[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994199#comment-12994199 ] ryan rawson commented on HDFS-347: -- In a test with 15 threads of HBase clients, latency goes from 12.1 ms - 6.9 ms with this patch. Based on my report to user@hbase list, there are a few people who are pulling down my patched hadoop variant and want to test and run with it. Based on the iceberg theory of interest, this is one of the hottest things I've seen people want, and want NOW in a while. DFS read performance suboptimal when client co-located on nodes with data - Key: HDFS-347 URL: https://issues.apache.org/jira/browse/HDFS-347 Project: Hadoop HDFS Issue Type: Improvement Reporter: George Porter Assignee: Todd Lipcon Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-branch-20-append.txt, all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc One of the major strategies Hadoop uses to get scalable data processing is to move the code to the data. However, putting the DFS client on the same physical node as the data blocks it acts on doesn't improve read performance as much as expected. After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem is due to the HDFS streaming protocol causing many more read I/O operations (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB disk block from the DataNode process (running in a separate JVM) running on the same machine. The DataNode will satisfy the single disk block request by sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, this is done in the sendChunk() method, relying on Java's transferTo() method. Depending on the host O/S and JVM implementation, transferTo() is implemented as either a sendfilev() syscall or a pair of mmap() and write(). In either case, each chunk is read from the disk by issuing a separate I/O operation for each chunk. The result is that the single request for a 64-MB block ends up hitting the disk as over a thousand smaller requests for 64-KB each. Since the DFSClient runs in a different JVM and process than the DataNode, shuttling data from the disk to the DFSClient also results in context switches each time network packets get sent (in this case, the 64-kb chunk turns into a large number of 1500 byte packet send operations). Thus we see a large number of context switches for each block send operation. I'd like to get some feedback on the best way to address this, but I think providing a mechanism for a DFSClient to directly open data blocks that happen to be on the same machine. It could do this by examining the set of LocatedBlocks returned by the NameNode, marking those that should be resident on the local host. Since the DataNode and DFSClient (probably) share the same hadoop configuration, the DFSClient should be able to find the files holding the block data, and it could directly open them and send data back to the client. This would avoid the context switches imposed by the network layer, and would allow for much larger read buffers than 64KB, which should reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992766#comment-12992766 ] ryan rawson commented on HDFS-347: -- dhruba, I am not seeing the file src/hdfs/org/apache/hadoop/hdfs/metrics/DFSClientMetrics.java in branch-20-append (nor cdh3b2). I also got a number of rejects, here are some highlights: ClientDatanodeProtocol, your variant has copyBlock, ours does not (hence the rej). Misc field differences in DFSClient, including the metrics object After resolving them I was able to get it up and going. I'm not able to get the unit test to pass, I'm guessing it's this: 2011-02-09 14:35:49,926 DEBUG hdfs.DFSClient (DFSClient.java:fetchBlockByteRange(1927)) - fetchBlockByteRange shortCircuitLocalReads true localhst h132.sfo.stumble.net/10.10.1.132 targetAddr /127.0.0.1:62665 Since we don't recognize that we are 'local', we do the normal read path which is failing. Any tips? DFS read performance suboptimal when client co-located on nodes with data - Key: HDFS-347 URL: https://issues.apache.org/jira/browse/HDFS-347 Project: Hadoop HDFS Issue Type: Improvement Reporter: George Porter Assignee: Todd Lipcon Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc One of the major strategies Hadoop uses to get scalable data processing is to move the code to the data. However, putting the DFS client on the same physical node as the data blocks it acts on doesn't improve read performance as much as expected. After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem is due to the HDFS streaming protocol causing many more read I/O operations (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB disk block from the DataNode process (running in a separate JVM) running on the same machine. The DataNode will satisfy the single disk block request by sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, this is done in the sendChunk() method, relying on Java's transferTo() method. Depending on the host O/S and JVM implementation, transferTo() is implemented as either a sendfilev() syscall or a pair of mmap() and write(). In either case, each chunk is read from the disk by issuing a separate I/O operation for each chunk. The result is that the single request for a 64-MB block ends up hitting the disk as over a thousand smaller requests for 64-KB each. Since the DFSClient runs in a different JVM and process than the DataNode, shuttling data from the disk to the DFSClient also results in context switches each time network packets get sent (in this case, the 64-kb chunk turns into a large number of 1500 byte packet send operations). Thus we see a large number of context switches for each block send operation. I'd like to get some feedback on the best way to address this, but I think providing a mechanism for a DFSClient to directly open data blocks that happen to be on the same machine. It could do this by examining the set of LocatedBlocks returned by the NameNode, marking those that should be resident on the local host. Since the DataNode and DFSClient (probably) share the same hadoop configuration, the DFSClient should be able to find the files holding the block data, and it could directly open them and send data back to the client. This would avoid the context switches imposed by the network layer, and would allow for much larger read buffers than 64KB, which should reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992786#comment-12992786 ] ryan rawson commented on HDFS-347: -- Applying this patch to branch-20-append and the unit test passes. Still trying to figure out why it works on one thing and not on the other. The patch is pretty dang simple too. DFS read performance suboptimal when client co-located on nodes with data - Key: HDFS-347 URL: https://issues.apache.org/jira/browse/HDFS-347 Project: Hadoop HDFS Issue Type: Improvement Reporter: George Porter Assignee: Todd Lipcon Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc One of the major strategies Hadoop uses to get scalable data processing is to move the code to the data. However, putting the DFS client on the same physical node as the data blocks it acts on doesn't improve read performance as much as expected. After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem is due to the HDFS streaming protocol causing many more read I/O operations (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB disk block from the DataNode process (running in a separate JVM) running on the same machine. The DataNode will satisfy the single disk block request by sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, this is done in the sendChunk() method, relying on Java's transferTo() method. Depending on the host O/S and JVM implementation, transferTo() is implemented as either a sendfilev() syscall or a pair of mmap() and write(). In either case, each chunk is read from the disk by issuing a separate I/O operation for each chunk. The result is that the single request for a 64-MB block ends up hitting the disk as over a thousand smaller requests for 64-KB each. Since the DFSClient runs in a different JVM and process than the DataNode, shuttling data from the disk to the DFSClient also results in context switches each time network packets get sent (in this case, the 64-kb chunk turns into a large number of 1500 byte packet send operations). Thus we see a large number of context switches for each block send operation. I'd like to get some feedback on the best way to address this, but I think providing a mechanism for a DFSClient to directly open data blocks that happen to be on the same machine. It could do this by examining the set of LocatedBlocks returned by the NameNode, marking those that should be resident on the local host. Since the DataNode and DFSClient (probably) share the same hadoop configuration, the DFSClient should be able to find the files holding the block data, and it could directly open them and send data back to the client. This would avoid the context switches imposed by the network layer, and would allow for much larger read buffers than 64KB, which should reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992801#comment-12992801 ] ryan rawson commented on HDFS-347: -- ok this was my bad, i applied the patch wrong. unit test passes. I'll attach a patch for others DFS read performance suboptimal when client co-located on nodes with data - Key: HDFS-347 URL: https://issues.apache.org/jira/browse/HDFS-347 Project: Hadoop HDFS Issue Type: Improvement Reporter: George Porter Assignee: Todd Lipcon Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc One of the major strategies Hadoop uses to get scalable data processing is to move the code to the data. However, putting the DFS client on the same physical node as the data blocks it acts on doesn't improve read performance as much as expected. After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem is due to the HDFS streaming protocol causing many more read I/O operations (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB disk block from the DataNode process (running in a separate JVM) running on the same machine. The DataNode will satisfy the single disk block request by sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, this is done in the sendChunk() method, relying on Java's transferTo() method. Depending on the host O/S and JVM implementation, transferTo() is implemented as either a sendfilev() syscall or a pair of mmap() and write(). In either case, each chunk is read from the disk by issuing a separate I/O operation for each chunk. The result is that the single request for a 64-MB block ends up hitting the disk as over a thousand smaller requests for 64-KB each. Since the DFSClient runs in a different JVM and process than the DataNode, shuttling data from the disk to the DFSClient also results in context switches each time network packets get sent (in this case, the 64-kb chunk turns into a large number of 1500 byte packet send operations). Thus we see a large number of context switches for each block send operation. I'd like to get some feedback on the best way to address this, but I think providing a mechanism for a DFSClient to directly open data blocks that happen to be on the same machine. It could do this by examining the set of LocatedBlocks returned by the NameNode, marking those that should be resident on the local host. Since the DataNode and DFSClient (probably) share the same hadoop configuration, the DFSClient should be able to find the files holding the block data, and it could directly open them and send data back to the client. This would avoid the context switches imposed by the network layer, and would allow for much larger read buffers than 64KB, which should reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HDFS-347: - Attachment: HDFS-347-branch-20-append.txt applies to head of branch-20-append DFS read performance suboptimal when client co-located on nodes with data - Key: HDFS-347 URL: https://issues.apache.org/jira/browse/HDFS-347 Project: Hadoop HDFS Issue Type: Improvement Reporter: George Porter Assignee: Todd Lipcon Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-branch-20-append.txt, all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc One of the major strategies Hadoop uses to get scalable data processing is to move the code to the data. However, putting the DFS client on the same physical node as the data blocks it acts on doesn't improve read performance as much as expected. After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem is due to the HDFS streaming protocol causing many more read I/O operations (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB disk block from the DataNode process (running in a separate JVM) running on the same machine. The DataNode will satisfy the single disk block request by sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, this is done in the sendChunk() method, relying on Java's transferTo() method. Depending on the host O/S and JVM implementation, transferTo() is implemented as either a sendfilev() syscall or a pair of mmap() and write(). In either case, each chunk is read from the disk by issuing a separate I/O operation for each chunk. The result is that the single request for a 64-MB block ends up hitting the disk as over a thousand smaller requests for 64-KB each. Since the DFSClient runs in a different JVM and process than the DataNode, shuttling data from the disk to the DFSClient also results in context switches each time network packets get sent (in this case, the 64-kb chunk turns into a large number of 1500 byte packet send operations). Thus we see a large number of context switches for each block send operation. I'd like to get some feedback on the best way to address this, but I think providing a mechanism for a DFSClient to directly open data blocks that happen to be on the same machine. It could do this by examining the set of LocatedBlocks returned by the NameNode, marking those that should be resident on the local host. Since the DataNode and DFSClient (probably) share the same hadoop configuration, the DFSClient should be able to find the files holding the block data, and it could directly open them and send data back to the client. This would avoid the context switches imposed by the network layer, and would allow for much larger read buffers than 64KB, which should reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900605#action_12900605 ] ryan rawson commented on HDFS-1052: --- Have you considered merely increasing the heap size? Switch to a no-pause GC collector, one of them lies therein: http://www.managedruntime.org/ Right now a machine can have 256 GB of ram for ~ $10,000. That is a 4x increase over what we have now. Added bonus: no additional complexity! HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1320) Add LOG.isDebugEnabled() guard for each LOG.debug(...)
[ https://issues.apache.org/jira/browse/HDFS-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892557#action_12892557 ] ryan rawson commented on HDFS-1320: --- does the JVM not optimize for this case in the fast-path? Add LOG.isDebugEnabled() guard for each LOG.debug(...) Key: HDFS-1320 URL: https://issues.apache.org/jira/browse/HDFS-1320 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Erik Steffl Fix For: 0.22.0 Attachments: HDFS-1320-0.22.patch Each LOG.debug(...) should be executed only if LOG.isDebugEnabled() is true, in some cases it's expensive to construct the string that is being printed to log. It's much easier to always use LOG.isDebugEnabled() because it's easier to check (rather than in each case reason wheather it's neccessary or not). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848562#action_12848562 ] ryan rawson commented on HDFS-1052: --- This sounds great! Also as part of the architecture can you explain how you will improve availability? HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1051) Umbrella Jira for Scaling the HDFS Name Service
[ https://issues.apache.org/jira/browse/HDFS-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847653#action_12847653 ] ryan rawson commented on HDFS-1051: --- Perhaps we could think of using Zab from Zookeeper to keep a cluster of NewNameNodes in sync all the time? Instead of doing a failure detection and failover and recovery, we could keep 2N+1 nodes always up to date. No recovery would be necessary during node failover, rolling restarts and machine moves would be doable (just like in ZK). This could reduce the deployment complexity over some of the other options. For metadata scalability in RAM, a possibility would be to leverage flash disk in some capacity. Swap, mmap, explicit local files in flash would be reasonably fast and could extend the serviceable life of the 1 machine holds all metadata architecture concept. Umbrella Jira for Scaling the HDFS Name Service --- Key: HDFS-1051 URL: https://issues.apache.org/jira/browse/HDFS-1051 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.22.0 Reporter: Sanjay Radia Assignee: Sanjay Radia Fix For: 0.22.0 The HDFS Name service currently uses a single Namenode which limits its scalability. This is a master jira to track sub-jiras to address this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1051) Umbrella Jira for Scaling the HDFS Name Service
[ https://issues.apache.org/jira/browse/HDFS-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847655#action_12847655 ] ryan rawson commented on HDFS-1051: --- One other note - most people have problems with the availability of NameNode, not the scalability of the cluster (ie: medium data over big data). I would argue that availability of the namenode is the highest priority and is holding HDFS back from widespread adoption. Umbrella Jira for Scaling the HDFS Name Service --- Key: HDFS-1051 URL: https://issues.apache.org/jira/browse/HDFS-1051 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.22.0 Reporter: Sanjay Radia Assignee: Sanjay Radia Fix For: 0.22.0 The HDFS Name service currently uses a single Namenode which limits its scalability. This is a master jira to track sub-jiras to address this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay
[ https://issues.apache.org/jira/browse/HDFS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846525#action_12846525 ] ryan rawson commented on HDFS-1002: --- we create the directory first, here is the relevant code: if (!fs.exists(dir)) { fs.mkdirs(dir); } Path path = getUniqueFile(fs, dir); return new HFile.Writer(fs, path, blocksize, algorithm == null? HFile.DEFAULT_COMPRESSION_ALGORITHM: algorithm, c == null? KeyValue.KEY_COMPARATOR: c); (StoreFile.java:398 in HBase) In this case getUniqueFile generates that 9207375265821366914 filename. The constructor of HFile.Writer() will create the file using fs.create(path). Also 'dir' == /hbase/stumbles_by_userid/compaction.dir/378232123 in this context. So yes we create the directory before creating the file. Secondary Name Node crash, NPE in edit log replay - Key: HDFS-1002 URL: https://issues.apache.org/jira/browse/HDFS-1002 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: ryan rawson Fix For: 0.21.0 Attachments: snn_crash.tar.gz, snn_log.txt An NPE in SNN, the core of the message looks like yay so: 2010-02-25 11:54:05,834 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219) at java.lang.Thread.run(Thread.java:619) This happens even if I restart SNN over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-265) Revisit append
[ https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846704#action_12846704 ] ryan rawson commented on HDFS-265: -- The head of the mail conversation is here: http://mail-archives.apache.org/mod_mbox/hadoop-general/201002.mbox/dfe484f01002161412h8bc953axee2a73d81a234...@mail.gmail.com There was no close to the discussion. At this point there does not seem to be any plans for Hadoop 0.21. Revisit append -- Key: HDFS-265 URL: https://issues.apache.org/jira/browse/HDFS-265 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.21.0 Attachments: a.sh, appendDesign.pdf, appendDesign.pdf, appendDesign1.pdf, appendDesign2.pdf, AppendSpec.pdf, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, TestPlanAppend.html HADOOP-1700 and related issues have put a lot of efforts to provide the first implementation of append. However, append is such a complex feature. It turns out that there are issues that were initially seemed trivial but needs a careful design. This jira revisits append, aiming for a design and implementation supporting a semantics that are acceptable to its users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844195#action_12844195 ] ryan rawson commented on HDFS-918: -- this sounds great guys. For HBase, there is more need for the read path, rather than the write path. I would love to test this for HBase, and we could postpone/delay the write path code. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1028) INode.getPathNames could split more efficiently
[ https://issues.apache.org/jira/browse/HDFS-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842215#action_12842215 ] ryan rawson commented on HDFS-1028: --- +1 - tsuna on our staff found that in a string intensive processing mini-app, he was able to speed his app up by a factor of 2 by not using that API. Also PrintWriter is many times faster than PrintStream. INode.getPathNames could split more efficiently --- Key: HDFS-1028 URL: https://issues.apache.org/jira/browse/HDFS-1028 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Priority: Minor INode.getPathnames uses String.split(String) which actually uses the full Java regex implementation. Since we're always splitting on a single char, we could implement a faster one like StringUtils.split() (except without the escape character). This takes a significant amount of CPU during FSImage loading so should be a worthwhile speedup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image
[ https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838497#action_12838497 ] ryan rawson commented on HDFS-686: -- Why are you closing this as fixed when it clearly isn't? NullPointerException is thrown while merging edit log and image --- Key: HDFS-686 URL: https://issues.apache.org/jira/browse/HDFS-686 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.21.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Priority: Blocker Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch Our secondary name node is not able to start on NullPointerException: ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) This was caused by setting access time on a non-existent file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay
Secondary Name Node crash, NPE in edit log replay - Key: HDFS-1002 URL: https://issues.apache.org/jira/browse/HDFS-1002 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: ryan rawson Fix For: 0.21.0 An NPE in SNN, the core of the message looks like yay so: 2010-02-25 11:54:05,834 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219) at java.lang.Thread.run(Thread.java:619) This happens even if I restart SNN over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay
[ https://issues.apache.org/jira/browse/HDFS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838589#action_12838589 ] ryan rawson commented on HDFS-1002: --- I forgot, this is a hadoop 0.21 build, based on SVN revision 901028, its from http://github.com/apache/hadoop-hdfs/commit/de5d50187a34d35e5e1a9ea8bfa1a2fedb9a7df4 which is actually https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0...@901028 Inspecting the code on my systems also verifies that yes we have this commit: http://svn.apache.org/viewvc/hadoop/hdfs/branches/branch-0.21/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java?p2=/hadoop/hdfs/branches/branch-0.21/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.javap1=/hadoop/hdfs/branches/branch-0.21/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.javar1=824955r2=824954view=diffpathrev=824955 Secondary Name Node crash, NPE in edit log replay - Key: HDFS-1002 URL: https://issues.apache.org/jira/browse/HDFS-1002 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: ryan rawson Fix For: 0.21.0 Attachments: snn_crash.tar.gz, snn_log.txt An NPE in SNN, the core of the message looks like yay so: 2010-02-25 11:54:05,834 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219) at java.lang.Thread.run(Thread.java:619) This happens even if I restart SNN over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay
[ https://issues.apache.org/jira/browse/HDFS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838626#action_12838626 ] ryan rawson commented on HDFS-1002: --- The edits log is 17 meg large, it is included in the attachments here. Secondary Name Node crash, NPE in edit log replay - Key: HDFS-1002 URL: https://issues.apache.org/jira/browse/HDFS-1002 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: ryan rawson Fix For: 0.21.0 Attachments: snn_crash.tar.gz, snn_log.txt An NPE in SNN, the core of the message looks like yay so: 2010-02-25 11:54:05,834 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219) at java.lang.Thread.run(Thread.java:619) This happens even if I restart SNN over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay
[ https://issues.apache.org/jira/browse/HDFS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838640#action_12838640 ] ryan rawson commented on HDFS-1002: --- my JVM has this in it's lsof: java10109 hadoop memREG8,3832194 205563 /home/hadoop/hadoop-0.21/hdfs/hadoop-hdfs-0.21.0-SNAPSHOT.jar and that build is most definitely from 0.21 branch with the aforementioned revision. Unless the gremlins have come out again and changed the bits on me... Secondary Name Node crash, NPE in edit log replay - Key: HDFS-1002 URL: https://issues.apache.org/jira/browse/HDFS-1002 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: ryan rawson Fix For: 0.21.0 Attachments: snn_crash.tar.gz, snn_log.txt An NPE in SNN, the core of the message looks like yay so: 2010-02-25 11:54:05,834 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219) at java.lang.Thread.run(Thread.java:619) This happens even if I restart SNN over and over again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image
[ https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835100#action_12835100 ] ryan rawson commented on HDFS-686: -- Hey guys, I ran into this on branch-0.21, probably because the patch contained in the file nullSetTime.patch was never committed! Why would this issue be closed?! NullPointerException is thrown while merging edit log and image --- Key: HDFS-686 URL: https://issues.apache.org/jira/browse/HDFS-686 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.21.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20.2, 0.21.0 Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch Our secondary name node is not able to start on NullPointerException: ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) This was caused by setting access time on a non-existent file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-686) NullPointerException is thrown while merging edit log and image
[ https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HDFS-686: - Priority: Blocker (was: Major) Affects Version/s: 0.21.0 NullPointerException is thrown while merging edit log and image --- Key: HDFS-686 URL: https://issues.apache.org/jira/browse/HDFS-686 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.21.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.20.2, 0.21.0 Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch Our secondary name node is not able to start on NullPointerException: ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) This was caused by setting access time on a non-existent file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image
[ https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835101#action_12835101 ] ryan rawson commented on HDFS-686: -- here is my stack trace: 2010-02-13 21:46:54,748 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 2010-02-13 21:46:54,749 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1399) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1387) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:672) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219) at java.lang.Thread.run(Thread.java:619) NullPointerException is thrown while merging edit log and image --- Key: HDFS-686 URL: https://issues.apache.org/jira/browse/HDFS-686 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.21.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.20.2, 0.21.0 Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch Our secondary name node is not able to start on NullPointerException: ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) This was caused by setting access time on a non-existent file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831696#action_12831696 ] ryan rawson commented on HDFS-918: -- Great work, but Todd is dead on, we need to look at the few hundred but small chunk read scenario. It's currently where we hurt badly, and one of the things hopefully this patch can address. I would be happy to test this out when you are ready. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829293#action_12829293 ] ryan rawson commented on HDFS-918: -- I have done some thinking about HBase performance in relation to HDFS, and right now we are currently bottlenecking reads to a single file. By moving to a re-entrant API (pread) we are looking to unleash the parallelism. This is important I think, because we want to push as many parallel reads from our clients down into Datanode then down into the kernel to benefit from the IO scheduling in the kernel hardware. This could mean we might expect literally dozens of parallel reads per node on a busy cluster. Perhaps even hundreds! Per node. To ensure scalbility we'd probably want to get away from the xciever model, for more than 1 reason... If I remember correctly, xcivers not only consume threads (hundreds of threads is OK but non ideal) but it also consumes epolls, and there is just so many epolls available. So I heartily approve of the direction of this JIRA! Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829368#action_12829368 ] ryan rawson commented on HDFS-918: -- the problem was we were using a stateful interface previously because it was faster in scan tests, so we serialized reads within 1 RS to any given HFile. With multiple client handler threads asking for different parts of a large file, we get a serialized behaviour which hurts random get performance. So we are moving back to pread, which means we will get more parallelism - depending your table read pattern of course. But I want to get even more parallelism, by preading multiple hfiles during a scan/get for example. This will just up the thread pressure on the datanode. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799464#action_12799464 ] ryan rawson commented on HDFS-826: -- Why wouldn't DFSClient just handle the write pipeline problems and just move on? Otherwise it sounds like if the client gets an error it must close the file, delete the file, then re-try again? This doesn't sound like a robust filesystem to me... Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline Key: HDFS-826 URL: https://issues.apache.org/jira/browse/HDFS-826 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: ReplicableHdfs.txt HDFS does not replicate the last block of the file that is being currently written to by an application. Every datanode death in the write pipeline decreases the reliability of the last block of the currently-being-written block. This situation can be improved if the application can be notified of a datanode death in the write pipeline. Then, the application can decide what is the right course of action to be taken on this event. In our use-case, the application can close the file on the first datanode death, and start writing to a newly created file. This ensures that the reliability guarantee of a block is close to 3 at all time. One idea is to make DFSOutoutStream. write() throw an exception if the number of datanodes in the write pipeline fall below minimum.replication.factor that is set on the client (this is backward compatible). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable
[ https://issues.apache.org/jira/browse/HDFS-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HDFS-127: - Attachment: HDFS-127-v2.patch here is a fixed version for hdfs-branch-0.21. DFSClient block read failures cause open DFSInputStream to become unusable -- Key: HDFS-127 URL: https://issues.apache.org/jira/browse/HDFS-127 Project: Hadoop HDFS Issue Type: Bug Reporter: Igor Bolotin Attachments: 4681.patch, h127_20091016.patch, HDFS-127-v2.patch We are using some Lucene indexes directly from HDFS and for quite long time we were using Hadoop version 0.15.3. When tried to upgrade to Hadoop 0.19 - index searches started to fail with exceptions like: 2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 file=/usr/collarity/data/urls-new/part-0/20081110-163426/_0.tis at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162) at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) ... The investigation showed that the root of this issue is that we exceeded # of xcievers in the data nodes and that was fixed by changing configuration settings to 2k. However - one thing that bothered me was that even after datanodes recovered from overload and most of client servers had been shut down - we still observed errors in the logs of running servers. Further investigation showed that fix for HADOOP-1911 introduced another problem - the DFSInputStream instance might become unusable once number of failures over lifetime of this instance exceeds configured threshold. The fix for this specific issue seems to be trivial - just reset failure counter before reading next block (patch will be attached shortly). This seems to be also related to HADOOP-3185, but I'm not sure I really understand necessity of keeping track of failed block accesses in the DFS client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable
[ https://issues.apache.org/jira/browse/HDFS-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HDFS-127: - Attachment: (was: HDFS-127-v2.patch) DFSClient block read failures cause open DFSInputStream to become unusable -- Key: HDFS-127 URL: https://issues.apache.org/jira/browse/HDFS-127 Project: Hadoop HDFS Issue Type: Bug Reporter: Igor Bolotin Attachments: 4681.patch, h127_20091016.patch We are using some Lucene indexes directly from HDFS and for quite long time we were using Hadoop version 0.15.3. When tried to upgrade to Hadoop 0.19 - index searches started to fail with exceptions like: 2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 file=/usr/collarity/data/urls-new/part-0/20081110-163426/_0.tis at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162) at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) ... The investigation showed that the root of this issue is that we exceeded # of xcievers in the data nodes and that was fixed by changing configuration settings to 2k. However - one thing that bothered me was that even after datanodes recovered from overload and most of client servers had been shut down - we still observed errors in the logs of running servers. Further investigation showed that fix for HADOOP-1911 introduced another problem - the DFSInputStream instance might become unusable once number of failures over lifetime of this instance exceeds configured threshold. The fix for this specific issue seems to be trivial - just reset failure counter before reading next block (patch will be attached shortly). This seems to be also related to HADOOP-3185, but I'm not sure I really understand necessity of keeping track of failed block accesses in the DFS client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
[ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758946#action_12758946 ] ryan rawson commented on HDFS-200: -- I'm not sure exactly what I'm seeing, here is some flow of events: namenode says when the regionserver closes a log: 2009-09-23 17:21:05,128 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.blockReceived: from 10.10.21.38:50010 1 blocks. 2009-09-23 17:21:05,128 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.blockReceived: blk_8594965619504827451_4351 is received from 10.10.21.38:50010 2009-09-23 17:21:05,128 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.10.21.38:50010 is added to blk_859496561950482745 1_4351 size 573866 2009-09-23 17:21:05,130 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.blockReceived: from 10.10.21.45:50010 1 blocks. 2009-09-23 17:21:05,130 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.blockReceived: blk_8594965619504827451_4351 is received from 10.10.21.45:50010 2009-09-23 17:21:05,130 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.10.21.45:50010 is added to blk_859496561950482745 1_4351 size 573866 2009-09-23 17:21:05,131 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.blockReceived: from 10.10.21.32:50010 1 blocks. 2009-09-23 17:21:05,131 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.blockReceived: blk_8594965619504827451_4351 is received from 10.10.21.32:50010 2009-09-23 17:21:05,131 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.10.21.32:50010 is added to blk_859496561950482745 1_4351 size 573866 2009-09-23 17:21:05,131 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR* NameNode.complete: /hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFSClien t_-2129099062 2009-09-23 17:21:05,131 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: /hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFS Client_-2129099062 2009-09-23 17:21:05,132 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.closeFile: /hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 with 1 bl ocks is persisted to the file system 2009-09-23 17:21:05,132 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: /hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 blockli st persisted So at this point we have 3 blocks, they have all checked in, right? then during logfile recovery we see: 2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR* NameNode.append: file /hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFSCl ient_-828773542 at 10.10.21.29 2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: src=/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228, holde r=DFSClient_-828773542, clientMachine=10.10.21.29, replication=512, overwrite=false, append=true 2009-09-23 17:21:45,997 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 567 Total time for transactions(ms): 9Number of transactions ba tched in Syncs: 54 Number of syncs: 374 SyncTimes(ms): 12023 4148 3690 7663 2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: UnderReplicationBlocks.update blk_8594965619504827451_4351 curReplicas 0 curExpectedReplicas 3 oldReplicas 0 oldExpectedReplicas 3 curPri 2 oldPri 2 2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.UnderReplicationBlock.update:blk_8594965619504827451_4351 has only 0 replicas and need 3 replicas so is added to neededReplications at priority level 2 2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.appendFile: file /hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFSClient_-828773542 at 10.10.21.29 block blk_8594965619504827451_4351 block size 573866 2009-09-23 17:21:45,997 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop,hadoop ip=/10.10.21.29 cmd=append src=/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 dst=nullperm=null 2009-09-23 17:21:47,265 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block blk_8594965619504827451_4351 from priority queue 2 2009-09-23 17:21:56,016 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.addBlock: file /hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFSClient_-828773542 2009-09-23 17:21:56,016 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getAdditionalBlock: file /hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFSClient_-828773542 2009-09-23 17:21:56,016
[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
[ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758949#action_12758949 ] ryan rawson commented on HDFS-200: -- scratch the last, i was having some environment/library version problems. In HDFS, sync() not yet guarantees data available to the new readers Key: HDFS-200 URL: https://issues.apache.org/jira/browse/HDFS-200 Project: Hadoop HDFS Issue Type: New Feature Reporter: Tsz Wo (Nicholas), SZE Assignee: dhruba borthakur Priority: Blocker Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says * A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file However, this feature is not yet implemented. Note that the operation 'flushed' is now called sync. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable
[ https://issues.apache.org/jira/browse/HDFS-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757355#action_12757355 ] ryan rawson commented on HDFS-127: -- +1 this patch is a _must have_ for anyone running hbase. The lack of it in hadoop trunk is forcing us to ship a non-standard hadoop jar just for a 2 line fix. Please commit already! DFSClient block read failures cause open DFSInputStream to become unusable -- Key: HDFS-127 URL: https://issues.apache.org/jira/browse/HDFS-127 Project: Hadoop HDFS Issue Type: Bug Reporter: Igor Bolotin Attachments: 4681.patch We are using some Lucene indexes directly from HDFS and for quite long time we were using Hadoop version 0.15.3. When tried to upgrade to Hadoop 0.19 - index searches started to fail with exceptions like: 2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 file=/usr/collarity/data/urls-new/part-0/20081110-163426/_0.tis at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162) at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) ... The investigation showed that the root of this issue is that we exceeded # of xcievers in the data nodes and that was fixed by changing configuration settings to 2k. However - one thing that bothered me was that even after datanodes recovered from overload and most of client servers had been shut down - we still observed errors in the logs of running servers. Further investigation showed that fix for HADOOP-1911 introduced another problem - the DFSInputStream instance might become unusable once number of failures over lifetime of this instance exceeds configured threshold. The fix for this specific issue seems to be trivial - just reset failure counter before reading next block (patch will be attached shortly). This seems to be also related to HADOOP-3185, but I'm not sure I really understand necessity of keeping track of failed block accesses in the DFS client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-265) Revisit append
[ https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755181#action_12755181 ] ryan rawson commented on HDFS-265: -- hey guys, it's been 4 months since this issue started moving, can we get some source so we can review and possibly test? Thanks! Revisit append -- Key: HDFS-265 URL: https://issues.apache.org/jira/browse/HDFS-265 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: Append Branch Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: Append Branch Attachments: appendDesign.pdf, appendDesign.pdf, appendDesign1.pdf, appendDesign2.pdf, AppendSpec.pdf, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, TestPlanAppend.html HADOOP-1700 and related issues have put a lot of efforts to provide the first implementation of append. However, append is such a complex feature. It turns out that there are issues that were initially seemed trivial but needs a careful design. This jira revisits append, aiming for a design and implementation supporting a semantics that are acceptable to its users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
[ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748236#action_12748236 ] ryan rawson commented on HDFS-200: -- I've been testing this on my 20 node cluster here. Lease recovery can take a long time which is a bit of an issue. The sync seems to be pretty good overall, we are recovering most of the edits up until the last flush, and it's pretty responsive. However, I have discovered a new bug, the scenario is like so: - we roll the logs every 1MB (block size). - we now have 18 logs to recover. The first 17 were closed properly, only the last one was in mid-write. - during log recovery, the hbase master calls fs.append(f); out.close(); - But the master gets stuck at the out.close(); It can't seem to progress. Investigating the logs, it looks like the namenode 'forgets' about the other 2 replicas for the block (file is 1 block), and thus we are stuck until another replica comes back. I've attached logs, hadoop fsck, stack traces from hbase. In HDFS, sync() not yet guarantees data available to the new readers Key: HDFS-200 URL: https://issues.apache.org/jira/browse/HDFS-200 Project: Hadoop HDFS Issue Type: New Feature Reporter: Tsz Wo (Nicholas), SZE Assignee: dhruba borthakur Priority: Blocker Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch, hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says * A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file However, this feature is not yet implemented. Note that the operation 'flushed' is now called sync. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.