Re: Sailfish
Srivas, Sailfish is builds upon record append (a feature not present in HDFS). The software that is currently released is based on Hadoop-0.20.2. You use the Sailfish version of Hadoop-0.20.2, KFS for the intermediate data, and then HDFS (or KFS) for storing the job/input. Since the changes are all in the handling of map output/reduce input, it is transparent to existing jobs. What is being proposed below is to bolt all the starting/stopping of the related deamons into YARN as a first step. There are other approaches that are possible, which have a similar effect. Hope this helps. Sriram On Thu, May 10, 2012 at 10:50 PM, M. C. Srivas mcsri...@gmail.com wrote: Sriram, Sailfish depends on append. I just noticed the HDFS disabled append. How does one use this with Hadoop? On Wed, May 9, 2012 at 9:00 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Sriram, The I-file concept could possibly be implemented here in a fairly self contained way. One could even colocate/embed a KFS filesystem with such an alternate shuffle, like how MR task temporary space is usually colocated with HDFS storage. Exactly. Does this seem reasonable in any way? Great. Where do go from here? How do we get a colloborative effort going? Sounds like a JIRA issue should be opened, the approach briefly described, and the first implementation attempt made. Then iterate. I look forward to seeing this! :) Otis -- Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Sriram Rao srirams...@gmail.com To: common-dev@hadoop.apache.org Sent: Tuesday, May 8, 2012 6:48 PM Subject: Re: Sailfish Dear Andy, From: Andrew Purtell apurt...@apache.org ... Do you intend this to be a joint project with the Hadoop community or a technology competitor? As I had said in my email, we are looking for folks to colloborate with us to help get us integrated with Hadoop. So, to be explicitly clear, we are intending for this to be a joint project with the community. Regrettably, KFS is not a drop in replacement for HDFS. Hypothetically: I have several petabytes of data in an existing HDFS deployment, which is the norm, and a continuous MapReduce workflow. How do you propose I, practically, migrate to something like Sailfish without a major capital expenditure and/or downtime and/or data loss? Well, we are not asking for KFS to replace HDFS. One path you could take is to experiment with Sailfish---use KFS just for the intermediate data and HDFS for everything else. There is no major capex :). While you get comfy with pushing intermediate data into a DFS, we get the ideas added to HDFS. This simplifies deployment considerations. However, can the Sailfish I-files implementation be plugged in as an alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and MAPREDUCE-4049), This'd be great! with necessary additional plumbing for dynamic adjustment of reduce task population? And the workbuilder could be part of an alternate MapReduce Application Manager? It should be part of the AM. (Currently, with our implementation in Hadoop-0.20.2, the workbuilder serves the role of an AM). The I-file concept could possibly be implemented here in a fairly self contained way. One could even colocate/embed a KFS filesystem with such an alternate shuffle, like how MR task temporary space is usually colocated with HDFS storage. Exactly. Does this seem reasonable in any way? Great. Where do go from here? How do we get a colloborative effort going? Best, Sriram From: Sriram Rao srirams...@gmail.com To: common-dev@hadoop.apache.org Sent: Tuesday, May 8, 2012 10:32 AM Subject: Project announcement: Sailfish (also, looking for colloborators) Hi, I'd like to announce the release of a new open source project, Sailfish. http://code.google.com/p/sailfish/ Sailfish tries to improve Hadoop-performance, particularly for large-jobs which process TB's of data and run for hours. In building Sailfish, we modify how map-output is handled and transported from map-reduce. The project pages provide more information about the project. We are looking for colloborators who can help get some of the ideas into Apache Hadoop. A possible step forward could be to make shuffle phase of Hadoop pluggable. If you are interested in working with us, please get in touch with me. Sriram -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Sailfish
Hey Sriram, We discussed this before, but for the benefit of the wider audience: :) It seems like the requirements imposed on KFS by Sailfish are in most ways much simplier than the requirements of a full distributed filesystem. The one thing we need is atomic record append -- but we don't need anything else, like filesystem metadata/naming, replication, corrupt data scanning, etc. All of the data is transient/short-lived and at replication count 1. So I think building something specific to this use case would be pretty practical - and my guess is it might even have some benefits over trying to use a full DFS. In the MR2 architecture, I'd probably try to build this as a service plugin in the NodeManager (similar to the way that the ShuffleHandler in the current implementation works) -Todd On Thu, May 10, 2012 at 11:01 PM, Sriram Rao srirams...@gmail.com wrote: Srivas, Sailfish is builds upon record append (a feature not present in HDFS). The software that is currently released is based on Hadoop-0.20.2. You use the Sailfish version of Hadoop-0.20.2, KFS for the intermediate data, and then HDFS (or KFS) for storing the job/input. Since the changes are all in the handling of map output/reduce input, it is transparent to existing jobs. What is being proposed below is to bolt all the starting/stopping of the related deamons into YARN as a first step. There are other approaches that are possible, which have a similar effect. Hope this helps. Sriram On Thu, May 10, 2012 at 10:50 PM, M. C. Srivas mcsri...@gmail.com wrote: Sriram, Sailfish depends on append. I just noticed the HDFS disabled append. How does one use this with Hadoop? On Wed, May 9, 2012 at 9:00 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Sriram, The I-file concept could possibly be implemented here in a fairly self contained way. One could even colocate/embed a KFS filesystem with such an alternate shuffle, like how MR task temporary space is usually colocated with HDFS storage. Exactly. Does this seem reasonable in any way? Great. Where do go from here? How do we get a colloborative effort going? Sounds like a JIRA issue should be opened, the approach briefly described, and the first implementation attempt made. Then iterate. I look forward to seeing this! :) Otis -- Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Sriram Rao srirams...@gmail.com To: common-dev@hadoop.apache.org Sent: Tuesday, May 8, 2012 6:48 PM Subject: Re: Sailfish Dear Andy, From: Andrew Purtell apurt...@apache.org ... Do you intend this to be a joint project with the Hadoop community or a technology competitor? As I had said in my email, we are looking for folks to colloborate with us to help get us integrated with Hadoop. So, to be explicitly clear, we are intending for this to be a joint project with the community. Regrettably, KFS is not a drop in replacement for HDFS. Hypothetically: I have several petabytes of data in an existing HDFS deployment, which is the norm, and a continuous MapReduce workflow. How do you propose I, practically, migrate to something like Sailfish without a major capital expenditure and/or downtime and/or data loss? Well, we are not asking for KFS to replace HDFS. One path you could take is to experiment with Sailfish---use KFS just for the intermediate data and HDFS for everything else. There is no major capex :). While you get comfy with pushing intermediate data into a DFS, we get the ideas added to HDFS. This simplifies deployment considerations. However, can the Sailfish I-files implementation be plugged in as an alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and MAPREDUCE-4049), This'd be great! with necessary additional plumbing for dynamic adjustment of reduce task population? And the workbuilder could be part of an alternate MapReduce Application Manager? It should be part of the AM. (Currently, with our implementation in Hadoop-0.20.2, the workbuilder serves the role of an AM). The I-file concept could possibly be implemented here in a fairly self contained way. One could even colocate/embed a KFS filesystem with such an alternate shuffle, like how MR task temporary space is usually colocated with HDFS storage. Exactly. Does this seem reasonable in any way? Great. Where do go from here? How do we get a colloborative effort going? Best, Sriram From: Sriram Rao srirams...@gmail.com To: common-dev@hadoop.apache.org Sent: Tuesday, May 8, 2012 10:32 AM Subject: Project announcement: Sailfish (also, looking for colloborators) Hi, I'd like to announce the release of a new open source project, Sailfish. http://code.google.com/p/sailfish/
Build failed in Jenkins: Hadoop-Common-trunk #403
See https://builds.apache.org/job/Hadoop-Common-trunk/403/changes Changes: [atm] HDFS-3026. HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. [eli] HDFS-3400. DNs should be able start with jsvc even if security is disabled. Contributed by Aaron T. Myers [szetszwo] HDFS-3385. The last block of INodeFileUnderConstruction is not necessarily a BlockInfoUnderConstruction, so do not cast it in FSNamesystem.recoverLeaseInternal(..). [eli] HDFS-3401. Cleanup DatanodeDescriptor creation in the tests. Contributed by Eli Collins [eli] HADOOP-8388. Remove unused BlockLocation serialization. Contributed by Colin Patrick McCabe [eli] Remove SHORT_STRING_MAX, left out of the previous commit. [eli] HADOOP-8361. Avoid out-of-memory problems when deserializing strings. Contributed by Colin Patrick McCabe [eli] HDFS-3134. harden edit log loader against malformed or malicious input. Contributed by Colin Patrick McCabe [szetszwo] HDFS-3369. Rename {get|set|add}INode(..) methods in BlockManager and BlocksMap to {get|set|add}BlockCollection(..). Contributed by John George [bobby] HADOOP-8375. test-patch should stop immediately once it has found compilation errors (bobby) [eli] HDFS-3230. Cleanup DatanodeID creation in the tests. Contributed by Eli Collins [atm] HDFS-3395. NN doesn't start with HA+security enabled and HTTP address set to 0.0.0.0. Contributed by Aaron T. Myers. [umamahesh] Reverting (Need to re-do the patch. new BlockInfo does not set iNode ) HDFS-3157. Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened. [eli] HDFS-3396. FUSE build fails on Ubuntu 12.04. Contributed by Colin Patrick McCabe [eli] HADOOP-7868. Hadoop native fails to compile when default linker option is -Wl,--as-needed. Contributed by Trevor Robinson [eli] HDFS-3328. NPE in DataNode.getIpcPort. Contributed by Eli Collins [todd] HDFS-3341, HADOOP-8340. SNAPSHOT build versions should compare as less than their eventual release. Contributed by Todd Lipcon. [suresh] HADOOP-8372. NetUtils.normalizeHostName() incorrectly handles hostname starting with a numeric character. Contributed by Junping Du. [bobby] MAPREDUCE-4237. TestNodeStatusUpdater can fail if localhost has a domain associated with it (bobby) [atm] HDFS-3390. DFSAdmin should print full stack traces of errors when DEBUG logging is enabled. Contributed by Aaron T. Myers. [bobby] HADOOP-8373. Port RPC.getServerAddress to 0.23 (Daryn Sharp via bobby) [bobby] HADOOP-8354. test-patch findbugs may fail if a dependent module is changed Contributed by Tom White and Robert Evans. -- [...truncated 45274 lines...] [DEBUG] (f) reactorProjects = [MavenProject: org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-annotations/pom.xml, MavenProject: org.apache.hadoop:hadoop-auth:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth/pom.xml, MavenProject: org.apache.hadoop:hadoop-auth-examples:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/pom.xml, MavenProject: org.apache.hadoop:hadoop-common:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-common/pom.xml, MavenProject: org.apache.hadoop:hadoop-common-project:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/pom.xml] [DEBUG] (f) useDefaultExcludes = true [DEBUG] (f) useDefaultManifestFile = false [DEBUG] -- end configuration -- [INFO] [INFO] --- maven-enforcer-plugin:1.0:enforce (dist-enforce) @ hadoop-common-project --- [DEBUG] Configuring mojo org.apache.maven.plugins:maven-enforcer-plugin:1.0:enforce from plugin realm ClassRealm[pluginorg.apache.maven.plugins:maven-enforcer-plugin:1.0, parent: sun.misc.Launcher$AppClassLoader@126b249] [DEBUG] Configuring mojo 'org.apache.maven.plugins:maven-enforcer-plugin:1.0:enforce' with basic configurator -- [DEBUG] (s) fail = true [DEBUG] (s) failFast = false [DEBUG] (f) ignoreCache = false [DEBUG] (s) project = MavenProject: org.apache.hadoop:hadoop-common-project:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/pom.xml [DEBUG] (s) version = [3.0.2,) [DEBUG] (s) version = 1.6 [DEBUG] (s) rules = [org.apache.maven.plugins.enforcer.RequireMavenVersion@6d9538, org.apache.maven.plugins.enforcer.RequireJavaVersion@5fd060] [DEBUG] (s) session = org.apache.maven.execution.MavenSession@cf68af [DEBUG] (s) skip = false [DEBUG] -- end configuration -- [DEBUG] Executing rule: org.apache.maven.plugins.enforcer.RequireMavenVersion [DEBUG] Rule org.apache.maven.plugins.enforcer.RequireMavenVersion is cacheable. [DEBUG] Key
Is it possible to execute Hive queries parallelly by writing mapper and reducer
Hello all, I am asking you about the increasing the performance of Hive. I tried with mappers and reducers but I didn't see difference in execution. Don't know why, may be I did in some other way which may be not correct or due to some other reason. I am thinking that Is it possible to execute Hive queries parallelly? Means, Normally the queries get execute in queue manner. query1 query2 query3 . . . n I am thinking that if we use mapreduce program in Hive JDBC program, then is it possible to execute it parallelly. Don't know will it work or not? Thats I am asking you about it. But again my questions are: 1) If it is possible then may be it require multiple Hive Thrift Server? 2) Is it possible to open multiple Hive Thrift Server? 3) I think it is not possible to open multiple Hive Thrift Server on same port.? 4) Can we open multiple Hive Thrift Server on different different port? Please suggest me some solution to this. If you have other idea other than this then pls share with with me I will also try that. Thanks -- Regards, Bhavesh Shah
Re: Sailfish
That makes perfect sense to me. Especially because it really is a new implementation of shuffle that is optimized for very large jobs. I am happy to see anything go in that is going to improve the performance of hadoop, and I look forward to running some benchmarks on the changes. I am not super familiar with sailfish, but from what I remember from a while ago it is the modified version of KFS that is in reality doing the sorting. The maps will output data to chunks aka blocks that when each chunk is full it is sorted. When the sorting is finished for a chunk the reducers are now free to pull the sorted data from the chunks and run. I have a few concerns with it though. 1. How do we securely handle different comparators? Currently comparators run as the user that launched the job, not as a privileged user. Sailfish seems to require that comparators run as a privileged user, or we only support pure bitwise sorting of keys. 2. How does this work in a mixed environment? Sailfish, as I understand it, is optimized for large map/reduce jobs, and can be slower on small jobs than the current implementation. How do we make it so that large jobs are able to run faster, but not negatively impact the more common small jobs? We could run both in parallel and switch between them depending on the size of the job's input, or a config key of some sort, but then the RAM needed to make these big jobs run fast would not be available for smaller jobs to use when no really big job is running. --Bobby Evans On 5/11/12 1:32 AM, Todd Lipcon t...@cloudera.com wrote: Hey Sriram, We discussed this before, but for the benefit of the wider audience: :) It seems like the requirements imposed on KFS by Sailfish are in most ways much simplier than the requirements of a full distributed filesystem. The one thing we need is atomic record append -- but we don't need anything else, like filesystem metadata/naming, replication, corrupt data scanning, etc. All of the data is transient/short-lived and at replication count 1. So I think building something specific to this use case would be pretty practical - and my guess is it might even have some benefits over trying to use a full DFS. In the MR2 architecture, I'd probably try to build this as a service plugin in the NodeManager (similar to the way that the ShuffleHandler in the current implementation works) -Todd On Thu, May 10, 2012 at 11:01 PM, Sriram Rao srirams...@gmail.com wrote: Srivas, Sailfish is builds upon record append (a feature not present in HDFS). The software that is currently released is based on Hadoop-0.20.2. You use the Sailfish version of Hadoop-0.20.2, KFS for the intermediate data, and then HDFS (or KFS) for storing the job/input. Since the changes are all in the handling of map output/reduce input, it is transparent to existing jobs. What is being proposed below is to bolt all the starting/stopping of the related deamons into YARN as a first step. There are other approaches that are possible, which have a similar effect. Hope this helps. Sriram On Thu, May 10, 2012 at 10:50 PM, M. C. Srivas mcsri...@gmail.com wrote: Sriram, Sailfish depends on append. I just noticed the HDFS disabled append. How does one use this with Hadoop? On Wed, May 9, 2012 at 9:00 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Sriram, The I-file concept could possibly be implemented here in a fairly self contained way. One could even colocate/embed a KFS filesystem with such an alternate shuffle, like how MR task temporary space is usually colocated with HDFS storage. Exactly. Does this seem reasonable in any way? Great. Where do go from here? How do we get a colloborative effort going? Sounds like a JIRA issue should be opened, the approach briefly described, and the first implementation attempt made. Then iterate. I look forward to seeing this! :) Otis -- Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Sriram Rao srirams...@gmail.com To: common-dev@hadoop.apache.org Sent: Tuesday, May 8, 2012 6:48 PM Subject: Re: Sailfish Dear Andy, From: Andrew Purtell apurt...@apache.org ... Do you intend this to be a joint project with the Hadoop community or a technology competitor? As I had said in my email, we are looking for folks to colloborate with us to help get us integrated with Hadoop. So, to be explicitly clear, we are intending for this to be a joint project with the community. Regrettably, KFS is not a drop in replacement for HDFS. Hypothetically: I have several petabytes of data in an existing HDFS deployment, which is the norm, and a continuous MapReduce workflow. How do you propose I, practically, migrate to something like Sailfish without a major capital expenditure and/or downtime and/or data loss?
[jira] [Created] (HADOOP-8391) Hadoop-auth should use log4j
Eli Collins created HADOOP-8391: --- Summary: Hadoop-auth should use log4j Key: HADOOP-8391 URL: https://issues.apache.org/jira/browse/HADOOP-8391 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 2.0.0 Reporter: Eli Collins Per HADOOP-8086 hadoop-auth uses slf4j, don't see why it shouldn't use log4j to be consistent with the rest of Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8392) Add YARN audit logging to log4j.properties
Eli Collins created HADOOP-8392: --- Summary: Add YARN audit logging to log4j.properties Key: HADOOP-8392 URL: https://issues.apache.org/jira/browse/HADOOP-8392 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins MAPREDUCE-2655 added MR/NM audit logging but it's not hooked up into log4j.properties or the bin and env scripts like the other audit logs so you have to modify the deployed binary to change them. Let's add the relevant plumbing that the other audit loggers have, and update log4.properties with a sample configuration that's disabled by default, eg see [this comment|https://issues.apache.org/jira/browse/MAPREDUCE-2655?focusedCommentId=13084191page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13084191]. Also, looks like mapred.AuditLogger and its plumbing can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8395) Text shell command unnecessarily demands that a SequenceFile's key class be WritableComparable
Harsh J created HADOOP-8395: --- Summary: Text shell command unnecessarily demands that a SequenceFile's key class be WritableComparable Key: HADOOP-8395 URL: https://issues.apache.org/jira/browse/HADOOP-8395 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.0.0 Reporter: Harsh J Priority: Trivial Text from Display set of Shell commands (hadoop fs -text), has a strict subclass check for a sequence-file-header loaded key class to be a subclass of WritableComparable. The sequence file writer itself has no such checks (one can create sequence files with just plain writable keys, comparable is needed for sequence file's sorter alone, which not all of them use always), and hence its not reasonable for Text command to carry it either. We should relax the check and simply just check for Writable, not WritableComparable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8396) DataStreamer, OutOfMemoryError, unable to create new native thread
Catalin Alexandru Zamfir created HADOOP-8396: Summary: DataStreamer, OutOfMemoryError, unable to create new native thread Key: HADOOP-8396 URL: https://issues.apache.org/jira/browse/HADOOP-8396 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 1.0.2 Environment: Ubuntu 64bit, 4GB of RAM, Core Duo processors, commodity hardware. Reporter: Catalin Alexandru Zamfir Priority: Blocker We're trying to write about 1 few billion records, via Avro. When we got this error, that's unrelated to our code: 10725984 [Main] INFO net.gameloft.RnD.Hadoop.App - ## At: 2:58:43.290 # Written: 52100 records Exception in thread DataStreamer for file /Streams/Cubed/Stuff/objGame/aRandomGame/objType/aRandomType/2012/05/11/20/29/Shard.avro block blk_3254486396346586049_75838 java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:657) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:612) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) at org.apache.hadoop.ipc.Client.call(Client.java:1046) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:160) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3117) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790) 10746169 [Main] INFO net.gameloft.RnD.Hadoop.App - ## At: 2:59:03.474 # Written: 52200 records Exception in thread ResponseProcessor for block blk_4201760269657070412_73948 java.lang.OutOfMemoryError at sun.misc.Unsafe.allocateMemory(Native Method) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:117) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305) at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:75) at sun.nio.ch.IOUtil.read(IOUtil.java:223) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2964) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 32 bytes for intptr_t in /build/buildd/openjdk-6-6b23~pre11/build/openjdk/hotspot/src/share/vm/runtime/deoptimization.cpp [thread 1587264368 also had an error] [thread 309168 also had an error] [thread 1820371824 also had an error] [thread 1343454064 also had an error] [thread 1345444720 also had an error] # An error report file with more information is saved as: # [thread 1345444720 also had an error] [thread -1091290256 also had an error] [thread 678165360 also had an error] [thread 678497136 also had an error] [thread 675511152 also had an error] [thread 1385937776 also had an error] [thread 911969136 also had an error] [thread -1086207120 also had an error] [thread -1088251024 also had an error] [thread -1088914576 also had an error] [thread -1086870672 also had an error] [thread 441797488 also had an error][thread 445778800 also had an error] [thread 440400752 also had an error] [thread 444119920 also had an error][thread 1151298416 also had an error] [thread 443124592 also had an error] [thread 1152625520 also had an error] [thread 913628016 also had an error] [thread -1095345296 also had an error][thread 1390799728 also had an error] [thread 443788144 also had an error] [thread 676506480 also had an error] [thread 1630595952 also had an error] pure virtual method called terminate called without an active exception pure virtual method called Aborted It seems to be a memory leak. We were opening 5 - 10 buffers to different paths when writing and closing them. We've tested that those buffers do not overrun. And