Re: S3N copy creating recursive folders
Subroto and Shumin Try adding a slash to to the s3n source: - hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData /test/srcData + hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData/ /test/srcData Without the slash, it will keep listing srcData each time it is listed, leading to the infinite recursion you experienced. George I used to have similar problem. Looks like there is a recursive folder creation bug. How about you try remove the srcData from the dst, for example use the following command: *hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData http://acessKey:acesssec...@bucket.name/srcData /test/* Or with distcp: *hadoop distcp s3n://acessKey:acesssec...@bucket.name/srcData http://acessKey:acesssec...@bucket.name/srcData /test/* HTH. Shumin On Wed, Mar 6, 2013 at 5:44 AM, Subroto ssan...@datameer.com mailto:ssan...@datameer.com wrote: Hi Mike, I have tries distcp as well and it ended up with exception: 13/03/06 05:41:13 INFO tools.DistCp: srcPaths=[s3n://acessKey:acesssec...@dm.test.bucket/srcData] 13/03/06 05:41:13 INFO tools.DistCp: destPath=/test/srcData 13/03/06 05:41:18 INFO tools.DistCp: /test/srcData does not exist. org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there are duplicated files in the sources: s3n://acessKey:acesssec...@dm.test.bucket/srcData/compressed, s3n://acessKey:acesssec...@dm.test.bucket/srcData/compressed at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1368) at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1176) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) One more interesting stuff to notice is that same thing works nicely with hadoop 2.0 Cheers, Subroto Sanyal On Mar 6, 2013, at 11:12 AM, Michel Segel wrote: Have you tried using distcp? Sent from a remote device. Please excuse any typos... Mike Segel On Mar 5, 2013, at 8:37 AM, Subroto ssan...@datameer.com mailto:ssan...@datameer.com wrote: Hi, Its not because there are too many recursive folders in S3 bucket; in-fact there is no recursive folder in the source. If I list the S3 bucket with Native S3 tools I can find a file srcData with size 0 in the folder srcData. The copy command keeps on creating folder /test/srcData/srcData/srcData (keep on appending srcData). Cheers, Subroto Sanyal On Mar 5, 2013, at 3:32 PM, 卖报的小行家 wrote: Hi Subroto, I didn't use the s3n filesystem.But from the output cp: java.io.IOException: mkdirs: Pathname too long. Limit 8000 characters, 1000 levels., I think this is because the problem of the path. Is the path longer than 8000 characters or the level is more than 1000? You only have 998 folders.Maybe the last one is more than 8000 characters.Why not count the last one's length? BRs//Julian -- Original -- *From: * Subrotossan...@datameer.com mailto:ssan...@datameer.com; *Date: * Tue, Mar 5, 2013 10:22 PM *To: * useruser@hadoop.apache.org mailto:user@hadoop.apache.org; *Subject: * S3N copy creating recursive folders Hi, I am using Hadoop 1.0.3 and trying to execute: hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData /test/srcData This ends up with: cp: java.io.IOException: mkdirs: Pathname too long. Limit 8000 characters, 1000 levels. When I try to list the folder recursively /test/srcData: it lists 998 folders like: drwxr-xr-x - root supergroup 0 2013-03-05 08:49 /test/srcData/srcData drwxr-xr-x - root supergroup 0 2013-03-05 08:49 /test/srcData/srcData/srcData drwxr-xr-x - root supergroup 0 2013-03-05 08:49 /test/srcData/srcData/srcData/srcData drwxr-xr-x - root supergroup 0 2013-03-05 08:49 /test/srcData/srcData/srcData/srcData/srcData drwxr-xr-x - root supergroup 0 2013-03-05 08:49 /test/srcData/srcData/srcData/srcData/srcData/srcData Is there a problem with s3n filesystem ?? Cheers, Subroto Sanyal
Re: Question related to Decompressor interface
Hello Can someone share some idea what the Hadoop source code of class org.apache.hadoop.io.compress.BlockDecompressorStream, method rawReadInt() is trying to do here? The BlockDecompressorStream class is used for block-based decompression (e.g. snappy). Each chunk has a header indicating how many bytes it is. That header is obtained by the rawReadInt method so it is expected to return a non-negative value (since you can't have a negative length). George
Re: xcievers
Patai, I am still curious, how do we monitor the consumption of this value in each datanode. You can use the getDataNodeStats() method of your your DistributedFileSystem instance. It returns an array of DatanodeInfo which contains, among other things, the xceiver count that you are looking for. George
Re: what happens when a datanode rejoins?
Hi Mehul Some of the blocks it was managing are deleted/modified? The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. The size of the blocks are now modified say from 64MB to 128MB? Block size is a per-file setting so new files will be 128MB, but the old ones will remain at 64MB. What if the block replication factor was one (yea not in most deployments but say incase) so does the namenode recreate a file once the datanode rejoins? (assuming you didn't perform a decommission) Blocks that lived only on that datanode will be declared missing and the files associated with those blocks will be not be able to be fully read, until the datanode rejoins. George
Re: what happens when a datanode rejoins?
Mehul, Let me make an addition. Some of the blocks it was managing are deleted/modified? Blocks that are deleted in the interim will deleted on the rejoining node as well, after it rejoins . Regarding the modified, I'd advise against modifying blocks after they have been fully written. George
Re: How to get the HDFS I/O information
Qu, Every job has a history file that is, by default, stored under $HADOOP_LOG_DIR/history. These job history files list the amount of hdfs read/write (and lots of other things) for every task. On 2012/04/25 7:25, Qu Chen wrote: Let me add, I'd like to do this periodically to gather some performance profile information. On Tue, Apr 24, 2012 at 5:47 PM, Qu Chen chenqu...@gmail.com mailto:chenqu...@gmail.com wrote: I am trying to gather the info regarding the amount of HDFS read/write for each task in a given map-reduce job. How can I do that?
Re: Reducer not firing
Arko, Change Iterator to Iterable George On 2012/04/18 8:16, Arko Provo Mukherjee wrote: Hello, Thanks everyone for helping me. Here are my observations: Devaraj - I didn't find any bug in the log files. In fact, none of the print statements in my reducer are even appearing in the logs. I can share the syslogs if you want. I didn't paste them here so that the email doesn't get cluttered. Kasi - Thanks for the suggestion. I tired but got the same output. The system just created 1 reducer as my test data set is small. Bejoy - Can you please advice how I can pinpoint whether the IdentityReducer is being used or not. Steven - I tried compiling with your suggestion. However if I put a @Override on top of my reduce method, I get the following error: method does not override or implement a method from a supertype The code compiles without it. I do have an @Override on top of my map method though. public class Reduce_First extends ReducerIntWritable, Text, NullWritable, Text { public void reduce (IntWritable key, IteratorText values, Context context) throws IOException, InterruptedException { while ( values.hasNext() ) // Process // Finally emit } } Thanks a lot again! Warm regards Arko On Tue, Apr 17, 2012 at 3:19 PM, Steven Willisswil...@compete.com wrote: Try putting @Override before your reduce method to make sure you're overriding the method properly. You’ll get a compile time error if not. -Steven Willis From: Bejoy KS [mailto:bejoy.had...@gmail.com] Sent: Tuesday, April 17, 2012 10:03 AM To: mapreduce-user@hadoop.apache.org Subject: Re: Reducer not firing Hi Akro From the naming of output files, your job has the reduce phase. But the reducer being used is the IdentityReducer instead of your custom reducer. That is the reason you are seeing the same map output in the output files as well. You need to evaluate your code and logs to see why IdentityReducer is being triggered. Regards Bejoy KS Sent from handheld, please excuse typos. From: kasi subrahmanyamkasisubbu...@gmail.com Date: Tue, 17 Apr 2012 19:10:33 +0530 To:mapreduce-user@hadoop.apache.org ReplyTo: mapreduce-user@hadoop.apache.org Subject: Re: Reducer not firing Could you comment the property where you are setting the number of reducer tasks and see the behaviour of the program once. If you already tried could you share the output On Tue, Apr 17, 2012 at 3:00 PM, Devaraj kdevara...@huawei.com wrote: Can you check the task attempt logs in your cluster and find out what is happening in the reduce phase. By default task attempt logs present in $HADOOP_LOG_DIR/userlogs/job-id/. There could be some bug exist in your reducer which is leading to this output. Thanks Devaraj From: Arko Provo Mukherjee [arkoprovomukher...@gmail.com] Sent: Tuesday, April 17, 2012 2:07 PM To: mapreduce-user@hadoop.apache.org Subject: Re: Reducer not firing Hello, Many thanks for the reply. The 'no_of_reduce_tasks' is set to 2. I have a print statement before the code I pasted below to check that. Also I can find two output files part-r-0 and part-r-1. But they contain the values that has been outputted by the Mapper logic. Please let me know what I can check further. Thanks a lot in advance! Warm regards Arko On Tue, Apr 17, 2012 at 12:48 AM, Devaraj kdevara...@huawei.com wrote: Hi Arko, What is value of 'no_of_reduce_tasks'? If no of reduce tasks are 0, then the map task will directly write map output into the Job output path. Thanks Devaraj From: Arko Provo Mukherjee [arkoprovomukher...@gmail.com] Sent: Tuesday, April 17, 2012 10:32 AM To: mapreduce-user@hadoop.apache.org Subject: Reducer not firing Dear All, I am porting code from the old API to the new API (Context objects) and run on Hadoop 0.20.203. Job job_first = new Job(); job_first.setJarByClass(My.class); job_first.setNumReduceTasks(no_of_reduce_tasks); job_first.setJobName(My_Job); FileInputFormat.addInputPath( job_first, new Path (Input_Path) ); FileOutputFormat.setOutputPath( job_first, new Path (Output_Path) ); job_first.setMapperClass(Map_First.class); job_first.setReducerClass(Reduce_First.class); job_first.setMapOutputKeyClass(IntWritable.class); job_first.setMapOutputValueClass(Text.class); job_first.setOutputKeyClass(NullWritable.class); job_first.setOutputValueClass(Text.class); job_first.waitForCompletion(true); The problem I am facing is that instead of emitting values to reducers, the mappers are directly writing their output in the OutputPath and the reducers and not processing anything. As read from the online materials that are available both my Map and Reduce method uses the context.write method to emit the values. Please help. Thanks a lot in advance!! Warm regards Arko
Re: Best practices configuring libraries on the backend.
Dmitriy I've tested it on hadoop 1.0.0 and 1.0.1. (I don't know which version cdh3u3 is based off of) In hadoop-env.sh if I set HADOOP_TASKTRACKER_OPTS=-Djava.library.path=/usr/blah the TaskTracker sees that option. Then it gets passed along to all M/R child tasks on that node. Can you confirm that your TaskTrackers are actually seeing the passed option? (through the ps command) George On 2012/03/29 5:19, Dmitriy Lyubimov wrote: Hm. doesn't seem to work for me (with cdh3u3) I defined export HADOOP_TASKTRACKER_OPTS=-Djava.library.path=/usr/ and it doesn't seem to work (as opposed to when i set is withfinal property mapred.child.java.opts on the data node). Still puzzling. On Tue, Mar 27, 2012 at 7:17 PM, George Datskos george.dats...@jp.fujitsu.com wrote: Dmitriy, I just double-checked, and the caveat I stated earlier is incorrect. So, -Djava.library.path set in the client's {mapred.child.java.opts} should just append to to the -Djava.library.path that each TaskTracker has when creating the library path for each child (M/R) task. So that's even better I guess. George On 2012/03/28 11:06, George Datskos wrote: Dmitriy, To deal with different servers having various shared libraries in different locations, you can simply make sure the _TaskTracker_'s -Djava.library.path is set correctly on each server. That library path should be passed along to each child (M/R) task. (in *addition* to the {mapred.child.java.opts} that you specify on the client-side configuration options) One caveat: on the client-side, don't include -Djava.library.path or that path will be passed along to all of the child tasks, overriding site-specific one you set on the TaskTracker. George On 2012/03/28 10:43, Dmitriy Lyubimov wrote: Hello, I have a couple of questions regarding mapreduce configurations. We install various platforms on data nodes that require mixed set of native libraries. Part of the problem is that in general case, this software platforms may be installed into different locations in the backend. (we try to unify it, but still). What it means, it may require site-specific -Djava.library.path setting. I configured individual jvm options (mapred.child.java.opts) on each node to include specific set of paths. However, i encountered 2 problems: #1: my setting doesn't go into effect unless I also declare it final in the data node. It's just being overriden by default -Xmx200 value from the driver EVEN when i don't set it on the driver at all (and there seems to be no way to unset it). However, using final spec at the backend creates a problem if some of numerous jobs we run wishes to override the setting still. The ideal behavior is if i don't set it in the driver, then backend value kicks in, otherwise it's driver's value. But i did not find a way to do that for this particular setting for some reason.Could somebody clarify the best workaround? thank you. #2. Ideal behavior would actually be to merge driver-specific and backend-specific settings. E.g. backend may need to configure specific software package locations while client may wish sometimes to set heap etc. Is there a best practice to achieve this effect? Thank you very much in advance. -Dmitriy
Re: Best practices configuring libraries on the backend.
Dmitriy, To deal with different servers having various shared libraries in different locations, you can simply make sure the _TaskTracker_'s -Djava.library.path is set correctly on each server. That library path should be passed along to each child (M/R) task. (in *addition* to the {mapred.child.java.opts} that you specify on the client-side configuration options) One caveat: on the client-side, don't include -Djava.library.path or that path will be passed along to all of the child tasks, overriding site-specific one you set on the TaskTracker. George On 2012/03/28 10:43, Dmitriy Lyubimov wrote: Hello, I have a couple of questions regarding mapreduce configurations. We install various platforms on data nodes that require mixed set of native libraries. Part of the problem is that in general case, this software platforms may be installed into different locations in the backend. (we try to unify it, but still). What it means, it may require site-specific -Djava.library.path setting. I configured individual jvm options (mapred.child.java.opts) on each node to include specific set of paths. However, i encountered 2 problems: #1: my setting doesn't go into effect unless I also declare it final in the data node. It's just being overriden by default -Xmx200 value from the driver EVEN when i don't set it on the driver at all (and there seems to be no way to unset it). However, using final spec at the backend creates a problem if some of numerous jobs we run wishes to override the setting still. The ideal behavior is if i don't set it in the driver, then backend value kicks in, otherwise it's driver's value. But i did not find a way to do that for this particular setting for some reason.Could somebody clarify the best workaround? thank you. #2. Ideal behavior would actually be to merge driver-specific and backend-specific settings. E.g. backend may need to configure specific software package locations while client may wish sometimes to set heap etc. Is there a best practice to achieve this effect? Thank you very much in advance. -Dmitriy
Re: Best practices configuring libraries on the backend.
Dmitriy, I just double-checked, and the caveat I stated earlier is incorrect. So, -Djava.library.path set in the client's {mapred.child.java.opts} should just append to to the -Djava.library.path that each TaskTracker has when creating the library path for each child (M/R) task. So that's even better I guess. George On 2012/03/28 11:06, George Datskos wrote: Dmitriy, To deal with different servers having various shared libraries in different locations, you can simply make sure the _TaskTracker_'s -Djava.library.path is set correctly on each server. That library path should be passed along to each child (M/R) task. (in *addition* to the {mapred.child.java.opts} that you specify on the client-side configuration options) One caveat: on the client-side, don't include -Djava.library.path or that path will be passed along to all of the child tasks, overriding site-specific one you set on the TaskTracker. George On 2012/03/28 10:43, Dmitriy Lyubimov wrote: Hello, I have a couple of questions regarding mapreduce configurations. We install various platforms on data nodes that require mixed set of native libraries. Part of the problem is that in general case, this software platforms may be installed into different locations in the backend. (we try to unify it, but still). What it means, it may require site-specific -Djava.library.path setting. I configured individual jvm options (mapred.child.java.opts) on each node to include specific set of paths. However, i encountered 2 problems: #1: my setting doesn't go into effect unless I also declare it final in the data node. It's just being overriden by default -Xmx200 value from the driver EVEN when i don't set it on the driver at all (and there seems to be no way to unset it). However, using final spec at the backend creates a problem if some of numerous jobs we run wishes to override the setting still. The ideal behavior is if i don't set it in the driver, then backend value kicks in, otherwise it's driver's value. But i did not find a way to do that for this particular setting for some reason.Could somebody clarify the best workaround? thank you. #2. Ideal behavior would actually be to merge driver-specific and backend-specific settings. E.g. backend may need to configure specific software package locations while client may wish sometimes to set heap etc. Is there a best practice to achieve this effect? Thank you very much in advance. -Dmitriy
Re: Mapper Record Spillage
Actually if you set {io.sort.mb} to 2048, your map tasks will always fail. The maximum {io.sort.mb} is hard-coded to 2047. Which means if you think you've set 2048 and your tasks aren't failing, then you probably haven't actually changed io.sort.mb. Double-check what configuration settings the Jobtracker actually saw by looking at $ hadoop fs -cat hdfs://JOB_OUTPUT_DIR/_logs/history/*.xml | grep io.sort.mb George On 2012/03/11 22:38, Harsh J wrote: Hans, I don't think io.sort.mb can support a whole 2048 value (it builds one array with the size, and JVM may not be allowing that). Can you lower it to 2000 ± 100 and try again? On Sun, Mar 11, 2012 at 1:36 PM, Hans Uhlighuh...@uhlisys.com wrote: If that is the case then these two lines should make more than enough memory. On a virtually unused cluster. job.getConfiguration().setInt(io.sort.mb, 2048); job.getConfiguration().set(mapred.map.child.java.opts, -Xmx3072M); Such that a conversion from 1GB of CSV Text to binary primitives should fit easily. but java still throws a heap error even when there is 25 GB of memory free. On Sat, Mar 10, 2012 at 11:50 PM, Harsh Jha...@cloudera.com wrote: Hans, You can change memory requirements for tasks of a single job, but not of a single task inside that job. This is briefly how the 0.20 framework (by default) works: TT has notions only of slots, and carries a maximum _number_ of simultaneous slots it may run. It does not know of what each task, occupying one slot, would demand in resource-terms. Your job then supplies a # of map tasks, and amount of memory required per map task in general, as a configuration. TTs then merely start the task JVMs with the provided heap configuration. On Sun, Mar 11, 2012 at 11:24 AM, Hans Uhlighuh...@uhlisys.com wrote: That was a typo in my email not in the configuration. Is the memory reserved for the tasks when the task tracker starts? You seem to be suggesting that I need to set the memory to be the same for all map tasks. Is there no way to override for a single map task? On Sat, Mar 10, 2012 at 8:41 PM, Harsh Jha...@cloudera.com wrote: Hans, Its possible you may have an typo issue: mapred.map.child.jvm.opts - Such a property does not exist. Perhaps you wanted mapred.map.child.java.opts? Additionally, the computation you need to do is (# of map slots on a TT * per-map-task-heap-requirement) should be at least (Total RAM - 2/3 GB). With your 4 GB requirement, I guess you can support a max of 6-7 slots per machine (i.e. Not counting reducer heap requirements in parallel). On Sun, Mar 11, 2012 at 9:30 AM, Hans Uhlighuh...@uhlisys.com wrote: I am attempting to speed up a mapping process whose input is GZIP compressed CSV files. The files range from 1-2GB, I am running on a Cluster where each node has a total of 32GB memory available to use. I have attempted to tweak mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to accommodate the size but I keep getting java heap errors or other memory related problems. My row count per mapper is well below Integer.MAX_INTEGER limit by several orders of magnitude and the box is NOT using anywhere close to its full memory allotment. How can I specify that this map task can have 3-4 GB of memory for the collection, partition and sort process without constantly spilling records to disk? -- Harsh J -- Harsh J
Re: Tracking Job completion times
Bharath, Try the hadoop job -history API On 2012/03/05 8:06, Bharath Ravi wrote: The Web UI does give me start and finish times, but I was wondering if there is a way to access these stats through an API, without having to grep through HTML. The hadoop jobs -status API was useful, but it doesn't seem to list wall completion times. (It does give me CPU time though). Am I missing something?
Re: Adding nodes
Mohit, New datanodes will connect to the namenode so thats how the namenode knows. Just make sure the datanodes have the correct {fs.default.dir} in their hdfs-site.xml and then start them. The namenode can, however, choose to reject the datanode if you are using the {dfs.hosts} and {dfs.hosts.exclude} settings in the namenode's hdfs-site.xml. The namenode doesn't actually care about the slaves file. It's only used by the start/stop scripts. On 2012/03/02 10:35, Mohit Anchlia wrote: I actually meant to ask how does namenode/jobtracker know there is a new node in the cluster. Is it initiated by namenode when slave file is edited? Or is it initiated by tasktracker when tasktracker is started?
Re: setting mapred.map.child.java.opts not working
Koji, Harsh mapred-478 seems to be in v1, but those new settings have not yet been added to mapred-default.xml. (for backwards compatibility?) George On 2012/01/12 13:50, Koji Noguchi wrote: Hi Harsh, Wasn't MAPREDUCE-478 in 1.0 ? Maybe the Jira is not up to date. Koji On 1/11/12 8:44 PM, Harsh Jha...@cloudera.com wrote: These properties are not available on Apache Hadoop 1.0 (Formerly known as 0.20.x). This was a feature introduced in 0.21 (https://issues.apache.org/jira/browse/MAPREDUCE-478), and is available today on 0.22 and 0.23 line of releases. For 1.0/0.20, use mapred.child.java.opts, that applies to both map and reduce commonly. Would also be helpful if you can tell us what doc guided you to use these property names instead of the proper one, so we can fix it. On Thu, Jan 12, 2012 at 8:44 AM, T Vinod Guptatvi...@readypulse.com wrote: Hi, Can someone help me asap? when i run my mapred job, it fails with this error - 12/01/12 02:58:36 INFO mapred.JobClient: Task Id : attempt_201112151554_0050_m_71_0, Status : FAILED Error: Java heap space attempt_201112151554_0050_m_71_0: log4j:ERROR Failed to flush writer, attempt_201112151554_0050_m_71_0: java.io.IOException: Stream closed attempt_201112151554_0050_m_71_0: at sun.nio.cs.StreamEncoder.ensureOpen(StreamEncoder.java:44) attempt_201112151554_0050_m_71_0: at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:139) attempt_201112151554_0050_m_71_0: at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) attempt_201112151554_0050_m_71_0: at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:58) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:94) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:260) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.Child$2.run(Child.java:142) so i updated my mapred-site.xml with these settings - property namemapred.map.child.java.opts/name value-Xmx2048M/value /property property namemapred.reduce.child.java.opts/name value-Xmx2048M/value /property also, when i run my jar, i provide - -Dmapred.map.child.java.opts=-Xmx4000m at the end. inspite of this, the task is not getting the max heap size im setting. where did i go wrong? after changing mapred-site.xml, i restarted jobtracker and tasktracker.. is that not good enough? thanks
legacy hadoop versions
Is there an Apache Hadoop policy towards maintenance/support of older Hadoop versions? It seems like 0.20.20* (now 1.0), 0.22, and 0.23 are the currently active branches. Regarding versions like 0.18 and 0.19, is there some policy like up to N years or up to M releases prior where legacy versions are still maintained? George
Re: Hadoop Question
Nitin, On 2011/07/28 14:51, Nitin Khandelwal wrote: How can I determine if a file is being written to (by any thread) in HDFS. That information is exposed by the NameNode http servlet. You can obtain it with the fsck tool (hadoop fsck /path/to/dir -openforwrite) or you can do an http get http://namenode:port/fsck?path=/your/pathopenforwrite=1 George