du reserved can ran into problems with reserved disk capacity by tune2fs

2013-02-12 Thread Alexander Fahlke
Hi! I'm using hadoop-0.20.2 on Debian Squeeze and ran into the same confusion as many others with the parameter for dfs.datanode.du.reserved. One day some data nodes got out of disk errors although there was space left on the disks. The following values are rounded to make the problem more

Fwd: Delivery Status Notification (Failure)

2013-02-12 Thread samir das mohapatra
Hi All, I wanted to know how to connect Hive(hadoop-cdh4 distribution) with MircoStrategy Any help is very helpfull. Witing for you response Note: It is little bit urgent do any one have exprience in that Thanks, samir

Re: number input files to mapreduce job

2013-02-12 Thread Mahesh Balija
Hi Vikas, You can get the FileSystem instance by calling FileSystem.get(Configuration); Once you get the FileSystem instance you can use FileSystem.listStatus(InputPath); to get the fileStatus instances. Best, Mahesh Balija, Calsoft Labs. On Tue, Feb 12, 2013

NullPointerException in Spring Data Hadoop with CDH4

2013-02-12 Thread Christian Schneider
Hi, I try to use Spring Data Hadoop with CDH4 to write a Map Reduce Job. On startup, I get the following exception: Exception in thread SimpleAsyncTaskExecutor-1 java.lang.ExceptionInInitializerError at org.springframework.data.hadoop.mapreduce.JobExecutor$2.run(JobExecutor.java:183)

Error for Pseudo-distributed Mode

2013-02-12 Thread yeyu1899
Hi all, I installed a redhat_enterprise-linux-x86 in VMware Workstation, and set the virtual machine 1G memory. Then I followed steps guided by Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode ——

Re: NullPointerException in Spring Data Hadoop with CDH4

2013-02-12 Thread Costin Leau
Hi, For Spring Data Hadoop problems, it's best to use the designated forum [1]. These being said I've tried to reproduce your error but I can't - I've upgraded the build to CDH 4.1.3 which runs fine against the VM on the CI (4.1.1). Maybe you have some other libraries on the client

RE: Error for Pseudo-distributed Mode

2013-02-12 Thread Vijay Thakorlal
Hi, Could you first try running the example: $ /usr/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar grep input output 'dfs[a-z.]+' Do you receive the same error? Not sure if it's related to a lack of RAM, but as the stack trace shows errors with network timeout (I

RE: number input files to mapreduce job

2013-02-12 Thread java8964 java8964
I don't think you can get list of all input files in the mapper, but what you can get is the current file's information. In the context object reference, you can get the InputSplit(), which should give you all the information you want of the current input file.

RE: Loader for small files

2013-02-12 Thread java8964 java8964
Hi, Davie: I am not sure I understand this suggestion. Why smaller block size will help this performance issue? From what the original question about, it looks like the performance problem is due to that there are a lot of small files, and each file will run in its own mapper. As hadoop needs

Re: NullPointerException in Spring Data Hadoop with CDH4

2013-02-12 Thread Christian Schneider
With the help of Costin I got a running Maven configuration. Thank you :). This is a pom.xml for Spring Data Hadoop and CDH4: project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://maven.apache.org/POM/4.0.0

Re: Decommissioning Nodes in Production Cluster.

2013-02-12 Thread Robert Molina
Hi Dhanasekaran, I believe you are trying to ask if it is recommended to use the decommissioning feature to remove datanodes from your cluster, the answer would be yes. As far as how to do it, there should be some information here http://wiki.apache.org/hadoop/FAQ that should help. Regards,

Re: Decommissioning Nodes in Production Cluster.

2013-02-12 Thread Benjamin Kim
Hi, I would like to add another scenario. What are the steps for removing a dead node when the server had a hard failure that is unrecoverable. Thanks, Ben On Tuesday, February 12, 2013 7:30:57 AM UTC-8, sudhakara st wrote: The decommissioning process is controlled by an exclude file, which

Re: Decommissioning Nodes in Production Cluster.

2013-02-12 Thread sudhakara st
The decommissioning process is controlled by an exclude file, which for HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude * property. In most cases, there is one shared file,referred to as the exclude file.This exclude file name should be specified

Re: Decommissioning Nodes in Production Cluster.

2013-02-12 Thread shashwat shriparv
On Tue, Feb 12, 2013 at 11:43 PM, Robert Molina rmol...@hortonworks.comwrote: to do it, there should be some information he this is best way to remove data node from a cluster. you have done the right thing. ∞ Shashwat Shriparv

Re: Loader for small files

2013-02-12 Thread Something Something
No, Yong, I believe you misunderstood. David's explanation makes sense. As pointed out in my original email, everything is going to 1 Mapper. It's not creating multiple mappers. BTW, the code given in my original email, indeed works as expected. It does trigger multiple mappers, but it doesn't

Java submit job to remote server

2013-02-12 Thread Alex Thieme
I apologize for asking what seems to be such a basic question, but I would use some help with submitting a job to a remote server. I have downloaded and installed hadoop locally in pseudo-distributed mode. I have written some Java code to submit a job. Here's the org.apache.hadoop.util.Tool

Re: Java submit job to remote server

2013-02-12 Thread Nitin Pawar
conf.set(mapred.job.tracker, localhost:9001); this means that your jobtracker is on port 9001 on localhost if you change it to the remote host and thats the port its running on then it should work as expected whats the exception you are getting? On Wed, Feb 13, 2013 at 2:41 AM, Alex Thieme

Re: Java submit job to remote server

2013-02-12 Thread Alex Thieme
Thanks for the prompt reply and I'm sorry I forgot to include the exception. My bad. I've included it below. There certainly appears to be a server running on localhost:9001. At least, I was able to telnet to that address. While in development, I'm treating the server on localhost as the remote

Re: Question related to Decompressor interface

2013-02-12 Thread George Datskos
Hello Can someone share some idea what the Hadoop source code of class org.apache.hadoop.io.compress.BlockDecompressorStream, method rawReadInt() is trying to do here? The BlockDecompressorStream class is used for block-based decompression (e.g. snappy). Each chunk has a header indicating

Re: Java submit job to remote server

2013-02-12 Thread Hemanth Yamijala
Can you please include the complete stack trace and not just the root. Also, have you set fs.default.name to a hdfs location like hdfs://localhost:9000 ? Thanks Hemanth On Wednesday, February 13, 2013, Alex Thieme wrote: Thanks for the prompt reply and I'm sorry I forgot to include the

Re: Delivery Status Notification (Failure)

2013-02-12 Thread Arun C Murthy
Pls don't cross-post, this belong only to cdh lists. On Feb 12, 2013, at 12:55 AM, samir das mohapatra wrote: Hi All, I wanted to know how to connect Hive(hadoop-cdh4 distribution) with MircoStrategy Any help is very helpfull. Witing for you response Note: It is little bit

Re: Delivery Status Notification (Failure)

2013-02-12 Thread Raj Vishwanathan
Arun I don't understand your reply!  Had you redirected this person to the hive mailing list I would have understood.. My philosophy any mailing list  has always been ; If I know the answer to a question, I reply.. Else I humbly walk away!.  I got a lot of help from this group for my (mostly

Re: Delivery Status Notification (Failure)

2013-02-12 Thread Konstantin Boudnik
With all due respect, sir, these mailing lists have certain rules, that aren't evidently coincide with your philosophy. Cos On Tue, Feb 12, 2013 at 08:45PM, Raj Vishwanathan wrote: Arun I don't understand your reply! ═Had you redirected this person to the hive mailing list I would have

Re: Anyway to load certain Key/Value pair fast?

2013-02-12 Thread Harsh J
Please do not use the general@ lists for any user-oriented questions. Please redirect them to user@hadoop.apache.org lists, which is where the user community and questions lie. I've moved your post there and have added you on CC in case you haven't subscribed there. Please reply back only to the

Re: Anyway to load certain Key/Value pair fast?

2013-02-12 Thread Harsh J
My reply to your questions is inline. On Wed, Feb 13, 2013 at 10:59 AM, Harsh J ha...@cloudera.com wrote: Please do not use the general@ lists for any user-oriented questions. Please redirect them to user@hadoop.apache.org lists, which is where the user community and questions lie. I've

Re: Anyway to load certain Key/Value pair fast?

2013-02-12 Thread William Kang
Hi Harsh, Thanks a lot for your reply and great suggestions. In the practical cases, the values usually do not reside in the same data node. Instead, they are mostly distributed by the key range itself. So, it does require 20G of memory, but distributed in different nodes. The MapFile solution

Re: Delivery Status Notification (Failure)

2013-02-12 Thread Raj Vishwanathan
Cos, I understand that there are rules. What are these rules? Is it Hive vs hadoop ( this I understand) or apache hadoop vs a specific distribution? ( this I am not clear about.) Sent from my iPad Please excuse the typos. On Feb 12, 2013, at 8:56 PM, Konstantin Boudnik c...@apache.org