Hi!
I'm using hadoop-0.20.2 on Debian Squeeze and ran into the same confusion
as many others with the parameter for dfs.datanode.du.reserved.
One day some data nodes got out of disk errors although there was space
left on the disks.
The following values are rounded to make the problem more
Hi All,
I wanted to know how to connect Hive(hadoop-cdh4 distribution) with
MircoStrategy
Any help is very helpfull.
Witing for you response
Note: It is little bit urgent do any one have exprience in that
Thanks,
samir
Hi Vikas,
You can get the FileSystem instance by calling
FileSystem.get(Configuration);
Once you get the FileSystem instance you can use
FileSystem.listStatus(InputPath); to get the fileStatus instances.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Feb 12, 2013
Hi,
I try to use Spring Data Hadoop with CDH4 to write a Map Reduce Job.
On startup, I get the following exception:
Exception in thread SimpleAsyncTaskExecutor-1
java.lang.ExceptionInInitializerError
at
org.springframework.data.hadoop.mapreduce.JobExecutor$2.run(JobExecutor.java:183)
Hi all,
I installed a redhat_enterprise-linux-x86 in VMware Workstation, and set the
virtual machine 1G memory.
Then I followed steps guided by Installing CDH4 on a Single Linux Node in
Pseudo-distributed Mode ——
Hi,
For Spring Data Hadoop problems, it's best to use the designated forum
[1]. These being said I've tried to reproduce your error but I can't -
I've upgraded the build to CDH 4.1.3 which runs fine against the VM on
the CI (4.1.1).
Maybe you have some other libraries on the client
Hi,
Could you first try running the example:
$ /usr/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
grep input output 'dfs[a-z.]+'
Do you receive the same error?
Not sure if it's related to a lack of RAM, but as the stack trace shows
errors with network timeout (I
I don't think you can get list of all input files in the mapper, but what you
can get is the current file's information.
In the context object reference, you can get the InputSplit(), which should
give you all the information you want of the current input file.
Hi, Davie:
I am not sure I understand this suggestion. Why smaller block size will help
this performance issue?
From what the original question about, it looks like the performance problem
is due to that there are a lot of small files, and each file will run in its
own mapper.
As hadoop needs
With the help of Costin I got a running Maven configuration.
Thank you :).
This is a pom.xml for Spring Data Hadoop and CDH4:
project xmlns=http://maven.apache.org/POM/4.0.0;
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation=http://maven.apache.org/POM/4.0.0
Hi Dhanasekaran,
I believe you are trying to ask if it is recommended to use the
decommissioning feature to remove datanodes from your cluster, the answer
would be yes. As far as how to do it, there should be some information
here http://wiki.apache.org/hadoop/FAQ that should help.
Regards,
Hi,
I would like to add another scenario. What are the steps for removing a
dead node when the server had a hard failure that is unrecoverable.
Thanks,
Ben
On Tuesday, February 12, 2013 7:30:57 AM UTC-8, sudhakara st wrote:
The decommissioning process is controlled by an exclude file, which
The decommissioning process is controlled by an exclude file, which for
HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by
the*mapred.hosts.exclude
* property. In most cases, there is one shared file,referred to as the
exclude file.This exclude file name should be specified
On Tue, Feb 12, 2013 at 11:43 PM, Robert Molina rmol...@hortonworks.comwrote:
to do it, there should be some information he
this is best way to remove data node from a cluster. you have done the
right thing.
∞
Shashwat Shriparv
No, Yong, I believe you misunderstood. David's explanation makes sense. As
pointed out in my original email, everything is going to 1 Mapper. It's
not creating multiple mappers.
BTW, the code given in my original email, indeed works as expected. It
does trigger multiple mappers, but it doesn't
I apologize for asking what seems to be such a basic question, but I would use
some help with submitting a job to a remote server.
I have downloaded and installed hadoop locally in pseudo-distributed mode. I
have written some Java code to submit a job.
Here's the org.apache.hadoop.util.Tool
conf.set(mapred.job.tracker, localhost:9001);
this means that your jobtracker is on port 9001 on localhost
if you change it to the remote host and thats the port its running on then
it should work as expected
whats the exception you are getting?
On Wed, Feb 13, 2013 at 2:41 AM, Alex Thieme
Thanks for the prompt reply and I'm sorry I forgot to include the exception. My
bad. I've included it below. There certainly appears to be a server running on
localhost:9001. At least, I was able to telnet to that address. While in
development, I'm treating the server on localhost as the remote
Hello
Can someone share some idea what the Hadoop source code of class
org.apache.hadoop.io.compress.BlockDecompressorStream, method
rawReadInt() is trying to do here?
The BlockDecompressorStream class is used for block-based decompression
(e.g. snappy). Each chunk has a header indicating
Can you please include the complete stack trace and not just the root.
Also, have you set fs.default.name to a hdfs location like
hdfs://localhost:9000 ?
Thanks
Hemanth
On Wednesday, February 13, 2013, Alex Thieme wrote:
Thanks for the prompt reply and I'm sorry I forgot to include the
Pls don't cross-post, this belong only to cdh lists.
On Feb 12, 2013, at 12:55 AM, samir das mohapatra wrote:
Hi All,
I wanted to know how to connect Hive(hadoop-cdh4 distribution) with
MircoStrategy
Any help is very helpfull.
Witing for you response
Note: It is little bit
Arun
I don't understand your reply! Had you redirected this person to the hive
mailing list I would have understood..
My philosophy any mailing list has always been ; If I know the answer to a
question, I reply.. Else I humbly walk away!.
I got a lot of help from this group for my (mostly
With all due respect, sir, these mailing lists have certain rules, that aren't
evidently coincide with your philosophy.
Cos
On Tue, Feb 12, 2013 at 08:45PM, Raj Vishwanathan wrote:
Arun
I don't understand your reply! ═Had you redirected this person to the hive
mailing list I would have
Please do not use the general@ lists for any user-oriented questions.
Please redirect them to user@hadoop.apache.org lists, which is where
the user community and questions lie.
I've moved your post there and have added you on CC in case you
haven't subscribed there. Please reply back only to the
My reply to your questions is inline.
On Wed, Feb 13, 2013 at 10:59 AM, Harsh J ha...@cloudera.com wrote:
Please do not use the general@ lists for any user-oriented questions.
Please redirect them to user@hadoop.apache.org lists, which is where
the user community and questions lie.
I've
Hi Harsh,
Thanks a lot for your reply and great suggestions.
In the practical cases, the values usually do not reside in the same
data node. Instead, they are mostly distributed by the key range
itself. So, it does require 20G of memory, but distributed in
different nodes.
The MapFile solution
Cos,
I understand that there are rules. What are these rules? Is it Hive vs hadoop (
this I understand) or apache hadoop vs a specific distribution? ( this I am
not clear about.)
Sent from my iPad
Please excuse the typos.
On Feb 12, 2013, at 8:56 PM, Konstantin Boudnik c...@apache.org
27 matches
Mail list logo