The block scanner is a simple, independent operation of the DN that
runs periodically and does work in small phases, to ensure that no
blocks exist that aren't matching their checksums (its an automatic
data validator) - such that it may report corrupt/rotting blocks and
keep the cluster healthy.
Thanks Harsh & Manoj for the inputs.
Now i found that the data node is busy with block scanning. I have TBs data
attached with each data node. So its taking days to complete the data block
scanning. I have two questions.
1. Is data node will not allow to write the data during DataBlockScanning
pr
When you run with java -jar, as previously stated on another thread,
you aren't loading any configs present on the installation (that
configure HDFS to be the default filesystem).
When you run with "hadoop jar", the configs under /etc/hadoop/conf get
applied automatically to your program, making i
Oops, moving for sure this time :)
On Wed, May 1, 2013 at 10:35 AM, Harsh J wrote:
> Moving the question to Apache Avro's user@ lists. Please use the right
> lists for the most relevant answers.
>
> Avro is a different serialization technique that intends to replace
> the Writable serialization d
Moving the question to Apache Avro's user@ lists. Please use the right
lists for the most relevant answers.
Avro is a different serialization technique that intends to replace
the Writable serialization defaults in Hadoop. MR accepts a list of
serializers it can use for its key/value structures an
What do you mean "increasing the size"? Im talking more about increasing the
number of partitions... Which actually decreases individual file size.
On Apr 30, 2013, at 4:09 PM, Mohammad Tariq wrote:
> Increasing the size can help us to an extent, but increasing it further might
> cause proble
Thank you Mitra..I will change the hostname
On Tue, Apr 30, 2013 at 6:16 PM, Mitra Kaseebhotla <
mitra.kaseebho...@gmail.com> wrote:
> and change the hostname to reflect your actual hostnames.
>
>
>
> On Tue, Apr 30, 2013 at 3:14 PM, Mohammad Tariq wrote:
>
>> comment out 127.0.1.1 ubuntu in bot
Thank you Tariq. I will try that...
On Tue, Apr 30, 2013 at 6:14 PM, Mohammad Tariq wrote:
> comment out 127.0.1.1 ubuntu in both the machines.
>
> if it still doesn't work change 127.0.1.1master to something else,
> like 127.0.0.3 or something.
>
> Warm Regards,
> Tariq
> https://mtariq.ju
and change the hostname to reflect your actual hostnames.
On Tue, Apr 30, 2013 at 3:14 PM, Mohammad Tariq wrote:
> comment out 127.0.1.1 ubuntu in both the machines.
>
> if it still doesn't work change 127.0.1.1master to something else,
> like 127.0.0.3 or something.
>
> Warm Regards,
> Ta
comment out 127.0.1.1 ubuntu in both the machines.
if it still doesn't work change 127.0.1.1master to something else, like
127.0.0.3 or something.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 3:34 AM, Automation Me wrote:
> Hi Tariq,
>
>
> Mas
Hi Tariq,
Master:
Users:
hduser hduser
hostname:
ubuntu
*etc/hosts*
127.0.0.1 localhost
127.0.1.1 ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127
@Mitra Yes I cloned the same VM's. By default Ubuntu takes 127.0.0.1
-ubuntu hostname for all machines
@Tariq i will send the hosts file and users of all the machines.
On Tue, Apr 30, 2013 at 5:42 PM, Mitra Kaseebhotla <
mitra.kaseebho...@gmail.com> wrote:
> Looks like you have just cloned/copi
Looks like you have just cloned/copied the same VMs. Change the hostname of
each:
http://askubuntu.com/questions/87665/how-do-i-change-the-hostname-without-a-restart
On Tue, Apr 30, 2013 at 2:30 PM, Automation Me wrote:
> Thank you Tariq.
>
> I am using the same username on both the machines
show me your /etc/hosts file along with the output of "users" and
"hostname" on both the machines.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 3:00 AM, Automation Me wrote:
> Thank you Tariq.
>
> I am using the same username on both the machines
Thank you Tariq.
I am using the same username on both the machines and when i try to copy a
file master to slave just to make sure SSH is working fine, The file is
copying into master itself not an slave machine.
scp -r /usr/local/somefile hduser@slave:/usr/local/somefile
Any suggestions...
ssh is actually *user@some_machine *to *user@some_other_machine*. either
use same username on both the machines or add the IPs along with proper
user@hostname in /etc/hosts file.
HTH
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 2:39 AM, Automation M
Hello,
I am new to Hadoop and trying to install multinode cluster on ubuntu VM's.I
am not able to communicate between two clusters using SSH.
My host file:
127.0.1.1 Master
127.0.1.2 Slave
The following changes i made in two VM's
1.Updated the etc/hosts file in two vm's
on Master VM
i did SS
Increasing the size can help us to an extent, but increasing it further
might cause problems during copy and shuffle. If the partitions are too big
to be held in the memory, we'll end up with *disk based shuffle* which is
gonna be slower than *RAM based shuffle,* thus delaying the entire reduce
pha
Yes it is a problem at the first stage. What I'm wondering, though, is
wether the intermediate results - which happen after the mapper phase - can
be optimized.
On Tue, Apr 30, 2013 at 3:38 PM, Mohammad Tariq wrote:
> Hmmm. I was actually thinking about the very first step. How are you going
>
Hmmm. I was actually thinking about the very first step. How are you going
to create the maps. Suppose you are on a block-less filesystem and you have
a custom Format that is going to give you the splits dynamically. This mean
that you are going to store the file as a whole and create the splits as
Tariq,
Thank you. I tried this and the summary of the map reduce job looks like:
13/04/30 14:02:35 INFO mapred.JobClient: Job complete: job_201304301251_0004
13/04/30 14:02:35 INFO mapred.JobClient: Counters: 7
13/04/30 14:02:35 INFO mapred.JobClient: Job Counters
13/04/30 14:02:35 INF
Well, to be more clear, I'm wondering how hadoop-mapreduce can be optimized
in a block-less filesystem... And am thinking about application tier ways
to simulate blocks - i.e. by making the granularity of partitions smaller.
Wondering, if there is a way to hack an increased numbers of partitions a
Hello Jay,
What are you going to do in your custom InputFormat and partitioner?Is
your InputFormat is going to create larger splits which will overlap with
larger blocks?If that is the case, IMHO, then you are going to reduce the
no. of mappers thus reducing the parallelism. Also, much larger
Hi guys:
Im wondering - if I'm running mapreduce jobs on a cluster with large block
sizes - can i increase performance with either:
1) A custom FileInputFormat
2) A custom partitioner
3) -DnumReducers
Clearly, (3) will be an issue due to the fact that it might overload tasks
and network traffi
We/I are/am making progress. Now I get the error:
13/04/30 12:59:40 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
13/04/30 12:59:40 INFO mapred.JobClient: Cleaning up the staging area
hdfs://devubuntu05:9000/data/had
Set "HADOOP_MAPRED_HOME" in your hadoop-env.sh file and re-run the job. See
if it helps.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Tue, Apr 30, 2013 at 10:10 PM, Kevin Burton wrote:
> To be clear when this code is run with ‘java –jar’ it runs without
> exception. Th
Hai Hadoop,
--
*
With Best Regards
Manoj Kumar Sahu
Ameerpet,
Hyderabad-500016.
8374232928 /7842496524
*
Pl. *Save a tree. Please don't print this e-mail unless you really need
to...*
--
Regards
N.H Sandeep
Sorry Kevin, I was away for a while. Are you good now?
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Tue, Apr 30, 2013 at 9:50 PM, Arpit Gupta wrote:
> Kevin
>
> You will have create a new account if you did not have one before.
>
> --
> Arpit
>
> On Apr 30, 2013, at 9
To be clear when this code is run with 'java -jar' it runs without
exception. The exception occurs when I run with 'hadoop jar'.
From: Kevin Burton [mailto:rkevinbur...@charter.net]
Sent: Tuesday, April 30, 2013 11:36 AM
To: user@hadoop.apache.org
Subject: Can't initialize cluster
I have a
I have a simple MapReduce job that I am trying to get to run on my cluster.
When I run it I get:
13/04/30 11:27:45 INFO mapreduce.Cluster: Failed to use
org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid
"mapreduce.jobtracker.address" configuration value for LocalJobRunn
Kevin
You will have create a new account if you did not have one before.
--
Arpit
On Apr 30, 2013, at 9:11 AM, Kevin Burton wrote:
I don’t see a “create issue” button or tab. If I need to log in then I am
not sure what credentials I should use to log in because all I tried failed.
*From
I am not sure how to create a jira.
Again I am not sure I understand your workaround. You are suggesting that I
create /data/hadoop/tmp on HDFS like:
sudo -u hdfs hadoop fs -mkdir /data/hadoop/tmp
I don't think I can chmod -R 777 on /data since it is a disk and as I
indicated it is bein
I am not clear on what you are suggesting to create on HDFS or the local
file system. As I understand it hadoop.tmp.dir is the local file system. I
changed it so that the temporary files would be on a disk that has more
capacity then /tmp. So you are suggesting that I create /data/hadoop/tmp on
HDF
ah
this is what mapred.sytem.dir defaults to
mapred.system.dir
${hadoop.tmp.dir}/mapred/system
The directory where MapReduce stores control files.
So thats why its trying to write to /data/hadoop/tmp/hadoop-mapred/mapred/system
So if you want hadoop.tmp.dir to be /data/hadoop/tmp/
Unsubscribe
In core-site.xml I have:
fs.default.name
hdfs://devubuntu05:9000
The name of the default file system. A URI whose scheme and
authority determine the FileSystem implementation.
In hdfs-site.xml I have
hadoop.tmp.dir
/data/hadoop/tmp/hadoop-${user.name}
Hadoop tempo
Hi everyone
We have implemented a cluster with Apache version 1.0.4
Due to some corporation issues, we need to change the version to Cloudera
CDH 4.2.0.
1.How to uninstall the current Hadoop without any unnecessary files leave
behind?
2.Any preparation for this change?
BRs
Geelong
--
>From Go
Based on the logs your system dir is set to
> hdfs://devubuntu05:9000/data/hadoop/tmp/hadoop-mapred/mapred/system
what is your fs.default.name and hadoop.tmp.dir in core-site.xml set to?
--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/
On Apr 30, 2013, at 7:39 AM, "Kevin Burton" wrot
Thank you.
mapred.system.dir is not set. I am guessing that it is whatever the default
is. What should I set it to?
/tmp is already 777
kevin@devUbuntu05:~$ hadoop fs -ls /tmp
Found 1 items
drwxr-xr-x - hdfs supergroup 0 2013-04-29 15:45 /tmp/mapred
kevin@devUbuntu05:~$
what is your mapred.system.dir set to in mapred-site.xml?
By default it will write to /tmp on hdfs.
So you can do the following
create /tmp on hdfs and chmod it to 777 as user hdfs and then restart
jobtracker and tasktrackers.
In case its set to /mapred/something then create /mapred and chown
To further complicate the issue the log file in
(/var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-devUbuntu05.log) is
owned by mapred:mapred and the name of the file seems to indicate some other
lineage (hadoop,hadoop). I am out of my league in understanding the
permission structure for hado
That is what I perceive as the problem. The hdfs file system was created
with the user 'hdfs' owning the root ('/') but for some reason with a M/R
job the user 'mapred' needs to have write permission to the root. I don't
know how to satisfy both conditions. That is one reason that I relaxed the
per
Hi,
When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
that the output of K and V of AvroMapper isnt writable and neither the key
is comparable (these are AvroKey and AvroValue). As the general
serialization mechanism is writable , how is the K,V pairs in case of avro
, tr
I have relaxed it even further so now it is 775
kevin@devUbuntu05:/var/log/hadoop-0.20-mapreduce$ hadoop fs -ls -d /
Found 1 items
drwxrwxr-x - hdfs supergroup 0 2013-04-29 15:43 /
But I still get this error:
2013-04-30 07:43:02,520 FATAL org.apache.hadoop.mapred.JobTracker
I don't think you can control how many reducers can run parallely via
framework.
Other way to do this is increase the memory given to individual reducer so
that the tasktracker will be limited by memory to launch more reducers at
the same time and they will queue up
you can try setting up this ma
Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
is 8.
And for this job, I want it to be 4.
I set it through conf and build the job with this conf, then submit it. But
hadoop lauches 8 reduce per datanode...
2013/4/30 Nitin Pawar
> so basically if I understand corre
so basically if I understand correctly
you want to limit the # parallel execution of reducers only for this job?
On Tue, Apr 30, 2013 at 4:02 PM, Han JU wrote:
> Thanks.
>
> In fact I don't want to set reducer or mapper numbers, they are fine.
> I want to set the reduce slot capacity of my cl
Hi Pralabh
* *
1.The Map input bytes couter belongs to the MapReduce FrameWork. The
hadoop defintive explains that:
The number of bytes of uncompressed input consumed by all the maps in the
job. Incremented every time a record is read from a RecordReader and
passed
to the map’s map() me
Thanks.
In fact I don't want to set reducer or mapper numbers, they are fine.
I want to set the reduce slot capacity of my cluster when it executes my
specific job. Say I have 100 reduce tasks for this job, I want my cluster
to execute 4 of them in the same time, not 8 of them in the same time, on
forgot to add there is similar method for reducer as well
job.setNumReduceTasks(0);
On Tue, Apr 30, 2013 at 3:56 PM, Nitin Pawar wrote:
> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> se
The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
maximum number of reduce tasks that may be run by an individual TaskTracker
server at one time. This is not per job configuration.
he number of map tasks for a given job is driven by the number of input
splits and not by the
Thanks Nitin.
What I need is to set slot only for a specific job, not for the whole
cluster conf.
But what I did does NOT work ... Have I done something wrong?
2013/4/30 Nitin Pawar
> The config you are setting is for job only
>
> But if you want to reduce the slota on tasktrackers then you wi
The config you are setting is for job only
But if you want to reduce the slota on tasktrackers then you will need to
edit tasktracker conf and restart tasktracker
On Apr 30, 2013 3:30 PM, "Han JU" wrote:
> Hi,
>
> I want to change the cluster's capacity of reduce slots on a per job
> basis. Orig
Hi,
I want to change the cluster's capacity of reduce slots on a per job basis.
Originally I have 8 reduce slots for a tasktracker.
I did:
conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
...
Job job = new Job(conf, ...)
And in the web UI I can see that for this job, the max reduce tas
57 matches
Mail list logo