Re: Realtime sensor's tcpip data to hadoop

2014-05-13 Thread Azuryy Yu
Hi Alex,

you can try Apache Flume.


On Wed, May 7, 2014 at 10:48 AM, Alex Lee eliy...@hotmail.com wrote:

 Sensors' may send tcpip data to server. Each sensor may send tcpip data
 like a stream to the server, the quatity of the sensors and the data rate
 of the data is high.

 Firstly, how the data from tcpip can be put into hadoop. It need to do
 some process and store in hbase. Does it need through save to data files
 and put into hadoop or can be done in some direct ways from tcpip. Is there
 any software module can take care of this. Searched that Ganglia Nagios and
 Flume may do it. But when looking into details, ganglia and nagios are
 more for monitoring hadoop cluster itself. Flume is for log files.

 Secondly, if the total network traffic from sensors are over the limit of
 one lan port, how to share the loads, is there any component in hadoop to
 make this done automatically.

 Any suggestions, thanks.



Re: speed of replication for under replicated blocks by namenode

2014-05-13 Thread Ravi Prakash

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Chandra!

Replication is done according to priority (e.g. where only 1 block out
of 3 remains is higher priority than when only 2 out of 3 remain).
Every time a DN heartbeats into the NN, it *may* be assigned some
replication work according to some criteria. See
dfs.namenode.replication.work.multiplier.per.iteration .
The list of blocks that need to be replicated is calculated once every
dfs.namenode.replication.interval

HTH
Ravi

On 05/11/14 07:20, chandra kant wrote:
 Hi,

 Some of my blocks are under replicated. Can anybody tell me at what speed
 namenode replicates it so that dfs.replication factor is satisfied and
 blocks become normal again? Or does it solely depend upon load factor on
 namenode and it can't be measured accurately?

 thanx
 Chandra


-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTcbSfAAoJEGunL/HJl4XeL+AP/185/Sn72CH8RO67Z2eSUmSY
WjUhaZ1hs9yGHjlxS/tIm06QLl/ftvDaYmMI9Es4rpMfZqrYfwr6lMUVvDFRKyKO
5J+5Z9WhmMBMWkR5M707B5rY2TvCMUgnbQITZT9gA5XPwh1N/wiiDweoFNOGnxaw
yQuunzkEwyFiFFHlQuH9PyQeHTUu/enYpRM7ILEMimYZOufXvQ8r34ye0EqGQwJo
y7HsOvlT45qIfQPVuZ7oYgu1AWDxY/5nu1A57XFRIdbZjpYBoJYtKlcUSDq9Tfvc
H+5F+7pl+6gf2UTNKNUz3ZqbuRc48Ex85lp/lq2wbUQcNgQ81bRsMv1opyliYP1R
6Eeaa1x7KicxpQZvIsYtjCQHXn+HFTqRw7oKEAEe8cO2YAGFgj1uGAwYK7elmNoq
79JXLMs7PTj5PxvwgbfQgWS9Ec7JPwdebM9jBRgi/EdH5mG5liXPC5D7/P4j03Hu
ZN9mXyQhxVAwYJL5Hm29Gk1vq69NkUwXmYhjfjEDYGaDyakFT7Cboyh2JHJZm08B
TDXzK2i0plv9G75I5Dk+4zLJpPk/EeLQTGjoV/xI9Pc4TMFr8e5sDPcZCxM8/D68
X9OI0Qx1HCC717jrYZ1JpTxMSnh5JBnHoz/rpqx4HGrL/nQh5lM1oq4OFUFgNwv8
TR0Tvaalk119g037o1qm
=RCPv
-END PGP SIGNATURE-



Re: Conversion from MongoDB to hadoop

2014-05-13 Thread Raj K Singh
are you using aggregation mapReduce feature of MongoDb or some scripting
language(python) to emit key/value pair?


Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Mon, May 12, 2014 at 11:18 AM, Ranjini Rathinam
ranjinibe...@gmail.comwrote:

 Hi,

 How to convert the Mapreduce from MongoDB  to Mapreduce Hadoop.

 Please suggest .

 Thanks in advance.

 Ranjini.



Re: Data node with multiple disks

2014-05-13 Thread kishore alajangi
replication factor=1


On Tue, May 13, 2014 at 11:04 AM, SF Hadoop sfhad...@gmail.com wrote:

 Your question is unclear. Please restate and describe what you are
 attempting to do.

 Thanks.


 On Monday, May 12, 2014, Marcos Sousa falecom...@marcossousa.com wrote:

 Hi,

 I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be
 my datanode:

 /vol1/hadoop/data
 /vol2/hadoop/data
 /vol3/hadoop/data
 /volN/hadoop/data

 How do user those distinct discs not to replicate?

 Best regards,

 --
 Marcos Sousa




-- 
Thanks,
Kishore.


Re: Data node with multiple disks

2014-05-13 Thread Nitin Pawar
Hi Marcos,
If these discs are not shared across nodes, I would not worry. Hadoop takes
care of making sure data is not replicated to single node.

But if all these 20 nodes are sharing these 10 HDD's,

Then you may have to basically assign specific disc to specific node and
make your cluster rack aware so that the replication in same rack would go
to different node and replication to second rack will to new disc.





On Tue, May 13, 2014 at 1:38 PM, kishore alajangi alajangikish...@gmail.com
 wrote:

 replication factor=1


 On Tue, May 13, 2014 at 11:04 AM, SF Hadoop sfhad...@gmail.com wrote:

 Your question is unclear. Please restate and describe what you are
 attempting to do.

 Thanks.


 On Monday, May 12, 2014, Marcos Sousa falecom...@marcossousa.com wrote:

 Hi,

 I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be
 my datanode:

 /vol1/hadoop/data
 /vol2/hadoop/data
 /vol3/hadoop/data
 /volN/hadoop/data

 How do user those distinct discs not to replicate?

 Best regards,

 --
 Marcos Sousa




 --
 Thanks,
 Kishore.




-- 
Nitin Pawar


Re: LVM to JBOD conversion without data loss

2014-05-13 Thread Akira AJISAKA

Hi Bharath,

The steps are not correct for me. Data loss can happen if you reduce the 
replication and remove a DataNode at the same time.


1) decomission a DataNode (or some DataNodes)

2) change the configuration of the DataNode(s)

3) add the DataNode(s) to the cluster

repeat 1) - 3) for all the DataNodes.

Regards,
Akira

(2014/05/12 16:18), Bharath Kumar wrote:

Hi I am a query regarding JBOD ,

I sit possible to migrate from LVM to JBOD without loosing data ? Is
there any reference documentation ?

The steps which i can think of is

1) reduce the replication

2) change the hdfs-site.xml in namenode

3) add the data node back with new JBOD directory structure for haddop

4) rebalance the cluster

repeat 1 -4 for all data nodes is the steps correct ?

--
Warm Regards,
/Bharath Kumar /





Re: LVM to JBOD conversion without data loss

2014-05-13 Thread Ravi Prakash

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

One way I can think of is decomissioning the nodes and then basically
re-imaging it however you want to. Is that not an option?

On 05/12/14 00:18, Bharath Kumar wrote:
 Hi I am a query regarding JBOD ,

 I sit possible to migrate from LVM to JBOD without loosing data ? Is there
 any reference documentation ?

 The steps which i can think of is

 1) reduce the replication

 2) change the hdfs-site.xml in namenode

 3) add the data node back with new JBOD directory structure for haddop

 4) rebalance the cluster

 repeat 1 -4 for all data nodes is the steps correct ?


-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTcbU5AAoJEGunL/HJl4XecH0P/0FIpstANjdkzpxKD2of1TRd
LZKbyWUAbj3EZB06yDMunNCsj9Bg5BwsuRV3LFVUmwRv1ymW/hXQMgn8torPiu45
csmYX2ihMNlMqQO7u3HyaDHpHnTkBTmtypfWWXVe2LNjnrpmkcJpl8dU2FuEm7ga
E8w46pnvB/vSGFQQ3tWOCdCknKM+0ykcrcy64Xpe0gx92MSgyEL5I2yxrOA3sSru
QvU4ko4q8Q5OT6oOFBw5lNiVJ6FyiQj0y5499o9zV5xUFL39a+R/nqZpiJ0Pgm2d
ytZVheHLYZZSt2gZYjAGvikLveez/ycCJEx4YA2j5FlpMycpeshLmJ1K/EAbIY/a
Yv2hsf2GQB6pVZisOfL179TwTcII8xJ3/JviyrdRb93anVT9sgicgJJrFtQN+mcy
exrT4R5Y/Fl1pxkFUHJj4qz9rSuwC2WlUjpjGpA8Lf9DsIC5ONzzDTSux+e4WfCD
wBjSd2fZ8F6/KPoHMvfMUVwMoT7RLtUuWpw87P5GshdzjD0Bx0HhEMK5+XGn/GyP
VU2ahcFrjFok5FoaysbY8QjKqn7gMT2b1hG7jM8bm/4MPLOYWfYKpRghClQFGbD1
Xb7p5pshqlctZw1X+ZfxyB2khrSsvd3GCI1PwK62mdehL59klpVkNQs+FXILQ6sF
1lodqxtUgXcRE0hXwhV7
=CR33
-END PGP SIGNATURE-



all tasks failing for MR job on Hadoop 2.4

2014-05-13 Thread Gäde , Sebastian
Hi,

I've set up a Hadoop 2.4 cluster with three nodes. Namenode and Resourcemanager 
are running on one node, Datanodes and Nodemanagers on the other two. All 
services are starting up without problems (as far as I can see), web apps show 
all nodes as running.

However, I am not able to run MapReduce jobs:
yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 100
submits the job, it appears in the web app, but state is stuck in ACCEPTED. 
Instead I'm receiving messages:

14/05/13 12:15:48 INFO mapreduce.Job: Task Id : 
attempt_1399971492349_0004_m_00_0, Status : FAILED
14/05/13 12:15:48 INFO mapreduce.Job: Task Id : 
attempt_1399971492349_0004_m_01_0, Status : FAILED


the log shows:

2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration: 
job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir;  
Ignoring.
2014-05-13 12:15:27,896 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2014-05-13 12:15:28,146 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 
10 second(s).
2014-05-13 12:15:28,146 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
started
2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Executing with tokens:
2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: 
mapreduce.job, Service: job_1399971492349_0004, Ident: 
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Sleeping for 0ms before retrying again. Got null now.
2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address 
change detected. Old: localhost/127.0.1.1:41395 New: localhost/127.0.0.1:41395
2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.net.ConnectException: Call From 
hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on 
connection exception: java.net.ConnectException: Verbindungsaufbau abgelehnt; 
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 

enable regular expression on which parameter?

2014-05-13 Thread Avinash Kujur
mapreduce-5851
i can see many parameters in Distcp class. in which parameter do we need to
enable regular expressions?

private static final String usage = NAME
  +  [OPTIONS] srcurl* desturl +
  \n\nOPTIONS: +
  \n-p[rbugp]  Preserve status +
  \n   r: replication number +
  \n   b: block size +
  \n   u: user +
  \n   g: group +
  \n   p: permission +
  \n   -p alone is equivalent to -prbugp +
  \n-i Ignore failures +
  \n-log logdir  Write logs to logdir +
  \n-m num_maps  Maximum number of simultaneous copies +
  \n-overwrite Overwrite destination +
  \n-updateOverwrite if src size different from dst
size +
  \n-f urilist_uri   Use list at urilist_uri as src list +
  \n-filelimit n Limit the total number of files to be = n
+
  \n-sizelimit n Limit the total size to be = n bytes +
  \n-deleteDelete the files existing in the dst but
not in src +
  \n-mapredSslConf f Filename of SSL configuration for mapper
task +

  \n\nNOTE 1: if -overwrite or -update are set, each source URI is  +
  \n  interpreted as an isomorphic update to an existing
directory. +
  \nFor example: +
  \nhadoop  + NAME +  -p -update \hdfs://A:8020/user/foo/bar\  +
  \hdfs://B:8020/user/foo/baz\\n +
  \n would update all descendants of 'baz' also in 'bar'; it would
 +
  \n *not* update /user/foo/baz/bar +

  \n\nNOTE 2: The parameter n in -filelimit and -sizelimit can be  +
  \n specified with symbolic representation.  For examples, +
  \n   1230k = 1230 * 1024 = 1259520 +
  \n   891g = 891 * 1024^3 = 956703965184 +

  \n;


Re: Data node with multiple disks

2014-05-13 Thread SF Hadoop
Your question is unclear. Please restate and describe what you are
attempting to do.

Thanks.


On Monday, May 12, 2014, Marcos Sousa falecom...@marcossousa.com wrote:

 Hi,

 I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be
 my datanode:

 /vol1/hadoop/data
 /vol2/hadoop/data
 /vol3/hadoop/data
 /volN/hadoop/data

 How do user those distinct discs not to replicate?

 Best regards,

 --
 Marcos Sousa



Using Lookup file in mapreduce

2014-05-13 Thread Siddharth Tiwari
Hi team

I have a huge lookup file around 5 GB and I need to use it to map users to 
categories in my mapreduce job. Can you suggest the best way to achieve it ?

Sent from my iPhone

Re: Data node with multiple disks

2014-05-13 Thread Aitor Perez Cedres


If you specify a list in the property dfs.datanode.data.dir hadoop 
will distribute the data blocks among all those disks; it will not 
replicate data between them. If you want to use the disks as a single 
one you gotta make a LVM array or any other solution to present them as 
a single one to the OS.


However, benchmarks prove that specifying a list of disks and letting 
hadoop distribute data among them gives better performance.


On 13/05/14 17:12, Marcos Sousa wrote:

Yes,

I don't want to replicate, just use as one disk? Isn't possible to 
make this work?


Best regards,

Marcos


On Tue, May 13, 2014 at 6:55 AM, Rahul Chaudhari 
rahulchaudhari0...@gmail.com mailto:rahulchaudhari0...@gmail.com 
wrote:


Marcos,
While configuring hadoop, the dfs.datanode.data.dir property
in hdfs-default.xml should have this list of disks specified on
separate line. If you specific comma separated list, it will
replicate on all those disks/partitions.

_Rahul
Sent from my iPad

 On 13-May-2014, at 12:22 am, Marcos Sousa
falecom...@marcossousa.com mailto:falecom...@marcossousa.com
wrote:

 Hi,

 I have 20 servers with 10 HD with 400GB SATA. I'd like to use
them to be my datanode:

 /vol1/hadoop/data
 /vol2/hadoop/data
 /vol3/hadoop/data
 /volN/hadoop/data

 How do user those distinct discs not to replicate?

 Best regards,

 --
 Marcos Sousa




--
Marcos Sousa
www.marcossousa.com http://www.marcossousa.com Enjoy it!


--
*Aitor Pérez*
/Big Data System Engineer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Questions about Hadoop logs and mapred.local.dir

2014-05-13 Thread sam liu
Hi Experts,

1. The size of mapred.local.dir is big(30 GB), how many methods could clean
it correctly?
2. For logs of NameNode/DataNode/JobTracker/TaskTracker, are they all
rolling type log? What's their max size? I can not find the specific
settings for them in log4j.properties.
3. I find the size of dfs.name.dir and dfs.data.dir is very big now, are
there any files under them could be removed actually? Or all files under
the two folders could not be removed at all?

Thanks!


Re: No job can run in YARN (Hadoop-2.2)

2014-05-13 Thread Tao Xiao
The *FileNotFoundException* was thrown when I tried to submit a job
calculating PI, actually there is no such exception thrown when I submit a
wordcount job, but I can still see Exception from container-launch... 
 and any other jobs would throw such exceptions.

Every job runs successfully when I commented out properties
*mapreduce.map.java.opts*
and *mapreduce.reduce.java.opts.*

Indeed sounds odd, but I think maybe it is because that these two
properties conflict with other memory-related properties, so the container
can not be launched.


2014-05-12 3:37 GMT+08:00 Jay Vyas jayunit...@gmail.com:

 Sounds oddSo (1) you got a filenotfound exception and (2) you fixed it
 by commenting out memory specific config parameters?

 Not sure how that would work... Any other details or am I missing
 something else?

 On May 11, 2014, at 4:16 AM, Tao Xiao xiaotao.cs@gmail.com wrote:

 I'm sure this problem is caused by the incorrect configuration. I
 commented out all the configurations regarding memory, then jobs can run
 successfully.


 2014-05-11 0:01 GMT+08:00 Tao Xiao xiaotao.cs@gmail.com:

 I installed Hadoop-2.2 in a cluster of 4 nodes, following Hadoop YARN
 Installation: The definitive 
 guidehttp://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide
 .

 The configurations are as follows:

 ~/.bashrc http://pastebin.com/zQgwuQv2 
 core-site.xmlhttp://pastebin.com/rBAaqZps
  hdfs-site.xml http://pastebin.com/bxazvp2G   
 mapred-site.xml
 http://pastebin.com/N00SsMbzslaveshttp://pastebin.com/8VjsZ1uu
   yarn-site.xml http://pastebin.com/XwLQZTQb


 I started NameNode, DataNodes, ResourceManager and NodeManagers
 successfully, but no job can run successfully. For example, I  run the
 following job:

 [root@Single-Hadoop ~]#yarn jar
 /var/soft/apache/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
 pi 2 4

 The output is as follows:

 14/05/10 23:56:25 INFO mapreduce.Job: Task Id :
 attempt_1399733823963_0004_m_00_0, Status : FAILED
 Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
  at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:662)



 14/05/10 23:56:25 INFO mapreduce.Job: Task Id :
 attempt_1399733823963_0004_m_01_0, Status : FAILED
 Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
  at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
  at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
  at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)

 ... ...


 14/05/10 23:56:36 INFO mapreduce.Job:  map 100% reduce 100%
 14/05/10 23:56:37 INFO mapreduce.Job: Job job_1399733823963_0004 failed
 with state FAILED due to: Task failed task_1399733823963_0004_m_00
 Job failed as tasks failed. failedMaps:1 failedReduces:0

 14/05/10 23:56:37 INFO mapreduce.Job: Counters: 10
 Job Counters
 Failed map tasks=7
  Killed map tasks=1
 Launched map tasks=8
 Other local map tasks=6
  Data-local map tasks=2
 Total time spent by all maps in occupied slots (ms)=21602
 Total time spent by all reduces in occupied slots (ms)=0
  Map-Reduce Framework
 CPU time spent (ms)=0
 Physical memory (bytes) snapshot=0
  Virtual memory (bytes) snapshot=0
 Job Finished in 24.515 seconds
 java.io.FileNotFoundException: File does not exist: hdfs://
 

Re: Data node with multiple disks

2014-05-13 Thread Marcos Sousa
Yes,

I don't want to replicate, just use as one disk? Isn't possible to make
this work?

Best regards,

Marcos


On Tue, May 13, 2014 at 6:55 AM, Rahul Chaudhari 
rahulchaudhari0...@gmail.com wrote:

 Marcos,
 While configuring hadoop, the dfs.datanode.data.dir property in
 hdfs-default.xml should have this list of disks specified on separate line.
 If you specific comma separated list, it will replicate on all those
 disks/partitions.

 _Rahul
 Sent from my iPad

  On 13-May-2014, at 12:22 am, Marcos Sousa falecom...@marcossousa.com
 wrote:
 
  Hi,
 
  I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be
 my datanode:
 
  /vol1/hadoop/data
  /vol2/hadoop/data
  /vol3/hadoop/data
  /volN/hadoop/data
 
  How do user those distinct discs not to replicate?
 
  Best regards,
 
  --
  Marcos Sousa




-- 
Marcos Sousa
www.marcossousa.com Enjoy it!


Re: enable regular expression on which parameter?

2014-05-13 Thread Ravi Prakash

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Avinash!

That JIRA is still open and does not seem to have been fixed. There are
a lot of issues with providing regexes though. A long standing issue has
been https://issues.apache.org/jira/browse/HDFS-13 which makes it even
harder

HTH
Ravi

On 05/13/14 02:33, Avinash Kujur wrote:
 mapreduce-5851
 i can see many parameters in Distcp class. in which parameter do we
need to
 enable regular expressions?

 private static final String usage = NAME
   +  [OPTIONS] srcurl* desturl +
   \n\nOPTIONS: +
   \n-p[rbugp]  Preserve status +
   \n   r: replication number +
   \n   b: block size +
   \n   u: user +
   \n   g: group +
   \n   p: permission +
   \n   -p alone is equivalent to -prbugp +
   \n-i Ignore failures +
   \n-log logdir  Write logs to logdir +
   \n-m num_maps  Maximum number of simultaneous copies +
   \n-overwrite Overwrite destination +
   \n-updateOverwrite if src size different from dst
 size +
   \n-f urilist_uri   Use list at urilist_uri as src list +
   \n-filelimit n Limit the total number of files to be
= n
 +
   \n-sizelimit n Limit the total size to be = n bytes +
   \n-deleteDelete the files existing in the dst but
 not in src +
   \n-mapredSslConf f Filename of SSL configuration for mapper
 task +

   \n\nNOTE 1: if -overwrite or -update are set, each source URI
is  +
   \n  interpreted as an isomorphic update to an existing
 directory. +
   \nFor example: +
   \nhadoop  + NAME +  -p -update \hdfs://A:8020/user/foo/bar\  +
   \hdfs://B:8020/user/foo/baz\\n +
   \n would update all descendants of 'baz' also in 'bar'; it
would
  +
   \n *not* update /user/foo/baz/bar +

   \n\nNOTE 2: The parameter n in -filelimit and -sizelimit can
be  +
   \n specified with symbolic representation.  For examples, +
   \n   1230k = 1230 * 1024 = 1259520 +
   \n   891g = 891 * 1024^3 = 956703965184 +

   \n;


-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTcvnBAAoJEGunL/HJl4XeacEQAIW/V14C9XZXKhWenEAALDQs
lFir6u0sdeelKYPCzqHGt41CWAMdSWl1YNl0gBXd1+o32U/y/T4Rb9vyZ6U5yG8I
OKpEEWx9ckiOke/jdpe0fxt2LiVyXhq/W3GckSinga5obZYtq1GWT+DsSMsXIU4b
EGwGe1prOs9o1wRQv00dWFskP3CifocZUYX7RKePfrNsHmlobonGl1gyjOpBHgoP
bXDsatQm5JQINDI8JjyBmXfqtQGWSSuSh7k/y8vfSBRDVwLeQNF5E6XrJcavFVeV
Anzst1eP0IsKbSFh3wnxPpEeOhhhYAv3mNbvtYu3c5/PzUmE5gFBQIgMTTMbBH16
xPT2btTIGueQTUQY6MTmmBaIH149s0opVKpLLizaFyqm/VJiUDgeiMDLpZhXYtHM
fC1swGBrK4IAmrHFGVbZs1ZfO5abntDkPlZJTbHvNX7CKTOR+CFmiYeBQ5buIcFU
bFgTKH6b9TQL4yHbVwpxEzgCId4YlheCiiDslXjLW5rPfHwtUGUkbXJQjGDHYafE
siJ2VYK6fI6E7Jq8GU+Ktw6z3gVZ2DFToPkudBNWGTsbHih6ARTW6fsY8w/RVRiI
IqSniJ103lKXZ+LGe3E2JyHkP5trjl5QnQFp4d9i7JiUXVanVAP93/h74emzJbkK
ctwR9R87n7ipXrGzMWtP
=3EnT
-END PGP SIGNATURE-