Re: Realtime sensor's tcpip data to hadoop
Hi Alex, you can try Apache Flume. On Wed, May 7, 2014 at 10:48 AM, Alex Lee eliy...@hotmail.com wrote: Sensors' may send tcpip data to server. Each sensor may send tcpip data like a stream to the server, the quatity of the sensors and the data rate of the data is high. Firstly, how the data from tcpip can be put into hadoop. It need to do some process and store in hbase. Does it need through save to data files and put into hadoop or can be done in some direct ways from tcpip. Is there any software module can take care of this. Searched that Ganglia Nagios and Flume may do it. But when looking into details, ganglia and nagios are more for monitoring hadoop cluster itself. Flume is for log files. Secondly, if the total network traffic from sensors are over the limit of one lan port, how to share the loads, is there any component in hadoop to make this done automatically. Any suggestions, thanks.
Re: speed of replication for under replicated blocks by namenode
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Chandra! Replication is done according to priority (e.g. where only 1 block out of 3 remains is higher priority than when only 2 out of 3 remain). Every time a DN heartbeats into the NN, it *may* be assigned some replication work according to some criteria. See dfs.namenode.replication.work.multiplier.per.iteration . The list of blocks that need to be replicated is calculated once every dfs.namenode.replication.interval HTH Ravi On 05/11/14 07:20, chandra kant wrote: Hi, Some of my blocks are under replicated. Can anybody tell me at what speed namenode replicates it so that dfs.replication factor is satisfied and blocks become normal again? Or does it solely depend upon load factor on namenode and it can't be measured accurately? thanx Chandra -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTcbSfAAoJEGunL/HJl4XeL+AP/185/Sn72CH8RO67Z2eSUmSY WjUhaZ1hs9yGHjlxS/tIm06QLl/ftvDaYmMI9Es4rpMfZqrYfwr6lMUVvDFRKyKO 5J+5Z9WhmMBMWkR5M707B5rY2TvCMUgnbQITZT9gA5XPwh1N/wiiDweoFNOGnxaw yQuunzkEwyFiFFHlQuH9PyQeHTUu/enYpRM7ILEMimYZOufXvQ8r34ye0EqGQwJo y7HsOvlT45qIfQPVuZ7oYgu1AWDxY/5nu1A57XFRIdbZjpYBoJYtKlcUSDq9Tfvc H+5F+7pl+6gf2UTNKNUz3ZqbuRc48Ex85lp/lq2wbUQcNgQ81bRsMv1opyliYP1R 6Eeaa1x7KicxpQZvIsYtjCQHXn+HFTqRw7oKEAEe8cO2YAGFgj1uGAwYK7elmNoq 79JXLMs7PTj5PxvwgbfQgWS9Ec7JPwdebM9jBRgi/EdH5mG5liXPC5D7/P4j03Hu ZN9mXyQhxVAwYJL5Hm29Gk1vq69NkUwXmYhjfjEDYGaDyakFT7Cboyh2JHJZm08B TDXzK2i0plv9G75I5Dk+4zLJpPk/EeLQTGjoV/xI9Pc4TMFr8e5sDPcZCxM8/D68 X9OI0Qx1HCC717jrYZ1JpTxMSnh5JBnHoz/rpqx4HGrL/nQh5lM1oq4OFUFgNwv8 TR0Tvaalk119g037o1qm =RCPv -END PGP SIGNATURE-
Re: Conversion from MongoDB to hadoop
are you using aggregation mapReduce feature of MongoDb or some scripting language(python) to emit key/value pair? Raj K Singh http://in.linkedin.com/in/rajkrrsingh http://www.rajkrrsingh.blogspot.com Mobile Tel: +91 (0)9899821370 On Mon, May 12, 2014 at 11:18 AM, Ranjini Rathinam ranjinibe...@gmail.comwrote: Hi, How to convert the Mapreduce from MongoDB to Mapreduce Hadoop. Please suggest . Thanks in advance. Ranjini.
Re: Data node with multiple disks
replication factor=1 On Tue, May 13, 2014 at 11:04 AM, SF Hadoop sfhad...@gmail.com wrote: Your question is unclear. Please restate and describe what you are attempting to do. Thanks. On Monday, May 12, 2014, Marcos Sousa falecom...@marcossousa.com wrote: Hi, I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be my datanode: /vol1/hadoop/data /vol2/hadoop/data /vol3/hadoop/data /volN/hadoop/data How do user those distinct discs not to replicate? Best regards, -- Marcos Sousa -- Thanks, Kishore.
Re: Data node with multiple disks
Hi Marcos, If these discs are not shared across nodes, I would not worry. Hadoop takes care of making sure data is not replicated to single node. But if all these 20 nodes are sharing these 10 HDD's, Then you may have to basically assign specific disc to specific node and make your cluster rack aware so that the replication in same rack would go to different node and replication to second rack will to new disc. On Tue, May 13, 2014 at 1:38 PM, kishore alajangi alajangikish...@gmail.com wrote: replication factor=1 On Tue, May 13, 2014 at 11:04 AM, SF Hadoop sfhad...@gmail.com wrote: Your question is unclear. Please restate and describe what you are attempting to do. Thanks. On Monday, May 12, 2014, Marcos Sousa falecom...@marcossousa.com wrote: Hi, I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be my datanode: /vol1/hadoop/data /vol2/hadoop/data /vol3/hadoop/data /volN/hadoop/data How do user those distinct discs not to replicate? Best regards, -- Marcos Sousa -- Thanks, Kishore. -- Nitin Pawar
Re: LVM to JBOD conversion without data loss
Hi Bharath, The steps are not correct for me. Data loss can happen if you reduce the replication and remove a DataNode at the same time. 1) decomission a DataNode (or some DataNodes) 2) change the configuration of the DataNode(s) 3) add the DataNode(s) to the cluster repeat 1) - 3) for all the DataNodes. Regards, Akira (2014/05/12 16:18), Bharath Kumar wrote: Hi I am a query regarding JBOD , I sit possible to migrate from LVM to JBOD without loosing data ? Is there any reference documentation ? The steps which i can think of is 1) reduce the replication 2) change the hdfs-site.xml in namenode 3) add the data node back with new JBOD directory structure for haddop 4) rebalance the cluster repeat 1 -4 for all data nodes is the steps correct ? -- Warm Regards, /Bharath Kumar /
Re: LVM to JBOD conversion without data loss
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 One way I can think of is decomissioning the nodes and then basically re-imaging it however you want to. Is that not an option? On 05/12/14 00:18, Bharath Kumar wrote: Hi I am a query regarding JBOD , I sit possible to migrate from LVM to JBOD without loosing data ? Is there any reference documentation ? The steps which i can think of is 1) reduce the replication 2) change the hdfs-site.xml in namenode 3) add the data node back with new JBOD directory structure for haddop 4) rebalance the cluster repeat 1 -4 for all data nodes is the steps correct ? -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTcbU5AAoJEGunL/HJl4XecH0P/0FIpstANjdkzpxKD2of1TRd LZKbyWUAbj3EZB06yDMunNCsj9Bg5BwsuRV3LFVUmwRv1ymW/hXQMgn8torPiu45 csmYX2ihMNlMqQO7u3HyaDHpHnTkBTmtypfWWXVe2LNjnrpmkcJpl8dU2FuEm7ga E8w46pnvB/vSGFQQ3tWOCdCknKM+0ykcrcy64Xpe0gx92MSgyEL5I2yxrOA3sSru QvU4ko4q8Q5OT6oOFBw5lNiVJ6FyiQj0y5499o9zV5xUFL39a+R/nqZpiJ0Pgm2d ytZVheHLYZZSt2gZYjAGvikLveez/ycCJEx4YA2j5FlpMycpeshLmJ1K/EAbIY/a Yv2hsf2GQB6pVZisOfL179TwTcII8xJ3/JviyrdRb93anVT9sgicgJJrFtQN+mcy exrT4R5Y/Fl1pxkFUHJj4qz9rSuwC2WlUjpjGpA8Lf9DsIC5ONzzDTSux+e4WfCD wBjSd2fZ8F6/KPoHMvfMUVwMoT7RLtUuWpw87P5GshdzjD0Bx0HhEMK5+XGn/GyP VU2ahcFrjFok5FoaysbY8QjKqn7gMT2b1hG7jM8bm/4MPLOYWfYKpRghClQFGbD1 Xb7p5pshqlctZw1X+ZfxyB2khrSsvd3GCI1PwK62mdehL59klpVkNQs+FXILQ6sF 1lodqxtUgXcRE0hXwhV7 =CR33 -END PGP SIGNATURE-
all tasks failing for MR job on Hadoop 2.4
Hi, I've set up a Hadoop 2.4 cluster with three nodes. Namenode and Resourcemanager are running on one node, Datanodes and Nodemanagers on the other two. All services are starting up without problems (as far as I can see), web apps show all nodes as running. However, I am not able to run MapReduce jobs: yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 100 submits the job, it appears in the web app, but state is stuck in ACCEPTED. Instead I'm receiving messages: 14/05/13 12:15:48 INFO mapreduce.Job: Task Id : attempt_1399971492349_0004_m_00_0, Status : FAILED 14/05/13 12:15:48 INFO mapreduce.Job: Task Id : attempt_1399971492349_0004_m_01_0, Status : FAILED the log shows: 2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir; Ignoring. 2014-05-13 12:15:27,896 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-05-13 12:15:28,146 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-05-13 12:15:28,146 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1399971492349_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879) 2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address change detected. Old: localhost/127.0.1.1:41395 New: localhost/127.0.0.1:41395 2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on connection exception: java.net.ConnectException: Verbindungsaufbau abgelehnt; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at
enable regular expression on which parameter?
mapreduce-5851 i can see many parameters in Distcp class. in which parameter do we need to enable regular expressions? private static final String usage = NAME + [OPTIONS] srcurl* desturl + \n\nOPTIONS: + \n-p[rbugp] Preserve status + \n r: replication number + \n b: block size + \n u: user + \n g: group + \n p: permission + \n -p alone is equivalent to -prbugp + \n-i Ignore failures + \n-log logdir Write logs to logdir + \n-m num_maps Maximum number of simultaneous copies + \n-overwrite Overwrite destination + \n-updateOverwrite if src size different from dst size + \n-f urilist_uri Use list at urilist_uri as src list + \n-filelimit n Limit the total number of files to be = n + \n-sizelimit n Limit the total size to be = n bytes + \n-deleteDelete the files existing in the dst but not in src + \n-mapredSslConf f Filename of SSL configuration for mapper task + \n\nNOTE 1: if -overwrite or -update are set, each source URI is + \n interpreted as an isomorphic update to an existing directory. + \nFor example: + \nhadoop + NAME + -p -update \hdfs://A:8020/user/foo/bar\ + \hdfs://B:8020/user/foo/baz\\n + \n would update all descendants of 'baz' also in 'bar'; it would + \n *not* update /user/foo/baz/bar + \n\nNOTE 2: The parameter n in -filelimit and -sizelimit can be + \n specified with symbolic representation. For examples, + \n 1230k = 1230 * 1024 = 1259520 + \n 891g = 891 * 1024^3 = 956703965184 + \n;
Re: Data node with multiple disks
Your question is unclear. Please restate and describe what you are attempting to do. Thanks. On Monday, May 12, 2014, Marcos Sousa falecom...@marcossousa.com wrote: Hi, I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be my datanode: /vol1/hadoop/data /vol2/hadoop/data /vol3/hadoop/data /volN/hadoop/data How do user those distinct discs not to replicate? Best regards, -- Marcos Sousa
Using Lookup file in mapreduce
Hi team I have a huge lookup file around 5 GB and I need to use it to map users to categories in my mapreduce job. Can you suggest the best way to achieve it ? Sent from my iPhone
Re: Data node with multiple disks
If you specify a list in the property dfs.datanode.data.dir hadoop will distribute the data blocks among all those disks; it will not replicate data between them. If you want to use the disks as a single one you gotta make a LVM array or any other solution to present them as a single one to the OS. However, benchmarks prove that specifying a list of disks and letting hadoop distribute data among them gives better performance. On 13/05/14 17:12, Marcos Sousa wrote: Yes, I don't want to replicate, just use as one disk? Isn't possible to make this work? Best regards, Marcos On Tue, May 13, 2014 at 6:55 AM, Rahul Chaudhari rahulchaudhari0...@gmail.com mailto:rahulchaudhari0...@gmail.com wrote: Marcos, While configuring hadoop, the dfs.datanode.data.dir property in hdfs-default.xml should have this list of disks specified on separate line. If you specific comma separated list, it will replicate on all those disks/partitions. _Rahul Sent from my iPad On 13-May-2014, at 12:22 am, Marcos Sousa falecom...@marcossousa.com mailto:falecom...@marcossousa.com wrote: Hi, I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be my datanode: /vol1/hadoop/data /vol2/hadoop/data /vol3/hadoop/data /volN/hadoop/data How do user those distinct discs not to replicate? Best regards, -- Marcos Sousa -- Marcos Sousa www.marcossousa.com http://www.marcossousa.com Enjoy it! -- *Aitor Pérez* /Big Data System Engineer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
Questions about Hadoop logs and mapred.local.dir
Hi Experts, 1. The size of mapred.local.dir is big(30 GB), how many methods could clean it correctly? 2. For logs of NameNode/DataNode/JobTracker/TaskTracker, are they all rolling type log? What's their max size? I can not find the specific settings for them in log4j.properties. 3. I find the size of dfs.name.dir and dfs.data.dir is very big now, are there any files under them could be removed actually? Or all files under the two folders could not be removed at all? Thanks!
Re: No job can run in YARN (Hadoop-2.2)
The *FileNotFoundException* was thrown when I tried to submit a job calculating PI, actually there is no such exception thrown when I submit a wordcount job, but I can still see Exception from container-launch... and any other jobs would throw such exceptions. Every job runs successfully when I commented out properties *mapreduce.map.java.opts* and *mapreduce.reduce.java.opts.* Indeed sounds odd, but I think maybe it is because that these two properties conflict with other memory-related properties, so the container can not be launched. 2014-05-12 3:37 GMT+08:00 Jay Vyas jayunit...@gmail.com: Sounds oddSo (1) you got a filenotfound exception and (2) you fixed it by commenting out memory specific config parameters? Not sure how that would work... Any other details or am I missing something else? On May 11, 2014, at 4:16 AM, Tao Xiao xiaotao.cs@gmail.com wrote: I'm sure this problem is caused by the incorrect configuration. I commented out all the configurations regarding memory, then jobs can run successfully. 2014-05-11 0:01 GMT+08:00 Tao Xiao xiaotao.cs@gmail.com: I installed Hadoop-2.2 in a cluster of 4 nodes, following Hadoop YARN Installation: The definitive guidehttp://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide . The configurations are as follows: ~/.bashrc http://pastebin.com/zQgwuQv2 core-site.xmlhttp://pastebin.com/rBAaqZps hdfs-site.xml http://pastebin.com/bxazvp2G mapred-site.xml http://pastebin.com/N00SsMbzslaveshttp://pastebin.com/8VjsZ1uu yarn-site.xml http://pastebin.com/XwLQZTQb I started NameNode, DataNodes, ResourceManager and NodeManagers successfully, but no job can run successfully. For example, I run the following job: [root@Single-Hadoop ~]#yarn jar /var/soft/apache/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 4 The output is as follows: 14/05/10 23:56:25 INFO mapreduce.Job: Task Id : attempt_1399733823963_0004_m_00_0, Status : FAILED Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) 14/05/10 23:56:25 INFO mapreduce.Job: Task Id : attempt_1399733823963_0004_m_01_0, Status : FAILED Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) ... ... 14/05/10 23:56:36 INFO mapreduce.Job: map 100% reduce 100% 14/05/10 23:56:37 INFO mapreduce.Job: Job job_1399733823963_0004 failed with state FAILED due to: Task failed task_1399733823963_0004_m_00 Job failed as tasks failed. failedMaps:1 failedReduces:0 14/05/10 23:56:37 INFO mapreduce.Job: Counters: 10 Job Counters Failed map tasks=7 Killed map tasks=1 Launched map tasks=8 Other local map tasks=6 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=21602 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Job Finished in 24.515 seconds java.io.FileNotFoundException: File does not exist: hdfs://
Re: Data node with multiple disks
Yes, I don't want to replicate, just use as one disk? Isn't possible to make this work? Best regards, Marcos On Tue, May 13, 2014 at 6:55 AM, Rahul Chaudhari rahulchaudhari0...@gmail.com wrote: Marcos, While configuring hadoop, the dfs.datanode.data.dir property in hdfs-default.xml should have this list of disks specified on separate line. If you specific comma separated list, it will replicate on all those disks/partitions. _Rahul Sent from my iPad On 13-May-2014, at 12:22 am, Marcos Sousa falecom...@marcossousa.com wrote: Hi, I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be my datanode: /vol1/hadoop/data /vol2/hadoop/data /vol3/hadoop/data /volN/hadoop/data How do user those distinct discs not to replicate? Best regards, -- Marcos Sousa -- Marcos Sousa www.marcossousa.com Enjoy it!
Re: enable regular expression on which parameter?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Avinash! That JIRA is still open and does not seem to have been fixed. There are a lot of issues with providing regexes though. A long standing issue has been https://issues.apache.org/jira/browse/HDFS-13 which makes it even harder HTH Ravi On 05/13/14 02:33, Avinash Kujur wrote: mapreduce-5851 i can see many parameters in Distcp class. in which parameter do we need to enable regular expressions? private static final String usage = NAME + [OPTIONS] srcurl* desturl + \n\nOPTIONS: + \n-p[rbugp] Preserve status + \n r: replication number + \n b: block size + \n u: user + \n g: group + \n p: permission + \n -p alone is equivalent to -prbugp + \n-i Ignore failures + \n-log logdir Write logs to logdir + \n-m num_maps Maximum number of simultaneous copies + \n-overwrite Overwrite destination + \n-updateOverwrite if src size different from dst size + \n-f urilist_uri Use list at urilist_uri as src list + \n-filelimit n Limit the total number of files to be = n + \n-sizelimit n Limit the total size to be = n bytes + \n-deleteDelete the files existing in the dst but not in src + \n-mapredSslConf f Filename of SSL configuration for mapper task + \n\nNOTE 1: if -overwrite or -update are set, each source URI is + \n interpreted as an isomorphic update to an existing directory. + \nFor example: + \nhadoop + NAME + -p -update \hdfs://A:8020/user/foo/bar\ + \hdfs://B:8020/user/foo/baz\\n + \n would update all descendants of 'baz' also in 'bar'; it would + \n *not* update /user/foo/baz/bar + \n\nNOTE 2: The parameter n in -filelimit and -sizelimit can be + \n specified with symbolic representation. For examples, + \n 1230k = 1230 * 1024 = 1259520 + \n 891g = 891 * 1024^3 = 956703965184 + \n; -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTcvnBAAoJEGunL/HJl4XeacEQAIW/V14C9XZXKhWenEAALDQs lFir6u0sdeelKYPCzqHGt41CWAMdSWl1YNl0gBXd1+o32U/y/T4Rb9vyZ6U5yG8I OKpEEWx9ckiOke/jdpe0fxt2LiVyXhq/W3GckSinga5obZYtq1GWT+DsSMsXIU4b EGwGe1prOs9o1wRQv00dWFskP3CifocZUYX7RKePfrNsHmlobonGl1gyjOpBHgoP bXDsatQm5JQINDI8JjyBmXfqtQGWSSuSh7k/y8vfSBRDVwLeQNF5E6XrJcavFVeV Anzst1eP0IsKbSFh3wnxPpEeOhhhYAv3mNbvtYu3c5/PzUmE5gFBQIgMTTMbBH16 xPT2btTIGueQTUQY6MTmmBaIH149s0opVKpLLizaFyqm/VJiUDgeiMDLpZhXYtHM fC1swGBrK4IAmrHFGVbZs1ZfO5abntDkPlZJTbHvNX7CKTOR+CFmiYeBQ5buIcFU bFgTKH6b9TQL4yHbVwpxEzgCId4YlheCiiDslXjLW5rPfHwtUGUkbXJQjGDHYafE siJ2VYK6fI6E7Jq8GU+Ktw6z3gVZ2DFToPkudBNWGTsbHih6ARTW6fsY8w/RVRiI IqSniJ103lKXZ+LGe3E2JyHkP5trjl5QnQFp4d9i7JiUXVanVAP93/h74emzJbkK ctwR9R87n7ipXrGzMWtP =3EnT -END PGP SIGNATURE-