Using Lookup file in mapreduce
Hi team I have a huge lookup file around 5 GB and I need to use it to map users to categories in my mapreduce job. Can you suggest the best way to achieve it ? Sent from my iPhone
R on hadoop
Hi team any docummentation around installing r on hadoop Sent from my iPhone
RE: Huge disk IO on only one disk
Thanks Brahma, That answers my question. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: brahmareddy.batt...@huawei.com To: user@hadoop.apache.org Subject: RE: Huge disk IO on only one disk Date: Mon, 3 Mar 2014 06:51:30 + What should be the standard around setting up the hadoop.tmp.dir parameter. As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one.. MapReduce: mapreduce.cluster.local.dir ${hadoop.tmp.dir}/mapred/local The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. mapreduce.jobtracker.system.dir ${hadoop.tmp.dir}/mapred/system The directory where MapReduce stores control files. mapreduce.jobtracker.staging.root.dir ${hadoop.tmp.dir}/mapred/staging The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user) mapreduce.cluster.temp.dir ${hadoop.tmp.dir}/mapred/temp A shared directory for temporary files. Yarn : yarn.nodemanager.local-dirs ${hadoop.tmp.dir}/nm-local-dir List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this. HDFS : dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. Thanks Regards Brahma Reddy Battula From: Siddharth Tiwari [siddharth.tiw...@live.com] Sent: Monday, March 03, 2014 11:20 AM To: USers Hadoop Subject: RE: Huge disk IO on only one disk Hi Brahma, No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: brahmareddy.batt...@huawei.com To: user@hadoop.apache.org Subject: RE: Huge disk IO on only one disk Date: Mon, 3 Mar 2014 05:14:34 + Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir . dfs.datanode.data.dir --- file://${hadoop.tmp.dir}/dfs/data (Default value) Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices yarn.nodemanager.local-dirs -- ${hadoop.tmp.dir}/nm-local-dir (Default value) To store localized files, It's like inetermediate files Please configure above two values as muliple dir's.. Thanks Regards Brahma Reddy Battula From: Siddharth Tiwari [siddharth.tiw...@live.com] Sent: Monday, March 03, 2014 5:58 AM To: USers Hadoop Subject: Huge disk IO on only one disk Hi Team, I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well. What should be the standard around setting up the hadoop.tmp.dir parameter. Any help would be highly appreciated. below is IO while I am running a huge job. Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda
FW: Query hangs at 99.97 % for one reducer in Hive
Forwarding message to hadoop list as well for any help. Appreciate any help ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: siddharth.tiw...@live.com To: u...@hive.apache.org Subject: Query hangs at 99.97 % for one reducer in Hive Date: Sun, 2 Mar 2014 23:09:25 + Hi team, following query hangs at 99.97% for one reducer, kindly help or point to what can be cause drop table if exists sample.dpi_short_lt;create table sample.dpi_short_lt asselect b.msisdn,a.area_erb, a.longitude,a.latitude, substring(b.msisdn,1,2) as country, substring(b.msisdn,3,2) as area_code, substring(b.start_time,1,4) as year, substring(b.start_time,6,2) as month, substring(b.start_time,9,2) as day, substring(b.start_time,12,2) as hour, cast(b.procedure_duration as double) as duracao_ms, cast(b.internet_latency as double) as int_internet_latency, cast(b.ran_latency as double) as int_ran_latency, cast(b.http_latency as double) as int_http_latency, (case when b.internet_latency='' then 1 else 0 end) as internet_latency_missing, (case when b.ran_latency='' then 1 else 0 end) as ran_latency_missing, (case when b.http_latency='' then 1 else 0 end) as http_latency_missing, (cast(b.mean_throughput_ul as int) * cast( procedure_duration as int) / 1000) as total_up_bytes, (cast(b.mean_throughput_dl as int) * cast(procedure_duration as int) / 1000) as total_dl_bytes, cast(b.missing_packets_ul as int) as int_missing_packets_ul, cast(b.missing_packets_dl as int) as int_missing_packets_dlfrom sample.dpi_large bleft outer join sample.science_new aon b.cgi = regexp_replace(a.codigo_cgi_ecgi,'-','')where msisdn!=''; Hive was heuristically selecting 1000 reducers and it was hanging at 99.97 percent on one reduce task. I then changed the above values to 3GB per reducer and 500 reducers and started hitting this error. java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.03_0 to: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_tmp.-ext-10001/03_0 at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812 I have 22 node cluster running cdh 4.3. Please try to locate what can be teh issue. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Huge disk IO on only one disk
Hi Team, I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well. What should be the standard around setting up the hadoop.tmp.dir parameter. Any help would be highly appreciated. below is IO while I am running a huge job. Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtnsda 2.1137.65 226.20 313512628 1883809216sdb 1.4796.44 152.48 803144582 1269829840sdc 1.45 93.03 153.10 774765734 1274979080sdd 1.4695.06 152.73 791690022 1271944848sde 1.4792.70 153.24 772025750 1276195288sdf 1.5595.77 153.06 797567654 1274657320sdg 10.10 364.26 1951.79 3033537062 16254346480sdi 1.4694.82 152.98 789646630 1274014936sdh 1.4494.09 152.57 783547390 1270598232sdj 1.4491.94 153.37 765678470 1277220208sdk 1.5297.01 153.02 807928678 1274300360 ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Row exception inhive
Hi team, What does the following error signify ? java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {key:{joinkey0:},value:{_col2:92,_col11:-60-01-21,00,_col12:-03-07-04,00},alias:1} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {key:{joinkey0:},value:{_col2:92,_col11:-60-01-21,00,_col12:-03-07-04,00},alias:1} at org.apache.hadoop.hive.ql.exec.ExecRedu ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: Huge disk IO on only one disk
Hi Brahma, No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: brahmareddy.batt...@huawei.com To: user@hadoop.apache.org Subject: RE: Huge disk IO on only one disk Date: Mon, 3 Mar 2014 05:14:34 + Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir . dfs.datanode.data.dir --- file://${hadoop.tmp.dir}/dfs/data (Default value) Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices yarn.nodemanager.local-dirs -- ${hadoop.tmp.dir}/nm-local-dir (Default value) To store localized files, It's like inetermediate files Please configure above two values as muliple dir's.. Thanks Regards Brahma Reddy Battula From: Siddharth Tiwari [siddharth.tiw...@live.com] Sent: Monday, March 03, 2014 5:58 AM To: USers Hadoop Subject: Huge disk IO on only one disk Hi Team, I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well. What should be the standard around setting up the hadoop.tmp.dir parameter. Any help would be highly appreciated. below is IO while I am running a huge job. Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 2.1137.65 226.20 313512628 1883809216 sdb 1.4796.44 152.48 803144582 1269829840 sdc 1.4593.03 153.10 774765734 1274979080 sdd 1.4695.06 152.73 791690022 1271944848 sde 1.4792.70 153.24 772025750 1276195288 sdf 1.5595.77 153.06 797567654 1274657320 sdg 10.10 364.26 1951.79 3033537062 16254346480 sdi 1.4694.82 152.98 789646630 1274014936 sdh 1.4494.09 152.57 783547390 1270598232 sdj 1.4491.94 153.37 765678470 1277220208 sdk 1.5297.01 153.02 807928678 1274300360 ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Seeing strange error in Hive
Hi Team, I am seeing following error in hive in reduce phase,can you guide me on its cause and possible solution ? java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.03_0 to: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_tmp.-ext-10001/03_0 at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812 I am using hive-10.x , hadoop-2.0.0,. Appreciate any help in understanding the issue. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Re: umount bad disk
Try doing unmount -l Sent from my iPhone On Feb 13, 2014, at 11:10 AM, Arpit Agarwal aagar...@hortonworks.com wrote: bcc'ed hadoop-user Lei, perhaps hbase-user can help. -- Forwarded message -- From: lei liu liulei...@gmail.com Date: Thu, Feb 13, 2014 at 1:04 AM Subject: umount bad disk To: user@hadoop.apache.org I use HBase0.96 and CDH4.3.1. I use Short-Circuit Local Read: property namedfs.client.read.shortcircuit/name valuetrue/value /property property namedfs.domain.socket.path/name value/home/hadoop/cdh4-dn-socket/dn_socket/value /property When one disk is bad, because the RegionServer open some file on the disk, so I don't run umount, example: sudo umount -f /disk10 umount2: Device or resource busy umount: /disk10: device is busy umount2: Device or resource busy umount: /disk10: device is busy I must stop RegionServer in order to run umount command. How can don't stop RegionServer and delete the bad disk. Thanks, LiuLei CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HA Jobtracker failure
How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well. Sent from my iPhone On Jan 27, 2014, at 3:00 PM, Karthik Kambatla ka...@cloudera.com wrote: (Redirecting to cdh-user, moving user@hadoop to bcc). Hi Oren Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil jt-pid. Thanks Karthik On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor or...@infolinks.com wrote: Hi. We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server) We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure. any ideas to why the active node turned to standby? logs attached: on (original) active node: 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913) at org.apache.hadoop.ipc.Client.call(Client.java:1198) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332) at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031 on standby node 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-0022 to hdfs://t
RE: Strange error on Datanodes
Thanks Jeet can you suggest me the parameter which controls the timeout value ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Tue, 3 Dec 2013 15:38:50 +0530 Subject: Re: Strange error on Datanodes From: jeetuyadav200...@gmail.com To: user@hadoop.apache.org; cdh-u...@cloudera.org Sorry for the incomplete mail. Instead of one issue I think you may have two issues going on. I'm also adding CDH mailing list for more inputs on the same. 1. 2013-12-02 13:11:36,441 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1854340821-10.238.9.151-1385733655875:blk_-2927699636194035560_63092 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected This error could be possible in a scenario where your DN process having long time GC push, Increasing the timeout value may resolve this issue. Or your client connect could be disconnected abnormal. 2. 2013-12-02 13:12:06,586 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: brtlvlts0088co:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.238.10.43:54040 dest: /10.238.10.43:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) Try to increase the dfs.datanode.max.xcievers conf value in the datanode hdfs-site.conf RegardsJitendra On Tue, Dec 3, 2013 at 3:17 PM, Jitendra Yadav jeetuyadav200...@gmail.com wrote: I did some analysis on the provided logs and confs. Instead of one issue i believe you may have two issue going on. 1.java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 2. 2013-12-02 13:12:06,586 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: brtlvlts0088co:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.238.10.43:54040 dest: /10.238.10.43:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) On Mon, Dec 2, 2013 at 9:30 PM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi JeetI am using CDH 4 , but I have manually installed NN and JT with HA not using cdh manager. I am attaching NN logs here, I sent a mail just before this for other files. This is frustrating , why is it happening. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Mon, 2 Dec 2013 21:24:43 +0530 Subject: Re: Strange error on Datanodes From: jeetuyadav200...@gmail.com To: user@hadoop.apache.org Which hadoop destro you are using?, It would be good if you share the logs from data node on which the data block(blk_-2927699636194035560_63092) exist and from name nodes also. Regards Jitendra On Mon, Dec 2, 2013 at 9:13 PM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Jeet I have a cluster of size 25, 4 Admin nodes and 21 datanodes. 2 NN 2 JT 3 Zookeepers and 3 QJNs if you could help me in understanding what kind of logs you want I will provide it to you. Do you need hdfs-site.xml, core-site.xml and mapred-site.xmls ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Mon, 2 Dec 2013 21:09:03 +0530 Subject: Re: Strange error on Datanodes From: jeetuyadav200...@gmail.com To: user@hadoop.apache.org Hi, Can you share some more logs from Data nodes? could you please also share the conf and cluster size? RegardsJitendra On Mon, Dec 2, 2013 at 8:49 PM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi team I see following errors on datanodes. What is the reason for this and how can it will be resolved:- 2013-12-02 13:11:36,441 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1854340821-10.238.9.151-1385733655875:blk_-2927699636194035560_63092 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.238.10.43:54040 remote=/10.238.10.43:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.hadoop.net.SocketInputStream.read
Strange error on Datanodes
Hi team I see following errors on datanodes. What is the reason for this and how can it will be resolved:- 2013-12-02 13:11:36,441 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1854340821-10.238.9.151-1385733655875:blk_-2927699636194035560_63092 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.238.10.43:54040 remote=/10.238.10.43:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:694) 2013-12-02 13:12:06,572 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-12-02 13:12:06,581 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: All datanodes 10.238.10.43:50010 are bad. Aborting... 2013-12-02 13:12:06,581 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: All datanodes 10.238.10.43:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:959) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448) 2013-12-02 13:12:06,583 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: Strange error on Datanodes
Hi Jeet I have a cluster of size 25, 4 Admin nodes and 21 datanodes.2 NN 2 JT 3 Zookeepers and 3 QJNs if you could help me in understanding what kind of logs you want I will provide it to you. Do you need hdfs-site.xml, core-site.xml and mapred-site.xmls ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Mon, 2 Dec 2013 21:09:03 +0530 Subject: Re: Strange error on Datanodes From: jeetuyadav200...@gmail.com To: user@hadoop.apache.org Hi, Can you share some more logs from Data nodes? could you please also share the conf and cluster size? RegardsJitendra On Mon, Dec 2, 2013 at 8:49 PM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi team I see following errors on datanodes. What is the reason for this and how can it will be resolved:- 2013-12-02 13:11:36,441 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1854340821-10.238.9.151-1385733655875:blk_-2927699636194035560_63092 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.238.10.43:54040 remote=/10.238.10.43:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:694) 2013-12-02 13:12:06,572 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-12-02 13:12:06,581 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: All datanodes 10.238.10.43:50010 are bad. Aborting... 2013-12-02 13:12:06,581 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: All datanodes 10.238.10.43:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:959) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448) 2013-12-02 13:12:06,583 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Jt ha issue on cdh4
I implemented jt ha on cdh4.4.2 . Jobtracker keeps on failing over to each other, job keeps restarting, also namenode goes down at times and I can see logs for few datanodes mentioning all data nodes are bad. aborting. I installed jt ha manually like this :- After configuring jt ha i started jobtracker ha daemon using command after formatzk Nohup Hadoop jobtrackerha Then i started mrzkfc using following commands Nohup hadoop mrkfc Please advice me if I am doing anything wrong. Also is that right way to start the jt ha and failover controller ? Sent from my iPad
Error for larger jobs
Hi Team I am getting following strange error, can you point me to the possible reason.I have set heap size to 4GB but still getting it. please help syslog logs2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead2013-11-27 19:01:51,539 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id2013-11-27 19:01:51,540 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=2013-11-27 19:01:51,867 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 02013-11-27 19:01:51,870 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d2013-11-27 19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec72013-11-27 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead2013-11-27 19:01:52,226 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 02013-11-27 19:01:52,250 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailable2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error running childjava.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailableat java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)at org.apache.hadoop.util.Shell.runCommand(Shell.java:206)at org.apache.hadoop.util.Shell.run(Shell.java:188)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381) at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584) at org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)at org.apache.hadoop.mapred.Child$4.run(Child.java:270)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262)Caused by: java.io.IOException: error=11, Resource temporarily unavailableat java.lang.UNIXProcess.forkAndExec(Native Method)at java.lang.UNIXProcess.init(UNIXProcess.java:135)at java.lang.ProcessImpl.start(ProcessImpl.java:130)at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)... 16 more2013-11-27 19:01:52,256 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Error=11 resource temporarily unavailable
Hi team I am getting this strange error below is the trace 2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead2013-11-27 19:01:51,539 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id2013-11-27 19:01:51,540 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=2013-11-27 19:01:51,867 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 02013-11-27 19:01:51,870 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d2013-11-27 19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec72013-11-27 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead2013-11-27 19:01:52,226 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 02013-11-27 19:01:52,250 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailable2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error running childjava.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailableat java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)at org.apache.hadoop.util.Shell.runCommand(Shell.java:206)at org.apache.hadoop.util.Shell.run(Shell.java:188)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381) at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584) at org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)at org.apache.hadoop.mapred.Child$4.run(Child.java:270)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262)Caused by: java.io.IOException: error=11, Resource temporarily unavailableat java.lang.UNIXProcess.forkAndExec(Native Method)at java.lang.UNIXProcess.init(UNIXProcess.java:135)at java.lang.ProcessImpl.start(ProcessImpl.java:130)at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)... 16 more2013-11-27 19:01:52,256 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task I have set map slots to 24 and reduce to 12 on 32 core machine ( on HT ) ulimit is 64K what is causing it and how can we get rid of it. Its happening only for bigger jobs say terasort ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: Error for larger jobs
Hi Azury Thanks for response. I have plenty of space on my Disks so that cannot be the issue. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Thu, 28 Nov 2013 08:10:06 +0800 Subject: Re: Error for larger jobs From: azury...@gmail.com To: user@hadoop.apache.org Your disk is full from the log. On 2013-11-28 5:27 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Team I am getting following strange error, can you point me to the possible reason. I have set heap size to 4GB but still getting it. please help syslog logs 2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2013-11-27 19:01:51,539 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 2013-11-27 19:01:51,540 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2013-11-27 19:01:51,867 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2013-11-27 19:01:51,870 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d 2013-11-27 19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec7 2013-11-27 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead 2013-11-27 19:01:52,226 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2013-11-27 19:01:52,250 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailable 2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailable at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.util.Shell.runCommand(Shell.java:206) at org.apache.hadoop.util.Shell.run(Shell.java:188) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381) at org.apache.hadoop.util.Shell.execCommand(Shell.java:467) at org.apache.hadoop.util.Shell.execCommand(Shell.java:450) at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584) at org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.io.IOException: error=11, Resource temporarily unavailable at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 16 more 2013-11-27 19:01:52,256 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Re: Error for larger jobs
Hi Vinay and Azuryy Thanks for your responses. I get these error when I just run a teragen. Also, do you suggest me to increase nproc value ? What should I increase it to ? Sent from my iPad On Nov 27, 2013, at 11:08 PM, Vinayakumar B vinayakuma...@huawei.com wrote: Hi Siddharth, Looks like the issue with one of the machine. Or its happening in different machines also? I don’t think it’s a problem with JVM heap memory. Suggest you to check this once, http://stackoverflow.com/questions/8384000/java-io-ioexception-error-11 Thanks and Regards, Vinayakumar B From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com] Sent: 28 November 2013 05:50 To: USers Hadoop Subject: RE: Error for larger jobs Hi Azury Thanks for response. I have plenty of space on my Disks so that cannot be the issue. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Thu, 28 Nov 2013 08:10:06 +0800 Subject: Re: Error for larger jobs From: azury...@gmail.com To: user@hadoop.apache.org Your disk is full from the log. On 2013-11-28 5:27 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Team I am getting following strange error, can you point me to the possible reason. I have set heap size to 4GB but still getting it. please help syslog logs 2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2013-11-27 19:01:51,539 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 2013-11-27 19:01:51,540 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2013-11-27 19:01:51,867 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2013-11-27 19:01:51,870 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d 2013-11-27 19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec7 2013-11-27 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead 2013-11-27 19:01:52,226 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2013-11-27 19:01:52,250 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailable 2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailable at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.util.Shell.runCommand(Shell.java:206) at org.apache.hadoop.util.Shell.run(Shell.java:188) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381) at org.apache.hadoop.util.Shell.execCommand(Shell.java:467) at org.apache.hadoop.util.Shell.execCommand(Shell.java:450) at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584) at org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.io.IOException: error=11, Resource temporarily unavailable at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 16 more 2013-11-27 19:01:52,256 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
RE: Error for larger jobs
What shall I put in my bash_profile ? Date: Thu, 28 Nov 2013 10:04:58 +0800 Subject: Re: Error for larger jobs From: azury...@gmail.com To: user@hadoop.apache.org yes. you need to increase it, a simple way is put it in your /etc/profile On Thu, Nov 28, 2013 at 9:59 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Vinay and AzuryyThanks for your responses.I get these error when I just run a teragen. Also, do you suggest me to increase nproc value ? What should I increase it to ? Sent from my iPad On Nov 27, 2013, at 11:08 PM, Vinayakumar B vinayakuma...@huawei.com wrote: Hi Siddharth, Looks like the issue with one of the machine. Or its happening in different machines also? I don’t think it’s a problem with JVM heap memory. Suggest you to check this once, http://stackoverflow.com/questions/8384000/java-io-ioexception-error-11 Thanks and Regards, Vinayakumar B From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com] Sent: 28 November 2013 05:50 To: USers Hadoop Subject: RE: Error for larger jobs Hi Azury Thanks for response. I have plenty of space on my Disks so that cannot be the issue. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Thu, 28 Nov 2013 08:10:06 +0800 Subject: Re: Error for larger jobs From: azury...@gmail.com To: user@hadoop.apache.org Your disk is full from the log. On 2013-11-28 5:27 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Team I am getting following strange error, can you point me to the possible reason. I have set heap size to 4GB but still getting it. please help syslog logs 2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2013-11-27 19:01:51,539 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 2013-11-27 19:01:51,540 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2013-11-27 19:01:51,867 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2013-11-27 19:01:51,870 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d 2013-11-27 19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec7 2013-11-27 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead 2013-11-27 19:01:52,226 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0 2013-11-27 19:01:52,250 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailable 2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: Cannot run program chmod: error=11, Resource temporarily unavailable at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.util.Shell.runCommand(Shell.java:206) at org.apache.hadoop.util.Shell.run(Shell.java:188) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381) at org.apache.hadoop.util.Shell.execCommand(Shell.java:467) at org.apache.hadoop.util.Shell.execCommand(Shell.java:450) at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584) at org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.io.IOException: error=11, Resource temporarily unavailable at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init
best solution for data ingestion
hi team seeking your advice on what could be best way to ingest a lot of data to hadoop. Also what are views about fuse ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Flume not moving data help !!!
Hi team I created flume source and sink as following in hadoop yarn and I am not getting data transferred from source to sink in HDFS it doesnt create any file and on local everytime I start agent it creates one empty file. Below are my configs in source and sink Source :- agent.sources = logger1agent.sources.logger1.type = execagent.sources.logger1.command = tail -f /var/log/messagesagent.sources.logger1.batchsSize = 0agent.sources.logger1.channels = memoryChannelagent.channels = memoryChannelagent.channels.memoryChannel.type = memoryagent.channels.memoryChannel.capacity = 100agent.sinks = AvroSinkagent.sinks.AvroSink.type = avroagent.sinks.AvroSink.channel = memoryChannelagent.sinks.AvroSink.hostname = 192.168.147.101agent.sinks.AvroSink.port = 4545agent.sources.logger1.interceptors = itime ihostagent.sources.logger1.interceptors.itime.type = TimestampInterceptoragent.sources.logger1.interceptors.ihost.type = hostagent.sources.logger1.interceptors.ihost.useIP = falseagent.sources.logger1.interceptors.ihost.hostHeader = host Sink at one of the slave ( datanodes on my Yarn cluster ) : collector.sources = AvroIncollector.sources.AvroIn.type = avrocollector.sources.AvroIn.bind = 0.0.0.0collector.sources.AvroIn.port = 4545collector.sources.AvroIn.channels = mc1 mc2collector.channels = mc1 mc2collector.channels.mc1.type = memorycollector.channels.mc1.capacity = 100 collector.channels.mc2.type = memorycollector.channels.mc2.capacity = 100 collector.sinks = LocalOut HadoopOutcollector.sinks.LocalOut.type = file_rollcollector.sinks.LocalOut.sink.directory = /home/hadoop/flumecollector.sinks.LocalOut.sink.rollInterval = 0collector.sinks.LocalOut.channel = mc1collector.sinks.HadoopOut.type = hdfscollector.sinks.HadoopOut.channel = mc2collector.sinks.HadoopOut.hdfs.path = /flumecollector.sinks.HadoopOut.hdfs.fileType = DataStreamcollector.sinks.HadoopOut.hdfs.writeFormat = Textcollector.sinks.HadoopOut.hdfs.rollSize = 0collector.sinks.HadoopOut.hdfs.rollCount = 1collector.sinks.HadoopOut.hdfs.rollInterval = 600 can somebody point me to what am I doing wrong ? This is what I get in my local directory [hadoop@node1 flume]$ ls -lrttotal 0-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:25 1383243942803-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:28 1383244097923-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:31 1383244302225-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:33 1383244404929-1 when I restart the collector it creates one 0 bytes file. Please help ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Trash in yarn
How can I enable trash in hadoop-2.2.0. Also, if I drop a table on hive does the trash help in it ? Please help , thanks for help in advance. Sent from my iPhone
Re: Cannot start resourcemanager
Hi team, Resource manager works now, the capacity scheduler xml had wrong entry somehow. But there is another small issue, I have NN HA enabled and wanted to run Hbase with it and eventhough I set hbase.rootdir to FsNameservice value it always throws exception saying it cannot recognize the nameservice value. I did put the core site and hdfs site in hbase conf. Can you help me in setting it up with namenode HA in new hadoop-2.2.0 stable release . Also what versions of hive, mahout and pig would be compatible with it. I am using hbase-0.94.12 release. Sent from my iPad On Oct 17, 2013, at 12:48 PM, Arun C Murthy a...@hortonworks.com wrote: What command did you use to start the RM? On Oct 17, 2013, at 10:18 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Team, trying to start resourcemanager in the latest hadoop-2.2.0 stable release. It throws following error. Please help 2013-10-17 10:01:51,230 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system... 2013-10-17 10:01:51,230 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped. 2013-10-17 10:01:51,231 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete. 2013-10-17 10:01:51,232 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getResourceCalculator(CapacitySchedulerConfiguration.java:333) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:263) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:871) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1744) ... 5 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718) ... 6 more 2013-10-17 10:01:51,239 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at node1/192.168.147.101 ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Using Ambari to deploy Apache hadoop
Hi team, Is it possible to deploy hadoop from Apache via Ambari ? Also is there a link for full offline installation ? We do not have access to outside world and we want to use Ambari for reploying Hadoop ( not hortonworks release though ) Sent from my iPhone
Using Hbase with NN HA
Hi team, Can Hbase be used with namenode HA in latest hadoop-2.2.0 ? If yes is there something else required to be done other than following ? 1. Set hbase root dir to logical name of namenode service 2. Keep core site and hdfs site jn hbase conf I did above two but logical name is not recognized. Also it will be helpful if i could get some help with which versions of Hbase hive pig and mahout are compatible with latest yarn release hadoop-2.2.0. I am using hbase-0.94.12 Thanks Sent from my iPhone
Warning while starting services
Hi I get following warning when I start the services in hadoop-2.2.0. What doe sit signify and how to get rid of it ? Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: Error in documentation
Can I get access to update the same ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Fri, 18 Oct 2013 11:42:29 +0200 Subject: Re: Error in documentation From: ake...@concurrentinc.com To: user@hadoop.apache.org The best thing to do is to open a JIRA here: https://issues.apache.org/jira/secure/Dashboard.jspa You might also want to submit a patch, which is very easy. - André On Fri, Oct 18, 2013 at 11:28 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: The installation documentation for Hadoop yarn at this link http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html has error in the yarn-site for property yarn.nodemanager.aux-services. it should be mapreduce_shuffle rather than mapreduce.shuffle. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself -- André Kelpe an...@concurrentinc.com http://concurrentinc.com
Cannot start resourcemanager
Hi Team, trying to start resourcemanager in the latest hadoop-2.2.0 stable release. It throws following error. Please help 2013-10-17 10:01:51,230 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system...2013-10-17 10:01:51,230 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped.2013-10-17 10:01:51,231 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.2013-10-17 10:01:51,232 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManagerjava.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getResourceCalculator(CapacitySchedulerConfiguration.java:333) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:263) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:871)Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not foundat org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1744) ... 5 moreCaused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718) ... 6 more2013-10-17 10:01:51,239 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: /SHUTDOWN_MSG: Shutting down ResourceManager at node1/192.168.147.101 ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Removing queues
Whats the easiest way to remove queues from hadoop without restarting services ? Why cant we just refreshqueues ? Sent from my iPhone
size of input files
Hi Friends, Is there a way to find out what was the size of the input file to each of the jobs from the logs or any other place for all jobs submitted ? Please help ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: size of input files
Do the counters provide the input file size ? I mean is bytes read equal to input file size ? Is there any log where I could find input file size submitted to each job. I believed that bytes read from fs is different from the input file size to the job. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: rahul.rec@gmail.com Date: Sun, 2 Jun 2013 23:26:08 +0530 Subject: Re: size of input files To: user@hadoop.apache.org Counters can help. Input to mr is a directory. The counters can point to the number of bytes read from that fs directory. Rahul On Sun, Jun 2, 2013 at 11:22 PM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Friends, Is there a way to find out what was the size of the input file to each of the jobs from the logs or any other place for all jobs submitted ? Please help ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Data migration from one cluster to other running diff. versions
Hi Team, What is the best way to migrate data residing on one cluster to another cluster ? Are there better methods available than distcp ? What if both the clusters running different RPC protocol versions ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: One petabyte of data loading into HDFS with in 10 min.
Well can't you load the incremental data only ? as the goal seems quite unrealistic. The big guns have already spoken :P ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: alex.gauth...@teradata.com To: user@hadoop.apache.org; mike.se...@thinkbiganalytics.com Subject: RE: One petabyte of data loading into HDFS with in 10 min. Date: Mon, 10 Sep 2012 16:17:20 + Well said Mike. Lots of “funny questions” around here lately… From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Monday, September 10, 2012 4:50 AM To: user@hadoop.apache.org Cc: Michael Segel Subject: Re: One petabyte of data loading into HDFS with in 10 min. On Sep 10, 2012, at 2:40 AM, prabhu K prabhu.had...@gmail.com wrote: Hi Users, Thanks for the response. We have loaded 100GB data loaded into HDFS, time taken 1hr.with below configuration. Each Node (1 machine master, 2 machines are slave) 1. 500 GB hard disk. 2. 4Gb RAM 3. 3 quad code CPUs. 4. Speed 1333 MHz Now, we are planning to load 1 petabyte of data (single file) into Hadoop HDFS and Hive table within 10-20 minutes. For this we need a clarification below. Ok... Some say that I am sometimes too harsh in my criticisms so take what I say with a grain of salt... You loaded 100GB in an hour using woefully underperforming hardware and are now saying you want to load 1PB in 10 mins. I would strongly suggest that you first learn more about Hadoop. No really. Looking at your first machine, its obvious that you don't really grok hadoop and what it requires to achieve optimum performance. You couldn't even extrapolate any meaningful data from your current environment. Secondly, I think you need to actually think about the problem. Did you mean PB or TB? Because your math seems to be off by a couple orders of magnitude. A single file measured in PBs? That is currently impossible using today (2012) technology. In fact a single file that is measured in PBs wouldn't exist within the next 5 years and most likely the next decade. [Moore's law is all about CPU power, not disk density.] Also take a look at networking. ToR switch design differs, however current technology, the fabric tends to max out at 40GBs. What's the widest fabric on a backplane? That's your first bottleneck because even if you had a 1PB of data, you couldn't feed it to the cluster fast enough. Forget disk. look at PCIe based memory. (Money no object, right? ) You still couldn't populate it fast enough. I guess Steve hit this nail on the head when he talked about this being a homework assignment. High school maybe? 1. what are the system configuration setup required for all the 3 machine’s ?. 2. Hard disk size. 3. RAM size. 4. Mother board 5. Network cable 6. How much Gbps Infiniband required. For the same setup we need cloud computing environment too? Please suggest and help me on this. Thanks, Prabhu. On Fri, Sep 7, 2012 at 7:30 PM, Michael Segel michael_se...@hotmail.com wrote: Sorry, but you didn't account for the network saturation. And why 1GBe and not 10GBe? Also which version of hadoop? Here MapR works well with bonding two 10GBe ports and with the right switch, you could do ok. Also 2 ToR switches... per rack. etc... How many machines? 150? 300? more? Then you don't talk about how much memory, CPUs, what type of storage... Lots of factors. I'm sorry to interrupt this mental masturbation about how to load 1PB in 10min. There is a lot more questions that should be asked that weren't. Hey but look. Its a Friday, so I suggest some pizza, beer and then take it to a white board. But what do I know? In a different thread, I'm talking about how to tame HR and Accounting so they let me play with my team Ninja! :-P On Sep 5, 2012, at 9:56 AM, zGreenfelder zgreenfel...@gmail.com wrote: On Wed, Sep 5, 2012 at 10:43 AM, Cosmin Lehene cleh...@adobe.com wrote: Here's an extremely naïve ballpark estimation: at theoretical hardware speed, for 3PB representing 1PB with 3x replication Over a single 1Gbps connection (and I'm not sure, you can actually reach 1Gbps) (3 petabytes) / (1 Gbps) = 291.27 days So you'd need at least 40,000 1Gbps network cards to get that in 10 minutes :) - (3PB/1Gbps)/4 The actual number of nodes would depend a lot on the actual network architecture, the type of storage you use (SSD, HDD), etc. Cosmin ah, I went te other direction with the math, and assumed no replication (completely unsafe and never reasonable for a real, production environment, but since we're all theory and just looking for starting point numbers) 1PB in 10 min == 1,000,000gB in 10 min
Record seperator
Hi list, Out of context, does any one encountered record separator delimiter problem. I have a log file in which each record is separated using RECORD SEPERATOR delimiter ( ^^ ) , can any one help me on this on how can I use it as delimiter ? Thanks ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: Record seperator
Hi Harsh, I am using CDH3U4. The records are seperated by following ascii ^^ 30 1E RS ␞ Record Separator I did not understand what u intend me to do so that I can use this one ? Thanks ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: ha...@cloudera.com Date: Tue, 11 Sep 2012 01:11:16 +0530 Subject: Re: Record seperator To: user@hadoop.apache.org What version of Hadoop are you using? Via https://issues.apache.org/jira/browse/MAPREDUCE-2254 you can simply set your custom delimiter string as a configuration option, and get this to work right out of the box. On Tue, Sep 11, 2012 at 12:59 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi list, Out of context, does any one encountered record separator delimiter problem. I have a log file in which each record is separated using RECORD SEPERATOR delimiter ( ^^ ) , can any one help me on this on how can I use it as delimiter ? Thanks ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself -- Harsh J
Hadoop and MainFrame integration
Hi Users. We have flat files on mainframes with around a billion records. We need to sort them and then use them with different jobs on mainframe for report generation. I was wondering was there any way I could integrate the mainframe with hadoop do the sorting and keep the file on the sever itself ( I do not want to ftp the file to a hadoop cluster and then ftp back the sorted file to Mainframe as it would waste MIPS and nullify the advantage ). This way I could save on MIPS and ultimately improve profitability. Thank you in advance ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: Reading multiple lines from a microsoft doc in hadoop
CAn anybody enlighten me on what could be wrongg ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: siddharth.tiw...@live.com To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com Subject: RE: Reading multiple lines from a microsoft doc in hadoop Date: Sat, 25 Aug 2012 05:35:48 + Any help on below would be really appreciated. i am stuck with it ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: siddharth.tiw...@live.com To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com Subject: RE: Reading multiple lines from a microsoft doc in hadoop Date: Fri, 24 Aug 2012 20:23:45 + Hi , Can anyone please help ? Thank you in advance ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: siddharth.tiw...@live.com To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com Subject: RE: Reading multiple lines from a microsoft doc in hadoop Date: Fri, 24 Aug 2012 16:22:57 + Hi Team, Thanks a lot for so many good suggestions. I wrote a custom input format for reading one paragraph at a time. But when I use it I get lines read. Can you please suggest what changes I must make to read one para at a time seperated by null lines ? below is the code I wrote:- import java.io.IOException; import java.util.ArrayList; import java.util.regex.Matcher; import java.util.regex.Pattern; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.JobContext; import org.apache.hadoop.mapreduce.RecordReader; import org.apache.hadoop.mapreduce.TaskAttemptContext; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileSplit; import org.apache.hadoop.mapreduce.lib.input.LineRecordReader; import org.apache.hadoop.util.LineReader; /** * */ /** * @author 460615 * */ //FileInputFormat is the base class for all file-based InputFormats public class ParaInputFormat extends FileInputFormatLongWritable,Text { private String nullRegex = ^\\s*$ ; public String StrLine = null; /*public RecordReaderLongWritable, Text getRecordReader (InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException { reporter.setStatus(genericSplit.toString()); return new ParaInputFormat(job, (FileSplit)genericSplit); }*/ public RecordReaderLongWritable, Text createRecordReader(InputSplit genericSplit, TaskAttemptContext context)throws IOException { context.setStatus(genericSplit.toString()); return new LineRecordReader(); } public InputSplit[] getSplits(JobContext job, Configuration conf) throws IOException { ArrayListFileSplit splits = new ArrayListFileSplit(); for (FileStatus status : listStatus(job)) { Path fileName = status.getPath(); if (status.isDir()) { throw new IOException(Not a file: + fileName); } FileSystem fs = fileName.getFileSystem(conf); LineReader lr = null; try { FSDataInputStream in = fs.open(fileName); lr = new LineReader(in, conf); // String regexMatch =in.readLine(); Text line = new Text(); long begin = 0; long length = 0; int num = -1; String boolTest = null; boolean match = false; Pattern p = Pattern.compile(nullRegex); // Matcher matcher = new p.matcher(); while ((boolTest = in.readLine()) != null (num = lr.readLine(line)) 0 ! ( in.readLine().isEmpty())){ // numLines++; length += num; splits.add(new FileSplit(fileName, begin, length, new String[]{}));} begin=length; }finally { if (lr != null) { lr.close(); } } } return splits.toArray(new FileSplit[splits.size()]); } } ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Fri, 24 Aug 2012 09:54:10 +0200 Subject: Re: Reading multiple lines from a microsoft doc in hadoop From: haavard.kongsga...@gmail.com To: user@hadoop.apache.org Hi, maybe you should check out the old nutch project http
RE: Reading multiple lines from a microsoft doc in hadoop
Hi, Thank you for the suggestion. Actually I was using poi to extract text, but since now I have so many documents I thought I will use hadoop directly to parse as well. Average size of each document is around 120 kb. Also I want to read multiple lines from the text until I find a blank line. I do not have any idea ankit how to design custom input format and record reader. Pleaser help with some tutorial tutorial, code or resource around it. I am struggling with the issue. I will be highly grateful. Thank you so much once again Date: Fri, 24 Aug 2012 08:07:39 +0200 Subject: Re: Reading multiple lines from a microsoft doc in hadoop From: haavard.kongsga...@gmail.com To: user@hadoop.apache.org It's much easier if you convert the documents to text first use http://tika.apache.org/ or some other doc parser -Håvard On Fri, Aug 24, 2012 at 7:52 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: hi, I have doc files in msword doc and docx format. These have entries which are seperated by an empty line. Is it possible for me to read these lines separated from empty lines at a time. Also which inpurformat shall I use to read doc docx. Please help ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself -- Håvard Wahl Kongsgård Faculty of Medicine Department of Mathematical Sciences NTNU http://havard.security-review.net/
RE: namenode not starting
Hi Abhay, I totaly conform with Bejoy. Can you paste your mapred-site.xml and hdfs-site.xml content here ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: lle...@ddn.com To: user@hadoop.apache.org Subject: RE: namenode not starting Date: Fri, 24 Aug 2012 16:38:01 + Abhay, Sounds like your namenode cannot find the metadata information it needs to start (the path/current | image | *checppints etc) Basically, if you cannot locate that data locally or on your NFS Server, your cluster is busted. But, let's us be optimistic about this. There is a chance that your NFS Server is down or the path mounted is lost. If it is NFS mounted (as you suggested) check that your host still have that path mounted. (from the proper NFS Server) ( [shell] mount ) can tell. * obviously if you originally mounted from foo:/mydata and now do bar:/mydata /you'll need to do some digging to find which NFS server it was writing to before. Failing to locate your namenode metadata (locally or on any of your NFS Server) either because the NFS Server decided to become a blackhole, or someone|thing removed it. And you don't have a backup of your namenode (tape or Secondary Namenode), I think you are in a world of hurt there. In theory you can read the blocks on the DN and try to recover some of your data (assume not in CODEC / compressed) . Humm.. anyone knows about recovery services? (^^) -Original Message- From: Håvard Wahl Kongsgård [mailto:haavard.kongsga...@gmail.com] Sent: Friday, August 24, 2012 5:38 AM To: user@hadoop.apache.org Subject: Re: namenode not starting You should start with a reboot of the system. A lesson to everyone, this is exactly why you should have a secondary name node (http://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F) and run the namenode a mirrored RAID-5/10 disk. -Håvard On Fri, Aug 24, 2012 at 9:40 AM, Abhay Ratnaparkhi abhay.ratnapar...@gmail.com wrote: Hello, I was using cluster for long time and not formatted the namenode. I ran bin/stop-all.sh and bin/start-all.sh scripts only. I am using NFS for dfs.name.dir. hadoop.tmp.dir is a /tmp directory. I've not restarted the OS. Any way to recover the data? Thanks, Abhay On Fri, Aug 24, 2012 at 1:01 PM, Bejoy KS bejoy.had...@gmail.com wrote: Hi Abhay What is the value for hadoop.tmp.dir or dfs.name.dir . If it was set to /tmp the contents would be deleted on a OS restart. You need to change this location before you start your NN. Regards Bejoy KS Sent from handheld, please excuse typos. From: Abhay Ratnaparkhi abhay.ratnapar...@gmail.com Date: Fri, 24 Aug 2012 12:58:41 +0530 To: user@hadoop.apache.org ReplyTo: user@hadoop.apache.org Subject: namenode not starting Hello, I had a running hadoop cluster. I restarted it and after that namenode is unable to start. I am getting error saying that it's not formatted. :( Is it possible to recover the data on HDFS? 2012-08-24 03:17:55,378 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: NameNode is not formatted. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:434) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:270) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:271) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:303) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:433) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:421) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1359) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:13 68) 2012-08-24 03:17:55,380 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: NameNode is not formatted. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:434) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:270
RE: How do we view the blocks of a file in HDFS
Hi Abhishek, You can use fsck for this purpose hadoop fsck HDFS directory -files -blocks -locations --- Displays what you want ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: abhisheksgum...@gmail.com Date: Fri, 24 Aug 2012 22:10:37 +0530 Subject: How do we view the blocks of a file in HDFS To: user@hadoop.apache.org hi, If I push a file into HDFS that runs on a 4 node cluster with 1 namenode and 3 datanodes, how can I view where on the datanodes are the blocks of this file?I would like to view the blocks and their replicas individually. How can I do this? The answer is very critical for my current task which is halted :) A detailed answer will be highly appreciated.Thank you! With Regards, Abhishek S
RE: Reading multiple lines from a microsoft doc in hadoop
Any help on below would be really appreciated. i am stuck with it ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: siddharth.tiw...@live.com To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com Subject: RE: Reading multiple lines from a microsoft doc in hadoop Date: Fri, 24 Aug 2012 20:23:45 + Hi , Can anyone please help ? Thank you in advance ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: siddharth.tiw...@live.com To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com Subject: RE: Reading multiple lines from a microsoft doc in hadoop Date: Fri, 24 Aug 2012 16:22:57 + Hi Team, Thanks a lot for so many good suggestions. I wrote a custom input format for reading one paragraph at a time. But when I use it I get lines read. Can you please suggest what changes I must make to read one para at a time seperated by null lines ? below is the code I wrote:- import java.io.IOException; import java.util.ArrayList; import java.util.regex.Matcher; import java.util.regex.Pattern; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.JobContext; import org.apache.hadoop.mapreduce.RecordReader; import org.apache.hadoop.mapreduce.TaskAttemptContext; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileSplit; import org.apache.hadoop.mapreduce.lib.input.LineRecordReader; import org.apache.hadoop.util.LineReader; /** * */ /** * @author 460615 * */ //FileInputFormat is the base class for all file-based InputFormats public class ParaInputFormat extends FileInputFormatLongWritable,Text { private String nullRegex = ^\\s*$ ; public String StrLine = null; /*public RecordReaderLongWritable, Text getRecordReader (InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException { reporter.setStatus(genericSplit.toString()); return new ParaInputFormat(job, (FileSplit)genericSplit); }*/ public RecordReaderLongWritable, Text createRecordReader(InputSplit genericSplit, TaskAttemptContext context)throws IOException { context.setStatus(genericSplit.toString()); return new LineRecordReader(); } public InputSplit[] getSplits(JobContext job, Configuration conf) throws IOException { ArrayListFileSplit splits = new ArrayListFileSplit(); for (FileStatus status : listStatus(job)) { Path fileName = status.getPath(); if (status.isDir()) { throw new IOException(Not a file: + fileName); } FileSystem fs = fileName.getFileSystem(conf); LineReader lr = null; try { FSDataInputStream in = fs.open(fileName); lr = new LineReader(in, conf); // String regexMatch =in.readLine(); Text line = new Text(); long begin = 0; long length = 0; int num = -1; String boolTest = null; boolean match = false; Pattern p = Pattern.compile(nullRegex); // Matcher matcher = new p.matcher(); while ((boolTest = in.readLine()) != null (num = lr.readLine(line)) 0 ! ( in.readLine().isEmpty())){ // numLines++; length += num; splits.add(new FileSplit(fileName, begin, length, new String[]{}));} begin=length; }finally { if (lr != null) { lr.close(); } } } return splits.toArray(new FileSplit[splits.size()]); } } ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself Date: Fri, 24 Aug 2012 09:54:10 +0200 Subject: Re: Reading multiple lines from a microsoft doc in hadoop From: haavard.kongsga...@gmail.com To: user@hadoop.apache.org Hi, maybe you should check out the old nutch project http://nutch.apache.org/ (hadoop was developed for nutch). It's a web crawler and indexer, but the malinglists hold much info doc/pdf parsing which also relates to hadoop. Have never parsed many docx or doc files, but it should be strait-forward. But generally for text analysis preprocessing is the KEY! For example replace dual lines \r\n\r\n or (\n\n) with is a simple trick) -Håvard On Fri, Aug 24, 2012 at 9:30 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi, Thank you
Reading multiple lines from a microsoft doc in hadoop
hi, I have doc files in msword doc and docx format. These have entries which are seperated by an empty line. Is it possible for me to read these lines separated from empty lines at a time. Also which inpurformat shall I use to read doc docx. Please help ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
Streaming issue ( URGENT )
Hi team, I have a python script which normally runs like this locally, Python mapper.py file1 file2 2 . How can I achieve this by using streaming API, and using the script as mapper. It actually joins the three files on a column which is passed as parameter ( numeric ) . Also how can I use paste command in mapper to concatenate three files. Ex, paste file1 file2 file3 file4 This is in normal shell, How to achieve it over streaming. if possible please explain how can I achive it using multiple mappers and one reducer. It would be great If I could get some examples, tried searching a lot :( Thanks in advance please help ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself
RE: hi facing issue with mysql in
The description is quite vague. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself From: rahul.rec@gmail.com Date: Mon, 20 Aug 2012 22:00:19 +0530 Subject: Re: hi facing issue with mysql in To: user@hadoop.apache.org This list ,I believe is for Hadoop users . On Mon, Aug 20, 2012 at 9:58 PM, rahul p rahulpoolancha...@gmail.com wrote: Hi, Please help set my mysql. it is giving persmission issue.
RE: Streaming Issue
Hi mohit, The script normally runs like this locally, Python mapper.py file1 file2 file3. How can I achieve this. Also how can I use paste command in mapper. Ex, paste file1 file2 file3 file4 This is in normal shell, How to achieve it over streaming. Thanks in advance please help Date: Sun, 19 Aug 2012 13:42:20 -0700 Subject: Re: FW: Streaming Issue From: mohitanch...@gmail.com To: user@hadoop.apache.org Are you looking for something like this? hadoop jar hadoop-streaming.jar -input 'file1 -input file2 On Sun, Aug 19, 2012 at 11:16 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote: Hi Friends, Can you please suggest me how can I pass 3 files as parameters to the mapper written in python in hadoop streaming API, which will process data from this three different files . Please help. ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit myself