Using Lookup file in mapreduce

2014-05-13 Thread Siddharth Tiwari
Hi team

I have a huge lookup file around 5 GB and I need to use it to map users to 
categories in my mapreduce job. Can you suggest the best way to achieve it ?

Sent from my iPhone

R on hadoop

2014-03-24 Thread Siddharth Tiwari
Hi team any docummentation around installing r on hadoop

Sent from my iPhone


RE: Huge disk IO on only one disk

2014-03-03 Thread Siddharth Tiwari
Thanks Brahma,
That answers my question.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: brahmareddy.batt...@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 06:51:30 +








 

 

What should be the standard around setting up the hadoop.tmp.dir parameter.

 

 As I know hadoop.tmp.dir   will be used for follow properites, If you 
 are configuring following properties,then you no need to configure 
 this one..

 

 

MapReduce:

 





mapreduce.cluster.local.dir
${hadoop.tmp.dir}/mapred/local
The local directory where MapReduce stores intermediate data files. May be a 
comma-separated list of directories on different devices in order to spread 
disk i/o. Directories that do not exist are ignored.



mapreduce.jobtracker.system.dir
${hadoop.tmp.dir}/mapred/system
The directory where MapReduce stores control files. 


mapreduce.jobtracker.staging.root.dir
${hadoop.tmp.dir}/mapred/staging
The root of the staging area for users' job files In practice, this should be 
the directory where users' home directories are located (usually /user)



mapreduce.cluster.temp.dir
${hadoop.tmp.dir}/mapred/temp
A shared directory for temporary files. 






 

Yarn : 

 





yarn.nodemanager.local-dirs
${hadoop.tmp.dir}/nm-local-dir
List of directories to store localized files in. An application's localized 
file directory will be found in: 
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. 
Individual containers' work directories, called
 container_${contid}, will be subdirectories of this. 





 

 

HDFS :

 





dfs.namenode.name.dir
file://${hadoop.tmp.dir}/dfs/name
Determines where on the local filesystem the DFS name node should store the 
name table(fsimage). If this is a comma-delimited list of directories then the 
name table is replicated in all of the directories, for redundancy.






 





dfs.datanode.data.dir
file://${hadoop.tmp.dir}/dfs/data
Determines where on the local filesystem an DFS data node should store its 
blocks. If this is a comma-delimited list of directories, then data will be 
stored in all named directories, typically on different devices. Directories 
that do not exist
 are ignored. 





 





dfs.namenode.checkpoint.dir
file://${hadoop.tmp.dir}/dfs/namesecondary
Determines where on the local filesystem the DFS secondary name node should 
store the temporary images to merge. If this is a comma-delimited list of 
directories then the image is replicated in all of the directories for 
redundancy.







 

 

 

 

Thanks  Regards

 

Brahma Reddy Battula

 





From: Siddharth Tiwari [siddharth.tiw...@live.com]

Sent: Monday, March 03, 2014 11:20 AM

To: USers Hadoop

Subject: RE: Huge disk IO on only one disk






Hi Brahma,



No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir 
. Have
 put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not 
? if yes what should be standards around.




**

Cheers !!!

Siddharth
Tiwari

Have a refreshing day !!!

Every duty is holy, and devotion to duty is the highest form of worship of 
God.”


Maybe other people will try to limit me but I don't limit myself







From: brahmareddy.batt...@huawei.com

To: user@hadoop.apache.org

Subject: RE: Huge disk IO on only one disk

Date: Mon, 3 Mar 2014 05:14:34 +







 

Seems to be you had started cluster with default values for the following two 
properties and configured for only hadoop.tmp.dir .

 

dfs.datanode.data.dir ---  file://${hadoop.tmp.dir}/dfs/data (Default value)

 

Determines where on the local filesystem an DFS data node should store its 
blocks. If this is a comma-delimited list of directories, then data will be 
stored in all named directories, typically on different devices

 

yarn.nodemanager.local-dirs --  ${hadoop.tmp.dir}/nm-local-dir (Default value)

 

To store localized files, It's like inetermediate files

 

 

Please configure above two values as muliple dir's..



 

 

Thanks  Regards 

Brahma Reddy Battula

 





From: Siddharth Tiwari [siddharth.tiw...@live.com]

Sent: Monday, March 03, 2014 5:58 AM

To: USers Hadoop

Subject: Huge disk IO on only one disk






Hi Team,



I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my 
hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my 
jobs compared to other disks. Can you guide my to the standards
 to follow so that this IO can be distributed across to other disks as well. 
What should be the standard around setting up the hadoop.tmp.dir parameter. 
Any help would be highly appreciated. below is IO while I am running a huge job.








Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda

FW: Query hangs at 99.97 % for one reducer in Hive

2014-03-02 Thread Siddharth Tiwari
Forwarding message to hadoop list as well for any help. Appreciate any help

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: siddharth.tiw...@live.com
To: u...@hive.apache.org
Subject: Query hangs at 99.97 % for one reducer in Hive
Date: Sun, 2 Mar 2014 23:09:25 +




Hi team,
following query hangs at 99.97% for one reducer, kindly help or point to what 
can be cause
drop table if exists sample.dpi_short_lt;create table sample.dpi_short_lt 
asselect   b.msisdn,a.area_erb,
a.longitude,a.latitude,   
substring(b.msisdn,1,2) as country,   
substring(b.msisdn,3,2) as area_code,   
substring(b.start_time,1,4) as year,   
substring(b.start_time,6,2) as month,   
substring(b.start_time,9,2) as day,   
substring(b.start_time,12,2) as hour,   
cast(b.procedure_duration as double) as duracao_ms, 
  cast(b.internet_latency as double) as int_internet_latency,   
cast(b.ran_latency as double) as int_ran_latency,   
cast(b.http_latency as double) as int_http_latency, 
  (case when b.internet_latency='' then 1 else 0 end) as 
internet_latency_missing,   (case when 
b.ran_latency='' then 1 else 0 end) as ran_latency_missing, 
  (case when b.http_latency='' then 1 else 0 end) as 
http_latency_missing,   (cast(b.mean_throughput_ul 
as int) * cast( procedure_duration as int) / 1000) as total_up_bytes,   
(cast(b.mean_throughput_dl as int) * 
cast(procedure_duration as int)  / 1000) as total_dl_bytes, 
  cast(b.missing_packets_ul as int) as int_missing_packets_ul,  
 cast(b.missing_packets_dl as int) as 
int_missing_packets_dlfrom sample.dpi_large bleft outer join sample.science_new 
aon b.cgi = regexp_replace(a.codigo_cgi_ecgi,'-','')where msisdn!='';
Hive was heuristically selecting 1000 reducers and it was hanging at 99.97 
percent on one reduce task. I then changed the above values to 3GB per reducer 
and 500 reducers and started hitting this error.
java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable 
to rename output from: 
hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.03_0
 to: 
hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_tmp.-ext-10001/03_0
at 
org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812

I have 22 node cluster running cdh 4.3. Please try to locate what can be teh 
issue.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself

  

Huge disk IO on only one disk

2014-03-02 Thread Siddharth Tiwari
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my 
hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my 
jobs compared to other disks. Can you guide my to the standards to follow so 
that this IO can be distributed across to other disks as well. What should be 
the standard around setting up the hadoop.tmp.dir parameter. Any help would be 
highly appreciated. below is IO while I am running a huge job.

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtnsda   
2.1137.65   226.20  313512628 1883809216sdb   
1.4796.44   152.48  803144582 1269829840sdc   1.45  
  93.03   153.10  774765734 1274979080sdd   1.4695.06   
152.73  791690022 1271944848sde   1.4792.70   
153.24  772025750 1276195288sdf   1.5595.77   153.06  
797567654 1274657320sdg  10.10   364.26  1951.79 3033537062 
16254346480sdi   1.4694.82   152.98  789646630 
1274014936sdh   1.4494.09   152.57  783547390 
1270598232sdj   1.4491.94   153.37  765678470 
1277220208sdk   1.5297.01   153.02  807928678 1274300360

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

Row exception inhive

2014-03-02 Thread Siddharth Tiwari
Hi team,
What does the following error signify ?
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row (tag=1) 
{key:{joinkey0:},value:{_col2:92,_col11:-60-01-21,00,_col12:-03-07-04,00},alias:1}
at 
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row (tag=1) 
{key:{joinkey0:},value:{_col2:92,_col11:-60-01-21,00,_col12:-03-07-04,00},alias:1}
at org.apache.hadoop.hive.ql.exec.ExecRedu

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: Huge disk IO on only one disk

2014-03-02 Thread Siddharth Tiwari
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir 
. Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir 
or not ? if yes what should be standards around.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: brahmareddy.batt...@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +








 

 

Seems to be you had started cluster with default values for the following two 
properties and configured for only hadoop.tmp.dir .

 

dfs.datanode.data.dir ---  file://${hadoop.tmp.dir}/dfs/data (Default value)

 

Determines where on the local filesystem an DFS data node should store its 
blocks. If this is a comma-delimited list of directories, then data will be 
stored in all named directories, typically on different devices

 

yarn.nodemanager.local-dirs --  ${hadoop.tmp.dir}/nm-local-dir (Default value)

 

To store localized files, It's like inetermediate files

 

 

Please configure above two values as muliple dir's..


 

 

 


Thanks  Regards 

Brahma Reddy Battula

 





From: Siddharth Tiwari [siddharth.tiw...@live.com]

Sent: Monday, March 03, 2014 5:58 AM

To: USers Hadoop

Subject: Huge disk IO on only one disk






Hi Team,



I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my 
hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my 
jobs compared to other disks. Can you guide my to the standards
 to follow so that this IO can be distributed across to other disks as well. 
What should be the standard around setting up the hadoop.tmp.dir parameter. 
Any help would be highly appreciated. below is IO while I am running a huge job.








Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda   2.1137.65   226.20  313512628 1883809216

sdb   1.4796.44   152.48  803144582 1269829840

sdc   1.4593.03   153.10  774765734 1274979080

sdd   1.4695.06   152.73  791690022 1271944848

sde   1.4792.70   153.24  772025750 1276195288

sdf   1.5595.77   153.06  797567654 1274657320

sdg  10.10   364.26  1951.79 3033537062 16254346480

sdi   1.4694.82   152.98  789646630 1274014936

sdh   1.4494.09   152.57  783547390 1270598232

sdj   1.4491.94   153.37  765678470 1277220208

sdk   1.5297.01   153.02  807928678 1274300360




**

Cheers !!!

Siddharth
Tiwari

Have a refreshing day !!!

Every duty is holy, and devotion to duty is the highest form of worship of 
God.”


Maybe other people will try to limit me but I don't limit myself





  

Seeing strange error in Hive

2014-03-01 Thread Siddharth Tiwari
Hi Team,
I am seeing following error in hive in reduce phase,can you guide me on its 
cause and possible solution ?
java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable 
to rename output from: 
hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.03_0
 to: 
hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_tmp.-ext-10001/03_0
at 
org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812
I am using hive-10.x , hadoop-2.0.0,. Appreciate any help in understanding the 
issue.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

Re: umount bad disk

2014-02-13 Thread Siddharth Tiwari
Try doing unmount -l

Sent from my iPhone

 On Feb 13, 2014, at 11:10 AM, Arpit Agarwal aagar...@hortonworks.com 
 wrote:
 
 bcc'ed hadoop-user
 
 Lei, perhaps hbase-user can help.
 
 -- Forwarded message --
 From: lei liu liulei...@gmail.com
 Date: Thu, Feb 13, 2014 at 1:04 AM
 Subject: umount bad disk
 To: user@hadoop.apache.org
 
 
 I use HBase0.96 and CDH4.3.1.
 
 I use Short-Circuit Local Read:
 property
   namedfs.client.read.shortcircuit/name
   valuetrue/value
 /property
 property
   namedfs.domain.socket.path/name
   value/home/hadoop/cdh4-dn-socket/dn_socket/value
 /property
 
 When one disk is bad, because the RegionServer open some file on the disk, so 
 I don't run umount, example:
 sudo umount -f /disk10
 umount2: Device or resource busy
 
 
 umount: /disk10: device is busy
 umount2: Device or resource busy
 umount: /disk10: device is busy 
 
 I must stop RegionServer in order to run umount command.  
 
 
 How can don't stop RegionServer and delete the bad disk.
 
 Thanks,
 
 LiuLei
 
 
 
 
 
 
 
 
 
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.


Re: HA Jobtracker failure

2014-01-27 Thread Siddharth Tiwari
How have you implemented the failover ? Also can you attach JTHA logs ? If you 
hav implemented it using. Zkfc, it would be interesting to look in zookeeper 
logs as well. 

Sent from my iPhone

 On Jan 27, 2014, at 3:00 PM, Karthik Kambatla ka...@cloudera.com wrote:
 
 (Redirecting to cdh-user, moving user@hadoop to bcc).
 
 Hi Oren
 
 Can you attach slightly longer versions of the log files on both the JTs? 
 Also, if this is something recurring, it would be nice to monitor the JT heap 
 usage and GC timeouts using jstat -gcutil jt-pid.
 
 Thanks
 Karthik
 
 
 
 
 On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor or...@infolinks.com wrote:
 Hi.
 We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
 We had a problem during which the active node suddenly became standby and 
 the standby server attempted to start resulting in a java heap space failure.
 any ideas to why the active node turned to standby?
 
 logs attached:
 on (original) active node:
 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: 
 Initializing job_201401041634_5858
 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: 
 Initializing job_201401041634_5858
 2014-01-22 06:50:27,386 INFO 
 org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
 standby
 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping 
 pluginDispatcher
 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping 
 infoServer
 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted 
 waiting to send params to server
 java.lang.InterruptedException
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
 at org.apache.hadoop.ipc.Client.call(Client.java:1198)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy9.getFileInfo(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
 at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at $Proxy10.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
 at 
 org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@0.0.0.0:50031
 
 on standby node
 2014-01-22 06:50:05,010 INFO 
 org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
 2014-01-22 06:50:05,010 INFO 
 org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping 
 JobTrackerHAHttpRedirector on port 50030
 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@0.0.0.0:50030
 2014-01-22 06:50:05,198 INFO 
 org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
 2014-01-22 06:50:05,201 INFO 
 org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous 
 system directory hdfs://***/tmp/mapred/system/seq-0022 to hdfs://t
 

RE: Strange error on Datanodes

2013-12-03 Thread Siddharth Tiwari
Thanks Jeet
 
can you suggest me the parameter which controls the timeout value ? 

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself

 
Date: Tue, 3 Dec 2013 15:38:50 +0530
Subject: Re: Strange error on Datanodes
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org; cdh-u...@cloudera.org

Sorry for the incomplete mail.
Instead of one issue I think you may have two issues going on. I'm also adding 
CDH mailing list for more inputs on the same.

1.
2013-12-02 13:11:36,441 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream 
ResponseProcessor exception  for block 
BP-1854340821-10.238.9.151-1385733655875:blk_-2927699636194035560_63092 
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
 This error could be possible in a scenario where your DN process having long 
time GC push, Increasing the timeout value may resolve this issue. Or your 
client connect could be disconnected abnormal. 
2. 2013-12-02 13:12:06,586 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
brtlvlts0088co:50010:DataXceiver error processing WRITE_BLOCK operation  src: 
/10.238.10.43:54040 dest: /10.238.10.43:50010 java.io.IOException: Premature 
EOF from inputStream   at 
org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
 Try to increase the dfs.datanode.max.xcievers conf value in the datanode 
hdfs-site.conf 


RegardsJitendra



On Tue, Dec 3, 2013 at 3:17 PM, Jitendra Yadav jeetuyadav200...@gmail.com 
wrote:

I did some analysis on the provided logs and confs.
Instead of one issue i believe you may have two issue going on.
1.java.net.SocketTimeoutException: 65000 millis timeout while waiting for 
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
2. 2013-12-02 13:12:06,586 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
brtlvlts0088co:50010:DataXceiver error processing WRITE_BLOCK operation  src: 
/10.238.10.43:54040 dest: /10.238.10.43:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)






On Mon, Dec 2, 2013 at 9:30 PM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:






Hi JeetI am using CDH 4 , but I have manually installed NN and JT with HA not 
using cdh manager. I am attaching NN logs here, I sent a mail just before this 
for other files. This is frustrating , why is it happening.



**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 



Maybe other people will try to limit me but I don't limit myself


Date: Mon, 2 Dec 2013 21:24:43 +0530


Subject: Re: Strange error on Datanodes
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org



Which hadoop destro you are using?, It would be good if you share the logs from 
data node on which the data block(blk_-2927699636194035560_63092) exist and 
from name nodes also.
Regards


Jitendra

On Mon, Dec 2, 2013 at 9:13 PM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:






Hi Jeet
I have a cluster of size 25, 4 Admin nodes and 21 datanodes.


2 NN 2 JT 3 Zookeepers and 3 QJNs
if you could help me in understanding what kind of logs you want I will provide 
it to you. Do you need hdfs-site.xml, core-site.xml and mapred-site.xmls ?





**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 




Maybe other people will try to limit me but I don't limit myself


Date: Mon, 2 Dec 2013 21:09:03 +0530



Subject: Re: Strange error on Datanodes
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org




Hi,
Can you share some more logs from Data nodes? could you please also share the 
conf and cluster size?
RegardsJitendra  




On Mon, Dec 2, 2013 at 8:49 PM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:







Hi team
I see following errors on datanodes. What is the reason for this and how can it 
will be resolved:-




2013-12-02 13:11:36,441 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream 
ResponseProcessor exception  for block 
BP-1854340821-10.238.9.151-1385733655875:blk_-2927699636194035560_63092
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.238.10.43:54040 remote=/10.238.10.43:50010]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at 
org.apache.hadoop.net.SocketInputStream.read

Strange error on Datanodes

2013-12-02 Thread Siddharth Tiwari
Hi team
I see following errors on datanodes. What is the reason for this and how can it 
will be resolved:-
2013-12-02 13:11:36,441 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream 
ResponseProcessor exception  for block 
BP-1854340821-10.238.9.151-1385733655875:blk_-2927699636194035560_63092
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.238.10.43:54040 remote=/10.238.10.43:50010]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:694)
2013-12-02 13:12:06,572 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-12-02 13:12:06,581 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: 
All datanodes 10.238.10.43:50010 are bad. Aborting...
2013-12-02 13:12:06,581 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: All datanodes 10.238.10.43:50010 are bad. Aborting...
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:959)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
2013-12-02 13:12:06,583 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: Strange error on Datanodes

2013-12-02 Thread Siddharth Tiwari
Hi Jeet
I have a cluster of size 25, 4 Admin nodes and 21 datanodes.2 NN 2 JT 3 
Zookeepers and 3 QJNs
if you could help me in understanding what kind of logs you want I will provide 
it to you. Do you need hdfs-site.xml, core-site.xml and mapred-site.xmls ?

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


Date: Mon, 2 Dec 2013 21:09:03 +0530
Subject: Re: Strange error on Datanodes
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org

Hi,
Can you share some more logs from Data nodes? could you please also share the 
conf and cluster size?
RegardsJitendra  



On Mon, Dec 2, 2013 at 8:49 PM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:




Hi team
I see following errors on datanodes. What is the reason for this and how can it 
will be resolved:-

2013-12-02 13:11:36,441 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream 
ResponseProcessor exception  for block 
BP-1854340821-10.238.9.151-1385733655875:blk_-2927699636194035560_63092
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.238.10.43:54040 remote=/10.238.10.43:50010]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:694)
2013-12-02 13:12:06,572 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-12-02 13:12:06,581 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: 
All datanodes 10.238.10.43:50010 are bad. Aborting...
2013-12-02 13:12:06,581 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: All datanodes 10.238.10.43:50010 are bad. Aborting...
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:959)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
2013-12-02 13:12:06,583 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 


Maybe other people will try to limit me but I don't limit myself
  


  

Jt ha issue on cdh4

2013-12-01 Thread Siddharth Tiwari
I implemented jt ha on cdh4.4.2 . Jobtracker keeps on failing over to each 
other, job keeps restarting, also namenode goes down at times and I can see 
logs for few datanodes mentioning all data nodes are bad. aborting. 

I installed jt ha manually like this :-

After configuring jt ha i started jobtracker ha daemon using command after 
formatzk
Nohup Hadoop jobtrackerha 
Then i started mrzkfc using following commands
Nohup hadoop mrkfc 

Please advice me if I am doing anything wrong. Also is that right way to start 
the jt ha and failover controller ?

Sent from my iPad

Error for larger jobs

2013-11-27 Thread Siddharth Tiwari
Hi Team
I am getting following strange error, can you point me to the possible reason.I 
have set heap size to 4GB but still getting it. please help
syslog logs2013-11-27 19:01:50,678 WARN 
org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable2013-11-27 
19:01:51,051 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead2013-11-27 19:01:51,539 WARN 
org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use 
dfs.metrics.session-id2013-11-27 19:01:51,540 INFO 
org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with 
processName=MAP, sessionId=2013-11-27 19:01:51,867 INFO 
org.apache.hadoop.util.ProcessTree: setsid exited with exit code 02013-11-27 
19:01:51,870 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin 
:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d2013-11-27 
19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing 
split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec72013-11-27
 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is 
deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as 
counter name instead2013-11-27 19:01:52,226 INFO 
org.apache.hadoop.mapred.MapTask: numReduceTasks: 02013-11-27 19:01:52,250 
ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: 
Cannot run program chmod: error=11, Resource temporarily 
unavailable2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error 
running childjava.io.IOException: Cannot run program chmod: error=11, 
Resource temporarily unavailableat 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:206)at 
org.apache.hadoop.util.Shell.run(Shell.java:188)at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)   
 at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)at 
org.apache.hadoop.util.Shell.execCommand(Shell.java:450)at 
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584)
at 
org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146)
at 
org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168)   
 at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270)at 
java.security.AccessController.doPrivileged(Native Method)at 
javax.security.auth.Subject.doAs(Subject.java:415)at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)Caused by: 
java.io.IOException: error=11, Resource temporarily unavailableat 
java.lang.UNIXProcess.forkAndExec(Native Method)at 
java.lang.UNIXProcess.init(UNIXProcess.java:135)at 
java.lang.ProcessImpl.start(ProcessImpl.java:130)at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)... 16 
more2013-11-27 19:01:52,256 INFO org.apache.hadoop.mapred.Task: Runnning 
cleanup for the task

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

Error=11 resource temporarily unavailable

2013-11-27 Thread Siddharth Tiwari
Hi team
I am getting this strange error below is the trace
2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes 
where applicable2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead2013-11-27 19:01:51,539 WARN 
org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use 
dfs.metrics.session-id2013-11-27 19:01:51,540 INFO 
org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with 
processName=MAP, sessionId=2013-11-27 19:01:51,867 INFO 
org.apache.hadoop.util.ProcessTree: setsid exited with exit code 02013-11-27 
19:01:51,870 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin 
:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d2013-11-27 
19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing 
split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec72013-11-27
 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is 
deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as 
counter name instead2013-11-27 19:01:52,226 INFO 
org.apache.hadoop.mapred.MapTask: numReduceTasks: 02013-11-27 19:01:52,250 
ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: 
Cannot run program chmod: error=11, Resource temporarily 
unavailable2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error 
running childjava.io.IOException: Cannot run program chmod: error=11, 
Resource temporarily unavailableat 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:206)at 
org.apache.hadoop.util.Shell.run(Shell.java:188)at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)   
 at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)at 
org.apache.hadoop.util.Shell.execCommand(Shell.java:450)at 
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584)
at 
org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146)
at 
org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168)   
 at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270)at 
java.security.AccessController.doPrivileged(Native Method)at 
javax.security.auth.Subject.doAs(Subject.java:415)at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)Caused by: 
java.io.IOException: error=11, Resource temporarily unavailableat 
java.lang.UNIXProcess.forkAndExec(Native Method)at 
java.lang.UNIXProcess.init(UNIXProcess.java:135)at 
java.lang.ProcessImpl.start(ProcessImpl.java:130)at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)... 16 
more2013-11-27 19:01:52,256 INFO org.apache.hadoop.mapred.Task: Runnning 
cleanup for the task

I have set map slots to 24 and reduce to 12 on 32 core machine ( on HT )
ulimit is 64K
what is causing it and how can we get rid of it. Its happening only for bigger 
jobs say terasort


**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: Error for larger jobs

2013-11-27 Thread Siddharth Tiwari
Hi Azury
Thanks for response. I have plenty of space on my Disks so that cannot be the 
issue.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


Date: Thu, 28 Nov 2013 08:10:06 +0800
Subject: Re: Error for larger jobs
From: azury...@gmail.com
To: user@hadoop.apache.org

Your disk is full from the log.
On 2013-11-28 5:27 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote:




Hi Team
I am getting following strange error, can you point me to the possible reason.
I have set heap size to 4GB but still getting it. please help

syslog logs
2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes 
where applicable
2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2013-11-27 19:01:51,539 WARN org.apache.hadoop.conf.Configuration: session.id 
is deprecated. Instead, use dfs.metrics.session-id
2013-11-27 19:01:51,540 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=
2013-11-27 19:01:51,867 INFO org.apache.hadoop.util.ProcessTree: setsid exited 
with exit code 0
2013-11-27 19:01:51,870 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin 
:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d
2013-11-27 19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing 
split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec7
2013-11-27 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES 
is deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as 
counter name instead
2013-11-27 19:01:52,226 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2013-11-27 19:01:52,250 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: 
Cannot run program chmod: error=11, Resource temporarily unavailable
2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Cannot run program chmod: error=11, Resource temporarily 
unavailable
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:206)
at org.apache.hadoop.util.Shell.run(Shell.java:188)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)
at 
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584)
at 
org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146)
at 
org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.io.IOException: error=11, Resource temporarily unavailable
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.init(UNIXProcess.java:135)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
... 16 more
2013-11-27 19:01:52,256 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task

**


Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 


Maybe other people will try to limit me but I don't limit myself
  

  

Re: Error for larger jobs

2013-11-27 Thread Siddharth Tiwari
Hi Vinay and Azuryy
Thanks for your responses.
I get these error when I just run a teragen.
Also, do you suggest me to increase nproc value ? What should I increase it to ?

Sent from my iPad

 On Nov 27, 2013, at 11:08 PM, Vinayakumar B vinayakuma...@huawei.com 
 wrote:
 
 Hi Siddharth,
  
 Looks like the issue with one of the machine.  Or its happening in different 
 machines also?
  
 I don’t think it’s a problem with JVM heap memory.
  
 Suggest you to check this once, 
 http://stackoverflow.com/questions/8384000/java-io-ioexception-error-11
  
 Thanks and Regards,
 Vinayakumar B
  
 From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com] 
 Sent: 28 November 2013 05:50
 To: USers Hadoop
 Subject: RE: Error for larger jobs
  
 Hi Azury
  
 Thanks for response. I have plenty of space on my Disks so that cannot be the 
 issue.
 
 
 **
 Cheers !!!
 Siddharth Tiwari
 Have a refreshing day !!!
 Every duty is holy, and devotion to duty is the highest form of worship of 
 God.” 
 Maybe other people will try to limit me but I don't limit myself
 
 
 Date: Thu, 28 Nov 2013 08:10:06 +0800
 Subject: Re: Error for larger jobs
 From: azury...@gmail.com
 To: user@hadoop.apache.org
 
 Your disk is full from the log.
 On 2013-11-28 5:27 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote:
 Hi Team
  
 I am getting following strange error, can you point me to the possible reason.
 I have set heap size to 4GB but still getting it. please help
  
 syslog logs
 2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
 to load native-hadoop library for your platform... using builtin-java classes 
 where applicable
 2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group 
 org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
 org.apache.hadoop.mapreduce.TaskCounter instead
 2013-11-27 19:01:51,539 WARN org.apache.hadoop.conf.Configuration: session.id 
 is deprecated. Instead, use dfs.metrics.session-id
 2013-11-27 19:01:51,540 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2013-11-27 19:01:51,867 INFO org.apache.hadoop.util.ProcessTree: setsid 
 exited with exit code 0
 2013-11-27 19:01:51,870 INFO org.apache.hadoop.mapred.Task:  Using 
 ResourceCalculatorPlugin 
 :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d
 2013-11-27 19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing 
 split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec7
 2013-11-27 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES 
 is deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as 
 counter name instead
 2013-11-27 19:01:52,226 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 0
 2013-11-27 19:01:52,250 ERROR 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot run program 
 chmod: error=11, Resource temporarily unavailable
 2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.io.IOException: Cannot run program chmod: error=11, Resource 
 temporarily unavailable
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:206)
 at org.apache.hadoop.util.Shell.run(Shell.java:188)
 at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
 at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)
 at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584)
 at 
 org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146)
 at 
 org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168)
 at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310)
 at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: java.io.IOException: error=11, Resource temporarily unavailable
 at java.lang.UNIXProcess.forkAndExec(Native Method)
 at java.lang.UNIXProcess.init(UNIXProcess.java:135)
 at java.lang.ProcessImpl.start(ProcessImpl.java:130)
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
 ... 16 more
 2013-11-27 19:01:52,256 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task

RE: Error for larger jobs

2013-11-27 Thread Siddharth Tiwari
What shall I put in my bash_profile ?

Date: Thu, 28 Nov 2013 10:04:58 +0800
Subject: Re: Error for larger jobs
From: azury...@gmail.com
To: user@hadoop.apache.org

yes. you need to increase it, a simple way is put it in your /etc/profile



On Thu, Nov 28, 2013 at 9:59 AM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:

Hi Vinay and AzuryyThanks for your responses.I get these error when I just run 
a teragen.
Also, do you suggest me to increase nproc value ? What should I increase it to ?

Sent from my iPad
On Nov 27, 2013, at 11:08 PM, Vinayakumar B vinayakuma...@huawei.com wrote:










Hi Siddharth,
 
Looks like the issue with one of the machine.  Or its happening in different 
machines also?

 
I don’t think it’s a problem with JVM heap memory.
 
Suggest you to check this once, 

http://stackoverflow.com/questions/8384000/java-io-ioexception-error-11

 
Thanks and Regards,
Vinayakumar B
 


From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com]


Sent: 28 November 2013 05:50

To: USers Hadoop

Subject: RE: Error for larger jobs


 

Hi Azury

 


Thanks for response. I have plenty of space on my Disks so that cannot be the 
issue.






**

Cheers !!!


Siddharth Tiwari

Have a refreshing day !!!

Every duty is holy, and devotion to duty is the highest form of worship of 
God.”


Maybe other people will try to limit me but I don't limit myself








Date: Thu, 28 Nov 2013 08:10:06 +0800

Subject: Re: Error for larger jobs

From: azury...@gmail.com

To: user@hadoop.apache.org
Your disk is full from the log.

On 2013-11-28 5:27 AM, Siddharth Tiwari siddharth.tiw...@live.com wrote:



Hi Team

 


I am getting following strange error, can you point me to the possible reason.



I have set heap size to 4GB but still getting it. please help



 


syslog logs

2013-11-27 19:01:50,678 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java
 classes where applicable
2013-11-27 19:01:51,051 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter
 instead
2013-11-27 19:01:51,539 WARN org.apache.hadoop.conf.Configuration:
session.id is deprecated. Instead, use dfs.metrics.session-id

2013-11-27 19:01:51,540 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=

2013-11-27 19:01:51,867 INFO org.apache.hadoop.util.ProcessTree: setsid exited 
with exit code 0

2013-11-27 19:01:51,870 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin 
:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a0bd13d

2013-11-27 19:01:52,217 INFO org.apache.hadoop.mapred.MapTask: Processing 
split:org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@6c30aec7

2013-11-27 19:01:52,222 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES 
is deprecated. Use FileInputFormatCounters as group name and 
 BYTES_READ as counter name instead
2013-11-27 19:01:52,226 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0

2013-11-27 19:01:52,250 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException:
 Cannot run program chmod: error=11, Resource temporarily unavailable
2013-11-27 19:01:52,250 WARN org.apache.hadoop.mapred.Child: Error running child

java.io.IOException: Cannot run program chmod: error=11, Resource temporarily 
unavailable

at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)

at org.apache.hadoop.util.Shell.runCommand(Shell.java:206)

at org.apache.hadoop.util.Shell.run(Shell.java:188)

at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)

at org.apache.hadoop.util.Shell.execCommand(Shell.java:467)

at org.apache.hadoop.util.Shell.execCommand(Shell.java:450)

at 
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:593)

at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584)

at 
org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:146)

at 
org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:168)

at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310)

at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)

at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)

at org.apache.hadoop.mapred.Child.main(Child.java:262)

Caused by: java.io.IOException: error=11, Resource temporarily unavailable

at java.lang.UNIXProcess.forkAndExec(Native Method)

at java.lang.UNIXProcess.init

best solution for data ingestion

2013-11-01 Thread Siddharth Tiwari
hi team
seeking your advice on what could be best way to ingest a lot of data to 
hadoop. Also what are views about fuse ?

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

Flume not moving data help !!!

2013-10-31 Thread Siddharth Tiwari
Hi team I created flume source and sink as following in hadoop yarn and I am 
not getting data transferred from source to sink in HDFS it doesnt create any 
file and on local everytime I start agent it creates one empty file. Below are 
my configs in source and sink

Source :-
agent.sources = logger1agent.sources.logger1.type = 
execagent.sources.logger1.command = tail -f 
/var/log/messagesagent.sources.logger1.batchsSize = 
0agent.sources.logger1.channels = memoryChannelagent.channels = 
memoryChannelagent.channels.memoryChannel.type = 
memoryagent.channels.memoryChannel.capacity = 100agent.sinks = 
AvroSinkagent.sinks.AvroSink.type = avroagent.sinks.AvroSink.channel = 
memoryChannelagent.sinks.AvroSink.hostname = 
192.168.147.101agent.sinks.AvroSink.port = 
4545agent.sources.logger1.interceptors = itime 
ihostagent.sources.logger1.interceptors.itime.type = 
TimestampInterceptoragent.sources.logger1.interceptors.ihost.type = 
hostagent.sources.logger1.interceptors.ihost.useIP = 
falseagent.sources.logger1.interceptors.ihost.hostHeader = host

Sink at one of the slave ( datanodes on my Yarn cluster ) :
collector.sources = AvroIncollector.sources.AvroIn.type = 
avrocollector.sources.AvroIn.bind = 0.0.0.0collector.sources.AvroIn.port = 
4545collector.sources.AvroIn.channels = mc1 mc2collector.channels = mc1 
mc2collector.channels.mc1.type = memorycollector.channels.mc1.capacity = 100
collector.channels.mc2.type = memorycollector.channels.mc2.capacity = 100
collector.sinks = LocalOut HadoopOutcollector.sinks.LocalOut.type = 
file_rollcollector.sinks.LocalOut.sink.directory = 
/home/hadoop/flumecollector.sinks.LocalOut.sink.rollInterval = 
0collector.sinks.LocalOut.channel = mc1collector.sinks.HadoopOut.type = 
hdfscollector.sinks.HadoopOut.channel = mc2collector.sinks.HadoopOut.hdfs.path 
= /flumecollector.sinks.HadoopOut.hdfs.fileType = 
DataStreamcollector.sinks.HadoopOut.hdfs.writeFormat = 
Textcollector.sinks.HadoopOut.hdfs.rollSize = 
0collector.sinks.HadoopOut.hdfs.rollCount = 
1collector.sinks.HadoopOut.hdfs.rollInterval = 600

can somebody point me to what am I doing wrong ?
This is what I get in my local directory
[hadoop@node1 flume]$ ls -lrttotal 0-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:25 
1383243942803-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:28 
1383244097923-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:31 
1383244302225-1-rw-rw-r-- 1 hadoop hadoop 0 Oct 31 11:33 1383244404929-1

when I restart the collector it creates one 0 bytes file.
Please help 
**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

Trash in yarn

2013-10-25 Thread Siddharth Tiwari
How can I enable trash in hadoop-2.2.0.  Also, if I drop a table on hive does 
the trash help in it ?
Please help , thanks for help in advance. 

Sent from my iPhone

Re: Cannot start resourcemanager

2013-10-18 Thread Siddharth Tiwari
Hi team, 
Resource manager works now, the capacity scheduler xml had wrong entry somehow. 
But there is another small issue, I have NN HA enabled and wanted to run Hbase 
with it and eventhough I set hbase.rootdir to FsNameservice value it always 
throws exception saying it cannot recognize the nameservice value. I did put 
the core site and hdfs site in hbase conf. Can you help me in setting it up 
with namenode HA in new hadoop-2.2.0 stable release .  Also what versions of 
hive, mahout and pig would be compatible with it. I am using hbase-0.94.12 
release.

Sent from my iPad

 On Oct 17, 2013, at 12:48 PM, Arun C Murthy a...@hortonworks.com wrote:
 
 What command did you use to start the RM?
 
 On Oct 17, 2013, at 10:18 AM, Siddharth Tiwari siddharth.tiw...@live.com 
 wrote:
 
 Hi Team,
 
 trying to start resourcemanager in the latest hadoop-2.2.0 stable release. 
 It throws following error. Please help
 
 2013-10-17 10:01:51,230 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
 metrics system...
 2013-10-17 10:01:51,230 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
 system stopped.
 2013-10-17 10:01:51,231 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
 system shutdown complete.
 2013-10-17 10:01:51,232 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error 
 starting ResourceManager
 java.lang.RuntimeException: java.lang.RuntimeException: 
 java.lang.ClassNotFoundException: Class 
 org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator
  not found
  at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getResourceCalculator(CapacitySchedulerConfiguration.java:333)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:263)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249)
  at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:871)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
 Class 
 org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator
  not found
  at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
  at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1744)
  ... 5 more
 Caused by: java.lang.ClassNotFoundException: Class 
 org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator
  not found
  at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
  at 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
  ... 6 more
 2013-10-17 10:01:51,239 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at node1/192.168.147.101
 
 
 
 **
 Cheers !!!
 Siddharth Tiwari
 Have a refreshing day !!!
 Every duty is holy, and devotion to duty is the highest form of worship of 
 God.” 
 Maybe other people will try to limit me but I don't limit myself
 
 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/
 
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.


Using Ambari to deploy Apache hadoop

2013-10-18 Thread Siddharth Tiwari
Hi team,

Is it possible to deploy hadoop from Apache via Ambari ? Also is there a link 
for full offline installation ? We do not have access to outside world and we 
want to use Ambari for reploying Hadoop ( not hortonworks release though )

Sent from my iPhone

Using Hbase with NN HA

2013-10-18 Thread Siddharth Tiwari
Hi team,
Can Hbase be used with namenode HA in latest hadoop-2.2.0 ?
If yes is there something else required to be done other than following ?
1. Set hbase root dir to logical name of namenode service
2. Keep core site and hdfs site jn hbase conf

I did above two but logical name is not recognized. 

Also it will be helpful if i could get some help with which versions of Hbase 
hive pig and mahout are compatible with latest yarn release hadoop-2.2.0.  

I am using hbase-0.94.12

Thanks

Sent from my iPhone

Warning while starting services

2013-10-18 Thread Siddharth Tiwari
Hi
I get following warning when I start the services in hadoop-2.2.0. What doe sit 
signify and how to get rid of it ?
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library 
/opt/hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have 
disabled stack guard. The VM will try to fix the stack guard now.It's highly 
recommended that you fix the library with 'execstack -c libfile', or link it 
with '-z noexecstack'.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: Error in documentation

2013-10-18 Thread Siddharth Tiwari
Can I get access to update the same ?

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


 Date: Fri, 18 Oct 2013 11:42:29 +0200
 Subject: Re: Error in documentation
 From: ake...@concurrentinc.com
 To: user@hadoop.apache.org
 
 The best thing to do is to open a JIRA here:
 https://issues.apache.org/jira/secure/Dashboard.jspa You might also
 want to submit a patch, which is very easy.
 
 - André
 
 On Fri, Oct 18, 2013 at 11:28 AM, Siddharth Tiwari
 siddharth.tiw...@live.com wrote:
  The installation documentation for Hadoop yarn at this link
  http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
  has error in the yarn-site for property yarn.nodemanager.aux-services. it
  should be  mapreduce_shuffle rather than mapreduce.shuffle.
 
 
 
 
 
 
  **
  Cheers !!!
  Siddharth Tiwari
  Have a refreshing day !!!
  Every duty is holy, and devotion to duty is the highest form of worship of
  God.”
  Maybe other people will try to limit me but I don't limit myself
 
 
 
 -- 
 André Kelpe
 an...@concurrentinc.com
 http://concurrentinc.com
  

Cannot start resourcemanager

2013-10-17 Thread Siddharth Tiwari
Hi Team,
trying to start resourcemanager in the latest hadoop-2.2.0 stable release. It 
throws following error. Please help
2013-10-17 10:01:51,230 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping ResourceManager metrics system...2013-10-17 10:01:51,230 INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
system stopped.2013-10-17 10:01:51,231 INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
system shutdown complete.2013-10-17 10:01:51,232 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
ResourceManagerjava.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator
 not found at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752)   at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getResourceCalculator(CapacitySchedulerConfiguration.java:333)
   at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:263)
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249)
  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)   
  at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:871)Caused
 by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator
 not foundat 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)   at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1744)   
... 5 moreCaused by: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.server.resourcemanager.resource.DefaultResourceCalculator
 not found at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626) at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)   
... 6 more2013-10-17 10:01:51,239 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
/SHUTDOWN_MSG: 
Shutting down ResourceManager at node1/192.168.147.101


**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

Removing queues

2013-08-25 Thread Siddharth Tiwari
Whats the easiest way to remove queues from hadoop without restarting services 
? Why cant we just refreshqueues ? 

Sent from my iPhone

size of input files

2013-06-02 Thread Siddharth Tiwari
Hi Friends,
Is there a way to find out what was the size of the input file to each of the 
jobs from the logs or any other place for all jobs submitted ?
Please help

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: size of input files

2013-06-02 Thread Siddharth Tiwari
Do the counters provide the input file size ? I mean is bytes read equal to 
input file size ? Is there any log where I could find input file size submitted 
to each job. I believed that bytes read from fs is different from the input 
file size to the job.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: rahul.rec@gmail.com
Date: Sun, 2 Jun 2013 23:26:08 +0530
Subject: Re: size of input files
To: user@hadoop.apache.org

Counters can help. Input to mr is a directory. The counters can point to the 
number of bytes read from that fs directory.



Rahul


On Sun, Jun 2, 2013 at 11:22 PM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:





Hi Friends,
Is there a way to find out what was the size of the input file to each of the 
jobs from the logs or any other place for all jobs submitted ?


Please help


**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 



Maybe other people will try to limit me but I don't limit myself
  



  

Data migration from one cluster to other running diff. versions

2013-01-30 Thread Siddharth Tiwari
Hi Team,

What is the best way to migrate data residing on one cluster to another cluster 
?
Are there better methods available than distcp ?
What if both the clusters running different RPC protocol versions ?

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: One petabyte of data loading into HDFS with in 10 min.

2012-09-10 Thread Siddharth Tiwari

Well can't you load the incremental data only ? as the goal seems quite 
unrealistic. The big guns have already spoken :P

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: alex.gauth...@teradata.com
To: user@hadoop.apache.org; mike.se...@thinkbiganalytics.com
Subject: RE: One petabyte of data loading into HDFS with in 10 min.
Date: Mon, 10 Sep 2012 16:17:20 +









Well said Mike. Lots of “funny questions” around here lately…

 


From: Michael Segel [mailto:michael_se...@hotmail.com]


Sent: Monday, September 10, 2012 4:50 AM

To: user@hadoop.apache.org

Cc: Michael Segel

Subject: Re: One petabyte of data loading into HDFS with in 10 min.


 
 


On Sep 10, 2012, at 2:40 AM, prabhu K prabhu.had...@gmail.com wrote:







Hi Users,


 


Thanks for the response.


 


We have loaded 100GB data loaded into HDFS, time taken 1hr.with below 
configuration.
Each Node (1 machine master, 2 machines  are slave)

1.   
500 GB hard disk.


2.   
4Gb RAM


3.   
3 quad code CPUs.


4.   
Speed 1333 MHz


 

Now, we are planning to load 1 petabyte of data (single file)  into Hadoop HDFS 
and Hive table within 10-20 minutes. For this we need a clarification below.

Ok...


 


Some say that I am sometimes too harsh in my criticisms so take what I say with 
a grain of salt...


 


You loaded 100GB in an hour using woefully underperforming hardware and are now 
saying you want to load 1PB in 10 mins.


 


I would strongly suggest that you first learn more about Hadoop.  No really. 
Looking at your first machine, its obvious that you don't really grok hadoop 
and what it requires to achieve optimum performance.  You couldn't even 
extrapolate
 any meaningful data from your current environment.


 


Secondly, I think you need to actually think about the problem. Did you mean PB 
or TB? Because your math seems to be off by a couple orders of magnitude. 


 


A single file measured in PBs? That is currently impossible using today (2012) 
technology. In fact a single file that is measured in PBs wouldn't exist within 
the next 5 years and most likely the next decade. [Moore's law is all about CPU
 power, not disk density.]


 


Also take a look at networking. 


ToR switch design differs, however current technology, the fabric tends to max 
out at 40GBs.  What's the widest fabric on a backplane? 


That's your first bottleneck because even if you had a 1PB of data, you 
couldn't feed it to the cluster fast enough. 


 


Forget disk. look at PCIe based memory. (Money no object, right? ) 


You still couldn't populate it fast enough.


 


I guess Steve hit this nail on the head when he talked about this being a 
homework assignment. 


 


High school maybe? 


 








1. what are the system configuration setup required for all the 3 machine’s ?.
2. Hard disk size.
3. RAM size.
4. Mother board
5. Network cable
6. How much Gbps  Infiniband required.
 For the same setup we need cloud computing environment too?
Please suggest and help me on this.
 Thanks,
Prabhu.


On Fri, Sep 7, 2012 at 7:30 PM, Michael Segel michael_se...@hotmail.com wrote:
Sorry, but you didn't account for the network saturation.



And why 1GBe and not 10GBe? Also which version of hadoop?



Here MapR works well with bonding two 10GBe ports and with the right switch, 
you could do ok.

Also 2 ToR switches... per rack. etc...



How many machines? 150? 300? more?



Then you don't talk about how much memory, CPUs, what type of storage...



Lots of factors.



I'm sorry to interrupt this mental masturbation about how to load 1PB in 10min.

There is a lot more questions that should be asked that weren't.



Hey but look. Its a Friday, so I suggest some pizza, beer and then take it to a 
white board.



But what do I know? In a different thread, I'm talking about how to tame HR and 
Accounting so they let me play with my team Ninja!

:-P




On Sep 5, 2012, at 9:56 AM, zGreenfelder zgreenfel...@gmail.com wrote:



 On Wed, Sep 5, 2012 at 10:43 AM, Cosmin Lehene cleh...@adobe.com wrote:

 Here's an extremely naïve ballpark estimation: at theoretical hardware

 speed, for 3PB representing 1PB with 3x replication



 Over a single 1Gbps connection (and I'm not sure, you can actually reach

 1Gbps)

 (3 petabytes) / (1 Gbps) = 291.27 days



 So you'd need at least 40,000 1Gbps network cards to get that in 10 minutes

 :) - (3PB/1Gbps)/4



 The actual number of nodes would depend a lot on the actual network

 architecture, the type of storage you use (SSD,  HDD), etc.



 Cosmin



 ah, I went te other direction with the math, and assumed no

 replication (completely unsafe and never reasonable for a real,

 production environment, but since we're all theory and just looking

 for starting point numbers)





 1PB in 10 min ==

 1,000,000gB in 10 min

Record seperator

2012-09-10 Thread Siddharth Tiwari

Hi list,

Out of context, does any one encountered record separator delimiter problem. I 
have a log file in which each record is separated using  RECORD SEPERATOR 
delimiter ( ^^ ) , can any one help me on this on how can I use it as delimiter 
?

Thanks 

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: Record seperator

2012-09-10 Thread Siddharth Tiwari

Hi Harsh,
I am using CDH3U4. 
The records are seperated by following ascii 
^^
30
1E
RS
␞
Record Separator


I did not understand what u intend me to do so that I can use this one ? 
Thanks

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


 From: ha...@cloudera.com
 Date: Tue, 11 Sep 2012 01:11:16 +0530
 Subject: Re: Record seperator
 To: user@hadoop.apache.org
 
 What version of Hadoop are you using? Via
 https://issues.apache.org/jira/browse/MAPREDUCE-2254 you can simply
 set your custom delimiter string as a configuration option, and get
 this to work right out of the box.
 
 On Tue, Sep 11, 2012 at 12:59 AM, Siddharth Tiwari
 siddharth.tiw...@live.com wrote:
  Hi list,
 
  Out of context, does any one encountered record separator delimiter problem.
  I have a log file in which each record is separated using  RECORD SEPERATOR
  delimiter ( ^^ ) , can any one help me on this on how can I use it as
  delimiter ?
 
  Thanks
 
 
  **
  Cheers !!!
  Siddharth Tiwari
  Have a refreshing day !!!
  Every duty is holy, and devotion to duty is the highest form of worship of
  God.”
  Maybe other people will try to limit me but I don't limit myself
 
 
 
 -- 
 Harsh J
  

Hadoop and MainFrame integration

2012-08-28 Thread Siddharth Tiwari

Hi Users.

We have flat files on mainframes with around a billion records. We need to sort 
them and then use them with different jobs on mainframe for report generation. 
I was wondering was there any way I could integrate the mainframe with hadoop 
do the sorting and keep the file on the sever itself ( I do not want to ftp the 
file to a hadoop cluster and then ftp back the sorted file to Mainframe as it 
would waste MIPS and nullify the advantage ). This way I could save on MIPS and 
ultimately improve profitability. 

Thank you in advance

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: Reading multiple lines from a microsoft doc in hadoop

2012-08-25 Thread Siddharth Tiwari

CAn anybody enlighten me on what could be wrongg ?

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: siddharth.tiw...@live.com
To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com
Subject: RE: Reading multiple lines from a microsoft doc in hadoop
Date: Sat, 25 Aug 2012 05:35:48 +





Any help on below would be really appreciated. i am stuck with it

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: siddharth.tiw...@live.com
To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com
Subject: RE: Reading multiple lines from a microsoft doc in hadoop
Date: Fri, 24 Aug 2012 20:23:45 +





Hi ,

Can anyone please help ?

Thank you in advance

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: siddharth.tiw...@live.com
To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com
Subject: RE: Reading multiple lines from a microsoft doc in hadoop
Date: Fri, 24 Aug 2012 16:22:57 +





Hi Team,

Thanks a lot for so many good suggestions. I wrote a custom input format for 
reading one paragraph at a time. But when I use it I get lines read. Can you 
please suggest what changes I must make to read one para at a time seperated by 
null lines ?
below is the code I wrote:-


import java.io.IOException;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.LineRecordReader;
import org.apache.hadoop.util.LineReader;




/**
 * 
 */

/**
 * @author 460615
 *
 */
//FileInputFormat is the base class for all file-based InputFormats
public class ParaInputFormat extends FileInputFormatLongWritable,Text {
private String nullRegex = ^\\s*$ ;
public String StrLine = null;
/*public RecordReaderLongWritable, Text getRecordReader (InputSplit 
genericSplit, JobConf job, Reporter reporter) throws IOException {
reporter.setStatus(genericSplit.toString());
return new ParaInputFormat(job, (FileSplit)genericSplit);
}*/
public RecordReaderLongWritable, Text createRecordReader(InputSplit 
genericSplit, TaskAttemptContext context)throws IOException {
   context.setStatus(genericSplit.toString());
   return new LineRecordReader();
 }


public InputSplit[] getSplits(JobContext job, Configuration conf) throws 
IOException {
ArrayListFileSplit splits = new ArrayListFileSplit();
for (FileStatus status : listStatus(job)) {
Path fileName = status.getPath();
if (status.isDir()) {
throw new IOException(Not a file:  + fileName);
}
FileSystem  fs = fileName.getFileSystem(conf);
LineReader lr = null;
try {
FSDataInputStream in  = fs.open(fileName);
lr = new LineReader(in, conf);
// String regexMatch =in.readLine();
Text line = new Text();
long begin = 0;
long length = 0;
int num = -1;
String boolTest = null;
boolean match = false;
Pattern p = Pattern.compile(nullRegex);
// Matcher matcher = new p.matcher();
while ((boolTest = in.readLine()) != null  (num = lr.readLine(line))  0  ! 
( in.readLine().isEmpty())){
// numLines++;
length += num;
 
 
splits.add(new FileSplit(fileName, begin, length, new String[]{}));}
begin=length;
}finally {
if (lr != null) {
lr.close();
}
 
 
 
}
 
}
return splits.toArray(new FileSplit[splits.size()]);
}
 


}




**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


 Date: Fri, 24 Aug 2012 09:54:10 +0200
 Subject: Re: Reading multiple lines from a microsoft doc in hadoop
 From: haavard.kongsga...@gmail.com
 To: user@hadoop.apache.org
 
 Hi, maybe you should check out the old nutch project
 http

RE: Reading multiple lines from a microsoft doc in hadoop

2012-08-24 Thread Siddharth Tiwari
Hi,
Thank you for the suggestion. Actually I was using poi to extract text, but 
since now  I  have so many  documents I thought I will use hadoop directly to 
parse as well. Average size of each document is around 120 kb. Also I want to 
read multiple lines from the text until I find a blank line. I do not have any 
idea ankit how to design custom input format and record reader. Pleaser help 
with some tutorial tutorial, code or resource around it. I am struggling with 
the issue. I will be highly grateful. Thank you so much once again

 Date: Fri, 24 Aug 2012 08:07:39 +0200
 Subject: Re: Reading multiple lines from a microsoft doc in hadoop
 From: haavard.kongsga...@gmail.com
 To: user@hadoop.apache.org
 
 It's much easier if you convert the documents to text first
 
 use
 http://tika.apache.org/
 
 or some other doc parser
 
 
 -Håvard
 
 On Fri, Aug 24, 2012 at 7:52 AM, Siddharth Tiwari
 siddharth.tiw...@live.com wrote:
  hi,
  I have doc files in msword doc and docx format. These have entries which are
  seperated by an empty line. Is it possible for me to read
  these lines separated from empty lines at a time. Also which inpurformat
  shall I use to read doc docx. Please help
 
  **
  Cheers !!!
  Siddharth Tiwari
  Have a refreshing day !!!
  Every duty is holy, and devotion to duty is the highest form of worship of
  God.”
  Maybe other people will try to limit me but I don't limit myself
 
 
 
 -- 
 Håvard Wahl Kongsgård
 Faculty of Medicine 
 Department of Mathematical Sciences
 NTNU
 
 http://havard.security-review.net/
  

RE: namenode not starting

2012-08-24 Thread Siddharth Tiwari

Hi Abhay,

I totaly conform with Bejoy. Can you paste your mapred-site.xml and 
hdfs-site.xml content here ?

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


 From: lle...@ddn.com
 To: user@hadoop.apache.org
 Subject: RE: namenode not starting
 Date: Fri, 24 Aug 2012 16:38:01 +
 
 Abhay,
   Sounds like your namenode cannot find the metadata information it needs to 
 start (the path/current | image | *checppints etc)
 
   Basically, if you cannot locate that data locally or on your NFS Server,  
 your cluster is busted.
 
   But, let's us be optimistic about this. 
 
  There is a chance that your NFS Server is down or the path mounted is lost.
 
   If it is NFS mounted (as you suggested) check that your host still have 
 that path mounted. (from the proper NFS Server)
   ( [shell] mount ) can tell. 
   * obviously if you originally mounted from foo:/mydata  and now do 
 bar:/mydata /you'll need to do some digging to find which NFS server it 
 was writing to before.
 
  Failing to locate your namenode metadata (locally or on any of your NFS 
 Server)  either because the NFS Server decided to become a blackhole, or 
 someone|thing removed it.
 
   And you don't have a backup of your namenode (tape or Secondary Namenode),  
   I think you are in a world of hurt there.
 
   In theory you can read the blocks on the DN and try to recover some of your 
 data (assume not in CODEC / compressed) .
 Humm.. anyone knows about recovery services? (^^)
 
 
 
 -Original Message-
 From: Håvard Wahl Kongsgård [mailto:haavard.kongsga...@gmail.com] 
 Sent: Friday, August 24, 2012 5:38 AM
 To: user@hadoop.apache.org
 Subject: Re: namenode not starting
 
 You should start with a reboot of the system.
 
 A lesson to everyone, this is exactly why you should have a secondary name 
 node 
 (http://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F)
 and run the namenode a mirrored RAID-5/10 disk.
 
 
 -Håvard
 
 
 
 On Fri, Aug 24, 2012 at 9:40 AM, Abhay Ratnaparkhi 
 abhay.ratnapar...@gmail.com wrote:
  Hello,
 
  I was using cluster for long time and not formatted the namenode.
  I ran bin/stop-all.sh and bin/start-all.sh scripts only.
 
  I am using NFS for dfs.name.dir.
  hadoop.tmp.dir is a /tmp directory. I've not restarted the OS.  Any 
  way to recover the data?
 
  Thanks,
  Abhay
 
 
  On Fri, Aug 24, 2012 at 1:01 PM, Bejoy KS bejoy.had...@gmail.com wrote:
 
  Hi Abhay
 
  What is the value for hadoop.tmp.dir or dfs.name.dir . If it was set 
  to /tmp the contents would be deleted on a OS restart. You need to 
  change this location before you start your NN.
  Regards
  Bejoy KS
 
  Sent from handheld, please excuse typos.
  
  From: Abhay Ratnaparkhi abhay.ratnapar...@gmail.com
  Date: Fri, 24 Aug 2012 12:58:41 +0530
  To: user@hadoop.apache.org
  ReplyTo: user@hadoop.apache.org
  Subject: namenode not starting
 
  Hello,
 
  I had a running hadoop cluster.
  I restarted it and after that namenode is unable to start. I am 
  getting error saying that it's not formatted. :( Is it possible to 
  recover the data on HDFS?
 
  2012-08-24 03:17:55,378 ERROR
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
  initialization failed.
  java.io.IOException: NameNode is not formatted.
  at
  org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:434)
  at
  org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
  at
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291)
  at
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:270)
  at
  org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:271)
  at
  org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:303)
  at
  org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:433)
  at
  org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:421)
  at
  org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1359)
  at
  org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:13
  68)
  2012-08-24 03:17:55,380 ERROR
  org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
  NameNode is not formatted.
  at
  org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:434)
  at
  org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
  at
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291)
  at
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:270

RE: How do we view the blocks of a file in HDFS

2012-08-24 Thread Siddharth Tiwari

Hi Abhishek,

You can use fsck for this purpose

hadoop fsck HDFS directory -files -blocks -locations  --- Displays what you 
want

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: abhisheksgum...@gmail.com
Date: Fri, 24 Aug 2012 22:10:37 +0530
Subject: How do we view the blocks of a file in HDFS
To: user@hadoop.apache.org

hi,
   If I push a file into HDFS that runs on a 4 node cluster with 1 namenode and 
3 datanodes, how can I view where on the datanodes are the blocks of this 
file?I would like to view the blocks and their replicas individually. How can I 
do this?


The answer is very critical for my current task which is halted :) A detailed 
answer will be highly appreciated.Thank you!

With Regards,
Abhishek S

  

RE: Reading multiple lines from a microsoft doc in hadoop

2012-08-24 Thread Siddharth Tiwari

Any help on below would be really appreciated. i am stuck with it

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: siddharth.tiw...@live.com
To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com
Subject: RE: Reading multiple lines from a microsoft doc in hadoop
Date: Fri, 24 Aug 2012 20:23:45 +





Hi ,

Can anyone please help ?

Thank you in advance

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: siddharth.tiw...@live.com
To: user@hadoop.apache.org; bejoy.had...@gmail.com; bejoy...@yahoo.com
Subject: RE: Reading multiple lines from a microsoft doc in hadoop
Date: Fri, 24 Aug 2012 16:22:57 +





Hi Team,

Thanks a lot for so many good suggestions. I wrote a custom input format for 
reading one paragraph at a time. But when I use it I get lines read. Can you 
please suggest what changes I must make to read one para at a time seperated by 
null lines ?
below is the code I wrote:-


import java.io.IOException;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.LineRecordReader;
import org.apache.hadoop.util.LineReader;




/**
 * 
 */

/**
 * @author 460615
 *
 */
//FileInputFormat is the base class for all file-based InputFormats
public class ParaInputFormat extends FileInputFormatLongWritable,Text {
private String nullRegex = ^\\s*$ ;
public String StrLine = null;
/*public RecordReaderLongWritable, Text getRecordReader (InputSplit 
genericSplit, JobConf job, Reporter reporter) throws IOException {
reporter.setStatus(genericSplit.toString());
return new ParaInputFormat(job, (FileSplit)genericSplit);
}*/
public RecordReaderLongWritable, Text createRecordReader(InputSplit 
genericSplit, TaskAttemptContext context)throws IOException {
   context.setStatus(genericSplit.toString());
   return new LineRecordReader();
 }


public InputSplit[] getSplits(JobContext job, Configuration conf) throws 
IOException {
ArrayListFileSplit splits = new ArrayListFileSplit();
for (FileStatus status : listStatus(job)) {
Path fileName = status.getPath();
if (status.isDir()) {
throw new IOException(Not a file:  + fileName);
}
FileSystem  fs = fileName.getFileSystem(conf);
LineReader lr = null;
try {
FSDataInputStream in  = fs.open(fileName);
lr = new LineReader(in, conf);
// String regexMatch =in.readLine();
Text line = new Text();
long begin = 0;
long length = 0;
int num = -1;
String boolTest = null;
boolean match = false;
Pattern p = Pattern.compile(nullRegex);
// Matcher matcher = new p.matcher();
while ((boolTest = in.readLine()) != null  (num = lr.readLine(line))  0  ! 
( in.readLine().isEmpty())){
// numLines++;
length += num;
 
 
splits.add(new FileSplit(fileName, begin, length, new String[]{}));}
begin=length;
}finally {
if (lr != null) {
lr.close();
}
 
 
 
}
 
}
return splits.toArray(new FileSplit[splits.size()]);
}
 


}




**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


 Date: Fri, 24 Aug 2012 09:54:10 +0200
 Subject: Re: Reading multiple lines from a microsoft doc in hadoop
 From: haavard.kongsga...@gmail.com
 To: user@hadoop.apache.org
 
 Hi, maybe you should check out the old nutch project
 http://nutch.apache.org/ (hadoop was developed for nutch).
 It's a web crawler and indexer, but the malinglists hold much info
 doc/pdf parsing which also relates to hadoop.
 
 Have never parsed many docx or doc files, but it should be
 strait-forward. But generally for text analysis preprocessing is the
 KEY! For example replace dual lines \r\n\r\n or (\n\n) with  is a
 simple trick)
 
 
 -Håvard
 
 On Fri, Aug 24, 2012 at 9:30 AM, Siddharth Tiwari
 siddharth.tiw...@live.com wrote:
  Hi,
  Thank you

Reading multiple lines from a microsoft doc in hadoop

2012-08-23 Thread Siddharth Tiwari

hi,
I have doc files in msword doc and docx format. These have entries which are 
seperated by an empty line. Is it possible for me to read these lines separated 
from empty lines at a time. Also which inpurformat shall I use to read doc 
docx. Please help

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

Streaming issue ( URGENT )

2012-08-20 Thread Siddharth Tiwari

Hi team,




I have a python script which  normally runs like this locally,


Python mapper.py file1 file2  2 .


How can I achieve this by using streaming API, and using the script as mapper. 
It actually joins the three files on a column which is passed as parameter ( 
numeric ) .



Also how can I use paste command in mapper to concatenate three files.


Ex, paste file1 file2 file3  file4


This is in normal shell,


How to achieve it over streaming.

if possible please explain how can I achive it using multiple mappers and one 
reducer. It would be great If I could get some examples, tried searching a lot 
:(



Thanks in advance please help

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself
  

RE: hi facing issue with mysql in

2012-08-20 Thread Siddharth Tiwari

The description is quite vague. 

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

Maybe other people will try to limit me but I don't limit myself


From: rahul.rec@gmail.com
Date: Mon, 20 Aug 2012 22:00:19 +0530
Subject: Re: hi facing issue with mysql in
To: user@hadoop.apache.org

This list ,I believe is for Hadoop users .


On Mon, Aug 20, 2012 at 9:58 PM, rahul p rahulpoolancha...@gmail.com wrote:


Hi,
Please help set my mysql.
it is giving persmission issue.



  

RE: Streaming Issue

2012-08-19 Thread Siddharth Tiwari
Hi mohit,
The script normally runs like this locally,
Python mapper.py file1 file2 file3.
How can I achieve this.
Also how can I use paste command in mapper.
Ex, paste file1 file2 file3  file4
This is in normal shell,
How to achieve it over streaming.
Thanks in advance please help

Date: Sun, 19 Aug 2012 13:42:20 -0700
Subject: Re: FW: Streaming Issue
From: mohitanch...@gmail.com
To: user@hadoop.apache.org

Are you looking for something like this?
 
hadoop jar hadoop-streaming.jar -input 'file1 -input file2


On Sun, Aug 19, 2012 at 11:16 AM, Siddharth Tiwari siddharth.tiw...@live.com 
wrote:










Hi Friends,

Can you please suggest me how can I pass 3 files as parameters to the mapper 
written in python in hadoop streaming API, which will process data from this 
three different files . Please help.

 


**
Cheers !!!

Siddharth Tiwari
Have a refreshing day !!!

Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 
Maybe other people will try to limit me but I don't limit myself