Multiple data node and namenode ?

2013-07-25 Thread Manish Bhoge
I am configuring cluster and starting with first machine. So i have configured 
the core-site, hdfs-site and mapred-site to run hadoop only on 1 machine. But 
somehow i am getting 2 copies of data node, namenode and secondary namenode. 
Not able to figure out why ?



SecurityAuth-hdfs.audit 0 bytes Jul 17, 2013 4:34:56 AM
hadoop-hdfs-datanode-INBEDU011997A.log 243358 bytes Jul 25, 2013 2:29:15 AM
hadoop-hdfs-datanode-myhost-1.log 1893692 bytes Jul 25, 2013 2:50:42 AM
hadoop-hdfs-namenode-INBEDU011997A.log 19545397 bytes Jul 25, 2013 2:50:40 AM
hadoop-hdfs-namenode-myhost-1.log 2438271 bytes Jul 25, 2013 2:48:57 AM
hadoop-hdfs-secondarynamenode-INBEDU011997A.log 4061047 bytes Jul 25, 2013 
2:30:10 AM
hadoop-hdfs-secondarynamenode-myhost-1.log 9707957 bytes Jul 25, 2013 2:46:18 AM

Re: Multiple data node and namenode ?

2013-07-25 Thread Manish Bhoge
Unfortunately, I don't have JPS installed. But as you can see it shows 2 data 
nodes. 
 hadoop-hdfs-datanode-INBEDU011997A.log 243358 bytes Jul 25, 2013 2:29:15 AM 
hadoop-hdfs-datanode-myhost-1.log 1893692 bytes Jul 25, 2013 2:50:42 AM


is it because i have given 3 directory under dfs-data.dir in hdfs-site.xml?



 From: Devaraj k devara...@huawei.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org 
Sent: Thursday, 25 July 2013 12:41 PM
Subject: RE: Multiple data node and namenode ?
 

Hi Manish,

  Can you check how many data node processes are running really in the machine 
using the command 'jps' or 'ps'. 

Thanks
Devaraj k


-Original Message-
From: Manish Bhoge [mailto:manishbh...@rocketmail.com] 
Sent: 25 July 2013 12:29
To: common-user@hadoop.apache.org
Subject: Multiple data node and namenode ?

I am configuring cluster and starting with first machine. So i have configured 
the core-site, hdfs-site and mapred-site to run hadoop only on 1 machine. But 
somehow i am getting 2 copies of data node, namenode and secondary namenode. 
Not able to figure out why ?



SecurityAuth-hdfs.audit 0 bytes Jul 17, 2013 4:34:56 AM 
hadoop-hdfs-datanode-INBEDU011997A.log 243358 bytes Jul 25, 2013 2:29:15 AM 
hadoop-hdfs-datanode-myhost-1.log 1893692 bytes Jul 25, 2013 2:50:42 AM 
hadoop-hdfs-namenode-INBEDU011997A.log 19545397 bytes Jul 25, 2013 2:50:40 AM 
hadoop-hdfs-namenode-myhost-1.log 2438271 bytes Jul 25, 2013 2:48:57 AM 
hadoop-hdfs-secondarynamenode-INBEDU011997A.log 4061047 bytes Jul 25, 2013 
2:30:10 AM hadoop-hdfs-secondarynamenode-myhost-1.log 9707957 bytes Jul 25, 
2013 2:46:18 AM

Re: REg - Hive

2012-10-19 Thread Manish Bhoge
Check when you configured Hive on your hadoop cluster which metastore you used? 
If you have MySQL as a metastore then you are done you can goto meta tables to 
pull attribute's information.


Sent from my BlackBerry, pls excuse typo

-Original Message-
From: sudha sadhasivam sudhasadhasi...@yahoo.com
Date: Fri, 19 Oct 2012 03:13:54 
To: common-ser-groupcommon-user@hadoop.apache.org; hadoop core 
developercommon-...@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Cc: hadoop userscore-u...@hadoop.apache.org
Subject: REg - Hive

Sir

In our system, we need hive query to  retrieve the field name where a 
particular data item belongs
For example, if we have a table having location information, when queried for 
New York, the query should retuen the field names where New York occurs like 
City, Administrative division etc.

Kindly inform whether it is possible to retrieve meta-data information for a 
particular field value in Hive
Thanking you
G Sudha



Re: Hadoop Admin

2012-07-15 Thread Manish Bhoge
Hadoop in Action book can give u a good hands on admin. It is freely 
available on net.
Sent from my BlackBerry, pls excuse typo

-Original Message-
From: prabhu K prabhu.had...@gmail.com
Date: Sun, 15 Jul 2012 17:48:03 
To: common-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Hadoop Admin

Hi Users,

Can you please any one provide me in-depth Hadoop Administrator related web
links and ppts.

Thanks,
Prabhu.





Re: How to load raw log file into HDFS?

2012-05-14 Thread Manish Bhoge
You first need to copy data using copyFromLocal to your HDFS and then you can 
utilize PIG and Hive program for further analysis which run on map reduce. Yes 
warehouse directory is in HDFS. If you want to run(test) PIG in local then in 
that case you don't to copy data to HDFS
Sent from my BlackBerry, pls excuse typo

-Original Message-
From: Michael Wang michael.w...@meredith.com
Date: Mon, 14 May 2012 18:43:47 
To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: RE: How to load raw log file into HDFS?

I have the same question and I am glad to get you guys' help. I am also novice 
in Hadoop :)
I am using pig and hive to analyze the logs. My logs are in LOCAL_FILE_PATH. 
Do I need to use hadoop fs -copyFromLocal to put files to HDFS_FILE_PATH 
first, and then load data files to pig or hive from HDFS_FILE_PATH? Or can 
just load logs from Local_file_path directly to pig or hive? After I load the 
files to hive, I found it is put at /user/hive/warehouse. Is 
/user/hive/warehouse a HDFS?
How do I know what HDFS_FILE_PATH are available? 

-Original Message-
From: Alexander Fahlke [mailto:alexander.fahlke.mailingli...@googlemail.com] 
Sent: Monday, May 14, 2012 1:53 AM
To: common-user@hadoop.apache.org
Subject: Re: How to load raw log file into HDFS?

Hi,

the best would be to read the documentation and some books to get familar
with Hadoop.

One of my favourite books is Hadoop in Action from Manning (
http://www.manning.com/lam/)
This book has an exmple for putting (log)-files into HDFS. Check out the
source listing-3-1

Later you can also check out Cloudera's Flume:
https://github.com/cloudera/flume/wiki

-- 
BR

Alexander Fahlke
Java Developer
www.nurago.com | www.fahlke.org


On Mon, May 14, 2012 at 7:24 AM, Amith D K amit...@huawei.com wrote:

 U can even use put/copyFromLocal

 both are similar and does the job via terminal.

 Or u can write a simple client program to do the job :)

 Amith


 
 From: samir das mohapatra [samir.help...@gmail.com]
 Sent: Sunday, May 13, 2012 9:13 PM
 To: common-user@hadoop.apache.org
 Subject: Re: How to load raw log file into HDFS?

 Hi
 To load any file from local
 Command:
  syntax: hadoop fs -copyFromLocal  LOCAL_FILE_PATH  HDFS_FILE_PATH
   Example hadoop fs -copyFromLocal input/logs
 hdfs://localhost/user/dataset/

  More Commans:
 http://hadoop.apache.org/common/docs/r0.17.1/hdfs_shell.html


 On Sun, May 13, 2012 at 9:53 AM, AnExplorer satishtha...@gmail.com
 wrote:

 
  Hi, I am novice in Hadoop. Kindly suggest how do we load log files into
  hdfs.
  Please suggest the command and steps.
  Thanks in advance!!
  --
  View this message in context:
 
 http://old.nabble.com/How-to-load-raw-log-file-into-HDFS--tp33815208p33815208.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 


This electronic message, including any attachments, may contain proprietary, 
confidential or privileged information for the sole use of the intended 
recipient(s). You are hereby notified that any unauthorized disclosure, 
copying, distribution, or use of this message is prohibited. If you have 
received this message in error, please immediately notify the sender by reply 
e-mail and delete it.



Re: Hive Thrift help

2012-04-17 Thread Manish Bhoge
Michael,

Out of the box I am taking this problem as a metadata problem as we have also 
faced same kind of issue when connecting tableu with hive and problem has 
identified in metadata set up. If your metadata in default apache db i.e. Derby 
then jdbc connection doesn't work. As a work around we have stored hive 
metadata in mySQL and configured hive-site.xml accordingly and now we're able 
to establish jdbc connection.

Thank you,

Sent from my BlackBerry, pls excuse typo

-Original Message-
From: Michael Wang michael.w...@meredith.com
Date: Mon, 16 Apr 2012 20:53:31 
To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Hive Thrift help

we need to connect to HIVE from Microstrategy reports, and it requires the Hive 
Thrift server. But I
tried to start it, and it just hangs as below.
# hive --service hiveserver
Starting Hive Thrift Server
Any ideas?
Thanks,
Michael

This electronic message, including any attachments, may contain proprietary, 
confidential or privileged information for the sole use of the intended 
recipient(s). You are hereby notified that any unauthorized disclosure, 
copying, distribution, or use of this message is prohibited. If you have 
received this message in error, please immediately notify the sender by reply 
e-mail and delete it.



Re: Basic setup questions on Ubuntu

2012-04-15 Thread Manish Bhoge
Prashant,
Post your questions to cdh-u...@cloudera.org.

Follow CDH3 installation guide. After installing package and individual 
components you need to configure all configuration files like core-site.xml, 
hdfs-site.xml etc. 

Thanks
Manish
Sent from my BlackBerry, pls excuse typo

-Original Message-
From: shan s mysub...@gmail.com
Date: Mon, 16 Apr 2012 02:49:51 
To: common-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Basic setup questions on Ubuntu

I am a newbie to Unix/Hadoop and have basic questions about CDH3 setup.


I installed CDH3 on Ubuntu 11.0 Unix box. I want to setup a sudo
cluster where I can  run my pig jobs under mapreduce mode.
How do I achieve that?

1. I couldd not find the  core-site.xml. hdfs-site.xml and mapred-site.xml
files with all default parameters set? Where are these located.
 (I see the files under example-conf. dir, but I guess they are example
files)
2. I see several config files under /usr/lib/hadoop/conf. But all of them
are empty files, with the comments that these can be used to override the
configuration, but these are read-only files. What is the intention of
these files being read-only.


Many Thanks,
Prashant



Job tracker service start issue.

2012-03-23 Thread Manish Bhoge
I have Hadoop running on Standalone box. When I am starting deamon for 
namenode, secondarynamenode, job tracker, task tracker and data node, it is 
starting gracefully. But soon after it start job tracker it doesn't 
show up job tracker service. when i run 'jps' it is showing me all the 
services including task tracker except Job Tracker. 

Is there any time limit that need to set up or is it going into the safe 
mode. Because when i saw job tracker log this what it is showing, looks 
like it is starting the namenode but soon after it shutdown:

2012-03-22 23:26:04,061 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: 
/
STARTUP_MSG: Starting JobTracker
STARTUP_MSG:   host = manish/10.131.18.119
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u3
STARTUP_MSG:   build = 
file:///data/1/tmp/nightly_2012-02-16_09-46-24_3/hadoop-0.20-0.20.2+923.195-1~maverick
 -r 217a3767c48ad11d4632e19a22897677268c40c4; compiled by 'root' on Thu Feb 16 
10:22:53 PST 2012
/
2012-03-22 23:26:04,140 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Updating the current master key for generating delegation tokens
2012-03-22 23:26:04,141 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Starting expired delegation token remover thread, tokenRemoverScanInterval=60 
min(s)
2012-03-22 23:26:04,141 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Updating the current master key for generating delegation tokens
2012-03-22 23:26:04,142 INFO org.apache.hadoop.mapred.JobTracker: Scheduler 
configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, 
limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2012-03-22 23:26:04,143 INFO org.apache.hadoop.util.HostsFileReader: Refreshing 
hosts (include/exclude) list
2012-03-22 23:26:04,186 INFO org.apache.hadoop.mapred.JobTracker: Starting 
jobtracker with owner as mapred
2012-03-22 23:26:04,201 INFO org.apache.hadoop.ipc.Server: Starting Socket 
Reader #1 for port 54311
2012-03-22 23:26:04,203 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=JobTracker, port=54311
2012-03-22 23:26:04,206 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: 
Initializing RPC Metrics with hostName=JobTracker, port=54311
2012-03-22 23:26:09,250 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-03-22 23:26:09,298 INFO org.apache.hadoop.http.HttpServer: Added global 
filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Port returned 
by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the 
listener on 50030
2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: 
listener.getLocalPort() returned 50030 
webServer.getConnectors()[0].getLocalPort() returned 50030
2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 50030
2012-03-22 23:26:09,319 INFO org.mortbay.log: jetty-6.1.26.cloudera.1
2012-03-22 23:26:09,517 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:50030
2012-03-22 23:26:09,519 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=JobTracker, sessionId=
2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up 
at: 54311
2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker 
webserver: 50030
2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: Failed to 
operate on mapred.system.dir 
(hdfs://localhost:54310/app/hadoop/tmp/mapred/system) because of permissions.
2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: This 
directory should be owned by the user 'mapred (auth:SIMPLE)'
2012-03-22 23:26:09,650 WARN org.apache.hadoop.mapred.JobTracker: Bailing out 
... 
org.apache.hadoop.security.AccessControlException: The systemdir 
hdfs://localhost:54310/app/hadoop/tmp/mapred/system is not owned by mapred at 
org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2241) at 
org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2050) at 
org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296) at 
org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288) at 
org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792)
2012-03-22 23:26:09,652 FATAL org.apache.hadoop.mapred.JobTracker: 
org.apache.hadoop.security.AccessControlException: The systemdir 
hdfs://localhost:54310/app/hadoop/tmp/mapred/system is not owned by mapred at 
org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2241) at 
org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2050) at 
org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296) at 

Issue when starting services on CDH3

2012-03-15 Thread Manish Bhoge
I have CDH3 installed in standalone mode. I have install all hadoop components. 
Now when I start services (namenode,secondary namenode,job tracker,task 
tracker) I can start gracefully from /usr/lib/hadoop/ ./bin/start-all.sh. But 
when start the same servises from /etc/init.d/hadoop-0.20-* then I unable to 
start. Why? Now I want to start Hue also which is in init.d that also I 
couldn't start. Here I suspect authentication issue. Because all the services 
in init.d are under root user and root group. Please suggest I am stuck here. I 
tried hive and it seems it running fine.
Thanks
Manish.
Sent from my BlackBerry, pls excuse typo



Re: Issue when starting services on CDH3

2012-03-15 Thread Manish Bhoge
Manu,
None of the services getting up including namenode, second namenode, 
tasktracker, jobtracker

Sent from my BlackBerry, pls excuse typo

-Original Message-
From: Manu S manupk...@gmail.com
Date: Thu, 15 Mar 2012 21:31:34 
To: common-user@hadoop.apache.org; manishbh...@rocketmail.com
Subject: Re: Issue when starting services on CDH3

Dear manish
Which daemons are not starting?

On Mar 15, 2012 9:21 PM, Manish Bhoge manishbh...@rocketmail.com wrote:

 I have CDH3 installed in standalone mode. I have install all hadoop
components. Now when I start services (namenode,secondary namenode,job
tracker,task tracker) I can start gracefully from /usr/lib/hadoop/
./bin/start-all.sh. But when start the same servises from
/etc/init.d/hadoop-0.20-* then I unable to start. Why? Now I want to start
Hue also which is in init.d that also I couldn't start. Here I suspect
authentication issue. Because all the services in init.d are under root
user and root group. Please suggest I am stuck here. I tried hive and it
seems it running fine.
 Thanks
 Manish.
 Sent from my BlackBerry, pls excuse typo