Multiple data node and namenode ?
I am configuring cluster and starting with first machine. So i have configured the core-site, hdfs-site and mapred-site to run hadoop only on 1 machine. But somehow i am getting 2 copies of data node, namenode and secondary namenode. Not able to figure out why ? SecurityAuth-hdfs.audit 0 bytes Jul 17, 2013 4:34:56 AM hadoop-hdfs-datanode-INBEDU011997A.log 243358 bytes Jul 25, 2013 2:29:15 AM hadoop-hdfs-datanode-myhost-1.log 1893692 bytes Jul 25, 2013 2:50:42 AM hadoop-hdfs-namenode-INBEDU011997A.log 19545397 bytes Jul 25, 2013 2:50:40 AM hadoop-hdfs-namenode-myhost-1.log 2438271 bytes Jul 25, 2013 2:48:57 AM hadoop-hdfs-secondarynamenode-INBEDU011997A.log 4061047 bytes Jul 25, 2013 2:30:10 AM hadoop-hdfs-secondarynamenode-myhost-1.log 9707957 bytes Jul 25, 2013 2:46:18 AM
Re: Multiple data node and namenode ?
Unfortunately, I don't have JPS installed. But as you can see it shows 2 data nodes. hadoop-hdfs-datanode-INBEDU011997A.log 243358 bytes Jul 25, 2013 2:29:15 AM hadoop-hdfs-datanode-myhost-1.log 1893692 bytes Jul 25, 2013 2:50:42 AM is it because i have given 3 directory under dfs-data.dir in hdfs-site.xml? From: Devaraj k devara...@huawei.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Thursday, 25 July 2013 12:41 PM Subject: RE: Multiple data node and namenode ? Hi Manish, Can you check how many data node processes are running really in the machine using the command 'jps' or 'ps'. Thanks Devaraj k -Original Message- From: Manish Bhoge [mailto:manishbh...@rocketmail.com] Sent: 25 July 2013 12:29 To: common-user@hadoop.apache.org Subject: Multiple data node and namenode ? I am configuring cluster and starting with first machine. So i have configured the core-site, hdfs-site and mapred-site to run hadoop only on 1 machine. But somehow i am getting 2 copies of data node, namenode and secondary namenode. Not able to figure out why ? SecurityAuth-hdfs.audit 0 bytes Jul 17, 2013 4:34:56 AM hadoop-hdfs-datanode-INBEDU011997A.log 243358 bytes Jul 25, 2013 2:29:15 AM hadoop-hdfs-datanode-myhost-1.log 1893692 bytes Jul 25, 2013 2:50:42 AM hadoop-hdfs-namenode-INBEDU011997A.log 19545397 bytes Jul 25, 2013 2:50:40 AM hadoop-hdfs-namenode-myhost-1.log 2438271 bytes Jul 25, 2013 2:48:57 AM hadoop-hdfs-secondarynamenode-INBEDU011997A.log 4061047 bytes Jul 25, 2013 2:30:10 AM hadoop-hdfs-secondarynamenode-myhost-1.log 9707957 bytes Jul 25, 2013 2:46:18 AM
Re: REg - Hive
Check when you configured Hive on your hadoop cluster which metastore you used? If you have MySQL as a metastore then you are done you can goto meta tables to pull attribute's information. Sent from my BlackBerry, pls excuse typo -Original Message- From: sudha sadhasivam sudhasadhasi...@yahoo.com Date: Fri, 19 Oct 2012 03:13:54 To: common-ser-groupcommon-user@hadoop.apache.org; hadoop core developercommon-...@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Cc: hadoop userscore-u...@hadoop.apache.org Subject: REg - Hive Sir In our system, we need hive query to retrieve the field name where a particular data item belongs For example, if we have a table having location information, when queried for New York, the query should retuen the field names where New York occurs like City, Administrative division etc. Kindly inform whether it is possible to retrieve meta-data information for a particular field value in Hive Thanking you G Sudha
Re: Hadoop Admin
Hadoop in Action book can give u a good hands on admin. It is freely available on net. Sent from my BlackBerry, pls excuse typo -Original Message- From: prabhu K prabhu.had...@gmail.com Date: Sun, 15 Jul 2012 17:48:03 To: common-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Hadoop Admin Hi Users, Can you please any one provide me in-depth Hadoop Administrator related web links and ppts. Thanks, Prabhu.
Re: How to load raw log file into HDFS?
You first need to copy data using copyFromLocal to your HDFS and then you can utilize PIG and Hive program for further analysis which run on map reduce. Yes warehouse directory is in HDFS. If you want to run(test) PIG in local then in that case you don't to copy data to HDFS Sent from my BlackBerry, pls excuse typo -Original Message- From: Michael Wang michael.w...@meredith.com Date: Mon, 14 May 2012 18:43:47 To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: RE: How to load raw log file into HDFS? I have the same question and I am glad to get you guys' help. I am also novice in Hadoop :) I am using pig and hive to analyze the logs. My logs are in LOCAL_FILE_PATH. Do I need to use hadoop fs -copyFromLocal to put files to HDFS_FILE_PATH first, and then load data files to pig or hive from HDFS_FILE_PATH? Or can just load logs from Local_file_path directly to pig or hive? After I load the files to hive, I found it is put at /user/hive/warehouse. Is /user/hive/warehouse a HDFS? How do I know what HDFS_FILE_PATH are available? -Original Message- From: Alexander Fahlke [mailto:alexander.fahlke.mailingli...@googlemail.com] Sent: Monday, May 14, 2012 1:53 AM To: common-user@hadoop.apache.org Subject: Re: How to load raw log file into HDFS? Hi, the best would be to read the documentation and some books to get familar with Hadoop. One of my favourite books is Hadoop in Action from Manning ( http://www.manning.com/lam/) This book has an exmple for putting (log)-files into HDFS. Check out the source listing-3-1 Later you can also check out Cloudera's Flume: https://github.com/cloudera/flume/wiki -- BR Alexander Fahlke Java Developer www.nurago.com | www.fahlke.org On Mon, May 14, 2012 at 7:24 AM, Amith D K amit...@huawei.com wrote: U can even use put/copyFromLocal both are similar and does the job via terminal. Or u can write a simple client program to do the job :) Amith From: samir das mohapatra [samir.help...@gmail.com] Sent: Sunday, May 13, 2012 9:13 PM To: common-user@hadoop.apache.org Subject: Re: How to load raw log file into HDFS? Hi To load any file from local Command: syntax: hadoop fs -copyFromLocal LOCAL_FILE_PATH HDFS_FILE_PATH Example hadoop fs -copyFromLocal input/logs hdfs://localhost/user/dataset/ More Commans: http://hadoop.apache.org/common/docs/r0.17.1/hdfs_shell.html On Sun, May 13, 2012 at 9:53 AM, AnExplorer satishtha...@gmail.com wrote: Hi, I am novice in Hadoop. Kindly suggest how do we load log files into hdfs. Please suggest the command and steps. Thanks in advance!! -- View this message in context: http://old.nabble.com/How-to-load-raw-log-file-into-HDFS--tp33815208p33815208.html Sent from the Hadoop core-user mailing list archive at Nabble.com. This electronic message, including any attachments, may contain proprietary, confidential or privileged information for the sole use of the intended recipient(s). You are hereby notified that any unauthorized disclosure, copying, distribution, or use of this message is prohibited. If you have received this message in error, please immediately notify the sender by reply e-mail and delete it.
Re: Hive Thrift help
Michael, Out of the box I am taking this problem as a metadata problem as we have also faced same kind of issue when connecting tableu with hive and problem has identified in metadata set up. If your metadata in default apache db i.e. Derby then jdbc connection doesn't work. As a work around we have stored hive metadata in mySQL and configured hive-site.xml accordingly and now we're able to establish jdbc connection. Thank you, Sent from my BlackBerry, pls excuse typo -Original Message- From: Michael Wang michael.w...@meredith.com Date: Mon, 16 Apr 2012 20:53:31 To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Hive Thrift help we need to connect to HIVE from Microstrategy reports, and it requires the Hive Thrift server. But I tried to start it, and it just hangs as below. # hive --service hiveserver Starting Hive Thrift Server Any ideas? Thanks, Michael This electronic message, including any attachments, may contain proprietary, confidential or privileged information for the sole use of the intended recipient(s). You are hereby notified that any unauthorized disclosure, copying, distribution, or use of this message is prohibited. If you have received this message in error, please immediately notify the sender by reply e-mail and delete it.
Re: Basic setup questions on Ubuntu
Prashant, Post your questions to cdh-u...@cloudera.org. Follow CDH3 installation guide. After installing package and individual components you need to configure all configuration files like core-site.xml, hdfs-site.xml etc. Thanks Manish Sent from my BlackBerry, pls excuse typo -Original Message- From: shan s mysub...@gmail.com Date: Mon, 16 Apr 2012 02:49:51 To: common-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Basic setup questions on Ubuntu I am a newbie to Unix/Hadoop and have basic questions about CDH3 setup. I installed CDH3 on Ubuntu 11.0 Unix box. I want to setup a sudo cluster where I can run my pig jobs under mapreduce mode. How do I achieve that? 1. I couldd not find the core-site.xml. hdfs-site.xml and mapred-site.xml files with all default parameters set? Where are these located. (I see the files under example-conf. dir, but I guess they are example files) 2. I see several config files under /usr/lib/hadoop/conf. But all of them are empty files, with the comments that these can be used to override the configuration, but these are read-only files. What is the intention of these files being read-only. Many Thanks, Prashant
Job tracker service start issue.
I have Hadoop running on Standalone box. When I am starting deamon for namenode, secondarynamenode, job tracker, task tracker and data node, it is starting gracefully. But soon after it start job tracker it doesn't show up job tracker service. when i run 'jps' it is showing me all the services including task tracker except Job Tracker. Is there any time limit that need to set up or is it going into the safe mode. Because when i saw job tracker log this what it is showing, looks like it is starting the namenode but soon after it shutdown: 2012-03-22 23:26:04,061 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: / STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = manish/10.131.18.119 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u3 STARTUP_MSG: build = file:///data/1/tmp/nightly_2012-02-16_09-46-24_3/hadoop-0.20-0.20.2+923.195-1~maverick -r 217a3767c48ad11d4632e19a22897677268c40c4; compiled by 'root' on Thu Feb 16 10:22:53 PST 2012 / 2012-03-22 23:26:04,140 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2012-03-22 23:26:04,141 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2012-03-22 23:26:04,141 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2012-03-22 23:26:04,142 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) 2012-03-22 23:26:04,143 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2012-03-22 23:26:04,186 INFO org.apache.hadoop.mapred.JobTracker: Starting jobtracker with owner as mapred 2012-03-22 23:26:04,201 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 54311 2012-03-22 23:26:04,203 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=JobTracker, port=54311 2012-03-22 23:26:04,206 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=JobTracker, port=54311 2012-03-22 23:26:09,250 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-03-22 23:26:09,298 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50030 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50030 webServer.getConnectors()[0].getLocalPort() returned 50030 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50030 2012-03-22 23:26:09,319 INFO org.mortbay.log: jetty-6.1.26.cloudera.1 2012-03-22 23:26:09,517 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50030 2012-03-22 23:26:09,519 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 54311 2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030 2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://localhost:54310/app/hadoop/tmp/mapred/system) because of permissions. 2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned by the user 'mapred (auth:SIMPLE)' 2012-03-22 23:26:09,650 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ... org.apache.hadoop.security.AccessControlException: The systemdir hdfs://localhost:54310/app/hadoop/tmp/mapred/system is not owned by mapred at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2241) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2050) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:288) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4792) 2012-03-22 23:26:09,652 FATAL org.apache.hadoop.mapred.JobTracker: org.apache.hadoop.security.AccessControlException: The systemdir hdfs://localhost:54310/app/hadoop/tmp/mapred/system is not owned by mapred at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2241) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2050) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:296) at
Issue when starting services on CDH3
I have CDH3 installed in standalone mode. I have install all hadoop components. Now when I start services (namenode,secondary namenode,job tracker,task tracker) I can start gracefully from /usr/lib/hadoop/ ./bin/start-all.sh. But when start the same servises from /etc/init.d/hadoop-0.20-* then I unable to start. Why? Now I want to start Hue also which is in init.d that also I couldn't start. Here I suspect authentication issue. Because all the services in init.d are under root user and root group. Please suggest I am stuck here. I tried hive and it seems it running fine. Thanks Manish. Sent from my BlackBerry, pls excuse typo
Re: Issue when starting services on CDH3
Manu, None of the services getting up including namenode, second namenode, tasktracker, jobtracker Sent from my BlackBerry, pls excuse typo -Original Message- From: Manu S manupk...@gmail.com Date: Thu, 15 Mar 2012 21:31:34 To: common-user@hadoop.apache.org; manishbh...@rocketmail.com Subject: Re: Issue when starting services on CDH3 Dear manish Which daemons are not starting? On Mar 15, 2012 9:21 PM, Manish Bhoge manishbh...@rocketmail.com wrote: I have CDH3 installed in standalone mode. I have install all hadoop components. Now when I start services (namenode,secondary namenode,job tracker,task tracker) I can start gracefully from /usr/lib/hadoop/ ./bin/start-all.sh. But when start the same servises from /etc/init.d/hadoop-0.20-* then I unable to start. Why? Now I want to start Hue also which is in init.d that also I couldn't start. Here I suspect authentication issue. Because all the services in init.d are under root user and root group. Please suggest I am stuck here. I tried hive and it seems it running fine. Thanks Manish. Sent from my BlackBerry, pls excuse typo