Re: Hbase with Hadoop
Jignesh, I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. Below are the steps I followed: 1. Make sure none of hbasemaster, regionservers or zookeeper are running. As Matt pointed out, turn on append. 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver 5. hbase --config $HBASE_CONF_DIR shell Hope it helps. Ramya On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel jign...@websoft.com wrote: Is there a way to resolve this weird problem. bin/hbase-start.sh is supposed to start zookeeper but it doesn't start. But on the other side if zookeeper up and running then it says Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum On Oct 13, 2011, at 5:40 PM, Jignesh Patel wrote: Ok now the problem is if I only use bin/hbase-start.sh then it doesn't start zookeeper. But if I use bin/hbase-daemon.sh start zookeeper before starting bin/hbase-start.sh then it will try to start zookeeper at port 2181 and then I have following error. Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum So I am wondering if bin/hbase-start.sh is trying to start zookeeper then while zookeeper is not running it should start the zookeeper. I only get the error if zookeeper already running. -Jignesh On Oct 13, 2011, at 4:53 PM, Ramya Sunil wrote: You already have zookeeper running on 2181 according to your jps output. That is the reason, master seems to be complaining. Can you please stop zookeeper, verify that no daemons are running on 2181 and restart your master? On Thu, Oct 13, 2011 at 12:37 PM, Jignesh Patel jign...@websoft.com wrote: Ramya, Based on Hbase the definite guide it seems zookeeper being started by hbase no need to start it separately(may be this is changed for 0.90.4. Anyways now following is the updated status. Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/start-hbase.sh starting master, logging to /users/hadoop-user/hadoop-hbase/logs/hbase-hadoop-user-master-Jignesh-MacBookPro.local.out Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum Jignesh-MacBookPro:hadoop-hbase hadoop-user$ jps 41486 HQuorumPeer 38814 SecondaryNameNode 41578 Jps 38878 JobTracker 38726 DataNode 38639 NameNode 38964 TaskTracker On Oct 13, 2011, at 3:23 PM, Ramya Sunil wrote: Jignesh, I dont see zookeeper running on your master. My cluster reads the following: $ jps 15315 Jps 13590 HMaster 15235 HQuorumPeer Can you please shutdown your Hmaster and run the following first: $ hbase-daemon.sh start zookeeper And then start your hbasemaster and regionservers? Thanks Ramya On Thu, Oct 13, 2011 at 12:01 PM, Jignesh Patel jign...@websoft.com wrote: ok --config worked but it is showing me same error. How to resolve this. http://pastebin.com/UyRBA7vX On Oct 13, 2011, at 1:34 PM, Ramya Sunil wrote: Hi Jignesh, --config (i.e. - - config) is the option to use and not -config. Alternatively you can also set HBASE_CONF_DIR. Below is the exact command line: $ hbase --config /home/ramya/hbase/conf shell hbase(main):001:0 create 'newtable','family' 0 row(s) in 0.5140 seconds hbase(main):002:0 list 'newtable' TABLE newtable 1 row(s) in 0.0120 seconds OR $ export HBASE_CONF_DIR=/home/ramya/hbase/conf $ hbase shell hbase(main):001:0 list 'newtable' TABLE newtable 1 row(s) in 0.3860 seconds Thanks Ramya On Thu, Oct 13, 2011 at 8:30 AM, jigneshmpatel jigneshmpa...@gmail.com wrote: There is no command like -config see below Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/hbase -config ./config shell Unrecognized option: -config Could not create the Java virtual machine. -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3418924.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
FUSE CRASHING
Hi, I am trying to run FUSE and, its crashing randomly in the middle with the following error: fuse_dfs: tpp.c:66: __pthread_tpp_change_priority: Assertion `previous_prio == -1 || (previous_prio = __sched_fifo_min_prio previous_prio = __sched_fifo_max_prio)' failed. Does anyone know the possible reason for such an error? And is it a known bug in FUSE? The FUSE version I am using is 2.8.5. Kindly help. Thanks, Deepti
Re: Hbase with Hadoop
Ramya, I have followed the steps you mention but in this steps I don't see you starting hbase. I have followed step 1,2 and 3. Here is how my hdfs-site.xml looks. configuration property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namedfs.support.append/name valuetrue/value /property property namedfs.datanode.max.xcievers/name value4096/value /property /configuration For the step 4 I got following message which is ok as I am running in pseudo mode. starting regionserver, logging to /Users/hadoop-user/hadoop-hbase/bin/../logs/hbase-hadoop-user-regionserver-Jignesh-MacBookPro.local.out 11/10/14 10:25:55 WARN regionserver.HRegionServerCommandLine: Not starting a distinct region server because hbase.cluster.distributed is false then when I have tried to start base - bin/start-hbase.sh --config ./config I have same old error. Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum -Jignesh On Oct 14, 2011, at 2:31 AM, Ramya Sunil wrote: Jignesh, I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. Below are the steps I followed: 1. Make sure none of hbasemaster, regionservers or zookeeper are running. As Matt pointed out, turn on append. 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver 5. hbase --config $HBASE_CONF_DIR shell Hope it helps. Ramya On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel jign...@websoft.com wrote: Is there a way to resolve this weird problem. bin/hbase-start.sh is supposed to start zookeeper but it doesn't start. But on the other side if zookeeper up and running then it says Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum On Oct 13, 2011, at 5:40 PM, Jignesh Patel wrote: Ok now the problem is if I only use bin/hbase-start.sh then it doesn't start zookeeper. But if I use bin/hbase-daemon.sh start zookeeper before starting bin/hbase-start.sh then it will try to start zookeeper at port 2181 and then I have following error. Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum So I am wondering if bin/hbase-start.sh is trying to start zookeeper then while zookeeper is not running it should start the zookeeper. I only get the error if zookeeper already running. -Jignesh On Oct 13, 2011, at 4:53 PM, Ramya Sunil wrote: You already have zookeeper running on 2181 according to your jps output. That is the reason, master seems to be complaining. Can you please stop zookeeper, verify that no daemons are running on 2181 and restart your master? On Thu, Oct 13, 2011 at 12:37 PM, Jignesh Patel jign...@websoft.com wrote: Ramya, Based on Hbase the definite guide it seems zookeeper being started by hbase no need to start it separately(may be this is changed for 0.90.4. Anyways now following is the updated status. Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/start-hbase.sh starting master, logging to /users/hadoop-user/hadoop-hbase/logs/hbase-hadoop-user-master-Jignesh-MacBookPro.local.out Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum Jignesh-MacBookPro:hadoop-hbase hadoop-user$ jps 41486 HQuorumPeer 38814 SecondaryNameNode 41578 Jps 38878 JobTracker 38726 DataNode 38639 NameNode 38964 TaskTracker On Oct 13, 2011, at 3:23 PM, Ramya Sunil wrote: Jignesh, I dont see zookeeper running on your master. My cluster reads the following: $ jps 15315 Jps 13590 HMaster 15235 HQuorumPeer Can you please shutdown your Hmaster and run the following first: $ hbase-daemon.sh start zookeeper And then start your hbasemaster and regionservers? Thanks Ramya On Thu, Oct 13, 2011 at 12:01 PM, Jignesh Patel jign...@websoft.com wrote: ok --config worked but it is showing me same error. How to resolve this. http://pastebin.com/UyRBA7vX On Oct 13, 2011, at 1:34 PM, Ramya Sunil wrote: Hi Jignesh, --config (i.e. - - config) is the option to use and not -config. Alternatively you can also set HBASE_CONF_DIR. Below is the exact command line: $ hbase --config /home/ramya/hbase/conf shell hbase(main):001:0 create 'newtable','family' 0 row(s)
Re: FUSE CRASHING
Hi Deepti, That appears to crash deep in pthread, which would scare me a bit. Are you using a strange/non-standard platform? What Java version? What HDFS version? Brian On Oct 14, 2011, at 3:59 AM, Banka, Deepti wrote: Hi, I am trying to run FUSE and, its crashing randomly in the middle with the following error: fuse_dfs: tpp.c:66: __pthread_tpp_change_priority: Assertion `previous_prio == -1 || (previous_prio = __sched_fifo_min_prio previous_prio = __sched_fifo_max_prio)' failed. Does anyone know the possible reason for such an error? And is it a known bug in FUSE? The FUSE version I am using is 2.8.5. Kindly help. Thanks, Deepti smime.p7s Description: S/MIME cryptographic signature
Re: wordcount example throwing null pointer with ConcurrentHashMap
ConcurrentHashMap does not accept null keys, so get() must have been called with null. Looking briefly, it seems that a map completion event contained a tracker http address without a hostname? That might be enough to help you debug it in your setup; I don't know. S. On 13 October 2011 22:10, Santosh Belda santosh.be...@broadridge.comwrote: Hi, I have setup the hadoop on single node and worked fine but when executing the wordcount example, following error is thornw, Is this any configuration issue? bin/hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount /user/hduser/testfiles /user/hduser/output 11/10/14 10:29:53 INFO input.FileInputFormat: Total input paths to process : 3 11/10/14 10:29:53 WARN snappy.LoadSnappy: Snappy native library is available 11/10/14 10:29:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library 11/10/14 10:29:53 INFO snappy.LoadSnappy: Snappy native library loaded 11/10/14 10:29:53 INFO mapred.JobClient: Running job: job_201110141028_0001 11/10/14 10:29:54 INFO mapred.JobClient: map 0% reduce 0% 11/10/14 10:29:59 INFO mapred.JobClient: map 66% reduce 0% 11/10/14 10:30:01 INFO mapred.JobClient: Task Id : attempt_201110141028_0001_r_00_0, Status : FAILED Error: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2824) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2744) 11/10/14 10:30:02 INFO mapred.JobClient: map 100% reduce 0% 11/10/14 10:30:03 INFO mapred.JobClient: Task Id : attempt_201110141028_0001_r_00_1, Status : FAILED Error: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2824) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2744) 11/10/14 10:30:05 INFO mapred.JobClient: Task Id : attempt_201110141028_0001_r_00_2, Status : FAILED Error: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2824) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2744) 11/10/14 10:30:08 INFO mapred.JobClient: Job complete: job_201110141028_0001 11/10/14 10:30:08 INFO mapred.JobClient: Counters: 18 11/10/14 10:30:08 INFO mapred.JobClient: Job Counters 11/10/14 10:30:08 INFO mapred.JobClient: Launched reduce tasks=4 11/10/14 10:30:08 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9167 11/10/14 10:30:08 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/10/14 10:30:08 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/10/14 10:30:08 INFO mapred.JobClient: Launched map tasks=3 11/10/14 10:30:08 INFO mapred.JobClient: Data-local map tasks=3 11/10/14 10:30:08 INFO mapred.JobClient: Failed reduce tasks=1 11/10/14 10:30:08 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3292 11/10/14 10:30:08 INFO mapred.JobClient: FileSystemCounters 11/10/14 10:30:08 INFO mapred.JobClient: FILE_BYTES_READ=740427 11/10/14 10:30:08 INFO mapred.JobClient: HDFS_BYTES_READ=2863597 11/10/14 10:30:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2161157 11/10/14 10:30:08 INFO mapred.JobClient: Map-Reduce Framework 11/10/14 10:30:08 INFO mapred.JobClient: Combine output records=87431 11/10/14 10:30:08 INFO mapred.JobClient: Map input records=58570 11/10/14 10:30:08 INFO mapred.JobClient: Spilled Records=138742 11/10/14 10:30:08 INFO mapred.JobClient: Map output bytes=4774081 11/10/14 10:30:08 INFO mapred.JobClient: Combine input records=487561 11/10/14 10:30:08 INFO mapred.JobClient: Map output records=487561 11/10/14 10:30:08 INFO mapred.JobClient: SPLIT_RAW_BYTES=361 -- View this message in context: http://old.nabble.com/wordcount-example-throwing-null-pointer-with-ConcurrentHashMap-tp32650178p32650178.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Is Hadoop the right platform for my HPC application?
On 12 September 2011 14:23, Alberto Andreotti albertoandreo...@gmail.comwrote: Hi Parker, I'm also interested in exploring hadoop capabilities for HPC, I've been doing some experiments with heat transfer problems. Which workloads are you trying? My limited understanding suggests you might also look at Pregel or Giraph for heat transfer problems? S.
Re: How to evenly split data file
I can't answer your question fully without sitting in front of it using a debugger, but the principle is this: Hadoop's JobClient splits the data into approximately evenly sized blocks in bytes. The trick now is to synchronize those blocks on record boundaries. All file formats behave approximately the same way, and as I understand it, the rough algorithm is this: Each file must have synchronization points (a term I now define, not common terminology in Hadoop). A synchronization point in a text file is a newline. A synchronization point in an RCFile or SequenceFile is a block header, which is recognized by a randomly selected sequence of 16(?) bytes. Offset 0 is also a synchronization point. Some file formats cannot detect sync points using the data and must rely on external indexes (e.g. LZO), in which case a sync point is a reinitialization point for the compression algorithm (which is always a block compressor), looked up in that external index. A task is given (start, end) as bytes offsets. It finds the first sync point at or after 'start'. It then reads records in any block starting after 'start', but not starting after 'end'. In the case of a text file, block = line = record. In other files, the concepts are distinct. If a task is given a short block (x, x+1) then it will find the first sync point after x, which will also be after x+1, so it will read no records. Thus no records are read twice, and blocks must be large enough to give each task at least some records between sync points in its block. If your file format syncs every 64Mb, and your record is 10 bytes, and you give out 1Mb splits in the hope of getting 100K records per mapper, you will get 1 in 64 mappers does 6 million records, and 63 in 64 mappers do nothing. Thus each task processes a roughly equal number of bytes, but not an equal number of records. I'm afraid I can't help more, but these are the principles you are looking for. By the way, does anyone know how or whether SequenceFile avoids a heavy HDFS hit on the first block of the file, where it looks up the magic byte signature for that file? I'm too lazy to look this morning, but the thought occurs to me. S. On 5 October 2011 22:35, Thomas Anderson t.dt.aander...@gmail.com wrote: I don't use mapreduce, and just practice using Hadoop common api to manually split a data file, in which data is stored in a form of SequceFileInputFormat. The way to split file is by dividing file length by total tasks number. InputSplit created will be passed RecordReader and read from designated path. The code is as below: private void readPartOfDataFile() { taskId = getTaskId(); InputSplit split = getSplit(taskid); SequenceFileRecordReaderText, CustomData input = new SequenceFileRecordReaderText, CustomData(conf, (FileSplit) split); Text url = input.createKey(); CustomData d = input.createValue(); int count = 0; while(input.next(url, d)) { count++; } } private InputSplit getSplit(final int taskid) throws IOException { FileSystem fs = FileSystem.get(conf); Path filePath = new Path(path/to/, file); FileStatus[] status = fs.listStatus(filePath); int maxTasks = conf.getInt(test.maxtasks, 12); for(FileStatus file: status) { if(file.isDir()) { // get data file Path dataFile = new Path(file.getPath(), data); FileStatus data = fs.getFileStatus(dataFile); long dataLength = data.getLen(); BlockLocation[] locations = fs.getFileBlockLocations(data, 0, dataLength); if(0 dataLength) { long chunk = dataLength/(long)maxTasks; long beg = (taskid*chunk)+(long)1; long end = (taskid+1)*chunk; if(maxTasks == (taskid+1)) { end = dataLength; } return new FileSplit(dataFile, beg, end, locations[locations.length-1].getHosts()); } else { LOG.info(No Data for file:+file.getPath()); } }// is dir }// for return null; } However, it seems that the records read from data file is not equally distributed. For instance, data file may contain 1200 records and data length is around 74250. With 12 max tasks, each task may roughly hold size around 6187 (per split). But the records displayed shows that each task may hold various records (e.g. task 4 read records 526. task 5 read 632. task 6 read 600) and the total count records is larger than the total records stored. I check JobClient.writeOldSplits(). It seems similar to the way to JobClient divides data. What is missing when considering split data with hadoop common api?
Re: Hbase with Hadoop
On Oct 14, 2011, at 2:44 PM, Jignesh Patel wrote: According to start-hase.sh if distributed mode=flase then I am supposed to start only masters it doesn't required to start zookeeper, see the script below from the file. if [ $distMode == 'false' ] then $bin/hbase-daemon.sh start master else $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} start zookeeper $bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_REGIONSERVERS} start regionserver $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} start master-backup fi According to above script the zookeeper is not required to start as I am not running server in distributed mode but in pseudo mode. But then it is giving error for zookeeper is not able to connect. -Jignesh is supposed to start zookeeper and master as per the On Fri, Oct 14, 2011 at 2:31 AM, Ramya Sunil [via Lucene] ml-node+s472066n342086...@n3.nabble.com wrote: Jignesh, I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. Below are the steps I followed: 1. Make sure none of hbasemaster, regionservers or zookeeper are running. As Matt pointed out, turn on append. 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver 5. hbase --config $HBASE_CONF_DIR shell Hope it helps. Ramya On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel [hidden email] wrote: Is there a way to resolve this weird problem. bin/hbase-start.sh is supposed to start zookeeper but it doesn't start. But on the other side if zookeeper up and running then it says Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum On Oct 13, 2011, at 5:40 PM, Jignesh Patel wrote: Ok now the problem is if I only use bin/hbase-start.sh then it doesn't start zookeeper. But if I use bin/hbase-daemon.sh start zookeeper before starting bin/hbase-start.sh then it will try to start zookeeper at port 2181 and then I have following error. Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum So I am wondering if bin/hbase-start.sh is trying to start zookeeper then while zookeeper is not running it should start the zookeeper. I only get the error if zookeeper already running. -Jignesh On Oct 13, 2011, at 4:53 PM, Ramya Sunil wrote: You already have zookeeper running on 2181 according to your jps output. That is the reason, master seems to be complaining. Can you please stop zookeeper, verify that no daemons are running on 2181 and restart your master? On Thu, Oct 13, 2011 at 12:37 PM, Jignesh Patel [hidden email] wrote: Ramya, Based on Hbase the definite guide it seems zookeeper being started by hbase no need to start it separately(may be this is changed for 0.90.4. Anyways now following is the updated status. Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/start-hbase.sh starting master, logging to /users/hadoop-user/hadoop-hbase/logs/hbase-hadoop-user-master-Jignesh-MacBookPro.local.out Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum Jignesh-MacBookPro:hadoop-hbase hadoop-user$ jps 41486 HQuorumPeer 38814 SecondaryNameNode 41578 Jps 38878 JobTracker 38726 DataNode 38639 NameNode 38964 TaskTracker On Oct 13, 2011, at 3:23 PM, Ramya Sunil wrote: Jignesh, I dont see zookeeper running on your master. My cluster reads the following: $ jps 15315 Jps 13590 HMaster 15235 HQuorumPeer Can you please shutdown your Hmaster and run the following first: $ hbase-daemon.sh start zookeeper And then start your hbasemaster and regionservers? Thanks Ramya On Thu, Oct 13, 2011 at 12:01 PM, Jignesh Patel [hidden email] wrote: ok --config worked but it is showing me same error. How to resolve this. http://pastebin.com/UyRBA7vX On Oct 13, 2011, at 1:34 PM, Ramya Sunil wrote: Hi Jignesh, --config (i.e. - - config) is the option to use and not -config. Alternatively you can also set HBASE_CONF_DIR. Below is the exact command line: $ hbase --config /home/ramya/hbase/conf shell hbase(main):001:0 create 'newtable','family' 0 row(s) in 0.5140 seconds hbase(main):002:0 list 'newtable' TABLE newtable 1 row(s) in 0.0120 seconds OR $
controlling base memory
How to control Hbase memory? As soon as I use command bin/start-hbase.sh my 8 GB ram machine runs out to memory. -Jignesh
Re: Hbase with Hadoop
Can somebody help me to work Hadoop 0.20.205.0 and Hbase 0.90.4 in pseudo mode. This is third day in a row and I am not able to make it run. The details are as follows http://pastebin.com/KrJePt64 If this is not going to work then let me know which version I should use to get it run. On Oct 14, 2011, at 2:46 PM, Jignesh Patel wrote: On Oct 14, 2011, at 2:44 PM, Jignesh Patel wrote: According to start-hase.sh if distributed mode=flase then I am supposed to start only masters it doesn't required to start zookeeper, see the script below from the file. if [ $distMode == 'false' ] then $bin/hbase-daemon.sh start master else $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} start zookeeper $bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_REGIONSERVERS} start regionserver $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} start master-backup fi According to above script the zookeeper is not required to start as I am not running server in distributed mode but in pseudo mode. But then it is giving error for zookeeper is not able to connect. -Jignesh is supposed to start zookeeper and master as per the On Fri, Oct 14, 2011 at 2:31 AM, Ramya Sunil [via Lucene] ml-node+s472066n342086...@n3.nabble.com wrote: Jignesh, I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. Below are the steps I followed: 1. Make sure none of hbasemaster, regionservers or zookeeper are running. As Matt pointed out, turn on append. 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver 5. hbase --config $HBASE_CONF_DIR shell Hope it helps. Ramya On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel [hidden email] wrote: Is there a way to resolve this weird problem. bin/hbase-start.sh is supposed to start zookeeper but it doesn't start. But on the other side if zookeeper up and running then it says Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum On Oct 13, 2011, at 5:40 PM, Jignesh Patel wrote: Ok now the problem is if I only use bin/hbase-start.sh then it doesn't start zookeeper. But if I use bin/hbase-daemon.sh start zookeeper before starting bin/hbase-start.sh then it will try to start zookeeper at port 2181 and then I have following error. Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum So I am wondering if bin/hbase-start.sh is trying to start zookeeper then while zookeeper is not running it should start the zookeeper. I only get the error if zookeeper already running. -Jignesh On Oct 13, 2011, at 4:53 PM, Ramya Sunil wrote: You already have zookeeper running on 2181 according to your jps output. That is the reason, master seems to be complaining. Can you please stop zookeeper, verify that no daemons are running on 2181 and restart your master? On Thu, Oct 13, 2011 at 12:37 PM, Jignesh Patel [hidden email] wrote: Ramya, Based on Hbase the definite guide it seems zookeeper being started by hbase no need to start it separately(may be this is changed for 0.90.4. Anyways now following is the updated status. Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/start-hbase.sh starting master, logging to /users/hadoop-user/hadoop-hbase/logs/hbase-hadoop-user-master-Jignesh-MacBookPro.local.out Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. Why? Because clients (eg shell) wont be able to find this ZK quorum Jignesh-MacBookPro:hadoop-hbase hadoop-user$ jps 41486 HQuorumPeer 38814 SecondaryNameNode 41578 Jps 38878 JobTracker 38726 DataNode 38639 NameNode 38964 TaskTracker On Oct 13, 2011, at 3:23 PM, Ramya Sunil wrote: Jignesh, I dont see zookeeper running on your master. My cluster reads the following: $ jps 15315 Jps 13590 HMaster 15235 HQuorumPeer Can you please shutdown your Hmaster and run the following first: $ hbase-daemon.sh start zookeeper And then start your hbasemaster and regionservers? Thanks Ramya On Thu, Oct 13, 2011 at 12:01 PM, Jignesh Patel [hidden email] wrote: ok --config worked but it is showing me same error. How to resolve this. http://pastebin.com/UyRBA7vX On Oct 13, 2011, at 1:34 PM, Ramya Sunil wrote: Hi Jignesh, --config (i.e. - - config) is the option to
Re: Hbase with Hadoop
On Wed, Oct 12, 2011 at 9:31 AM, Vinod Gupta Tankala tvi...@readypulse.com wrote: its free and open source too.. basically, their releases are ahead of public releases of hadoop/hbase - from what i understand, major bug fixes and enhancements are checked in to their branch first and then eventually make it to public release branches. You've got it a bit backwards - except for very rare exceptions, we check our fixes into the public ASF codebase before we commit anything to CDH releases. Sometimes, it will show up in a CDH release before an ASF release, but the changes are always done as backports from ASF'[s subversion. You can see the list of public JIRAs referenced in our changelists here: http://archive.cloudera.com/cdh/3/hadoop-0.20.2+923.97.CHANGES.txt Apologies for the vendor-specific comment: I just wanted to clarify that Cloudera's aim is to contribute to the community and not any kind of fork as suggested above. Back to work on 0.23 for me! -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Hbase with Hadoop
At last I move one step further. It was a problem with the hadoop jar file. I need to replace hadoop-core-xx.jar in base/lib with hadoop/lib. After replacing it I got following error: 2011-10-14 17:09:12,409 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:37) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.clinit(DefaultMetricsSystem.java:34) at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216) at org.apache.hadoop.security.KerberosName.clinit(KerberosName.java:83) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:189) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395) at org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1436) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1337) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:364) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:81) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:193) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 22 more On Oct 14, 2011, at 3:35 PM, Jignesh Patel wrote: Can somebody help me to work Hadoop 0.20.205.0 and Hbase 0.90.4 in pseudo mode. This is third day in a row and I am not able to make it run. The details are as follows http://pastebin.com/KrJePt64 If this is not going to work then let me know which version I should use to get it run. On Oct 14, 2011, at 2:46 PM, Jignesh Patel wrote: On Oct 14, 2011, at 2:44 PM, Jignesh Patel wrote: According to start-hase.sh if distributed mode=flase then I am supposed to start only masters it doesn't required to start zookeeper, see the script below from the file. if [ $distMode == 'false' ] then $bin/hbase-daemon.sh start master else $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} start zookeeper $bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_REGIONSERVERS} start regionserver $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} start master-backup fi According to above script the zookeeper is not required to start as I am not running server in distributed mode but in pseudo mode. But then it is giving error for zookeeper is not able to connect. -Jignesh is supposed to start zookeeper and master as per the On Fri, Oct 14, 2011 at 2:31 AM, Ramya Sunil [via Lucene] ml-node+s472066n342086...@n3.nabble.com wrote: Jignesh, I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. Below are the steps I followed: 1. Make sure none of hbasemaster, regionservers or zookeeper are running. As Matt pointed out, turn on append. 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver 5. hbase --config $HBASE_CONF_DIR shell Hope it helps. Ramya On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel [hidden email] wrote: Is there a way to resolve this weird problem.
Re: Hbase with Hadoop
Cool Everything is good now after copying commons-configuration.jar file. No need to start zookeeper or master. Only start hbase-start.sh and everything works. I see my status changed, On Oct 14, 2011, at 5:16 PM, Jignesh Patel wrote: undError: org/apache/commons/configuration/Configuration
mapreduce linear chaining: ClassCastException
Hi all, I am trying a simple extension of WordCount example in Hadoop. I want to get a frequency of wordcounts in descending order. To that I employ a linear chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the usual example). For the next MR job = I set the mapper to swap the word, count to count, word. Then, have the Identity reducer to simply store the results. My MR-1 does its job correctly and store the result in a temp path. Question 1: The mapper of the second MR job (MR-2) doesn't like the input format. I have properly set the input format for MapClass2 of what it expects and what its output must be. It seems to expecting a LongWritable. I suspect that it is trying to look at some index file. I am not sure. It throws an error like this: code java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text /code Some Info: - I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it for now. - I use hadoop-0.20.2 For MR-1: - conf1.setOutputKeyClass(Text.class); - conf1.setOutputValueClass(IntWritable.class); For MR-2 - takes in a Text (word) and IntWritable (sum) - conf2.setOutputKeyClass(IntWritable.class); - conf2.setOutputValueClass(Text.class); code public class MapClass2 extends MapReduceBase implements MapperText, IntWritable, IntWritable, Text { @Override public void map(Text word, IntWritable sum, OutputCollectorIntWritable, Text output, Reporter reporter) throws IOException { output.collect(sum, word); // sum, word } } /code Any suggestions would be helpful. Is my MapClass2 code right in the first place...for swapping? Or should I assume that mapper reads line by line, so, must read in one line, then, use StrTokenizer to split them up and convert the second token (sum) from str to Int?? Or should I mess around with OutputKeyComparator class? Thanks, PD
Re: controlling base memory
Jignesh, Please use the HBase user list (u...@hbase.apache.org) for all your HBase questions. This list is for Hadoop Common. On Sat, Oct 15, 2011 at 12:26 AM, Jignesh Patel jign...@websoft.com wrote: How to control Hbase memory? As soon as I use command bin/start-hbase.sh my 8 GB ram machine runs out to memory. -Jignesh -- Harsh J