Re: Hbase with Hadoop

2011-10-14 Thread Ramya Sunil
Jignesh,

I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205.
Below are the steps I followed:

1. Make sure none of hbasemaster, regionservers or zookeeper are running. As
Matt pointed out, turn on append.
2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper
3. hbase-daemon.sh --config $HBASE_CONF_DIR start master
4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver
5. hbase --config $HBASE_CONF_DIR shell


Hope it helps.
Ramya



On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel jign...@websoft.com wrote:

 Is there a way to resolve this weird problem.

  bin/hbase-start.sh is supposed to start zookeeper but it doesn't start.
 But on the other side if zookeeper up and running then it says

  Couldnt start ZK at requested address of 2181, instead got: 2182.
 Aborting. Why? Because clients (eg shell) wont be able to find this ZK
 quorum



 On Oct 13, 2011, at 5:40 PM, Jignesh Patel wrote:

  Ok now the problem is
 
  if I only use bin/hbase-start.sh then it doesn't start zookeeper.
 
  But if I use bin/hbase-daemon.sh start zookeeper before starting
 bin/hbase-start.sh then it will try to start zookeeper at port 2181 and then
 I have following error.
 
  Couldnt start ZK at requested address of 2181, instead got: 2182.
 Aborting. Why? Because clients (eg shell) wont be able to find this ZK
 quorum
 
 
  So I am wondering if bin/hbase-start.sh is trying to start zookeeper then
 while zookeeper is not running it should start the zookeeper. I only get the
 error if zookeeper already running.
 
 
  -Jignesh
 
 
  On Oct 13, 2011, at 4:53 PM, Ramya Sunil wrote:
 
  You already have zookeeper running on 2181 according to your jps output.
  That is the reason, master seems to be complaining.
  Can you please stop zookeeper, verify that no daemons are running on
 2181
  and restart your master?
 
  On Thu, Oct 13, 2011 at 12:37 PM, Jignesh Patel jign...@websoft.com
 wrote:
 
  Ramya,
 
 
  Based on Hbase the definite guide it seems zookeeper being started by
  hbase no need to start it separately(may be this is changed for 0.90.4.
  Anyways now  following is the updated status.
 
  Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/start-hbase.sh
  starting master, logging to
 
 /users/hadoop-user/hadoop-hbase/logs/hbase-hadoop-user-master-Jignesh-MacBookPro.local.out
  Couldnt start ZK at requested address of 2181, instead got: 2182.
 Aborting.
  Why? Because clients (eg shell) wont be able to find this ZK quorum
  Jignesh-MacBookPro:hadoop-hbase hadoop-user$ jps
  41486 HQuorumPeer
  38814 SecondaryNameNode
  41578 Jps
  38878 JobTracker
  38726 DataNode
  38639 NameNode
  38964 TaskTracker
 
  On Oct 13, 2011, at 3:23 PM, Ramya Sunil wrote:
 
  Jignesh,
 
  I dont see zookeeper running on your master. My cluster reads the
  following:
 
  $ jps
  15315 Jps
  13590 HMaster
  15235 HQuorumPeer
 
  Can you please shutdown your Hmaster and run the following first:
  $ hbase-daemon.sh start zookeeper
 
  And then start your hbasemaster and regionservers?
 
  Thanks
  Ramya
 
  On Thu, Oct 13, 2011 at 12:01 PM, Jignesh Patel jign...@websoft.com
  wrote:
 
  ok --config worked but it is showing me same error. How to resolve
 this.
 
  http://pastebin.com/UyRBA7vX
 
  On Oct 13, 2011, at 1:34 PM, Ramya Sunil wrote:
 
  Hi Jignesh,
 
  --config (i.e. - - config) is the option to use and not -config.
  Alternatively you can also set HBASE_CONF_DIR.
 
  Below is the exact command line:
 
  $ hbase --config /home/ramya/hbase/conf shell
  hbase(main):001:0 create 'newtable','family'
  0 row(s) in 0.5140 seconds
 
  hbase(main):002:0 list 'newtable'
  TABLE
  newtable
  1 row(s) in 0.0120 seconds
 
  OR
 
  $ export HBASE_CONF_DIR=/home/ramya/hbase/conf
  $ hbase shell
 
  hbase(main):001:0 list 'newtable'
  TABLE
 
  newtable
 
  1 row(s) in 0.3860 seconds
 
 
  Thanks
  Ramya
 
 
  On Thu, Oct 13, 2011 at 8:30 AM, jigneshmpatel 
  jigneshmpa...@gmail.com
  wrote:
 
  There is no command like -config see below
 
  Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/hbase -config
  ./config
  shell
  Unrecognized option: -config
  Could not create the Java virtual machine.
 
  --
  View this message in context:
 
 
 
 http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3418924.html
  Sent from the Hadoop lucene-users mailing list archive at
 Nabble.com.
 
 
 
 
 
 




FUSE CRASHING

2011-10-14 Thread Banka, Deepti
Hi,

I am trying to run FUSE and, its crashing randomly in the middle with
the following error:

 

fuse_dfs:  tpp.c:66: __pthread_tpp_change_priority: Assertion
`previous_prio == -1 || (previous_prio = __sched_fifo_min_prio 
previous_prio = __sched_fifo_max_prio)' failed.

 

Does anyone know the possible reason for such an error? And is it a
known bug  in FUSE? The FUSE version I am using is 2.8.5.

Kindly help.

Thanks,

Deepti







Re: Hbase with Hadoop

2011-10-14 Thread Jignesh Patel
Ramya,

I have followed the steps you mention but in this steps I don't see you 
starting hbase.
I have followed step 1,2 and 3.
Here is how my hdfs-site.xml looks.

configuration
 property
 namedfs.replication/name
 value1/value
   descriptionDefault block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
 /description
 /property
 property
namedfs.support.append/name
valuetrue/value
 /property
 
  property
namedfs.datanode.max.xcievers/name
value4096/value
  /property

 
/configuration


For the step 4 I got following message which is ok as I am running in pseudo 
mode.

starting regionserver, logging to 
/Users/hadoop-user/hadoop-hbase/bin/../logs/hbase-hadoop-user-regionserver-Jignesh-MacBookPro.local.out
11/10/14 10:25:55 WARN regionserver.HRegionServerCommandLine: Not starting a 
distinct region server because hbase.cluster.distributed is false

then when I have tried to start base - bin/start-hbase.sh --config ./config I 
have same old error.

Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. 
Why? Because clients (eg shell) wont be able to find this ZK quorum



-Jignesh
On Oct 14, 2011, at 2:31 AM, Ramya Sunil wrote:

 Jignesh,
 
 I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. 
 Below are the steps I followed:
 
 1. Make sure none of hbasemaster, regionservers or zookeeper are running. As 
 Matt pointed out, turn on append.
 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper
 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master
 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver
 5. hbase --config $HBASE_CONF_DIR shell
 
 
 Hope it helps.
 Ramya
 
 
 
 On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel jign...@websoft.com wrote:
 Is there a way to resolve this weird problem.
 
  bin/hbase-start.sh is supposed to start zookeeper but it doesn't start. But 
  on the other side if zookeeper up and running then it says
 
  Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. 
  Why? Because clients (eg shell) wont be able to find this ZK quorum
 
 
 
 On Oct 13, 2011, at 5:40 PM, Jignesh Patel wrote:
 
  Ok now the problem is
 
  if I only use bin/hbase-start.sh then it doesn't start zookeeper.
 
  But if I use bin/hbase-daemon.sh start zookeeper before starting 
  bin/hbase-start.sh then it will try to start zookeeper at port 2181 and 
  then I have following error.
 
  Couldnt start ZK at requested address of 2181, instead got: 2182. Aborting. 
  Why? Because clients (eg shell) wont be able to find this ZK quorum
 
 
  So I am wondering if bin/hbase-start.sh is trying to start zookeeper then 
  while zookeeper is not running it should start the zookeeper. I only get 
  the error if zookeeper already running.
 
 
  -Jignesh
 
 
  On Oct 13, 2011, at 4:53 PM, Ramya Sunil wrote:
 
  You already have zookeeper running on 2181 according to your jps output.
  That is the reason, master seems to be complaining.
  Can you please stop zookeeper, verify that no daemons are running on 2181
  and restart your master?
 
  On Thu, Oct 13, 2011 at 12:37 PM, Jignesh Patel jign...@websoft.com 
  wrote:
 
  Ramya,
 
 
  Based on Hbase the definite guide it seems zookeeper being started by
  hbase no need to start it separately(may be this is changed for 0.90.4.
  Anyways now  following is the updated status.
 
  Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/start-hbase.sh
  starting master, logging to
  /users/hadoop-user/hadoop-hbase/logs/hbase-hadoop-user-master-Jignesh-MacBookPro.local.out
  Couldnt start ZK at requested address of 2181, instead got: 2182. 
  Aborting.
  Why? Because clients (eg shell) wont be able to find this ZK quorum
  Jignesh-MacBookPro:hadoop-hbase hadoop-user$ jps
  41486 HQuorumPeer
  38814 SecondaryNameNode
  41578 Jps
  38878 JobTracker
  38726 DataNode
  38639 NameNode
  38964 TaskTracker
 
  On Oct 13, 2011, at 3:23 PM, Ramya Sunil wrote:
 
  Jignesh,
 
  I dont see zookeeper running on your master. My cluster reads the
  following:
 
  $ jps
  15315 Jps
  13590 HMaster
  15235 HQuorumPeer
 
  Can you please shutdown your Hmaster and run the following first:
  $ hbase-daemon.sh start zookeeper
 
  And then start your hbasemaster and regionservers?
 
  Thanks
  Ramya
 
  On Thu, Oct 13, 2011 at 12:01 PM, Jignesh Patel jign...@websoft.com
  wrote:
 
  ok --config worked but it is showing me same error. How to resolve this.
 
  http://pastebin.com/UyRBA7vX
 
  On Oct 13, 2011, at 1:34 PM, Ramya Sunil wrote:
 
  Hi Jignesh,
 
  --config (i.e. - - config) is the option to use and not -config.
  Alternatively you can also set HBASE_CONF_DIR.
 
  Below is the exact command line:
 
  $ hbase --config /home/ramya/hbase/conf shell
  hbase(main):001:0 create 'newtable','family'
  0 row(s) 

Re: FUSE CRASHING

2011-10-14 Thread Brian Bockelman
Hi Deepti,

That appears to crash deep in pthread, which would scare me a bit.  Are you 
using a strange/non-standard platform?  What Java version?  What HDFS version?

Brian

On Oct 14, 2011, at 3:59 AM, Banka, Deepti wrote:

 Hi,
 
 I am trying to run FUSE and, its crashing randomly in the middle with
 the following error:
 
 
 
 fuse_dfs:  tpp.c:66: __pthread_tpp_change_priority: Assertion
 `previous_prio == -1 || (previous_prio = __sched_fifo_min_prio 
 previous_prio = __sched_fifo_max_prio)' failed.
 
 
 
 Does anyone know the possible reason for such an error? And is it a
 known bug  in FUSE? The FUSE version I am using is 2.8.5.
 
 Kindly help.
 
 Thanks,
 
 Deepti
 
 
 
 
 



smime.p7s
Description: S/MIME cryptographic signature


Re: wordcount example throwing null pointer with ConcurrentHashMap

2011-10-14 Thread Shevek
ConcurrentHashMap does not accept null keys, so get() must have been called
with null.

Looking briefly, it seems that a map completion event contained a tracker
http address without a hostname? That might be enough to help you debug it
in your setup; I don't know.

S.

On 13 October 2011 22:10, Santosh Belda santosh.be...@broadridge.comwrote:



 Hi,

 I have setup the hadoop on single node and worked fine but when executing
 the wordcount example, following error is thornw, Is this any configuration
 issue?

  bin/hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount
 /user/hduser/testfiles /user/hduser/output
 11/10/14 10:29:53 INFO input.FileInputFormat: Total input paths to process
 :
 3
 11/10/14 10:29:53 WARN snappy.LoadSnappy: Snappy native library is
 available
 11/10/14 10:29:53 INFO util.NativeCodeLoader: Loaded the native-hadoop
 library
 11/10/14 10:29:53 INFO snappy.LoadSnappy: Snappy native library loaded
 11/10/14 10:29:53 INFO mapred.JobClient: Running job: job_201110141028_0001
 11/10/14 10:29:54 INFO mapred.JobClient:  map 0% reduce 0%
 11/10/14 10:29:59 INFO mapred.JobClient:  map 66% reduce 0%
 11/10/14 10:30:01 INFO mapred.JobClient: Task Id :
 attempt_201110141028_0001_r_00_0, Status : FAILED
 Error: java.lang.NullPointerException
at
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2824)
at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2744)

 11/10/14 10:30:02 INFO mapred.JobClient:  map 100% reduce 0%
 11/10/14 10:30:03 INFO mapred.JobClient: Task Id :
 attempt_201110141028_0001_r_00_1, Status : FAILED
 Error: java.lang.NullPointerException
at
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2824)
at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2744)

 11/10/14 10:30:05 INFO mapred.JobClient: Task Id :
 attempt_201110141028_0001_r_00_2, Status : FAILED
 Error: java.lang.NullPointerException
at
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2824)
at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2744)

 11/10/14 10:30:08 INFO mapred.JobClient: Job complete:
 job_201110141028_0001
 11/10/14 10:30:08 INFO mapred.JobClient: Counters: 18
 11/10/14 10:30:08 INFO mapred.JobClient:   Job Counters
 11/10/14 10:30:08 INFO mapred.JobClient: Launched reduce tasks=4
 11/10/14 10:30:08 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9167
 11/10/14 10:30:08 INFO mapred.JobClient: Total time spent by all
 reduces
 waiting after reserving slots (ms)=0
 11/10/14 10:30:08 INFO mapred.JobClient: Total time spent by all maps
 waiting after reserving slots (ms)=0
 11/10/14 10:30:08 INFO mapred.JobClient: Launched map tasks=3
 11/10/14 10:30:08 INFO mapred.JobClient: Data-local map tasks=3
 11/10/14 10:30:08 INFO mapred.JobClient: Failed reduce tasks=1
 11/10/14 10:30:08 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3292
 11/10/14 10:30:08 INFO mapred.JobClient:   FileSystemCounters
 11/10/14 10:30:08 INFO mapred.JobClient: FILE_BYTES_READ=740427
 11/10/14 10:30:08 INFO mapred.JobClient: HDFS_BYTES_READ=2863597
 11/10/14 10:30:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2161157
 11/10/14 10:30:08 INFO mapred.JobClient:   Map-Reduce Framework
 11/10/14 10:30:08 INFO mapred.JobClient: Combine output records=87431
 11/10/14 10:30:08 INFO mapred.JobClient: Map input records=58570
 11/10/14 10:30:08 INFO mapred.JobClient: Spilled Records=138742
 11/10/14 10:30:08 INFO mapred.JobClient: Map output bytes=4774081
 11/10/14 10:30:08 INFO mapred.JobClient: Combine input records=487561
 11/10/14 10:30:08 INFO mapred.JobClient: Map output records=487561
 11/10/14 10:30:08 INFO mapred.JobClient: SPLIT_RAW_BYTES=361

 --
 View this message in context:
 http://old.nabble.com/wordcount-example-throwing-null-pointer-with-ConcurrentHashMap-tp32650178p32650178.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Is Hadoop the right platform for my HPC application?

2011-10-14 Thread Shevek
On 12 September 2011 14:23, Alberto Andreotti albertoandreo...@gmail.comwrote:

 Hi Parker,

 I'm also interested in exploring hadoop capabilities for HPC, I've been
 doing some experiments with heat transfer problems. Which workloads are you
 trying?


My limited understanding suggests you might also look at Pregel or Giraph
for heat transfer problems?

S.


Re: How to evenly split data file

2011-10-14 Thread Shevek
I can't answer your question fully without sitting in front of it using a
debugger, but the principle is this:

Hadoop's JobClient splits the data into approximately evenly sized blocks in
bytes. The trick now is to synchronize those blocks on record boundaries.
All file formats behave approximately the same way, and as I understand it,
the rough algorithm is this:

Each file must have synchronization points (a term I now define, not
common terminology in Hadoop). A synchronization point in a text file is a
newline. A synchronization point in an RCFile or SequenceFile is a block
header, which is recognized by a randomly selected sequence of 16(?) bytes.
Offset 0 is also a synchronization point. Some file formats cannot detect
sync points using the data and must rely on external indexes (e.g. LZO), in
which case a sync point is a reinitialization point for the compression
algorithm (which is always a block compressor), looked up in that external
index.

A task is given (start, end) as bytes offsets. It finds the first sync point
at or after 'start'. It then reads records in any block starting after
'start', but not starting after 'end'. In the case of a text file, block =
line = record. In other files, the concepts are distinct.

If a task is given a short block (x, x+1) then it will find the first sync
point after x, which will also be after x+1, so it will read no records.
Thus no records are read twice, and blocks must be large enough to give each
task at least some records between sync points in its block. If your file
format syncs every 64Mb, and your record is 10 bytes, and you give out 1Mb
splits in the hope of getting 100K records per mapper, you will get 1 in 64
mappers does 6 million records, and 63 in 64 mappers do nothing.

Thus each task processes a roughly equal number of bytes, but not an equal
number of records.

I'm afraid I can't help more, but these are the principles you are looking
for.

By the way, does anyone know how or whether SequenceFile avoids a heavy HDFS
hit on the first block of the file, where it looks up the magic byte
signature for that file? I'm too lazy to look this morning, but the thought
occurs to me.

S.

On 5 October 2011 22:35, Thomas Anderson t.dt.aander...@gmail.com wrote:

 I don't use mapreduce, and just practice using Hadoop common api to
 manually split a data file, in which data is stored in a form of
 SequceFileInputFormat.

 The way to split file is by dividing file length by total tasks
 number. InputSplit created will be passed RecordReader and read from
 designated path. The code is as below:

   private void readPartOfDataFile() {
  taskId = getTaskId();
  InputSplit split = getSplit(taskid);
  SequenceFileRecordReaderText, CustomData input = new
 SequenceFileRecordReaderText, CustomData(conf, (FileSplit) split);
  Text url = input.createKey();
  CustomData d = input.createValue();
  int count = 0;
  while(input.next(url, d)) {
count++;
  }
}

private InputSplit getSplit(final int taskid) throws IOException {
  FileSystem fs = FileSystem.get(conf);
  Path filePath = new Path(path/to/, file);
  FileStatus[] status = fs.listStatus(filePath);
  int maxTasks = conf.getInt(test.maxtasks, 12);
  for(FileStatus file: status) {
if(file.isDir()) { // get data file
  Path dataFile = new Path(file.getPath(), data);
  FileStatus data = fs.getFileStatus(dataFile);
  long dataLength = data.getLen();
  BlockLocation[] locations =
fs.getFileBlockLocations(data, 0, dataLength);
  if(0  dataLength) {
long chunk = dataLength/(long)maxTasks;
long beg = (taskid*chunk)+(long)1;
long end = (taskid+1)*chunk;
if(maxTasks == (taskid+1)) {
  end = dataLength;
}
return new FileSplit(dataFile, beg, end,
 locations[locations.length-1].getHosts());
  } else {
LOG.info(No Data for file:+file.getPath());
  }
}// is dir
  }// for
  return null;
}

 However, it seems that the records read from data file is not equally
 distributed. For instance, data file may contain 1200 records and data
 length is around 74250. With 12 max tasks, each task may roughly hold
 size around 6187 (per split). But the records displayed shows that
 each task may hold various  records (e.g. task 4 read records 526.
 task 5 read 632. task 6 read 600) and the total count records is
 larger than the total records stored. I check
 JobClient.writeOldSplits(). It seems similar to the way to JobClient
 divides data. What is missing when considering split data with hadoop
 common api?



Re: Hbase with Hadoop

2011-10-14 Thread Jignesh Patel

On Oct 14, 2011, at 2:44 PM, Jignesh Patel wrote:

 According to start-hase.sh if distributed mode=flase then I am supposed to 
 start only masters it doesn't required to start zookeeper, see the script 
 below from the file.
 
 if [ $distMode == 'false' ] 
 then
   $bin/hbase-daemon.sh start master
 else
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} start zookeeper
   $bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master 
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
 --hosts ${HBASE_REGIONSERVERS} start regionserver
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
 --hosts ${HBASE_BACKUP_MASTERS} start master-backup
 fi
 
 According to above script the zookeeper is not required to start as I am not 
 running server in distributed mode but in pseudo mode. But then it is giving 
 error for zookeeper is not able to connect.

-Jignesh
 
 
 is supposed to start zookeeper and master as per the 
 
 On Fri, Oct 14, 2011 at 2:31 AM, Ramya Sunil [via Lucene] 
 ml-node+s472066n342086...@n3.nabble.com wrote:
 Jignesh, 
 
 I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. 
 Below are the steps I followed: 
 
 1. Make sure none of hbasemaster, regionservers or zookeeper are running. As 
 Matt pointed out, turn on append. 
 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper 
 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master 
 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver 
 5. hbase --config $HBASE_CONF_DIR shell 
 
 
 Hope it helps. 
 Ramya 
 
 
 
 On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel [hidden email] wrote: 
 
  Is there a way to resolve this weird problem. 
  
   bin/hbase-start.sh is supposed to start zookeeper but it doesn't start. 
  But on the other side if zookeeper up and running then it says 
  
   Couldnt start ZK at requested address of 2181, instead got: 2182. 
  Aborting. Why? Because clients (eg shell) wont be able to find this ZK 
  quorum 
  
  
  
  On Oct 13, 2011, at 5:40 PM, Jignesh Patel wrote: 
  
   Ok now the problem is 
   
   if I only use bin/hbase-start.sh then it doesn't start zookeeper. 
   
   But if I use bin/hbase-daemon.sh start zookeeper before starting 
  bin/hbase-start.sh then it will try to start zookeeper at port 2181 and 
  then 
  I have following error. 
   
   Couldnt start ZK at requested address of 2181, instead got: 2182. 
  Aborting. Why? Because clients (eg shell) wont be able to find this ZK 
  quorum 
   
   
   So I am wondering if bin/hbase-start.sh is trying to start zookeeper then 
  while zookeeper is not running it should start the zookeeper. I only get 
  the 
  error if zookeeper already running. 
   
   
   -Jignesh 
   
   
   On Oct 13, 2011, at 4:53 PM, Ramya Sunil wrote: 
   
   You already have zookeeper running on 2181 according to your jps output. 
   That is the reason, master seems to be complaining. 
   Can you please stop zookeeper, verify that no daemons are running on 
  2181 
   and restart your master? 
   
   On Thu, Oct 13, 2011 at 12:37 PM, Jignesh Patel [hidden email] 
  wrote: 
   
   Ramya, 
   
   
   Based on Hbase the definite guide it seems zookeeper being started by 
   hbase no need to start it separately(may be this is changed for 0.90.4. 
   Anyways now  following is the updated status. 
   
   Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/start-hbase.sh 
   starting master, logging to 
   
  /users/hadoop-user/hadoop-hbase/logs/hbase-hadoop-user-master-Jignesh-MacBookPro.local.out
   
   Couldnt start ZK at requested address of 2181, instead got: 2182. 
  Aborting. 
   Why? Because clients (eg shell) wont be able to find this ZK quorum 
   Jignesh-MacBookPro:hadoop-hbase hadoop-user$ jps 
   41486 HQuorumPeer 
   38814 SecondaryNameNode 
   41578 Jps 
   38878 JobTracker 
   38726 DataNode 
   38639 NameNode 
   38964 TaskTracker 
   
   On Oct 13, 2011, at 3:23 PM, Ramya Sunil wrote: 
   
   Jignesh, 
   
   I dont see zookeeper running on your master. My cluster reads the 
   following: 
   
   $ jps 
   15315 Jps 
   13590 HMaster 
   15235 HQuorumPeer 
   
   Can you please shutdown your Hmaster and run the following first: 
   $ hbase-daemon.sh start zookeeper 
   
   And then start your hbasemaster and regionservers? 
   
   Thanks 
   Ramya 
   
   On Thu, Oct 13, 2011 at 12:01 PM, Jignesh Patel [hidden email] 
   wrote: 
   
   ok --config worked but it is showing me same error. How to resolve 
  this. 
   
   http://pastebin.com/UyRBA7vX
   
   On Oct 13, 2011, at 1:34 PM, Ramya Sunil wrote: 
   
   Hi Jignesh, 
   
   --config (i.e. - - config) is the option to use and not -config. 
   Alternatively you can also set HBASE_CONF_DIR. 
   
   Below is the exact command line: 
   
   $ hbase --config /home/ramya/hbase/conf shell 
   hbase(main):001:0 create 'newtable','family' 
   0 row(s) in 0.5140 seconds 
   
   hbase(main):002:0 list 'newtable' 
   TABLE 
   newtable 
   1 row(s) in 0.0120 seconds 
   
   OR 
   
   $ 

controlling base memory

2011-10-14 Thread Jignesh Patel
How to control Hbase memory? As soon as I use command bin/start-hbase.sh my 8 
GB ram machine runs out to memory.

-Jignesh

Re: Hbase with Hadoop

2011-10-14 Thread Jignesh Patel
Can somebody help me to work Hadoop 0.20.205.0 and Hbase 0.90.4 in pseudo mode. 
This is third day in a row and I am not able to make it run.

The details are as follows

http://pastebin.com/KrJePt64


If this is not going to work then let me know which version I should use to get 
it run. 

On Oct 14, 2011, at 2:46 PM, Jignesh Patel wrote:

 
 On Oct 14, 2011, at 2:44 PM, Jignesh Patel wrote:
 
 According to start-hase.sh if distributed mode=flase then I am supposed to 
 start only masters it doesn't required to start zookeeper, see the script 
 below from the file.
 
 if [ $distMode == 'false' ] 
 then
   $bin/hbase-daemon.sh start master
 else
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} start zookeeper
   $bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master 
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
 --hosts ${HBASE_REGIONSERVERS} start regionserver
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
 --hosts ${HBASE_BACKUP_MASTERS} start master-backup
 fi
 
 According to above script the zookeeper is not required to start as I am not 
 running server in distributed mode but in pseudo mode. But then it is giving 
 error for zookeeper is not able to connect.
 
 -Jignesh
 
 
 is supposed to start zookeeper and master as per the 
 
 On Fri, Oct 14, 2011 at 2:31 AM, Ramya Sunil [via Lucene] 
 ml-node+s472066n342086...@n3.nabble.com wrote:
 Jignesh, 
 
 I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. 
 Below are the steps I followed: 
 
 1. Make sure none of hbasemaster, regionservers or zookeeper are running. As 
 Matt pointed out, turn on append. 
 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper 
 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master 
 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver 
 5. hbase --config $HBASE_CONF_DIR shell 
 
 
 Hope it helps. 
 Ramya 
 
 
 
 On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel [hidden email] wrote: 
 
  Is there a way to resolve this weird problem. 
  
   bin/hbase-start.sh is supposed to start zookeeper but it doesn't start. 
  But on the other side if zookeeper up and running then it says 
  
   Couldnt start ZK at requested address of 2181, instead got: 2182. 
  Aborting. Why? Because clients (eg shell) wont be able to find this ZK 
  quorum 
  
  
  
  On Oct 13, 2011, at 5:40 PM, Jignesh Patel wrote: 
  
   Ok now the problem is 
   
   if I only use bin/hbase-start.sh then it doesn't start zookeeper. 
   
   But if I use bin/hbase-daemon.sh start zookeeper before starting 
  bin/hbase-start.sh then it will try to start zookeeper at port 2181 and 
  then 
  I have following error. 
   
   Couldnt start ZK at requested address of 2181, instead got: 2182. 
  Aborting. Why? Because clients (eg shell) wont be able to find this ZK 
  quorum 
   
   
   So I am wondering if bin/hbase-start.sh is trying to start zookeeper 
   then 
  while zookeeper is not running it should start the zookeeper. I only get 
  the 
  error if zookeeper already running. 
   
   
   -Jignesh 
   
   
   On Oct 13, 2011, at 4:53 PM, Ramya Sunil wrote: 
   
   You already have zookeeper running on 2181 according to your jps 
   output. 
   That is the reason, master seems to be complaining. 
   Can you please stop zookeeper, verify that no daemons are running on 
  2181 
   and restart your master? 
   
   On Thu, Oct 13, 2011 at 12:37 PM, Jignesh Patel [hidden email] 
  wrote: 
   
   Ramya, 
   
   
   Based on Hbase the definite guide it seems zookeeper being started 
   by 
   hbase no need to start it separately(may be this is changed for 
   0.90.4. 
   Anyways now  following is the updated status. 
   
   Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/start-hbase.sh 
   starting master, logging to 
   
  /users/hadoop-user/hadoop-hbase/logs/hbase-hadoop-user-master-Jignesh-MacBookPro.local.out
   
   Couldnt start ZK at requested address of 2181, instead got: 2182. 
  Aborting. 
   Why? Because clients (eg shell) wont be able to find this ZK quorum 
   Jignesh-MacBookPro:hadoop-hbase hadoop-user$ jps 
   41486 HQuorumPeer 
   38814 SecondaryNameNode 
   41578 Jps 
   38878 JobTracker 
   38726 DataNode 
   38639 NameNode 
   38964 TaskTracker 
   
   On Oct 13, 2011, at 3:23 PM, Ramya Sunil wrote: 
   
   Jignesh, 
   
   I dont see zookeeper running on your master. My cluster reads the 
   following: 
   
   $ jps 
   15315 Jps 
   13590 HMaster 
   15235 HQuorumPeer 
   
   Can you please shutdown your Hmaster and run the following first: 
   $ hbase-daemon.sh start zookeeper 
   
   And then start your hbasemaster and regionservers? 
   
   Thanks 
   Ramya 
   
   On Thu, Oct 13, 2011 at 12:01 PM, Jignesh Patel [hidden email] 
   wrote: 
   
   ok --config worked but it is showing me same error. How to resolve 
  this. 
   
   http://pastebin.com/UyRBA7vX
   
   On Oct 13, 2011, at 1:34 PM, Ramya Sunil wrote: 
   
   Hi Jignesh, 
   
   --config (i.e. - - config) is the option to 

Re: Hbase with Hadoop

2011-10-14 Thread Todd Lipcon
On Wed, Oct 12, 2011 at 9:31 AM, Vinod Gupta Tankala
tvi...@readypulse.com wrote:
 its free and open source too.. basically, their releases are ahead of public
 releases of hadoop/hbase - from what i understand, major bug fixes and
 enhancements are checked in to their branch first and then eventually make
 it to public release branches.


You've got it a bit backwards - except for very rare exceptions, we
check our fixes into the public ASF codebase before we commit anything
to CDH releases. Sometimes, it will show up in a CDH release before an
ASF release, but the changes are always done as backports from ASF'[s
subversion. You can see the list of public JIRAs referenced in our
changelists here:
http://archive.cloudera.com/cdh/3/hadoop-0.20.2+923.97.CHANGES.txt

Apologies for the vendor-specific comment: I just wanted to clarify
that Cloudera's aim is to contribute to the community and not any kind
of fork as suggested above.

Back to work on 0.23 for me!

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Hbase with Hadoop

2011-10-14 Thread Jignesh Patel
At last I move one step further. It was a problem with the hadoop jar file. I 
need to replace hadoop-core-xx.jar in base/lib with hadoop/lib.
After replacing it I got following error:

2011-10-14 17:09:12,409 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:37)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.clinit(DefaultMetricsSystem.java:34)
at 
org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
at 
org.apache.hadoop.security.KerberosName.clinit(KerberosName.java:83)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:189)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395)
at 
org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1436)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1337)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:364)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:81)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
at 
org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:193)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.configuration.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 22 more


On Oct 14, 2011, at 3:35 PM, Jignesh Patel wrote:

 Can somebody help me to work Hadoop 0.20.205.0 and Hbase 0.90.4 in pseudo 
 mode. This is third day in a row and I am not able to make it run.
 
 The details are as follows
 
 http://pastebin.com/KrJePt64
 
 
 If this is not going to work then let me know which version I should use to 
 get it run. 
 
 On Oct 14, 2011, at 2:46 PM, Jignesh Patel wrote:
 
 
 On Oct 14, 2011, at 2:44 PM, Jignesh Patel wrote:
 
 According to start-hase.sh if distributed mode=flase then I am supposed to 
 start only masters it doesn't required to start zookeeper, see the script 
 below from the file.
 
 if [ $distMode == 'false' ] 
 then
   $bin/hbase-daemon.sh start master
 else
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} start zookeeper
   $bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master 
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
 --hosts ${HBASE_REGIONSERVERS} start regionserver
   $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \
 --hosts ${HBASE_BACKUP_MASTERS} start master-backup
 fi
 
 According to above script the zookeeper is not required to start as I am 
 not running server in distributed mode but in pseudo mode. But then it is 
 giving error for zookeeper is not able to connect.
 
 -Jignesh
 
 
 is supposed to start zookeeper and master as per the 
 
 On Fri, Oct 14, 2011 at 2:31 AM, Ramya Sunil [via Lucene] 
 ml-node+s472066n342086...@n3.nabble.com wrote:
 Jignesh, 
 
 I have been able to deploy Hbase 0.90.3 and 0.90.4 with hadoop-0.20.205. 
 Below are the steps I followed: 
 
 1. Make sure none of hbasemaster, regionservers or zookeeper are running. 
 As 
 Matt pointed out, turn on append. 
 2. hbase-daemon.sh --config $HBASE_CONF_DIR start zookeeper 
 3. hbase-daemon.sh --config $HBASE_CONF_DIR start master 
 4. hbase-daemon.sh --config $HBASE_CONF_DIR start regionserver 
 5. hbase --config $HBASE_CONF_DIR shell 
 
 
 Hope it helps. 
 Ramya 
 
 
 
 On Thu, Oct 13, 2011 at 4:11 PM, Jignesh Patel [hidden email] wrote: 
 
  Is there a way to resolve this weird problem. 
  
   

Re: Hbase with Hadoop

2011-10-14 Thread Jignesh Patel
Cool Everything is good now after copying commons-configuration.jar file.

No need to start zookeeper or master. Only start hbase-start.sh and everything 
works. I see my status changed,

On Oct 14, 2011, at 5:16 PM, Jignesh Patel wrote:

 undError: org/apache/commons/configuration/Configuration



mapreduce linear chaining: ClassCastException

2011-10-14 Thread Periya.Data
Hi all,
   I am trying a simple extension of WordCount example in Hadoop. I want to
get a frequency of wordcounts in descending order. To that I employ a linear
chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the
usual example). For the next MR job = I set the mapper to swap the word,
count to count, word. Then,  have the Identity reducer to simply store
the results.

My MR-1 does its job correctly and store the result in a temp path.

Question 1: The mapper of the second MR job (MR-2) doesn't like the input
format. I have properly set the input format for MapClass2 of what it
expects and what its output must be. It seems to expecting a LongWritable. I
suspect that it is trying to look at some index file. I am not sure.


It throws an error like this:

code
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text
/code

Some Info:
- I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it
for now.
- I use hadoop-0.20.2

For MR-1:
- conf1.setOutputKeyClass(Text.class);
- conf1.setOutputValueClass(IntWritable.class);

For MR-2
- takes in a Text (word) and IntWritable (sum)
- conf2.setOutputKeyClass(IntWritable.class);
- conf2.setOutputValueClass(Text.class);

code
public class MapClass2 extends MapReduceBase
  implements MapperText, IntWritable, IntWritable, Text {

  @Override
  public void map(Text word, IntWritable sum,
  OutputCollectorIntWritable, Text output,
  Reporter reporter) throws IOException {

  output.collect(sum, word);   // sum, word
  }
  }
/code

Any suggestions would be helpful. Is my MapClass2 code right in the first
place...for swapping? Or should I assume that mapper reads line by line,
so,  must read in one line, then, use StrTokenizer to split them up and
convert the second token (sum) from str to Int?? Or should I mess around
with OutputKeyComparator class?

Thanks,
PD


Re: controlling base memory

2011-10-14 Thread Harsh J
Jignesh,

Please use the HBase user list (u...@hbase.apache.org) for all your
HBase questions. This list is for Hadoop Common.

On Sat, Oct 15, 2011 at 12:26 AM, Jignesh Patel jign...@websoft.com wrote:
 How to control Hbase memory? As soon as I use command bin/start-hbase.sh my 8 
 GB ram machine runs out to memory.

 -Jignesh



-- 
Harsh J