Re: Insert blocked

2012-07-23 Thread Asaf Mesika
Is htable in autoFlush? What's your client buffer size?
What the thread stuck on? Take a thread dump

Sent from my iPad

On 24 ביול 2012, at 03:42, Mohit Anchlia  wrote:

> I am now using HTablePool but still the call hangs at "put". My code is
> something like this:
>
>
> hTablePool = *new* HTablePool(config,*MAX_POOL_SIZE*);
>
> result = *new* SessionTimelineDAO(hTablePool.getTable(t.name()),
> ColumnFamily.*S_T_MTX*);
>
> public SessionTimelineDAO(HTableInterface hTableInterface, ColumnFamily
> cf){
>  this.tableInt = hTableInterface;
>  this.cf = cf.name().getBytes();
>  log.info("Table " + hTableInterface + " " + cf);
> }
>
> @Override
> public void create(DataStoreModel dm) throws DataStoreException {
>  if(null == dm || null == dm.getKey()){
>   log.error("DataStoreModel is invalid");
>   return;
>  }
>
>  Put p = new Put(dm.getKey().array());
>
>  for(ByteBuffer bf : dm.getCols().keySet()){
>   p.add(cf, bf.array(), dm.getColumnValue(bf).array());
>  }
>
>  try {
>   log.info("In create ");
>   tableInt.put(p);
>  } catch (IOException e) {
>   log.error("Error writing " , e);
>   throw new DataStoreException(e);
>  } finally{
>   cleanUp();
>
>  }
> }
>
>
> private void cleanUp() {
>  if(null != tableInt){
>   try {
>tableInt.close();
>   } catch (IOException e) {
>log.error("Failed while closing table interface", e);
>   }
>  }
> }
> On Mon, Jul 23, 2012 at 4:15 PM, Mohit Anchlia wrote:
>
>>
>>
>> On Mon, Jul 23, 2012 at 3:54 PM, Elliott Clark wrote:
>>
>>> HTable is not thread safe[1]. It's better to use HTablePool if you want to
>>> share things across multiple threads.[2]
>>>
>>> 1
>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
>>> 2
>>>
>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html
>>>
>>> Thanks! I'll change my code to use HtablePool
>>
>>> On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia >>> wrote:
>>>
 I am writing a stress tool to test my specific use case. In my current
 implementation HTable is a global static variable that I initialize just
 once and use it accross multiple threads. Is this ok?

 My row key consists of (timestamp - (timestamp % 1000)) and cols are
 counters. What I am seeing is that when I run my test after first row is
 created the application just hangs. I just wanted to check if there are
 obvious things that I should watch out for.

 I am currently testing few threads in eclipse, but I'll still try and
 generate stackTrace

>>>
>>
>>


Re: Insert blocked

2012-07-23 Thread lars hofhansl
Or you can pre-create your HConnection and Threadpool and use the HTable 
constructor that takes these as arguments.
That is faster and less "byzantine" compared to the HTablePool "monster".

Also see here (if you don't mind the plug): 
http://hadoop-hbase.blogspot.com/2011/12/long-running-hbase-clients.html


-- Lars



- Original Message -
From: Elliott Clark 
To: user@hbase.apache.org
Cc: 
Sent: Monday, July 23, 2012 3:54 PM
Subject: Re: Insert blocked

HTable is not thread safe[1]. It's better to use HTablePool if you want to
share things across multiple threads.[2]

1 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
2
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html

On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia wrote:

> I am writing a stress tool to test my specific use case. In my current
> implementation HTable is a global static variable that I initialize just
> once and use it accross multiple threads. Is this ok?
>
> My row key consists of (timestamp - (timestamp % 1000)) and cols are
> counters. What I am seeing is that when I run my test after first row is
> created the application just hangs. I just wanted to check if there are
> obvious things that I should watch out for.
>
> I am currently testing few threads in eclipse, but I'll still try and
> generate stackTrace
>



Re: Efficient read/write - Iterative M/R jobs

2012-07-23 Thread Ioakim Perros
Update (for anyone ending up here after a possible google search on the 
issue) :


Finally, running M/R job in order to bulk import data in a 
pseudo-distributed is feasible (for testing purposes) .


The error concerning TotalOrderPartitioner had something to do with a 
trivial bug at the keys I passed from mappers.


The thing is that you need to add "guava-r09.jar" (or any version of 
latest guava I suppose - it is located under lib folder of hbase setup 
path) to the lib folder of hadoop setup path. I suppose that in order 
for the same job to run on a truly distributed environment, one has to 
add -libjars /path/to/guava.jar to the options of hadoop jar command.


On 07/24/2012 02:06 AM, Jean-Daniel Cryans wrote:

... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_00_0,
Status : FAILED
java.lang.IllegalArgumentException: *Can't read partitions file*
 at
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
...

I followed this link, while googling for the solution :
http://hbase.apache.org/book/trouble.mapreduce.html
and it implies a misconfiguration concerning a fully distributed
environment.

I would like, therefore, to ask if it is even possible to bulk import data
in a pseudo-distributed mode and if this is the case, does anyone have a
guess about this error?

AFAIK you just can't use the local job tracker for this, so you do
need to start one.

J-D







Re: Index building process design

2012-07-23 Thread Amandeep Khurana
You have an interesting question. I haven't come across anyone trying this 
approach yet. Replication is a relatively new feature so that might be one 
reason. But I think you are over engineering the solution here. I'll try to 
break it down to the best of my abilities.

1. First question to answer is - where are the indices going to be stored? Are 
they going to be other HBase tables (something like secondary indices)? Or are 
they going to be in an independent indexing system like Lucene/Solr/Elastic 
Search/Blurr? The number of fields you want to index on will determine that 
answer IMO.

2. Does the index building happen synchronously as the writes come in or is it 
an asynchronous process? The answer to this question will depend on the 
expected throughput and the cluster size. Also, the answer to this question 
will determine your indexing process.

a) If you are going to build indices synchronously, you have two options. First 
being - get your client to write to HBase as well as create the index. Second 
option - use coprocessors. Coprocs are also a new feature and I'm not certain 
yet that they'll solve your problem. Try them out though, they are pretty neat.

b) If you are going to build indices asynchronously, you'll likely be using MR 
for it. MR can run on the same cluster but if you pound your cluster with full 
table scans, your latencies for real time serving will get affected. Now, if 
you have relatively low throughput for data ingests, you might be able to get 
away with running fewer tasks in your MR job. So, data serving plus index 
creation can happen on the same cluster. If your throughput is high and you 
need MR jobs to run full force, separate the clusters out. Have a cluster to 
run the MR jobs and a separate cluster to serve the data. My approach in the 
second case would be to dump new data into HDFS directly as flat files, run MR 
over them to create the index and also put them into HBase for serving.

Hope that gives you some more ideas to think about.

-ak 


On Wednesday, July 11, 2012 at 10:26 PM, Eric Czech wrote:

> Hi everyone,
> 
> I have a general design question (apologies in advanced if this has
> been asked before).
> 
> I'd like to build indexes off of a raw data store and I'm trying to
> think of the best way to control processing so some part of my cluster
> can still serve reads and writes without being affected heavily by the
> index building process.
> 
> I get the sense that the typical process for this involves something
> like the following:
> 
> 1. Dedicate one cluster for index building (let's call it the INDEX
> cluster) and one for serving application reads on the indexes as well
> as writes/reads on the raw data set (let's call it the MAIN cluster).
> 2. Have the raw data set replicated from the MAIN cluster to the INDEX 
> cluster.
> 3. On the INDEX cluster, use the replicated raw data to constantly
> rebuild indexes and copy the new versions to the MAIN cluster,
> overwriting the old versions if necessary.
> 
> While conceptually simple, I can't help but wonder if it doesn't make
> more sense to simply switch application reads / writes from one
> cluster to another based on which one is NOT currently building
> indexes (but still have the raw data set replicate master-master
> between them).
> 
> To be more clear, I'm proposing doing this:
> 
> 1. Have two clusters, call them CLUSTER_1 and CLUSTER_2, and have the
> raw data set replicated master-master between them.
> 2. if CLUSTER_1 is currently rebuilding indexes, redirect all
> application traffic to CLUSTER_2 including reads from the indexes as
> well as writes to the raw data set (and vise-versa).
> 
> I know I'm not addressing a lot of details here but I'm just curious
> if anyone has ever implemented something along these lines.
> 
> The main advantage to what I'm proposing would be not having to copy
> potentially massive indexes across the network but at the cost of
> having to deal with having clients not always read from the same
> cluster (seems doable though).
> 
> Any advice would be much appreciated!
> 
> Thanks 



Re: Insert blocked

2012-07-23 Thread Mohit Anchlia
I am now using HTablePool but still the call hangs at "put". My code is
something like this:


hTablePool = *new* HTablePool(config,*MAX_POOL_SIZE*);

result = *new* SessionTimelineDAO(hTablePool.getTable(t.name()),
ColumnFamily.*S_T_MTX*);

 public SessionTimelineDAO(HTableInterface hTableInterface, ColumnFamily
cf){
  this.tableInt = hTableInterface;
  this.cf = cf.name().getBytes();
  log.info("Table " + hTableInterface + " " + cf);
 }

 @Override
 public void create(DataStoreModel dm) throws DataStoreException {
  if(null == dm || null == dm.getKey()){
   log.error("DataStoreModel is invalid");
   return;
  }

  Put p = new Put(dm.getKey().array());

  for(ByteBuffer bf : dm.getCols().keySet()){
   p.add(cf, bf.array(), dm.getColumnValue(bf).array());
  }

  try {
   log.info("In create ");
   tableInt.put(p);
  } catch (IOException e) {
   log.error("Error writing " , e);
   throw new DataStoreException(e);
  } finally{
   cleanUp();

  }
 }


 private void cleanUp() {
  if(null != tableInt){
   try {
tableInt.close();
   } catch (IOException e) {
log.error("Failed while closing table interface", e);
   }
  }
 }
On Mon, Jul 23, 2012 at 4:15 PM, Mohit Anchlia wrote:

>
>
>  On Mon, Jul 23, 2012 at 3:54 PM, Elliott Clark wrote:
>
>> HTable is not thread safe[1]. It's better to use HTablePool if you want to
>> share things across multiple threads.[2]
>>
>> 1
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
>> 2
>>
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html
>>
>> Thanks! I'll change my code to use HtablePool
>
>> On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia > >wrote:
>>
>> > I am writing a stress tool to test my specific use case. In my current
>> > implementation HTable is a global static variable that I initialize just
>> > once and use it accross multiple threads. Is this ok?
>> >
>> > My row key consists of (timestamp - (timestamp % 1000)) and cols are
>> > counters. What I am seeing is that when I run my test after first row is
>> > created the application just hangs. I just wanted to check if there are
>> > obvious things that I should watch out for.
>> >
>> > I am currently testing few threads in eclipse, but I'll still try and
>> > generate stackTrace
>> >
>>
>
>


Re: hbase threw NotServingRegionException

2012-07-23 Thread Ey-Chih chow
Besides GC info, the master log is as follows.

Ey-Chih

===

12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_1_prod2,1110----,1339745664043.27d8af632e8a790d4a72254cadbdcb0c./info:server/1339746517691/Put/vlen=22,
 
session_1_prod2,1110----,1339745664043.27d8af632e8a790d4a72254cadbdcb0c./info:serverstartcode/1339746517691/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_1_prod2,1DDC----,1339745664043.287beb1a649505008edb34564b67dbaa./info:server/1339790024840/Put/vlen=22,
 
session_1_prod2,1DDC----,1339745664043.287beb1a649505008edb34564b67dbaa./info:serverstartcode/1339790024840/Put/vlen=8}
2012-07-22T00:47:18.399-0700: [GC [ParNew: 17219K->172K(19136K), 0.0013380 
secs] 23565K->6518K(346732K) icms_dc=0 , 0.0014010 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_1_prod2,3FFC----,1339745664044.655378f6ee550200f85415aad81a876a./info:server/1339746494906/Put/vlen=22,
 
session_1_prod2,3FFC----,1339745664044.655378f6ee550200f85415aad81a876a./info:serverstartcode/1339746494906/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,0CCC----,1339612083332.2437e319dd4ae29ab9d43bc7962c022e./info:server/1339614727706/Put/vlen=22,
 
session_prod2,0CCC----,1339612083332.2437e319dd4ae29ab9d43bc7962c022e./info:serverstartcode/1339614727706/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,2AA8----,133961208.71a8929cb03baf96b3a5d2c2ddfda600./info:server/1339614726844/Put/vlen=22,
 
session_prod2,2AA8----,133961208.71a8929cb03baf96b3a5d2c2ddfda600./info:serverstartcode/1339614726844/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,3FFC----,133961208.18ec0eae6fdb1c720684b8e85f02f482./info:server/1339613525749/Put/vlen=22,
 
session_prod2,3FFC----,133961208.18ec0eae6fdb1c720684b8e85f02f482./info:serverstartcode/1339613525749/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,4440----,133961208.e55655bcdb5f369158ef80b9a8f83a71./info:server/1339614726816/Put/vlen=22,
 
session_prod2,4440----,133961208.e55655bcdb5f369158ef80b9a8f83a71./info:serverstartcode/1339614726816/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,4CC8----,133961208.29a52f98f640f35e187de9208a16f77a./info:server/1339614727337/Put/vlen=22,
 
session_prod2,4CC8----,133961208.29a52f98f640f35e187de9208a16f77a./info:serverstartcode/1339614727337/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,510C----,133961208.4cf3dbac075c508e6bb4868d3138d223./info:server/1339614730155/Put/vlen=22,
 
session_prod2,510C----,133961208.4cf3dbac075c508e6bb4868d3138d223./info:serverstartcode/1339614730155/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,6EE8----,133961208.b90def2e52482c15a1ea275aa8bcabbf./info:server/1339614728640/Put/vlen=22,
 
session_prod2,6EE8----,133961208.b90def2e52482c15a1ea275aa8bcabbf./info:serverstartcode/1339614728640/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,732C----,133961208.8cdb04c17aa9dd9e9eb05b8be0d5b58c./info:server/1339614728349/Put/vlen=22,
 
session_prod2,732C----,133961208.8cdb04c17aa9dd9e9eb05b8be0d5b58c./info:serverstartcode/1339614728349/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,843C----,133961208.17609b20443e66270e3bc08ee8d1b31c./info:server/1339614728931/Put/vlen=22,
 
session_prod2,843C----,133961208.17609b20443e66270e3bc08ee8d1b31c./info:serverstartcode/1339614728931/Put/vlen=8}
12/07/22 00:47:18 WARN master.CatalogJanitor: REGIONINFO_QUALIFIER is empty in 
keyvalues={session_prod2,9DD4----,1339612083334.df6df275cd52f326f7f1098379db626b./

Re: Insert blocked

2012-07-23 Thread Mohit Anchlia
On Mon, Jul 23, 2012 at 3:54 PM, Elliott Clark wrote:

> HTable is not thread safe[1]. It's better to use HTablePool if you want to
> share things across multiple threads.[2]
>
> 1
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
> 2
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html
>
> Thanks! I'll change my code to use HtablePool

> On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia  >wrote:
>
> > I am writing a stress tool to test my specific use case. In my current
> > implementation HTable is a global static variable that I initialize just
> > once and use it accross multiple threads. Is this ok?
> >
> > My row key consists of (timestamp - (timestamp % 1000)) and cols are
> > counters. What I am seeing is that when I run my test after first row is
> > created the application just hangs. I just wanted to check if there are
> > obvious things that I should watch out for.
> >
> > I am currently testing few threads in eclipse, but I'll still try and
> > generate stackTrace
> >
>


Re: Efficient read/write - Iterative M/R jobs

2012-07-23 Thread Ioakim Perros

Thank you very much for your instant response :-)

Hope Amazon Web Services will help me with this one.
IP


On 07/24/2012 02:06 AM, Jean-Daniel Cryans wrote:

... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_00_0,
Status : FAILED
java.lang.IllegalArgumentException: *Can't read partitions file*
 at
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
...

I followed this link, while googling for the solution :
http://hbase.apache.org/book/trouble.mapreduce.html
and it implies a misconfiguration concerning a fully distributed
environment.

I would like, therefore, to ask if it is even possible to bulk import data
in a pseudo-distributed mode and if this is the case, does anyone have a
guess about this error?

AFAIK you just can't use the local job tracker for this, so you do
need to start one.

J-D




Re: Index building process design

2012-07-23 Thread Michael Segel
Ok, I'll take a stab at the shorter one. :-)

You can create a base data table which contains your raw data. 
Depending on your index... like an inverted table, you can run a map/reduce job 
that builds up a second table.  And a third, a fourth... depending on how many 
inverted indexes you want. 

When you want to find a data set based on a known value in the index, you can 
scan the index table, and the result set will contain a list of keys for the 
data in the base table. 

Now you can then just fetch those rows from HBase.
If you are using multiple indexes, you just take the intersection of the result 
set(s) and now you have the end data set to fetch.

Not sure why you would want a second cluster. Could you expand on your use case?

On Jul 23, 2012, at 3:06 PM, Eric Czech wrote:

> Hmm, maybe that was too long -- I'll keep this one shorter I swear:
> 
> Would it make sense to build indexes with two Hadoop/Hbase clusters by
> simply pointing client traffic at the cluster that is currently NOT
> building indexes via M/R jobs?  Basically, has anyone ever tried switching
> back and forth between clusters instead of building indexes on one cluster
> and copying them to another?
> 
> 
> On Thu, Jul 12, 2012 at 1:26 AM, Eric Czech  wrote:
> 
>> Hi everyone,
>> 
>> I have a general design question (apologies in advanced if this has
>> been asked before).
>> 
>> I'd like to build indexes off of a raw data store and I'm trying to
>> think of the best way to control processing so some part of my cluster
>> can still serve reads and writes without being affected heavily by the
>> index building process.
>> 
>> I get the sense that the typical process for this involves something
>> like the following:
>> 
>> 1.  Dedicate one cluster for index building (let's call it the INDEX
>> cluster) and one for serving application reads on the indexes as well
>> as writes/reads on the raw data set (let's call it the MAIN cluster).
>> 2.  Have the raw data set replicated from the MAIN cluster to the INDEX
>> cluster.
>> 3.  On the INDEX cluster, use the replicated raw data to constantly
>> rebuild indexes and copy the new versions to the MAIN cluster,
>> overwriting the old versions if necessary.
>> 
>> While conceptually simple, I can't help but wonder if it doesn't make
>> more sense to simply switch application reads / writes from one
>> cluster to another based on which one is NOT currently building
>> indexes (but still have the raw data set replicate master-master
>> between them).
>> 
>> To be more clear, I'm proposing doing this:
>> 
>> 1.  Have two clusters, call them CLUSTER_1 and CLUSTER_2, and have the
>> raw data set replicated master-master between them.
>> 2.  if CLUSTER_1 is currently rebuilding indexes, redirect all
>> application traffic to CLUSTER_2 including reads from the indexes as
>> well as writes to the raw data set (and vise-versa).
>> 
>> I know I'm not addressing a lot of details here but I'm just curious
>> if anyone has ever implemented something along these lines.
>> 
>> The main advantage to what I'm proposing would be not having to copy
>> potentially massive indexes across the network but at the cost of
>> having to deal with having clients not always read from the same
>> cluster (seems doable though).
>> 
>> Any advice would be much appreciated!
>> 
>> Thanks
>> 



Re: Efficient read/write - Iterative M/R jobs

2012-07-23 Thread Jean-Daniel Cryans
> ... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_00_0,
> Status : FAILED
> java.lang.IllegalArgumentException: *Can't read partitions file*
> at
> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
> ...
>
> I followed this link, while googling for the solution :
> http://hbase.apache.org/book/trouble.mapreduce.html
> and it implies a misconfiguration concerning a fully distributed
> environment.
>
> I would like, therefore, to ask if it is even possible to bulk import data
> in a pseudo-distributed mode and if this is the case, does anyone have a
> guess about this error?

AFAIK you just can't use the local job tracker for this, so you do
need to start one.

J-D


Re: Efficient read/write - Iterative M/R jobs

2012-07-23 Thread Ioakim Perros

Thank you very much for responding :-)

I also found this one : http://www.deerwalk.com/bulk_importing_data , 
which seems very informative.


The thing is that I tried to create and run a simple (custom) bulk 
loading job and I tried to run it locally (in pseudo-distributed mode) - 
and the following error occurs:


... INFO mapred.JobClient: Task Id : 
attempt_201207232344_0001_m_00_0, Status : FAILED

java.lang.IllegalArgumentException: *Can't read partitions file*
at 
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111) 
...


I followed this link, while googling for the solution : 
http://hbase.apache.org/book/trouble.mapreduce.html
and it implies a misconfiguration concerning a fully distributed 
environment.


I would like, therefore, to ask if it is even possible to bulk import 
data in a pseudo-distributed mode and if this is the case, does anyone 
have a guess about this error?


Thanks in advance!
IP


On 07/23/2012 07:40 AM, Sonal Goyal wrote:

Hi,

You can check the bulk loading section at

http://hbase.apache.org/book/arch.bulk.load.html

Best Regards,
Sonal
Crux: Reporting for HBase 
Nube Technologies 







On Mon, Jul 23, 2012 at 6:15 AM, Ioakim Perros  wrote:


Hi,

Is there any efficient way (beyond the trivial using TableMapReduceUtil /
TableOutputFormat) to perform faster read and write operations to tables ?
Could anyone provide some example code of it ?

As of faster importing to table, I am aware of tools such as
completebulkload, but I would prefer triggering such a process through M/R
code, as I would like a whole table to be read and updated through
iterations of M/R jobs.

Thanks in advance!
IP





Re: Hbase bkup options

2012-07-23 Thread Michael Segel
There are a couple of nits...

1) Compression. This will help a bit when moving the files around. 

2) Data size.  You may have bandwidth issues.  Moving TBs of data over a 1GBe 
network can impact your cluster's performance.  (Even with compression)

Depending on your cluster(s) and infrastructure,  there is going to be a point 
where the cost of trying to back up to tape is going to exceed the cost of 
replicating to a second cluster. At the same time, you have to remember that 
restoring TBs of data will take time. 

How large a data set will vary by organization. Again, only you can determine 
the value of your data. 

If you are backing up to a secondary cluster ... you can use the replication 
feature in HBase. This would be a better fit if you are looking at backing up a 
large set of HBase tables. 


On Jul 23, 2012, at 10:33 AM, Amlan Roy wrote:

> Hi Michael,
> 
> Thanks a lot for the reply. What I want to achieve is, if my cluster goes
> down for some reason, I should be able to create a new cluster and should be
> able to import all the backed up data. As I want to store all the tables, I
> expect the data size to be huge (in order of Tera Bytes) and it will keep
> growing.
> 
> If I have understood correctly, you have suggested to run "export" to get
> the data into hdfs and then run "hadoop fs -copyToLocal" to get it into
> local file. If I take a back up of the files, is it possible to import that
> data to a new Hbase cluster?
> 
> Thanks and regards,
> Amlan
> 
> -Original Message-
> From: Michael Segel [mailto:michael_se...@hotmail.com] 
> Sent: Monday, July 23, 2012 8:19 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase bkup options
> 
> Amian, 
> 
> Like always the answer to your question is... it depends.
> 
> First, how much data are we talking about? 
> 
> What's the value of the underlying data? 
> 
> One possible scenario...
> You run a M/R job to copy data from the table to an HDFS file, that is then
> copied to attached storage on an edge node and then to tape. 
> Depending on how much data, how much disk is in the attached storage you may
> want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold
> copy on tape off to some offsite storage facility.
> 
> There are other options, but it all depends on what you want to achieve. 
> 
> With respect to the other tools...
> 
> You can export  (which is a m/r job) to a local directory, then use distcp
> to a different cluster.  hadoop fs -copyToLocal will let you copy off the
> cluster. 
> You could write your own code, but you don't get much gain over existing
> UNIX/Linux tools. 
> 
> 
> On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:
> 
>> Hi,
>> 
>> 
>> 
>> Is it feasible to do disk or tape backup for Hbase tables?
>> 
>> 
>> 
>> I have read about the tools like Export, CopyTable, Distcp. It seems like
>> they will require a separate HDFS cluster to do that.
>> 
>> 
>> 
>> Regards,
>> 
>> Amlan
>> 
> 
> 



Re: Insert blocked

2012-07-23 Thread Elliott Clark
HTable is not thread safe[1]. It's better to use HTablePool if you want to
share things across multiple threads.[2]

1 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
2
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html

On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia wrote:

> I am writing a stress tool to test my specific use case. In my current
> implementation HTable is a global static variable that I initialize just
> once and use it accross multiple threads. Is this ok?
>
> My row key consists of (timestamp - (timestamp % 1000)) and cols are
> counters. What I am seeing is that when I run my test after first row is
> created the application just hangs. I just wanted to check if there are
> obvious things that I should watch out for.
>
> I am currently testing few threads in eclipse, but I'll still try and
> generate stackTrace
>


Insert blocked

2012-07-23 Thread Mohit Anchlia
I am writing a stress tool to test my specific use case. In my current
implementation HTable is a global static variable that I initialize just
once and use it accross multiple threads. Is this ok?

My row key consists of (timestamp - (timestamp % 1000)) and cols are
counters. What I am seeing is that when I run my test after first row is
created the application just hangs. I just wanted to check if there are
obvious things that I should watch out for.

I am currently testing few threads in eclipse, but I'll still try and
generate stackTrace


Re: hbase threw NotServingRegionException

2012-07-23 Thread Elliott Clark
hbck should help expose more problems than a single scan would.  With that
said the logs are the best bet in understanding what is going on with the
cluster at the time of the issue.  You posted logs that seem to only really
contain GC info.  Do you have more information about the cluster state when
the exceptions were thrown ?  Do you have the logs from the master? Was the
RS responding at the time? Were there compactions or balancer actions
happening ?

On Mon, Jul 23, 2012 at 2:05 PM, Ey-Chih chow  wrote:

> Thanks.  But if we do a scan on the table via the hbase shell, data in the
> table did show up.
>
> Ey-Chih
>
> On Jul 23, 2012, at 1:10 PM, Mohammad Tariq wrote:
>
> > Hello sir,
> >
> > A possible reason could be, your client is contacting the given
> > regionserver, and that regionserver kept on rejecting the requests.
> > Are you sure your table and all regions online? Use hbck once and see
> > if you find anything interesting.
> >
> > Regards,
> >Mohammad Tariq
> >
> >
> > On Tue, Jul 24, 2012 at 1:26 AM, Ey-Chih chow  wrote:
> >> Sorry I pasted the wrong portion of the region server log.  The right
> portion should be as follows:
> >>
> >>
> >> ==
> >> 2012-07-22T00:48:57.147-0700: [GC [ParNew: 18863K->2112K(19136K),
> 0.0029870 secs] 57106K->42831K(97048K) icms_dc=0 , 0.0030480 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:09.475-0700: [GC [ParNew: 18686K->2112K(19136K),
> 0.0219960 secs] 59405K->43951K(97048K) icms_dc=0 , 0.0220530 secs] [Times:
> user=0.15 sys=0.00, real=0.02 secs]
> >> 2012-07-22T00:49:20.872-0700: [GC [ParNew: 19135K->2112K(19136K),
> 0.0091670 secs] 60974K->44659K(97048K) icms_dc=0 , 0.0092480 secs] [Times:
> user=0.06 sys=0.00, real=0.01 secs]
> >> 2012-07-22T00:49:22.285-0700: [GC [ParNew: 19136K->2112K(19136K),
> 0.0021870 secs] 61683K->46380K(97048K) icms_dc=0 , 0.0022480 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:24.999-0700: [GC [ParNew: 19136K->2087K(19136K),
> 0.0019340 secs] 63404K->47842K(97048K) icms_dc=0 , 0.0019910 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:26.618-0700: [GC [ParNew: 18002K->2112K(19136K),
> 0.0016160 secs] 63757K->48065K(97048K) icms_dc=0 , 0.0016780 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:28.089-0700: [GC [ParNew: 19136K->1413K(19136K),
> 0.0033140 secs] 65089K->48748K(97048K) icms_dc=0 , 0.0033810 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:29.352-0700: [GC [ParNew: 18423K->2112K(19136K),
> 0.0063630 secs] 65759K->50827K(97048K) icms_dc=0 , 0.0064210 secs] [Times:
> user=0.05 sys=0.00, real=0.01 secs]
> >> 2012-07-22T00:49:30.226-0700: [GC [ParNew: 19136K->1897K(19136K),
> 0.0021470 secs] 67851K->52058K(97048K) icms_dc=0 , 0.0022010 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:31.190-0700: [GC [ParNew: 18921K->1606K(19136K),
> 0.0013270 secs] 69082K->52336K(97048K) icms_dc=0 , 0.0013780 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:33.111-0700: [GC [ParNew: 18482K->2112K(19136K),
> 0.0017970 secs] 69212K->53496K(97048K) icms_dc=0 , 0.0018510 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:34.396-0700: [GC [ParNew: 19136K->1468K(19136K),
> 0.0120290 secs] 70520K->54718K(97048K) icms_dc=0 , 0.0120820 secs] [Times:
> user=0.07 sys=0.00, real=0.02 secs]
> >> 2012-07-22T00:49:35.294-0700: [GC [ParNew: 18181K->2112K(19136K),
> 0.0011290 secs] 71431K->55399K(97048K) icms_dc=0 , 0.0011840 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:39.165-0700: [GC [ParNew: 19003K->547K(19136K),
> 0.0130200 secs] 72291K->55364K(97048K) icms_dc=0 , 0.0130880 secs] [Times:
> user=0.08 sys=0.00, real=0.02 secs]
> >> 2012-07-22T00:49:40.642-0700: [GC [ParNew: 17571K->646K(19136K),
> 0.0006740 secs] 72388K->55463K(97048K) icms_dc=0 , 0.0007220 secs] [Times:
> user=0.00 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:41.541-0700: [GC [ParNew: 17670K->2112K(19136K),
> 0.0104430 secs] 72487K->58179K(97048K) icms_dc=0 , 0.0105000 secs] [Times:
> user=0.07 sys=0.00, real=0.01 secs]
> >> 2012-07-22T00:49:42.972-0700: [GC [ParNew: 19136K->868K(19136K),
> 0.0015360 secs] 75203K->58718K(97048K) icms_dc=0 , 0.0015880 secs] [Times:
> user=0.02 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:43.355-0700: [GC [ParNew: 17892K->2112K(19136K),
> 0.0017450 secs] 75742K->60552K(97048K) icms_dc=0 , 0.0018070 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:46.711-0700: [GC [ParNew: 18900K->644K(19136K),
> 0.0015250 secs] 77340K->60427K(97048K) icms_dc=0 , 0.0015760 secs] [Times:
> user=0.01 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:47.824-0700: [GC [ParNew: 17668K->1986K(19136K),
> 0.0015260 secs] 77451K->61943K(97048K) icms_dc=0 , 0.0015780 secs] [Times:
> user=0.00 sys=0.00, real=0.00 secs]
> >> 2012-07-22T00:49:51.552-0700: [GC [ParNew: 19010K->752K

Re: hbase threw NotServingRegionException

2012-07-23 Thread Ey-Chih chow
Thanks.  But if we do a scan on the table via the hbase shell, data in the 
table did show up.

Ey-Chih
 
On Jul 23, 2012, at 1:10 PM, Mohammad Tariq wrote:

> Hello sir,
> 
> A possible reason could be, your client is contacting the given
> regionserver, and that regionserver kept on rejecting the requests.
> Are you sure your table and all regions online? Use hbck once and see
> if you find anything interesting.
> 
> Regards,
>Mohammad Tariq
> 
> 
> On Tue, Jul 24, 2012 at 1:26 AM, Ey-Chih chow  wrote:
>> Sorry I pasted the wrong portion of the region server log.  The right 
>> portion should be as follows:
>> 
>> 
>> ==
>> 2012-07-22T00:48:57.147-0700: [GC [ParNew: 18863K->2112K(19136K), 0.0029870 
>> secs] 57106K->42831K(97048K) icms_dc=0 , 0.0030480 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:09.475-0700: [GC [ParNew: 18686K->2112K(19136K), 0.0219960 
>> secs] 59405K->43951K(97048K) icms_dc=0 , 0.0220530 secs] [Times: user=0.15 
>> sys=0.00, real=0.02 secs]
>> 2012-07-22T00:49:20.872-0700: [GC [ParNew: 19135K->2112K(19136K), 0.0091670 
>> secs] 60974K->44659K(97048K) icms_dc=0 , 0.0092480 secs] [Times: user=0.06 
>> sys=0.00, real=0.01 secs]
>> 2012-07-22T00:49:22.285-0700: [GC [ParNew: 19136K->2112K(19136K), 0.0021870 
>> secs] 61683K->46380K(97048K) icms_dc=0 , 0.0022480 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:24.999-0700: [GC [ParNew: 19136K->2087K(19136K), 0.0019340 
>> secs] 63404K->47842K(97048K) icms_dc=0 , 0.0019910 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:26.618-0700: [GC [ParNew: 18002K->2112K(19136K), 0.0016160 
>> secs] 63757K->48065K(97048K) icms_dc=0 , 0.0016780 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:28.089-0700: [GC [ParNew: 19136K->1413K(19136K), 0.0033140 
>> secs] 65089K->48748K(97048K) icms_dc=0 , 0.0033810 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:29.352-0700: [GC [ParNew: 18423K->2112K(19136K), 0.0063630 
>> secs] 65759K->50827K(97048K) icms_dc=0 , 0.0064210 secs] [Times: user=0.05 
>> sys=0.00, real=0.01 secs]
>> 2012-07-22T00:49:30.226-0700: [GC [ParNew: 19136K->1897K(19136K), 0.0021470 
>> secs] 67851K->52058K(97048K) icms_dc=0 , 0.0022010 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:31.190-0700: [GC [ParNew: 18921K->1606K(19136K), 0.0013270 
>> secs] 69082K->52336K(97048K) icms_dc=0 , 0.0013780 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:33.111-0700: [GC [ParNew: 18482K->2112K(19136K), 0.0017970 
>> secs] 69212K->53496K(97048K) icms_dc=0 , 0.0018510 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:34.396-0700: [GC [ParNew: 19136K->1468K(19136K), 0.0120290 
>> secs] 70520K->54718K(97048K) icms_dc=0 , 0.0120820 secs] [Times: user=0.07 
>> sys=0.00, real=0.02 secs]
>> 2012-07-22T00:49:35.294-0700: [GC [ParNew: 18181K->2112K(19136K), 0.0011290 
>> secs] 71431K->55399K(97048K) icms_dc=0 , 0.0011840 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:39.165-0700: [GC [ParNew: 19003K->547K(19136K), 0.0130200 
>> secs] 72291K->55364K(97048K) icms_dc=0 , 0.0130880 secs] [Times: user=0.08 
>> sys=0.00, real=0.02 secs]
>> 2012-07-22T00:49:40.642-0700: [GC [ParNew: 17571K->646K(19136K), 0.0006740 
>> secs] 72388K->55463K(97048K) icms_dc=0 , 0.0007220 secs] [Times: user=0.00 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:41.541-0700: [GC [ParNew: 17670K->2112K(19136K), 0.0104430 
>> secs] 72487K->58179K(97048K) icms_dc=0 , 0.0105000 secs] [Times: user=0.07 
>> sys=0.00, real=0.01 secs]
>> 2012-07-22T00:49:42.972-0700: [GC [ParNew: 19136K->868K(19136K), 0.0015360 
>> secs] 75203K->58718K(97048K) icms_dc=0 , 0.0015880 secs] [Times: user=0.02 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:43.355-0700: [GC [ParNew: 17892K->2112K(19136K), 0.0017450 
>> secs] 75742K->60552K(97048K) icms_dc=0 , 0.0018070 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:46.711-0700: [GC [ParNew: 18900K->644K(19136K), 0.0015250 
>> secs] 77340K->60427K(97048K) icms_dc=0 , 0.0015760 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:47.824-0700: [GC [ParNew: 17668K->1986K(19136K), 0.0015260 
>> secs] 77451K->61943K(97048K) icms_dc=0 , 0.0015780 secs] [Times: user=0.00 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:51.552-0700: [GC [ParNew: 19010K->752K(19136K), 0.0011550 
>> secs] 78967K->60715K(97048K) icms_dc=0 , 0.0012500 secs] [Times: user=0.00 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:49:58.460-0700: [GC [ParNew: 17776K->2055K(19136K), 0.0010860 
>> secs] 77739K->62018K(97048K) icms_dc=0 , 0.0011440 secs] [Times: user=0.01 
>> sys=0.00, real=0.00 secs]
>> 2012-07-22T00:50:03.692-0700: [GC [ParNew: 19079K->2112K(19136K), 0.0257530 
>> secs] 79042K->63417K(97048K) icms_dc=0 , 0.0258130 secs] [Times: user=0.18 
>> sys=0.00, real=0.02 secs]
>> 20

Re: drop table

2012-07-23 Thread Mohit Anchlia
Thanks everyone for your help

On Mon, Jul 23, 2012 at 1:40 PM, Mohammad Tariq  wrote:

> Also, we don't have to worry about compaction under normal conditions.
> When something is written to HBase, it is first written to an
> in-memory store (memstore), once this memstore reaches a certain size,
> it is flushed to disk into a store file (everything is also written
> immediately to a log file for durability). The store files created on
> disk are immutable. Sometimes the store files are merged together,
> this is done by a process called compaction.
>
> Regards,
> Mohammad Tariq
>
>
> On Tue, Jul 24, 2012 at 2:00 AM, Mohammad Tariq 
> wrote:
> > The HBase processes exposes a web-based user interface (in short UI),
> > which you can use to gain insight into the cluster's state, as well as
> > the tables it hosts. Just point your web browser to
> > "http://hmaster:60010";. Although majority of the functionality is
> > read-only, but there are a few selected operation you can trigger
> > through the UI(like splitting and compaction).
> >
> > Regards,
> > Mohammad Tariq
> >
> >
> > On Tue, Jul 24, 2012 at 1:56 AM, Rob Roland 
> wrote:
> >> You don't have to run the major compaction - the shell is doing that for
> >> you.  You must disable the table first, like:
> >>
> >> disable 'session_timeline'
> >> drop 'session_timeline'
> >>
> >> See the admin.rb file:
> >>
> >> def drop(table_name)
> >>   tableExists(table_name)
> >>   raise ArgumentError, "Table #{table_name} is enabled. Disable it
> >> first.'" if enabled?(table_name)
> >>
> >>   @admin.deleteTable(table_name)
> >>   flush(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME)
> >>   major_compact(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME)
> >> end
> >>
> >> On Mon, Jul 23, 2012 at 1:22 PM, Mohit Anchlia  >wrote:
> >>
> >>> Thanks! but I am still trying to understand these 2 questions:
> >>>
> >>> 1. How to see if this table has more than one region?
> >>> 2. And why do I need to run major compact if I have more than one
> region?
> >>>
> >>> On Mon, Jul 23, 2012 at 1:14 PM, Mohammad Tariq 
> >>> wrote:
> >>>
> >>> > Hi Mohit,
> >>> >
> >>> >   A table must be disabled first in order to get deleted.
> >>> > Regards,
> >>> > Mohammad Tariq
> >>> >
> >>> >
> >>> > On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia <
> mohitanch...@gmail.com>
> >>> > wrote:
> >>> > > I am trying to drop one of the tables but on the shell I get run
> >>> > > major_compact. I have couple of questions:
> >>> > >
> >>> > > 1. How to see if this table has more than one region?
> >>> > > 2. And why do I need to run major compact
> >>> > >
> >>> > >
> >>> > > hbase(main):010:0* drop 'SESSION_TIMELINE'
> >>> > >
> >>> > > ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'
> >>> > >
> >>> > > Here is some help for this command:
> >>> > >
> >>> > > Drop the named table. Table must first be disabled. If table has
> >>> > >
> >>> > > more than one region, run a major compaction on .META.:
> >>> > >
> >>> > > hbase> major_compact ".META."
> >>> >
> >>>
>


Re: drop table

2012-07-23 Thread Mohammad Tariq
Also, we don't have to worry about compaction under normal conditions.
When something is written to HBase, it is first written to an
in-memory store (memstore), once this memstore reaches a certain size,
it is flushed to disk into a store file (everything is also written
immediately to a log file for durability). The store files created on
disk are immutable. Sometimes the store files are merged together,
this is done by a process called compaction.

Regards,
Mohammad Tariq


On Tue, Jul 24, 2012 at 2:00 AM, Mohammad Tariq  wrote:
> The HBase processes exposes a web-based user interface (in short UI),
> which you can use to gain insight into the cluster's state, as well as
> the tables it hosts. Just point your web browser to
> "http://hmaster:60010";. Although majority of the functionality is
> read-only, but there are a few selected operation you can trigger
> through the UI(like splitting and compaction).
>
> Regards,
> Mohammad Tariq
>
>
> On Tue, Jul 24, 2012 at 1:56 AM, Rob Roland  wrote:
>> You don't have to run the major compaction - the shell is doing that for
>> you.  You must disable the table first, like:
>>
>> disable 'session_timeline'
>> drop 'session_timeline'
>>
>> See the admin.rb file:
>>
>> def drop(table_name)
>>   tableExists(table_name)
>>   raise ArgumentError, "Table #{table_name} is enabled. Disable it
>> first.'" if enabled?(table_name)
>>
>>   @admin.deleteTable(table_name)
>>   flush(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME)
>>   major_compact(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME)
>> end
>>
>> On Mon, Jul 23, 2012 at 1:22 PM, Mohit Anchlia wrote:
>>
>>> Thanks! but I am still trying to understand these 2 questions:
>>>
>>> 1. How to see if this table has more than one region?
>>> 2. And why do I need to run major compact if I have more than one region?
>>>
>>> On Mon, Jul 23, 2012 at 1:14 PM, Mohammad Tariq 
>>> wrote:
>>>
>>> > Hi Mohit,
>>> >
>>> >   A table must be disabled first in order to get deleted.
>>> > Regards,
>>> > Mohammad Tariq
>>> >
>>> >
>>> > On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia 
>>> > wrote:
>>> > > I am trying to drop one of the tables but on the shell I get run
>>> > > major_compact. I have couple of questions:
>>> > >
>>> > > 1. How to see if this table has more than one region?
>>> > > 2. And why do I need to run major compact
>>> > >
>>> > >
>>> > > hbase(main):010:0* drop 'SESSION_TIMELINE'
>>> > >
>>> > > ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'
>>> > >
>>> > > Here is some help for this command:
>>> > >
>>> > > Drop the named table. Table must first be disabled. If table has
>>> > >
>>> > > more than one region, run a major compaction on .META.:
>>> > >
>>> > > hbase> major_compact ".META."
>>> >
>>>


Re: drop table

2012-07-23 Thread Mohammad Tariq
The HBase processes exposes a web-based user interface (in short UI),
which you can use to gain insight into the cluster's state, as well as
the tables it hosts. Just point your web browser to
"http://hmaster:60010";. Although majority of the functionality is
read-only, but there are a few selected operation you can trigger
through the UI(like splitting and compaction).

Regards,
Mohammad Tariq


On Tue, Jul 24, 2012 at 1:56 AM, Rob Roland  wrote:
> You don't have to run the major compaction - the shell is doing that for
> you.  You must disable the table first, like:
>
> disable 'session_timeline'
> drop 'session_timeline'
>
> See the admin.rb file:
>
> def drop(table_name)
>   tableExists(table_name)
>   raise ArgumentError, "Table #{table_name} is enabled. Disable it
> first.'" if enabled?(table_name)
>
>   @admin.deleteTable(table_name)
>   flush(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME)
>   major_compact(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME)
> end
>
> On Mon, Jul 23, 2012 at 1:22 PM, Mohit Anchlia wrote:
>
>> Thanks! but I am still trying to understand these 2 questions:
>>
>> 1. How to see if this table has more than one region?
>> 2. And why do I need to run major compact if I have more than one region?
>>
>> On Mon, Jul 23, 2012 at 1:14 PM, Mohammad Tariq 
>> wrote:
>>
>> > Hi Mohit,
>> >
>> >   A table must be disabled first in order to get deleted.
>> > Regards,
>> > Mohammad Tariq
>> >
>> >
>> > On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia 
>> > wrote:
>> > > I am trying to drop one of the tables but on the shell I get run
>> > > major_compact. I have couple of questions:
>> > >
>> > > 1. How to see if this table has more than one region?
>> > > 2. And why do I need to run major compact
>> > >
>> > >
>> > > hbase(main):010:0* drop 'SESSION_TIMELINE'
>> > >
>> > > ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'
>> > >
>> > > Here is some help for this command:
>> > >
>> > > Drop the named table. Table must first be disabled. If table has
>> > >
>> > > more than one region, run a major compaction on .META.:
>> > >
>> > > hbase> major_compact ".META."
>> >
>>


Re: drop table

2012-07-23 Thread Rob Roland
You don't have to run the major compaction - the shell is doing that for
you.  You must disable the table first, like:

disable 'session_timeline'
drop 'session_timeline'

See the admin.rb file:

def drop(table_name)
  tableExists(table_name)
  raise ArgumentError, "Table #{table_name} is enabled. Disable it
first.'" if enabled?(table_name)

  @admin.deleteTable(table_name)
  flush(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME)
  major_compact(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME)
end

On Mon, Jul 23, 2012 at 1:22 PM, Mohit Anchlia wrote:

> Thanks! but I am still trying to understand these 2 questions:
>
> 1. How to see if this table has more than one region?
> 2. And why do I need to run major compact if I have more than one region?
>
> On Mon, Jul 23, 2012 at 1:14 PM, Mohammad Tariq 
> wrote:
>
> > Hi Mohit,
> >
> >   A table must be disabled first in order to get deleted.
> > Regards,
> > Mohammad Tariq
> >
> >
> > On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia 
> > wrote:
> > > I am trying to drop one of the tables but on the shell I get run
> > > major_compact. I have couple of questions:
> > >
> > > 1. How to see if this table has more than one region?
> > > 2. And why do I need to run major compact
> > >
> > >
> > > hbase(main):010:0* drop 'SESSION_TIMELINE'
> > >
> > > ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'
> > >
> > > Here is some help for this command:
> > >
> > > Drop the named table. Table must first be disabled. If table has
> > >
> > > more than one region, run a major compaction on .META.:
> > >
> > > hbase> major_compact ".META."
> >
>


Re: drop table

2012-07-23 Thread Jean-Marc Spaggiari
1) http://URL_OF_YOUR_MASTER:60010/table.jsp?name=NAME_OF_YOUR_TABLE
will should you all the regions of your table.
2) I have no clue ;)

2012/7/23, Mohit Anchlia :
> Thanks! but I am still trying to understand these 2 questions:
>
> 1. How to see if this table has more than one region?
> 2. And why do I need to run major compact if I have more than one region?
>
> On Mon, Jul 23, 2012 at 1:14 PM, Mohammad Tariq  wrote:
>
>> Hi Mohit,
>>
>>   A table must be disabled first in order to get deleted.
>> Regards,
>> Mohammad Tariq
>>
>>
>> On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia 
>> wrote:
>> > I am trying to drop one of the tables but on the shell I get run
>> > major_compact. I have couple of questions:
>> >
>> > 1. How to see if this table has more than one region?
>> > 2. And why do I need to run major compact
>> >
>> >
>> > hbase(main):010:0* drop 'SESSION_TIMELINE'
>> >
>> > ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'
>> >
>> > Here is some help for this command:
>> >
>> > Drop the named table. Table must first be disabled. If table has
>> >
>> > more than one region, run a major compaction on .META.:
>> >
>> > hbase> major_compact ".META."
>>
>


Re: drop table

2012-07-23 Thread Mohit Anchlia
Thanks! but I am still trying to understand these 2 questions:

1. How to see if this table has more than one region?
2. And why do I need to run major compact if I have more than one region?

On Mon, Jul 23, 2012 at 1:14 PM, Mohammad Tariq  wrote:

> Hi Mohit,
>
>   A table must be disabled first in order to get deleted.
> Regards,
> Mohammad Tariq
>
>
> On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia 
> wrote:
> > I am trying to drop one of the tables but on the shell I get run
> > major_compact. I have couple of questions:
> >
> > 1. How to see if this table has more than one region?
> > 2. And why do I need to run major compact
> >
> >
> > hbase(main):010:0* drop 'SESSION_TIMELINE'
> >
> > ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'
> >
> > Here is some help for this command:
> >
> > Drop the named table. Table must first be disabled. If table has
> >
> > more than one region, run a major compaction on .META.:
> >
> > hbase> major_compact ".META."
>


Re: drop table

2012-07-23 Thread Mohammad Tariq
Hi Mohit,

  A table must be disabled first in order to get deleted.
Regards,
Mohammad Tariq


On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia  wrote:
> I am trying to drop one of the tables but on the shell I get run
> major_compact. I have couple of questions:
>
> 1. How to see if this table has more than one region?
> 2. And why do I need to run major compact
>
>
> hbase(main):010:0* drop 'SESSION_TIMELINE'
>
> ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'
>
> Here is some help for this command:
>
> Drop the named table. Table must first be disabled. If table has
>
> more than one region, run a major compaction on .META.:
>
> hbase> major_compact ".META."


Re: drop table

2012-07-23 Thread Jean-Marc Spaggiari
Hi Mohit,

You have the respons int your question ;)

Simple type:

major_compact ".META."

On the shell.

To drop your table, just do:
disable 'SESSION_TIMELINE'
drop 'SESSION_TIMELINE'


--
JM

2012/7/23, Mohit Anchlia :
> I am trying to drop one of the tables but on the shell I get run
> major_compact. I have couple of questions:
>
> 1. How to see if this table has more than one region?
> 2. And why do I need to run major compact
>
>
> hbase(main):010:0* drop 'SESSION_TIMELINE'
>
> ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'
>
> Here is some help for this command:
>
> Drop the named table. Table must first be disabled. If table has
>
> more than one region, run a major compaction on .META.:
>
> hbase> major_compact ".META."
>


Re: hbase threw NotServingRegionException

2012-07-23 Thread Mohammad Tariq
Hello sir,

 A possible reason could be, your client is contacting the given
regionserver, and that regionserver kept on rejecting the requests.
Are you sure your table and all regions online? Use hbck once and see
if you find anything interesting.

Regards,
Mohammad Tariq


On Tue, Jul 24, 2012 at 1:26 AM, Ey-Chih chow  wrote:
> Sorry I pasted the wrong portion of the region server log.  The right portion 
> should be as follows:
>
>
> ==
> 2012-07-22T00:48:57.147-0700: [GC [ParNew: 18863K->2112K(19136K), 0.0029870 
> secs] 57106K->42831K(97048K) icms_dc=0 , 0.0030480 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:09.475-0700: [GC [ParNew: 18686K->2112K(19136K), 0.0219960 
> secs] 59405K->43951K(97048K) icms_dc=0 , 0.0220530 secs] [Times: user=0.15 
> sys=0.00, real=0.02 secs]
> 2012-07-22T00:49:20.872-0700: [GC [ParNew: 19135K->2112K(19136K), 0.0091670 
> secs] 60974K->44659K(97048K) icms_dc=0 , 0.0092480 secs] [Times: user=0.06 
> sys=0.00, real=0.01 secs]
> 2012-07-22T00:49:22.285-0700: [GC [ParNew: 19136K->2112K(19136K), 0.0021870 
> secs] 61683K->46380K(97048K) icms_dc=0 , 0.0022480 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:24.999-0700: [GC [ParNew: 19136K->2087K(19136K), 0.0019340 
> secs] 63404K->47842K(97048K) icms_dc=0 , 0.0019910 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:26.618-0700: [GC [ParNew: 18002K->2112K(19136K), 0.0016160 
> secs] 63757K->48065K(97048K) icms_dc=0 , 0.0016780 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:28.089-0700: [GC [ParNew: 19136K->1413K(19136K), 0.0033140 
> secs] 65089K->48748K(97048K) icms_dc=0 , 0.0033810 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:29.352-0700: [GC [ParNew: 18423K->2112K(19136K), 0.0063630 
> secs] 65759K->50827K(97048K) icms_dc=0 , 0.0064210 secs] [Times: user=0.05 
> sys=0.00, real=0.01 secs]
> 2012-07-22T00:49:30.226-0700: [GC [ParNew: 19136K->1897K(19136K), 0.0021470 
> secs] 67851K->52058K(97048K) icms_dc=0 , 0.0022010 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:31.190-0700: [GC [ParNew: 18921K->1606K(19136K), 0.0013270 
> secs] 69082K->52336K(97048K) icms_dc=0 , 0.0013780 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:33.111-0700: [GC [ParNew: 18482K->2112K(19136K), 0.0017970 
> secs] 69212K->53496K(97048K) icms_dc=0 , 0.0018510 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:34.396-0700: [GC [ParNew: 19136K->1468K(19136K), 0.0120290 
> secs] 70520K->54718K(97048K) icms_dc=0 , 0.0120820 secs] [Times: user=0.07 
> sys=0.00, real=0.02 secs]
> 2012-07-22T00:49:35.294-0700: [GC [ParNew: 18181K->2112K(19136K), 0.0011290 
> secs] 71431K->55399K(97048K) icms_dc=0 , 0.0011840 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:39.165-0700: [GC [ParNew: 19003K->547K(19136K), 0.0130200 
> secs] 72291K->55364K(97048K) icms_dc=0 , 0.0130880 secs] [Times: user=0.08 
> sys=0.00, real=0.02 secs]
> 2012-07-22T00:49:40.642-0700: [GC [ParNew: 17571K->646K(19136K), 0.0006740 
> secs] 72388K->55463K(97048K) icms_dc=0 , 0.0007220 secs] [Times: user=0.00 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:41.541-0700: [GC [ParNew: 17670K->2112K(19136K), 0.0104430 
> secs] 72487K->58179K(97048K) icms_dc=0 , 0.0105000 secs] [Times: user=0.07 
> sys=0.00, real=0.01 secs]
> 2012-07-22T00:49:42.972-0700: [GC [ParNew: 19136K->868K(19136K), 0.0015360 
> secs] 75203K->58718K(97048K) icms_dc=0 , 0.0015880 secs] [Times: user=0.02 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:43.355-0700: [GC [ParNew: 17892K->2112K(19136K), 0.0017450 
> secs] 75742K->60552K(97048K) icms_dc=0 , 0.0018070 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:46.711-0700: [GC [ParNew: 18900K->644K(19136K), 0.0015250 
> secs] 77340K->60427K(97048K) icms_dc=0 , 0.0015760 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:47.824-0700: [GC [ParNew: 17668K->1986K(19136K), 0.0015260 
> secs] 77451K->61943K(97048K) icms_dc=0 , 0.0015780 secs] [Times: user=0.00 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:51.552-0700: [GC [ParNew: 19010K->752K(19136K), 0.0011550 
> secs] 78967K->60715K(97048K) icms_dc=0 , 0.0012500 secs] [Times: user=0.00 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:49:58.460-0700: [GC [ParNew: 17776K->2055K(19136K), 0.0010860 
> secs] 77739K->62018K(97048K) icms_dc=0 , 0.0011440 secs] [Times: user=0.01 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:50:03.692-0700: [GC [ParNew: 19079K->2112K(19136K), 0.0257530 
> secs] 79042K->63417K(97048K) icms_dc=0 , 0.0258130 secs] [Times: user=0.18 
> sys=0.00, real=0.02 secs]
> 2012-07-22T00:50:07.522-0700: [GC [ParNew: 19136K->2112K(19136K), 0.0028960 
> secs] 80441K->67709K(97048K) icms_dc=0 , 0.0029600 secs] [Times: user=0.02 
> sys=0.00, real=0.00 secs]
> 2012-07-22T00:50:30.491-0700: [GC [ParNew: 19136K->2112K(19136K), 0.002391

drop table

2012-07-23 Thread Mohit Anchlia
I am trying to drop one of the tables but on the shell I get run
major_compact. I have couple of questions:

1. How to see if this table has more than one region?
2. And why do I need to run major compact


hbase(main):010:0* drop 'SESSION_TIMELINE'

ERROR: Table SESSION_TIMELINE is enabled. Disable it first.'

Here is some help for this command:

Drop the named table. Table must first be disabled. If table has

more than one region, run a major compaction on .META.:

hbase> major_compact ".META."


Re: Index building process design

2012-07-23 Thread Eric Czech
Hmm, maybe that was too long -- I'll keep this one shorter I swear:

Would it make sense to build indexes with two Hadoop/Hbase clusters by
simply pointing client traffic at the cluster that is currently NOT
building indexes via M/R jobs?  Basically, has anyone ever tried switching
back and forth between clusters instead of building indexes on one cluster
and copying them to another?


On Thu, Jul 12, 2012 at 1:26 AM, Eric Czech  wrote:

> Hi everyone,
>
> I have a general design question (apologies in advanced if this has
> been asked before).
>
> I'd like to build indexes off of a raw data store and I'm trying to
> think of the best way to control processing so some part of my cluster
> can still serve reads and writes without being affected heavily by the
> index building process.
>
> I get the sense that the typical process for this involves something
> like the following:
>
> 1.  Dedicate one cluster for index building (let's call it the INDEX
> cluster) and one for serving application reads on the indexes as well
> as writes/reads on the raw data set (let's call it the MAIN cluster).
> 2.  Have the raw data set replicated from the MAIN cluster to the INDEX
> cluster.
> 3.  On the INDEX cluster, use the replicated raw data to constantly
> rebuild indexes and copy the new versions to the MAIN cluster,
> overwriting the old versions if necessary.
>
> While conceptually simple, I can't help but wonder if it doesn't make
> more sense to simply switch application reads / writes from one
> cluster to another based on which one is NOT currently building
> indexes (but still have the raw data set replicate master-master
> between them).
>
> To be more clear, I'm proposing doing this:
>
> 1.  Have two clusters, call them CLUSTER_1 and CLUSTER_2, and have the
> raw data set replicated master-master between them.
> 2.  if CLUSTER_1 is currently rebuilding indexes, redirect all
> application traffic to CLUSTER_2 including reads from the indexes as
> well as writes to the raw data set (and vise-versa).
>
> I know I'm not addressing a lot of details here but I'm just curious
> if anyone has ever implemented something along these lines.
>
> The main advantage to what I'm proposing would be not having to copy
> potentially massive indexes across the network but at the cost of
> having to deal with having clients not always read from the same
> cluster (seems doable though).
>
> Any advice would be much appreciated!
>
> Thanks
>


Re: hbase threw NotServingRegionException

2012-07-23 Thread Ey-Chih chow
Sorry I pasted the wrong portion of the region server log.  The right portion 
should be as follows:


==
2012-07-22T00:48:57.147-0700: [GC [ParNew: 18863K->2112K(19136K), 0.0029870 
secs] 57106K->42831K(97048K) icms_dc=0 , 0.0030480 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:09.475-0700: [GC [ParNew: 18686K->2112K(19136K), 0.0219960 
secs] 59405K->43951K(97048K) icms_dc=0 , 0.0220530 secs] [Times: user=0.15 
sys=0.00, real=0.02 secs] 
2012-07-22T00:49:20.872-0700: [GC [ParNew: 19135K->2112K(19136K), 0.0091670 
secs] 60974K->44659K(97048K) icms_dc=0 , 0.0092480 secs] [Times: user=0.06 
sys=0.00, real=0.01 secs] 
2012-07-22T00:49:22.285-0700: [GC [ParNew: 19136K->2112K(19136K), 0.0021870 
secs] 61683K->46380K(97048K) icms_dc=0 , 0.0022480 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:24.999-0700: [GC [ParNew: 19136K->2087K(19136K), 0.0019340 
secs] 63404K->47842K(97048K) icms_dc=0 , 0.0019910 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:26.618-0700: [GC [ParNew: 18002K->2112K(19136K), 0.0016160 
secs] 63757K->48065K(97048K) icms_dc=0 , 0.0016780 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:28.089-0700: [GC [ParNew: 19136K->1413K(19136K), 0.0033140 
secs] 65089K->48748K(97048K) icms_dc=0 , 0.0033810 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:29.352-0700: [GC [ParNew: 18423K->2112K(19136K), 0.0063630 
secs] 65759K->50827K(97048K) icms_dc=0 , 0.0064210 secs] [Times: user=0.05 
sys=0.00, real=0.01 secs] 
2012-07-22T00:49:30.226-0700: [GC [ParNew: 19136K->1897K(19136K), 0.0021470 
secs] 67851K->52058K(97048K) icms_dc=0 , 0.0022010 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:31.190-0700: [GC [ParNew: 18921K->1606K(19136K), 0.0013270 
secs] 69082K->52336K(97048K) icms_dc=0 , 0.0013780 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:33.111-0700: [GC [ParNew: 18482K->2112K(19136K), 0.0017970 
secs] 69212K->53496K(97048K) icms_dc=0 , 0.0018510 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:34.396-0700: [GC [ParNew: 19136K->1468K(19136K), 0.0120290 
secs] 70520K->54718K(97048K) icms_dc=0 , 0.0120820 secs] [Times: user=0.07 
sys=0.00, real=0.02 secs] 
2012-07-22T00:49:35.294-0700: [GC [ParNew: 18181K->2112K(19136K), 0.0011290 
secs] 71431K->55399K(97048K) icms_dc=0 , 0.0011840 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:39.165-0700: [GC [ParNew: 19003K->547K(19136K), 0.0130200 
secs] 72291K->55364K(97048K) icms_dc=0 , 0.0130880 secs] [Times: user=0.08 
sys=0.00, real=0.02 secs] 
2012-07-22T00:49:40.642-0700: [GC [ParNew: 17571K->646K(19136K), 0.0006740 
secs] 72388K->55463K(97048K) icms_dc=0 , 0.0007220 secs] [Times: user=0.00 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:41.541-0700: [GC [ParNew: 17670K->2112K(19136K), 0.0104430 
secs] 72487K->58179K(97048K) icms_dc=0 , 0.0105000 secs] [Times: user=0.07 
sys=0.00, real=0.01 secs] 
2012-07-22T00:49:42.972-0700: [GC [ParNew: 19136K->868K(19136K), 0.0015360 
secs] 75203K->58718K(97048K) icms_dc=0 , 0.0015880 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:43.355-0700: [GC [ParNew: 17892K->2112K(19136K), 0.0017450 
secs] 75742K->60552K(97048K) icms_dc=0 , 0.0018070 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:46.711-0700: [GC [ParNew: 18900K->644K(19136K), 0.0015250 
secs] 77340K->60427K(97048K) icms_dc=0 , 0.0015760 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:47.824-0700: [GC [ParNew: 17668K->1986K(19136K), 0.0015260 
secs] 77451K->61943K(97048K) icms_dc=0 , 0.0015780 secs] [Times: user=0.00 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:51.552-0700: [GC [ParNew: 19010K->752K(19136K), 0.0011550 
secs] 78967K->60715K(97048K) icms_dc=0 , 0.0012500 secs] [Times: user=0.00 
sys=0.00, real=0.00 secs] 
2012-07-22T00:49:58.460-0700: [GC [ParNew: 17776K->2055K(19136K), 0.0010860 
secs] 77739K->62018K(97048K) icms_dc=0 , 0.0011440 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:50:03.692-0700: [GC [ParNew: 19079K->2112K(19136K), 0.0257530 
secs] 79042K->63417K(97048K) icms_dc=0 , 0.0258130 secs] [Times: user=0.18 
sys=0.00, real=0.02 secs] 
2012-07-22T00:50:07.522-0700: [GC [ParNew: 19136K->2112K(19136K), 0.0028960 
secs] 80441K->67709K(97048K) icms_dc=0 , 0.0029600 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs] 
2012-07-22T00:50:30.491-0700: [GC [ParNew: 19136K->2112K(19136K), 0.0023910 
secs] 84733K->70543K(97048K) icms_dc=0 , 0.0024560 secs] [Times: user=0.02 
sys=0.01, real=0.00 secs] 
2012-07-22T00:50:39.242-0700: [GC [ParNew: 18974K->1947K(19136K), 0.0009880 
secs] 87406K->70737K(97048K) icms_dc=0 , 0.0010440 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T00:50:51.654-0700: [GC [ParNew: 18765K->2112K(19136K), 0.0017700 
secs] 87555K->71087K(97048K) icms_dc=0 , 0.0018250 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
20

hbase threw NotServingRegionException

2012-07-23 Thread Ey-Chih chow
Hi,

We got a Map/Reduce job that threw NotServingRegionException when the reducer 
was about to insert data into a Hbase table. The error message is as follows.  
I also copied the corresponding region server log at the end of the message.  
Also, we browsed through the hbase administrative page of the table. We 
couldn't see the list of Table Regions (the table is pre-splitted.) Is there 
anybody who knows what's happening?  Thanks.

Ey-Chih Chow

= log from the map/reduce job =

12:59:06 [dba@dba@h01 1-exec][INFO] 12/07/22 00:50:35 INFO mapred.JobClient: 
Task Id : attempt_201206142240_19696_r_05_0, Status : FAILED 
12:59:06 [dba@dba@h01 1-exec][INFO] 
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 110 
actions: NotServingRegionException: 110 times, servers with issues: 
h07.mtv.byah.net:60020, 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1227)
 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1241)
 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:826) 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:682) 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
org.apache.hadoop.hbase.client.HTable.put(HTable.java:667) 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:127)
 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:82)
 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
com.booyah.analytics.mapreduce.AvroHbaseTableOutputFormat$AvroHbaseTableRecordWriter.write(AvroHbaseTableOutputFormat.java:75)
 
12:59:06 [dba@dba@h01 1-exec][INFO] at 
com.booyah.analytics.mapred.AdaptAvroHbaseTableOu 
12:59:06 [dba@dba@h01 1-exec][INFO] attempt_201206142240_19696_r_05_0: INFO 
22-07 00:45:54,927 - Loaded the native-hadoop library

=




log from the corresponding region server==

2012-07-22T09:08:25.833-0700: [GC [ParNew: 136739K->138K(153344K), 0.0027890 
secs] 316329K->179757K(1621376K) icms_dc=0 , 0.0028470 secs] [Times: user=0.03 
sys=0.00, real=0.01 secs] 
2012-07-22T09:25:00.822-0700: [GC [ParNew: 136458K->82K(153344K), 0.0028930 
secs] 316077K->179701K(1621376K) icms_dc=0 , 0.0029500 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs] 
2012-07-22T09:41:35.638-0700: [GC [ParNew: 136402K->49K(153344K), 0.0030770 
secs] 316021K->179668K(1621376K) icms_dc=0 , 0.0031310 secs] [Times: user=0.02 
sys=0.00, real=0.01 secs] 
2012-07-22T09:58:10.796-0700: [GC [ParNew: 136351K->44K(153344K), 0.0028190 
secs] 315970K->179663K(1621376K) icms_dc=0 , 0.0028750 secs] [Times: user=0.03 
sys=0.00, real=0.01 secs] 
2012-07-22T10:14:45.638-0700: [GC [ParNew: 136364K->66K(153344K), 0.0031410 
secs] 315983K->179694K(1621376K) icms_dc=0 , 0.0031960 secs] [Times: user=0.02 
sys=0.00, real=0.01 secs] 
2012-07-22T10:31:15.761-0700: [GC [ParNew: 136346K->52K(153344K), 0.0029310 
secs] 315974K->179680K(1621376K) icms_dc=0 , 0.0029870 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs] 
2012-07-22T10:47:48.745-0700: [GC [ParNew: 136341K->37K(153344K), 0.0031490 
secs] 315969K->179665K(1621376K) icms_dc=0 , 0.0032070 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs] 
2012-07-22T11:04:25.008-0700: [GC [ParNew: 136357K->39K(153344K), 0.0027710 
secs] 315985K->179667K(1621376K) icms_dc=0 , 0.0028260 secs] [Times: user=0.01 
sys=0.00, real=0.00 secs] 
2012-07-22T11:20:55.715-0700: [GC [ParNew: 136337K->39K(153344K), 0.0032670 
secs] 315965K->179667K(1621376K) icms_dc=0 , 0.0033210 secs] [Times: user=0.03 
sys=0.00, real=0.00 secs] 
2012-07-22T11:37:28.701-0700: [GC [ParNew: 136327K->39K(153344K), 0.0027510 
secs] 315955K->179667K(1621376K) icms_dc=0 , 0.0028070 secs] [Times: user=0.02 
sys=0.00, real=0.01 secs] 
2012-07-22T11:54:02.688-0700: [GC [ParNew: 136342K->39K(153344K), 0.0033410 
secs] 315971K->179667K(1621376K) icms_dc=0 , 0.0033980 secs] [Times: user=0.03 
sys=0.00, real=0.00 secs] 
2012-07-22T12:10:35.639-0700: [GC [ParNew: 136359K->39K(153344K), 0.0026440 
secs] 315987K->179667K(1621376K) icms_dc=0 , 0.0027000 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs] 
2012-07-22T12:27:05.649-0700: [GC [ParNew: 136359K->39K(153344K), 0.0027960 
secs] 315987K->179667K(1621376K) icms_dc=0 , 0.0028520 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs] 
2012-07-22T12:43:35.627-0700: [GC [ParNew: 136340K->39K(153344K), 0.0030750 
secs] 315968K->179667K(1621376K) icms_dc=0 , 0.0031320 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs] 
2012-07-22T13:00:05.607-0700: [GC [ParNew: 136359K->39K(153344K), 0.0032030

Re: Basic Question on Partitioner,Combiner and Co-Processor

2012-07-23 Thread shashwat shriparv
Check out this link too..

http://sharepointorange.blogspot.in/2012/07/how-to-working-with-hbase-coprocessor.html


On Tue, Jul 24, 2012 at 12:09 AM, syed kather  wrote:

> In my use case i have Mapper function which emit . Number of
> Map will be around 10 billion X 2000 user . which is best partition method
> which i can follow for my use case ?. Sorry i am first time writing my
> partition . So i have this doubt .
>
> Thanks in advance
>
> S SYED ABDUL KATHER
>
>
>
> On Tue, Jul 24, 2012 at 12:03 AM, syed kather  wrote:
>
> > Thanks Shashwat Shriparv..
> > Is there any interface or abstract class partitioner avaliable for hbase
> > specifically ..
> > Thanks and Regards,
> > S SYED ABDUL KATHER
> >
> >
> >
> > On Mon, Jul 23, 2012 at 11:54 PM, shashwat shriparv <
> > dwivedishash...@gmail.com> wrote:
> >
> >> Check out this link may be will help you..
> >>
> >> http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning
> >>
> >>
> http://www.ashishpaliwal.com/blog/2012/05/hadoop-recipe-implementing-custom-partitioner/
> >>
> >>
> >> Regards
> >>
> >>
> >> ∞
> >> Shashwat Shriparv
> >>
> >>
> >>
> >> On Mon, Jul 23, 2012 at 11:22 PM, syed kather 
> wrote:
> >>
> >> > Hi ,
> >> >
> >> > I am very much interested to know how to implement the custom
> >> > Partitioner  . Is there any blog let me know . As i knew the number of
> >> > reducer is depends upon the partitioner . Correct me if i am wrong
> >> >
> >> > How to implement Co-Processor(Min,Max) . Is there any tutorial
> >> available on
> >> > implementing Co-Processor . This is the first time i am working with
> >> > Co-Processor.
> >> >
> >> > How to use combiner in HBase ( Trying to write the Map-Reduce Program
> >> for
> >> > finding the Max Value from the HBase Columns )?
> >> >
> >> > Thanks in advance ..
> >> > Thanks and Regards,
> >> > S SYED ABDUL KATHER
> >> >
> >>
> >>
> >>
> >> --
> >>
> >>
> >> ∞
> >> Shashwat Shriparv
> >>
> >
> >
>



-- 


∞
Shashwat Shriparv


Re: Hbase bkup options

2012-07-23 Thread Minh Duc Nguyen
Once your backup data has been put back into HDFS, you can import it into
HBase using this command:

bin/hbase org.apache.hadoop.hbase.mapreduce.Import  

See http://hbase.apache.org/book/ops_mgt.html#import for more information.

HTH,
Minh

On Mon, Jul 23, 2012 at 11:33 AM, Amlan Roy  wrote:

> Hi Michael,
>
> Thanks a lot for the reply. What I want to achieve is, if my cluster goes
> down for some reason, I should be able to create a new cluster and should
> be
> able to import all the backed up data. As I want to store all the tables, I
> expect the data size to be huge (in order of Tera Bytes) and it will keep
> growing.
>
> If I have understood correctly, you have suggested to run "export" to get
> the data into hdfs and then run "hadoop fs -copyToLocal" to get it into
> local file. If I take a back up of the files, is it possible to import that
> data to a new Hbase cluster?
>
> Thanks and regards,
> Amlan
>
> -Original Message-
> From: Michael Segel [mailto:michael_se...@hotmail.com]
> Sent: Monday, July 23, 2012 8:19 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase bkup options
>
> Amian,
>
> Like always the answer to your question is... it depends.
>
> First, how much data are we talking about?
>
> What's the value of the underlying data?
>
> One possible scenario...
> You run a M/R job to copy data from the table to an HDFS file, that is then
> copied to attached storage on an edge node and then to tape.
> Depending on how much data, how much disk is in the attached storage you
> may
> want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold
> copy on tape off to some offsite storage facility.
>
> There are other options, but it all depends on what you want to achieve.
>
> With respect to the other tools...
>
> You can export  (which is a m/r job) to a local directory, then use distcp
> to a different cluster.  hadoop fs -copyToLocal will let you copy off the
> cluster.
> You could write your own code, but you don't get much gain over existing
> UNIX/Linux tools.
>
>
> On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:
>
> > Hi,
> >
> >
> >
> > Is it feasible to do disk or tape backup for Hbase tables?
> >
> >
> >
> > I have read about the tools like Export, CopyTable, Distcp. It seems like
> > they will require a separate HDFS cluster to do that.
> >
> >
> >
> > Regards,
> >
> > Amlan
> >
>
>


Re: Basic Question on Partitioner,Combiner and Co-Processor

2012-07-23 Thread syed kather
In my use case i have Mapper function which emit . Number of
Map will be around 10 billion X 2000 user . which is best partition method
which i can follow for my use case ?. Sorry i am first time writing my
partition . So i have this doubt .

Thanks in advance

S SYED ABDUL KATHER



On Tue, Jul 24, 2012 at 12:03 AM, syed kather  wrote:

> Thanks Shashwat Shriparv..
> Is there any interface or abstract class partitioner avaliable for hbase
> specifically ..
> Thanks and Regards,
> S SYED ABDUL KATHER
>
>
>
> On Mon, Jul 23, 2012 at 11:54 PM, shashwat shriparv <
> dwivedishash...@gmail.com> wrote:
>
>> Check out this link may be will help you..
>>
>> http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning
>>
>> http://www.ashishpaliwal.com/blog/2012/05/hadoop-recipe-implementing-custom-partitioner/
>>
>>
>> Regards
>>
>>
>> ∞
>> Shashwat Shriparv
>>
>>
>>
>> On Mon, Jul 23, 2012 at 11:22 PM, syed kather  wrote:
>>
>> > Hi ,
>> >
>> > I am very much interested to know how to implement the custom
>> > Partitioner  . Is there any blog let me know . As i knew the number of
>> > reducer is depends upon the partitioner . Correct me if i am wrong
>> >
>> > How to implement Co-Processor(Min,Max) . Is there any tutorial
>> available on
>> > implementing Co-Processor . This is the first time i am working with
>> > Co-Processor.
>> >
>> > How to use combiner in HBase ( Trying to write the Map-Reduce Program
>> for
>> > finding the Max Value from the HBase Columns )?
>> >
>> > Thanks in advance ..
>> > Thanks and Regards,
>> > S SYED ABDUL KATHER
>> >
>>
>>
>>
>> --
>>
>>
>> ∞
>> Shashwat Shriparv
>>
>
>


Re: Use of MD5 as row keys - is this safe?

2012-07-23 Thread Amandeep Khurana
On Mon, Jul 23, 2012 at 9:58 AM, Jonathan Bishop wrote:

> Hi,
> Thanks everyone for the informative discussion on this topic.
>
> I think that for project I am involved in I must remove the risk, however
> small, of a row key collision, and append the original id (in my case a
> long) to the hash, whatever hash I use. I don't want to be in the situation
> where occasionally something goes wrong and needing to eliminate the
> possibility of a collision.
>
> I was confused by a discussion in a book I was reading on HBase, "HBase in
> Action", which used MD5 directly as the row key, leaving the impression
> that this was a completely reliable way of creating unique row keys from
> strings.
>

The book talks about hashing as well as salting. I'll add notes to it about
possible collisions while using hashing. Thanks for pointing this out.


>
> Jon
>


Re: Basic Question on Partitioner,Combiner and Co-Processor

2012-07-23 Thread syed kather
Thanks Shashwat Shriparv..
Is there any interface or abstract class partitioner avaliable for hbase
specifically ..
Thanks and Regards,
S SYED ABDUL KATHER



On Mon, Jul 23, 2012 at 11:54 PM, shashwat shriparv <
dwivedishash...@gmail.com> wrote:

> Check out this link may be will help you..
>
> http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning
>
> http://www.ashishpaliwal.com/blog/2012/05/hadoop-recipe-implementing-custom-partitioner/
>
>
> Regards
>
>
> ∞
> Shashwat Shriparv
>
>
>
> On Mon, Jul 23, 2012 at 11:22 PM, syed kather  wrote:
>
> > Hi ,
> >
> > I am very much interested to know how to implement the custom
> > Partitioner  . Is there any blog let me know . As i knew the number of
> > reducer is depends upon the partitioner . Correct me if i am wrong
> >
> > How to implement Co-Processor(Min,Max) . Is there any tutorial available
> on
> > implementing Co-Processor . This is the first time i am working with
> > Co-Processor.
> >
> > How to use combiner in HBase ( Trying to write the Map-Reduce Program for
> > finding the Max Value from the HBase Columns )?
> >
> > Thanks in advance ..
> > Thanks and Regards,
> > S SYED ABDUL KATHER
> >
>
>
>
> --
>
>
> ∞
> Shashwat Shriparv
>


Re: Basic Question on Partitioner,Combiner and Co-Processor

2012-07-23 Thread shashwat shriparv
Check out this link may be will help you..

http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning
http://www.ashishpaliwal.com/blog/2012/05/hadoop-recipe-implementing-custom-partitioner/


Regards


∞
Shashwat Shriparv



On Mon, Jul 23, 2012 at 11:22 PM, syed kather  wrote:

> Hi ,
>
> I am very much interested to know how to implement the custom
> Partitioner  . Is there any blog let me know . As i knew the number of
> reducer is depends upon the partitioner . Correct me if i am wrong
>
> How to implement Co-Processor(Min,Max) . Is there any tutorial available on
> implementing Co-Processor . This is the first time i am working with
> Co-Processor.
>
> How to use combiner in HBase ( Trying to write the Map-Reduce Program for
> finding the Max Value from the HBase Columns )?
>
> Thanks in advance ..
> Thanks and Regards,
> S SYED ABDUL KATHER
>



-- 


∞
Shashwat Shriparv


Basic Question on Partitioner,Combiner and Co-Processor

2012-07-23 Thread syed kather
Hi ,

I am very much interested to know how to implement the custom
Partitioner  . Is there any blog let me know . As i knew the number of
reducer is depends upon the partitioner . Correct me if i am wrong

How to implement Co-Processor(Min,Max) . Is there any tutorial available on
implementing Co-Processor . This is the first time i am working with
Co-Processor.

How to use combiner in HBase ( Trying to write the Map-Reduce Program for
finding the Max Value from the HBase Columns )?

Thanks in advance ..
Thanks and Regards,
S SYED ABDUL KATHER


Re: Use of MD5 as row keys - is this safe?

2012-07-23 Thread Jonathan Bishop
Hi,
Thanks everyone for the informative discussion on this topic.

I think that for project I am involved in I must remove the risk, however
small, of a row key collision, and append the original id (in my case a
long) to the hash, whatever hash I use. I don't want to be in the situation
where occasionally something goes wrong and needing to eliminate the
possibility of a collision.

I was confused by a discussion in a book I was reading on HBase, "HBase in
Action", which used MD5 directly as the row key, leaving the impression
that this was a completely reliable way of creating unique row keys from
strings.

Jon


Re: Hbase bkup options

2012-07-23 Thread Alok Kumar
Hello everyone,

I too have similar use-case, where I've setup a separate HBase Replica
Cluster.
and enabled 'Replciation_Scope' for tables.

Q. Do I need to create 'table + ColFamily' in backup cluster everytime a
new *table* gets created in 'production' cluster?
or Is there a way where table schema too get replicated across cluster(
like put+delete get replicated) ?

Your help is highly appreciated
Thanks

(I tried sending separate email to group, but it get returned as spam :(

On Mon, Jul 23, 2012 at 9:03 PM, Amlan Roy  wrote:

> Hi Michael,
>
> Thanks a lot for the reply. What I want to achieve is, if my cluster goes
> down for some reason, I should be able to create a new cluster and should
> be
> able to import all the backed up data. As I want to store all the tables, I
> expect the data size to be huge (in order of Tera Bytes) and it will keep
> growing.
>
> If I have understood correctly, you have suggested to run "export" to get
> the data into hdfs and then run "hadoop fs -copyToLocal" to get it into
> local file. If I take a back up of the files, is it possible to import that
> data to a new Hbase cluster?
>
> Thanks and regards,
> Amlan
>
> -Original Message-
> From: Michael Segel [mailto:michael_se...@hotmail.com]
> Sent: Monday, July 23, 2012 8:19 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase bkup options
>
> Amian,
>
> Like always the answer to your question is... it depends.
>
> First, how much data are we talking about?
>
> What's the value of the underlying data?
>
> One possible scenario...
> You run a M/R job to copy data from the table to an HDFS file, that is then
> copied to attached storage on an edge node and then to tape.
> Depending on how much data, how much disk is in the attached storage you
> may
> want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold
> copy on tape off to some offsite storage facility.
>
> There are other options, but it all depends on what you want to achieve.
>
> With respect to the other tools...
>
> You can export  (which is a m/r job) to a local directory, then use distcp
> to a different cluster.  hadoop fs -copyToLocal will let you copy off the
> cluster.
> You could write your own code, but you don't get much gain over existing
> UNIX/Linux tools.
>
>
> On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:
>
> > Hi,
> >
> >
> >
> > Is it feasible to do disk or tape backup for Hbase tables?
> >
> >
> >
> > I have read about the tools like Export, CopyTable, Distcp. It seems like
> > they will require a separate HDFS cluster to do that.
> >
> >
> >
> > Regards,
> >
> > Amlan
> >
>
>
-- 
Alok Kumar


Shell - scripts

2012-07-23 Thread Claudiu Olteanu
Is there a security measure for running scripts like "while(1) do;end" in the 
shell? How is the thread closed?


Re: host:port problem

2012-07-23 Thread Mohammad Tariq
Hi Rajendra,

If the web service core was written with the zookeeper jar
included in some older Hbase release, and you have now upgraded your
Hbase version, then this could happen. Try adding the new jar in your
web service core.

Regards,
Mohammad Tariq


On Mon, Jul 23, 2012 at 9:03 PM, Amandeep Khurana  wrote:
> This is most likely because of a mismatch in the ZK library version between 
> your web service and the HBase install. Can you confirm you got the same 
> version in both places?
>
>
> On Monday, July 23, 2012 at 8:31 AM, Rajendra Manjunath wrote:
>
>> i have hbase configured in pseudo distributed mode and i am accessing it
>> through a web service. till now every thing was running fine but from the
>> morning i am getting
>> this error
>> 8774@rajendrarajendra,60020,1343050590428host:port pair: �
>>
>> all the process are running fine and i am able to access my hbase from the
>> shell
>>
>> Need the help and Many thanks
>>
>> From
>> Rajendra
>>
>>
>
>


Re: host:port problem

2012-07-23 Thread Amandeep Khurana
This is most likely because of a mismatch in the ZK library version between 
your web service and the HBase install. Can you confirm you got the same 
version in both places?  


On Monday, July 23, 2012 at 8:31 AM, Rajendra Manjunath wrote:

> i have hbase configured in pseudo distributed mode and i am accessing it
> through a web service. till now every thing was running fine but from the
> morning i am getting
> this error
> 8774@rajendrarajendra,60020,1343050590428host:port pair: �
>  
> all the process are running fine and i am able to access my hbase from the
> shell
>  
> Need the help and Many thanks
>  
> From
> Rajendra
>  
>  




RE: Hbase bkup options

2012-07-23 Thread Amlan Roy
Hi Michael,

Thanks a lot for the reply. What I want to achieve is, if my cluster goes
down for some reason, I should be able to create a new cluster and should be
able to import all the backed up data. As I want to store all the tables, I
expect the data size to be huge (in order of Tera Bytes) and it will keep
growing.

If I have understood correctly, you have suggested to run "export" to get
the data into hdfs and then run "hadoop fs -copyToLocal" to get it into
local file. If I take a back up of the files, is it possible to import that
data to a new Hbase cluster?

Thanks and regards,
Amlan

-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com] 
Sent: Monday, July 23, 2012 8:19 PM
To: user@hbase.apache.org
Subject: Re: Hbase bkup options

Amian, 

Like always the answer to your question is... it depends.

First, how much data are we talking about? 

What's the value of the underlying data? 

One possible scenario...
You run a M/R job to copy data from the table to an HDFS file, that is then
copied to attached storage on an edge node and then to tape. 
Depending on how much data, how much disk is in the attached storage you may
want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold
copy on tape off to some offsite storage facility.

There are other options, but it all depends on what you want to achieve. 

With respect to the other tools...

You can export  (which is a m/r job) to a local directory, then use distcp
to a different cluster.  hadoop fs -copyToLocal will let you copy off the
cluster. 
You could write your own code, but you don't get much gain over existing
UNIX/Linux tools. 


On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:

> Hi,
> 
> 
> 
> Is it feasible to do disk or tape backup for Hbase tables?
> 
> 
> 
> I have read about the tools like Export, CopyTable, Distcp. It seems like
> they will require a separate HDFS cluster to do that.
> 
> 
> 
> Regards,
> 
> Amlan
> 



host:port problem

2012-07-23 Thread Rajendra Manjunath
i have hbase configured in pseudo distributed mode and i am accessing it
through a web service. till now every thing was running fine but from the
morning i am getting
this error
8774@rajendrarajendra,60020,1343050590428host:port pair: �

all the process are running fine and i am able to access my hbase from the
shell

Need the help and Many thanks

From
Rajendra


Re: Hbase bkup options

2012-07-23 Thread Michael Segel
Amian, 

Like always the answer to your question is... it depends.

First, how much data are we talking about? 

What's the value of the underlying data? 

One possible scenario...
You run a M/R job to copy data from the table to an HDFS file, that is then 
copied to attached storage on an edge node and then to tape. 
Depending on how much data, how much disk is in the attached storage you may 
want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold 
copy on tape off to some offsite storage facility.

There are other options, but it all depends on what you want to achieve. 

With respect to the other tools...

You can export  (which is a m/r job) to a local directory, then use distcp to a 
different cluster.  hadoop fs -copyToLocal will let you copy off the cluster. 
You could write your own code, but you don't get much gain over existing 
UNIX/Linux tools. 


On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:

> Hi,
> 
> 
> 
> Is it feasible to do disk or tape backup for Hbase tables?
> 
> 
> 
> I have read about the tools like Export, CopyTable, Distcp. It seems like
> they will require a separate HDFS cluster to do that.
> 
> 
> 
> Regards,
> 
> Amlan
> 



Hbase bkup options

2012-07-23 Thread Amlan Roy
Hi,

 

Is it feasible to do disk or tape backup for Hbase tables?

 

I have read about the tools like Export, CopyTable, Distcp. It seems like
they will require a separate HDFS cluster to do that.

 

Regards,

Amlan



Re: Install hbase on Ubuntu 11.10

2012-07-23 Thread iwannaplay games
Hi,
Please make sure u follow all this written over here

http://biforbeginners.blogspot.in/2012/07/step-by-step-installation-of-hadoop-on.html

for ubuntu u can refer michael noll's set up

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Regards
Prabhjot

On 7/23/12, Debarshi Bhattacharya  wrote:
> I am getting the same type of error while installing HBASE. In have Apache
> Hadoop 0.20.2 installed in my system(ubuntu 11.10). Hbase is getting
> installed,
> but when I try to create a table I am getting the following error :-
>
>
> THE COMPLETE STACKTRACE :
>
>
> ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> after
> attempts=7, exceptions:
> Mon Jul 23 11:44:12 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
> region
> for test,,00 after 7 tries.
> Mon Jul 23 11:51:13 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
> region
> for test,,00 after 7 tries.
> Mon Jul 23 11:58:14 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
> region
> for test,,00 after 7 tries.
> Mon Jul 23 12:05:15 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
> region
> for test,,00 after 7 tries.
> Mon Jul 23 12:12:17 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
> region
> for test,,00 after 7 tries.
> Mon Jul 23 12:19:19 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
> region
> for test,,00 after 7 tries.
> Mon Jul 23 12:26:23 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
> region
> for test,,00 after 7 tries.
>
> Here is some help for this command:
> Create table; pass table name, a dictionary of specifications per
> column family, and optionally a dictionary of table configuration.
> Dictionaries are described below in the GENERAL NOTES section.
> Examples:
>
>   hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
>   hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
>   hbase> # The above in shorthand would be the following:
>   hbase> create 't1', 'f1', 'f2', 'f3'
>   hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000,
> BLOCKCACHE
> => true}
>   hbase> create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']}
>   hbase> create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}
>
>
> hbase(main):002:0>
>
>
>
>
>
>


Re: Use of MD5 as row keys - is this safe?

2012-07-23 Thread Michael Segel
Ethan,

Its not a question of collision attacks, although that is an indication of the 
strength of the hash. 

Just because both MD5 and SHA-1 yield a key of the same length, that doesn't 
mean that they will have the same chance of a collision. The algorithm has some 
impact on this too. 

I agree that if you're looking to hash the key, MD5 should be good enough. If 
you don't like it, SHA-1 or SHA-2 are also available as part of Sun's JDK. 
(See Appendix A here: 
http://docs.oracle.com/javase/6/docs/technotes/guides/security/p11guide.html) 

Key's longer than 512 are not exportable from the US. 

The point I was making was that if you are not comfortable in using MD5 and you 
want to append the key to the hash, you may want to consider a different hash. 

In terms of Uniqueness, take a UUID , hash it, and then truncate the hash to 
128 bits from 160 bits. This actually is a form of the UUID. While this could 
increase the likelihood of a collision, I haven't seen one in real use. 

The idea of creating a hash and then prepending it to the key is a tad 
pessimistic.

That was the point. 
 


On Jul 22, 2012, at 9:21 AM, Ethan Jewett wrote:

> To echo Joe Pallas:
> 
> Any fairly "random" hash algorithm producing the same length output should
> have about the same extremely small chance of producing the same output for
> two different inputs - a collision. It's a problem you need to be aware of
> no matter what hash algorithm you use. (Hash functions are mappings from a
> theoretically infinite input space to a finitely large output space, so
> they obviously generate the same output for multiple inputs.)
> 
> SHA-1 specifically (and MD5 even more-so) has an attack that shows that
> given a specific input and output, we can calculate a new input that
> produces the same output with better than brute-force efficiency.
> 
> Collisions and collision attacks are two different things. Collision
> attacks are a problem for cryptographic uses like signing, but how does
> this have anything to do with the problem of generating hBase row keys?
> Just use the fastest, most accessible, random-enough algorithm you can
> find, and if you are really worried about collisions then do something to
> ensure that the key will be unique. Right?
> 
> Cheers,
> Ethan
> 
> On Sun, Jul 22, 2012 at 2:00 PM, Michel Segel 
> wrote:
> 
>> http://en.wikipedia.org/wiki/SHA-1
>> 
>> Check out the comparisons between the different SHA algos.
>> 
>> In theory a collision was found for SHA-1, but none found for SHA-2 does
>> that mean that a collision doesn't exist? No, it means that it hasn't
>> happened yet and the odds are that it won't be found. Possible? Yes,
>> however, highly improbable. You have a better chance of winning the lotto...
>> 
>> The point was that if you are going to hash your key,then concatenate the
>> initial key, you would be better off looking at the SHA-1 option. You have
>> to consider a couple of factors...
>> 1: availability of the algo. SHA-1 is in the standard java API and is
>> readily available.
>> 2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For
>> most, I'll say probably.
>> 3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and
>> the original key added.
>> 
>> Just food for thought...
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Jul 20, 2012, at 3:35 PM, Joe Pallas  wrote:
>> 
>>> 
>>> On Jul 20, 2012, at 12:16 PM, Michel Segel wrote:
>>> 
 I don't believe that there has been any reports of collisions, but if.
>> You are concerned you could use the SHA-1 for generating the hash.
>> Relatively speaking, SHA-1is slower, but still fast enough for most
>> applications.
>>> 
>>> Every hash function can have collisions, by definition.  If the
>> correctness of your design depends on collisions being impossible, rather
>> than very rare, then your design is faulty.
>>> 
>>> Cryptographic hash functions have the property that it is
>> computationally hard to create inputs that match a given output.  That
>> doesn’t in itself make cryptographic hash functions better than other hash
>> functions for avoiding hot-spotting.  (But it does usually make
>> cryptographic hash functions more expensive to compute than other hash
>> functions.)
>>> 
>>> You may want to look at   and <
>> http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633
>>> .
>>> 
>>> Hope this helps,
>>> joe
>>> 
>>> 
>> 



Re: Install hbase on Ubuntu 11.10

2012-07-23 Thread Jean-Marc Spaggiari
Hi Debarshi,

You can use JPS to list the java deamons running on your different machines.

You should see at least those one:
3420 QuorumPeerMain
2564 TaskTracker
2252 NameNode
2456 DataNode
2360 JobTracker
2703 HMaster
2815 HRegionServer

The number is  the process number so will change each time you start
the application. And all those process can be on different machines,
not necessary the same one.

If you have all those process running, then you should take a look at
the logs files.

JM

2012/7/23, Mohammad Tariq :
> Just make sure all the daemon processes are running. Your logs show
> there is some problem with your regionserver and it is not reachable.
>
> Regards,
> Mohammad Tariq
>
>
> On Mon, Jul 23, 2012 at 3:22 PM, Debarshi Bhattacharya
>  wrote:
>> I am getting the same type of error while installing HBASE. In have
>> Apache
>> Hadoop 0.20.2 installed in my system(ubuntu 11.10). Hbase is getting
>> installed,
>> but when I try to create a table I am getting the following error :-
>>
>>
>> THE COMPLETE STACKTRACE :
>>
>>
>> ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
>> after
>> attempts=7, exceptions:
>> Mon Jul 23 11:44:12 IST 2012,
>> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
>> region
>> for test,,00 after 7 tries.
>> Mon Jul 23 11:51:13 IST 2012,
>> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
>> region
>> for test,,00 after 7 tries.
>> Mon Jul 23 11:58:14 IST 2012,
>> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
>> region
>> for test,,00 after 7 tries.
>> Mon Jul 23 12:05:15 IST 2012,
>> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
>> region
>> for test,,00 after 7 tries.
>> Mon Jul 23 12:12:17 IST 2012,
>> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
>> region
>> for test,,00 after 7 tries.
>> Mon Jul 23 12:19:19 IST 2012,
>> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
>> region
>> for test,,00 after 7 tries.
>> Mon Jul 23 12:26:23 IST 2012,
>> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find
>> region
>> for test,,00 after 7 tries.
>>
>> Here is some help for this command:
>> Create table; pass table name, a dictionary of specifications per
>> column family, and optionally a dictionary of table configuration.
>> Dictionaries are described below in the GENERAL NOTES section.
>> Examples:
>>
>>   hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
>>   hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
>>   hbase> # The above in shorthand would be the following:
>>   hbase> create 't1', 'f1', 'f2', 'f3'
>>   hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000,
>> BLOCKCACHE
>> => true}
>>   hbase> create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']}
>>   hbase> create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}
>>
>>
>> hbase(main):002:0>
>>
>>
>>
>>
>>
>


Re: Install hbase on Ubuntu 11.10

2012-07-23 Thread Mohammad Tariq
Just make sure all the daemon processes are running. Your logs show
there is some problem with your regionserver and it is not reachable.

Regards,
Mohammad Tariq


On Mon, Jul 23, 2012 at 3:22 PM, Debarshi Bhattacharya
 wrote:
> I am getting the same type of error while installing HBASE. In have Apache
> Hadoop 0.20.2 installed in my system(ubuntu 11.10). Hbase is getting 
> installed,
> but when I try to create a table I am getting the following error :-
>
>
> THE COMPLETE STACKTRACE :
>
>
> ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=7, exceptions:
> Mon Jul 23 11:44:12 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
> region
> for test,,00 after 7 tries.
> Mon Jul 23 11:51:13 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
> region
> for test,,00 after 7 tries.
> Mon Jul 23 11:58:14 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
> region
> for test,,00 after 7 tries.
> Mon Jul 23 12:05:15 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
> region
> for test,,00 after 7 tries.
> Mon Jul 23 12:12:17 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
> region
> for test,,00 after 7 tries.
> Mon Jul 23 12:19:19 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
> region
> for test,,00 after 7 tries.
> Mon Jul 23 12:26:23 IST 2012,
> org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
> org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
> region
> for test,,00 after 7 tries.
>
> Here is some help for this command:
> Create table; pass table name, a dictionary of specifications per
> column family, and optionally a dictionary of table configuration.
> Dictionaries are described below in the GENERAL NOTES section.
> Examples:
>
>   hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
>   hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
>   hbase> # The above in shorthand would be the following:
>   hbase> create 't1', 'f1', 'f2', 'f3'
>   hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE
> => true}
>   hbase> create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']}
>   hbase> create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}
>
>
> hbase(main):002:0>
>
>
>
>
>


Re: Install hbase on Ubuntu 11.10

2012-07-23 Thread Debarshi Bhattacharya
I am getting the same type of error while installing HBASE. In have Apache
Hadoop 0.20.2 installed in my system(ubuntu 11.10). Hbase is getting installed,
but when I try to create a table I am getting the following error :-


THE COMPLETE STACKTRACE :


ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=7, exceptions:
Mon Jul 23 11:44:12 IST 2012,
org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region
for test,,00 after 7 tries.
Mon Jul 23 11:51:13 IST 2012,
org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region
for test,,00 after 7 tries.
Mon Jul 23 11:58:14 IST 2012,
org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region
for test,,00 after 7 tries.
Mon Jul 23 12:05:15 IST 2012,
org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region
for test,,00 after 7 tries.
Mon Jul 23 12:12:17 IST 2012,
org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region
for test,,00 after 7 tries.
Mon Jul 23 12:19:19 IST 2012,
org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region
for test,,00 after 7 tries.
Mon Jul 23 12:26:23 IST 2012,
org.apache.hadoop.hbase.client.ScannerCallable@1d3e069,
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region
for test,,00 after 7 tries.

Here is some help for this command:
Create table; pass table name, a dictionary of specifications per
column family, and optionally a dictionary of table configuration.
Dictionaries are described below in the GENERAL NOTES section.
Examples:

  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
  hbase> # The above in shorthand would be the following:
  hbase> create 't1', 'f1', 'f2', 'f3'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE
=> true}
  hbase> create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']}
  hbase> create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}


hbase(main):002:0>







Re: Hbase useful weblinks

2012-07-23 Thread iwannaplay games
Listed steps of HBAse installation

http://biforbeginners.blogspot.in/2012/07/step-by-step-installation-of-hbase.html

Regards
Prabhjot


On 7/22/12, prabhu k  wrote:
> Hi Users,
>
> Could some one provide the following links.
>
> 1. Hbase stable version Installation.
>
> 2. Hbase study material and other stuff like ppt's etc...
>
> Thanks,
> Prabhu.
>