Root Region Error with hbase 0.98.4

2015-03-19 Thread Nishanth S
Hello,

I am running  a ycsb instance to insert data into  hbase.All was well when
this was against hbase 0.96.1.Now I am trying to run the same program to
another cluster which is configured with  hbase 0.98.4.I  get the below
error on the client side.Could some one help me with this?.


The znode for the -ROOT- region doesn't exist!

Thanks,
Nishan


Re: oldWALs: what it is and how can I clean it?

2015-02-25 Thread Nishanth S
Do you have replication turned on in hbase and  if so is your slave
 consuming the replicated data?.

-Nishanth

On Wed, Feb 25, 2015 at 10:19 AM, Madeleine Piffaretti 
mpiffare...@powerspace.com wrote:

 Hi all,

 We are running out of space in our small hadoop cluster so I was checking
 disk usage on HDFS and I saw that most of the space was occupied by the*
 /hbase/oldWALs* folder.

 I have checked in the HBase Definitive Book and others books, web-site
 and I have also search my issue on google but I didn't find a proper
 response...

 So I would like to know what does this folder, what is use for and also how
 can I free space from this folder without breaking everything...


 If it's related to a specific version... our cluster is under
 5.3.0-1.cdh5.3.0.p0.30 from cloudera (hbase 0.98.6).

 Thx for your help!



Re: Removing a Stored Field from Solr Schema

2015-01-30 Thread Nishanth S
Please ignore..

On Fri, Jan 30, 2015 at 10:39 AM, Nishanth S nishanth.2...@gmail.com
wrote:

 Hello,

 I have a field which is indexed  and stored  in the solr schema( 4.4.solr
 cloud).This field is relatively huge and I plan to  only index the field
 and not to store.Is there a  need to re-index the  documents once this
 change is made?.

 Thanks,
 Nishanth



Re: Tool to to execute an benchmark for HBase.

2015-01-30 Thread Nishanth S
You are hitting  hbase harder now which is important for benchmarking.If
there is no data loss it means your hbase cluster is  good enough to handle
the load.You are simply making more use of the cores from where you launch
 ycsb  process.Write your own workload depending on the record sizes,format
to see what can be achieved in a particular use case.

-Nishanth

On Fri, Jan 30, 2015 at 5:34 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:

 I have coming back to the benchmark.I executde this command:
 yscb run hbase -P workflowA -p columnfamilty=cf -p
 operationcount=10 threads=32

 And I got an performace of 2000op/seg
 What I did later it's to execute ten of those commands in parallel and
 I got about 18000op/sec  in total. I don't get 2000op/sec for each ot
 them executions but I got about 1800op/sec

 I don't know if ti's an HBase question, but, I don't understand why I
 got more performance if I execute more commands in parallel if I
 already execute 32 threads.
 I took a look to the top and I saw that in the first (just one
 process) the CPU was working about 20-60% when I launch more processes
 the CPU it's about 400-500%.



 2015-01-29 18:23 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
  There's an option when you execute yscb to say how many clients
  threads you want to use. I tried with 1/8/16/32. Those results are
  with 16, the improvement 1vs8 it's pretty high not as much 16 to 32.
  I only use one yscb, could it be that important?
 
  -threads : the number of client threads. By default, the YCSB Client
  uses a single worker thread, but additional threads can be specified.
  This is often done to increase the amount of load offered against the
  database.
 
  2015-01-29 17:27 GMT+01:00 Nishanth S nishanth.2...@gmail.com:
  How many instances of ycsb do you run and how many threads do you use
 per
  instance.I guess these ops are per instance and  you should get similar
  numbers if you run  more instances.In short try running more  workload
  instances...
 
  -Nishanth
 
  On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz konstt2...@gmail.com
  wrote:
 
  Yes, I'm using 40%. i can't access to those data either.
  I don't know how YSCB executes the reads and if they are random and
  could take advange of the cache.
 
  Do you think that it's an acceptable performance?
 
 
  2015-01-29 16:26 GMT+01:00 Ted Yu yuzhih...@gmail.com:
   What's the value for hfile.block.cache.size ?
  
   By default it is 40%. You may want to increase its value if you're
 using
   default.
  
   Andrew published some ycsb results :
   http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
   -0.98.0-vs-0.98.8.pdf
  
   However, I couldn't access the above now.
  
   Cheers
  
   On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz 
 konstt2...@gmail.com
   wrote:
  
   Is there any result with that benchmark to compare??
   I'm executing the different workloads and for example for 100% Reads
   in a table with 10Millions of records I only get an performance of
   2000operations/sec. I hoped much better performance but I could be
   wrong. I'd like to know if it's a normal performance or I could have
   something bad configured.
  
  
   I have splitted the tabled and all the records are balanced and used
   snappy.
   The cluster has a master and 4 regions servers with 256Gb,Cores 2
 (32
   w/ Hyperthreading), 0.98.6-cdh5.3.0,
  
   RegionServer is executed with these parameters:
/usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
   -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
   -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
   -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
   -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
   -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
   -Dhbase.log.dir=/var/log/hbase
  
  
 
 -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
  
 
 -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
   -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
  
  
 
 -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
   -Dhbase.security.logger=INFO,RFAS
   org.apache.hadoop.hbase.regionserver.HRegionServer start
  
  
   The results for 100% reads are
   [OVERALL], RunTime(ms), 42734.0
   [OVERALL], Throughput(ops/sec), 2340.0570973931763
   [UPDATE], Operations, 1.0
   [UPDATE], AverageLatency(us), 103170.0
   [UPDATE], MinLatency(us), 103168.0
   [UPDATE], MaxLatency(us), 103171.0
   [UPDATE], 95thPercentileLatency(ms), 103.0
   [UPDATE], 99thPercentileLatency(ms), 103.0
   [READ], Operations, 10.0
   [READ], AverageLatency(us), 412.5534
   [READ], AverageLatency(us,corrected), 581.6249026771276
   [READ], MinLatency(us), 218.0
   [READ], MaxLatency(us), 268383.0
   [READ], MaxLatency(us,corrected), 268383.0
   [READ], 95thPercentileLatency(ms), 0.0
   [READ], 95thPercentileLatency(ms,corrected), 0.0
   [READ

Removing a Stored Field from Solr Schema

2015-01-30 Thread Nishanth S
Hello,

I have a field which is indexed  and stored  in the solr schema( 4.4.solr
cloud).This field is relatively huge and I plan to  only index the field
and not to store.Is there a  need to re-index the  documents once this
change is made?.

Thanks,
Nishanth


Re: Tool to to execute an benchmark for HBase.

2015-01-29 Thread Nishanth S
How many instances of ycsb do you run and how many threads do you use per
instance.I guess these ops are per instance and  you should get similar
numbers if you run  more instances.In short try running more  workload
instances...

-Nishanth

On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:

 Yes, I'm using 40%. i can't access to those data either.
 I don't know how YSCB executes the reads and if they are random and
 could take advange of the cache.

 Do you think that it's an acceptable performance?


 2015-01-29 16:26 GMT+01:00 Ted Yu yuzhih...@gmail.com:
  What's the value for hfile.block.cache.size ?
 
  By default it is 40%. You may want to increase its value if you're using
  default.
 
  Andrew published some ycsb results :
  http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
  -0.98.0-vs-0.98.8.pdf
 
  However, I couldn't access the above now.
 
  Cheers
 
  On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz konstt2...@gmail.com
  wrote:
 
  Is there any result with that benchmark to compare??
  I'm executing the different workloads and for example for 100% Reads
  in a table with 10Millions of records I only get an performance of
  2000operations/sec. I hoped much better performance but I could be
  wrong. I'd like to know if it's a normal performance or I could have
  something bad configured.
 
 
  I have splitted the tabled and all the records are balanced and used
  snappy.
  The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32
  w/ Hyperthreading), 0.98.6-cdh5.3.0,
 
  RegionServer is executed with these parameters:
   /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
  -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
  -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
  -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
  -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
  -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
  -Dhbase.log.dir=/var/log/hbase
 
 
 -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
 
 -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
  -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
 
 
 -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
  -Dhbase.security.logger=INFO,RFAS
  org.apache.hadoop.hbase.regionserver.HRegionServer start
 
 
  The results for 100% reads are
  [OVERALL], RunTime(ms), 42734.0
  [OVERALL], Throughput(ops/sec), 2340.0570973931763
  [UPDATE], Operations, 1.0
  [UPDATE], AverageLatency(us), 103170.0
  [UPDATE], MinLatency(us), 103168.0
  [UPDATE], MaxLatency(us), 103171.0
  [UPDATE], 95thPercentileLatency(ms), 103.0
  [UPDATE], 99thPercentileLatency(ms), 103.0
  [READ], Operations, 10.0
  [READ], AverageLatency(us), 412.5534
  [READ], AverageLatency(us,corrected), 581.6249026771276
  [READ], MinLatency(us), 218.0
  [READ], MaxLatency(us), 268383.0
  [READ], MaxLatency(us,corrected), 268383.0
  [READ], 95thPercentileLatency(ms), 0.0
  [READ], 95thPercentileLatency(ms,corrected), 0.0
  [READ], 99thPercentileLatency(ms), 0.0
  [READ], 99thPercentileLatency(ms,corrected), 0.0
  [READ], Return=0, 10
  [CLEANUP], Operations, 1.0
  [CLEANUP], AverageLatency(us), 103598.0
  [CLEANUP], MinLatency(us), 103596.0
  [CLEANUP], MaxLatency(us), 103599.0
  [CLEANUP], 95thPercentileLatency(ms), 103.0
  [CLEANUP], 99thPercentileLatency(ms), 103.0
 
  hbase(main):030:0 describe 'username'
  DESCRIPTION
  ENABLED
   'username', {NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER
  = 'ROW', REPLICATION_SCOPE = '0', true
VERSIONS = '1', COMPRESSION = 'SNAPPY', MIN_VERSIONS = '0', TTL
  = 'FOREVER', KEEP_DELETED_CELLS = '
   false', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE =
 'true'}
  1 row(s) in 0.0170 seconds
 
  2015-01-29 5:27 GMT+01:00 Ted Yu yuzhih...@gmail.com:
   Maybe ask on Cassandra mailing list for the benchmark tool they use ?
  
   Cheers
  
   On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz 
 konstt2...@gmail.com
   wrote:
  
   I was checking that web, do you know if there's another possibility
   since last updated for Cassandra was two years ago and I'd like to
   compare bothof them with kind of same tool/code.
  
   2015-01-28 22:10 GMT+01:00 Ted Yu yuzhih...@gmail.com:
Guillermo:
If you use hbase 0.98.x, please consider Andrew's ycsb repo:
   
https://github.com/apurtell/ycsb/tree/new_hbase_client
   
Cheers
   
On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S 
 nishanth.2...@gmail.com
  
wrote:
   
You can use ycsb for this purpose.See here
   
https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
-Nishanth
   
On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz 
  konstt2...@gmail.com
wrote:
   
 Hi,

 I'd like to do some benchmarks fo HBase but I don't know what
 tool
 could use. I started to make some

Re: Tool to to execute an benchmark for HBase.

2015-01-28 Thread Nishanth S
You can use ycsb for this purpose.See here

https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
-Nishanth

On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz konstt2...@gmail.com
wrote:

 Hi,

 I'd like to do some benchmarks fo HBase but I don't know what tool
 could use. I started to make some code but I guess that there're some
 easier.

 I've taken a look to JMeter, but I guess that I'd attack directly from
 Java, JMeter looks great but I don't know if it fits well in this
 scenario. What tool could I use to take some measures as time to
 response some read and write request, etc. I'd like that to be able to
 make the same benchmarks to Cassandra.



Deleting Files in oldwals

2015-01-26 Thread Nishanth S
HI,

We were running an hbase cluster with replication enabled.How ever we have
moved away from  replication and  turned this off.I also went ahead and
removed the peers from hbase shell.How ever the oldwals directory is  not
cleaned up.I am using hbase version 0.96.1. Is it safe enough to  delete
these logs?.


Thanks,
Nishanth


Hbase Error When running Map reduce

2015-01-22 Thread Nishanth S
Hi All,
I am running a map reduce job which scans the hbase table for  a particular
time period and then  creates some files from that.The job runs fine for 10
minutes or so and few around 10% of maps get completed succesfully.Here is
the error that I am getting.Can some one help?


15/01/22 19:34:33 INFO mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.ScannerTimeoutException: 559843ms
passed since the last invocation, timeout is currently set to 6
at 
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:352)
at 
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
at 
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hbase.UnknownScannerException:
org.apache.hadoop.hbase.UnknownScannerException: Name:
3432603283499371482, already closed?
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:2973)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277)
at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:198)
at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:96)
at 
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:336)
... 13 more


Using Pig To Scan Hbase

2014-12-05 Thread Nishanth S
Hey folks,

I am trying to write a map reduce in pig against my hbase table.I have a
salting in my rowkey appended with  reverse timestamps ,so I guess the best
way is to do a scan for all the dates that I require to pull out
records.Does any one know if pig supports  hbase scan out of the box or  do
we need  to write a udf for that.

Thannks,
Nishanth


Re: Using Pig To Scan Hbase

2014-12-05 Thread Nishanth S
Thank you Pradeep.That was helpful.

-Nishan

On Fri, Dec 5, 2014 at 11:08 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:

 There is a built in storage handler for HBase. Take a look at the docs at

 https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

 It doesn't support dealing with salted rowkeys (or reverse timestamps) out
 of the box, so you may have to munge with the data a little bit after it's
 loaded to get what you want.

 Hope this helps.
 Pradeep

 On Fri Dec 05 2014 at 9:55:04 AM Nishanth S nishanth.2...@gmail.com
 wrote:

  Hey folks,
 
  I am trying to write a map reduce in pig against my hbase table.I have a
  salting in my rowkey appended with  reverse timestamps ,so I guess the
 best
  way is to do a scan for all the dates that I require to pull out
  records.Does any one know if pig supports  hbase scan out of the box or
 do
  we need  to write a udf for that.
 
  Thannks,
  Nishanth
 



Re: Re: Hbase Dead region Server

2014-11-04 Thread Nishanth S
Thanks Every one.It turned out that I there were a few empty wal
directories corresponding to the dead region servers.Moved them out of
/hbase and  failed over the master.Things started working fine after that.

-Nishanth

On Mon, Nov 3, 2014 at 10:25 PM, yeweichen2...@gmail.com 
yeweichen2...@gmail.com wrote:

 Nishanth,
   What version of HBase you are using?

   You can try clear the ZNode about regionserver list in zookeeper
 /hbase/ and then restart HMaster.

 --
 yeweichen2...@gmail.com


 *From:* Nishanth S nishanth.2...@gmail.com
 *Date:* 2014-11-04 02:32
 *To:* user user@hbase.apache.org
 *Subject:* Re: Hbase Dead region Server
 Thanks Pere. I just did that and still  has the dead region server  showing
 up in Master UI as well as  in status command.I have replication turned on
  in hbase and seeing few issues.Below is the stack trace I am seeing.

 2014-11-03 18:31:00,215 WARN
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't
 replicate because of a local or network error:
 java.io.IOException: No replication sinks are available
 at

 org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:117)
 at

 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:652)
 at

 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:350)
 2014-11-03 18:31:00,459 WARN
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't
 replicate because of a local or network error:
 java.io.IOException: No replication sinks are available
 at

 org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:117)
 at

 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:652)
 at

 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:350)

 On Mon, Nov 3, 2014 at 11:18 AM, Pere Kyle p...@whisper.sh wrote:

  Nishanth,
 
  In my experience the only way I have been able to clear the dead region
  servers is to restart the master daemon.
 
  -Pere
 
  On Mon, Nov 3, 2014 at 9:49 AM, Nishanth S nishanth.2...@gmail.com
  wrote:
 
   Hey folks,
  
   How do I remove a dead region server?.I manually failed over the hbase
   master but this is still appearing in master UI and also on the status
   command that I run.
  
   Thanks,
   Nishan
  
 




Hbase Dead region Server

2014-11-03 Thread Nishanth S
Hey folks,

How do I remove a dead region server?.I manually failed over the hbase
master but this is still appearing in master UI and also on the status
command that I run.

Thanks,
Nishan


Re: Hbase Dead region Server

2014-11-03 Thread Nishanth S
Thanks Pere. I just did that and still  has the dead region server  showing
up in Master UI as well as  in status command.I have replication turned on
 in hbase and seeing few issues.Below is the stack trace I am seeing.

2014-11-03 18:31:00,215 WARN
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't
replicate because of a local or network error:
java.io.IOException: No replication sinks are available
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:117)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:652)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:350)
2014-11-03 18:31:00,459 WARN
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't
replicate because of a local or network error:
java.io.IOException: No replication sinks are available
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:117)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:652)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:350)

On Mon, Nov 3, 2014 at 11:18 AM, Pere Kyle p...@whisper.sh wrote:

 Nishanth,

 In my experience the only way I have been able to clear the dead region
 servers is to restart the master daemon.

 -Pere

 On Mon, Nov 3, 2014 at 9:49 AM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Hey folks,
 
  How do I remove a dead region server?.I manually failed over the hbase
  master but this is still appearing in master UI and also on the status
  command that I run.
 
  Thanks,
  Nishan
 



Re: Connecting via API to a remote HBASE installation

2014-10-26 Thread Nishanth S
Can you telnet to  port 2181 and 60020 on  the remote cluster if you are
running default ports.I had a similar issue in the past where there was
firewall.

Thanks,
Nishanthon

On Sun, Oct 26, 2014 at 9:39 AM, Ted Yu yuzhih...@gmail.com wrote:

 Is hbase-site.xml corresponding to your cluster on the classpath of your
 Windows program ?

 Cheers

 On Sun, Oct 26, 2014 at 8:08 AM, Sznajder ForMailingList 
 bs4mailingl...@gmail.com wrote:

  Hi
 
  I am running some code from windows (java)
  and I would like to connect to the HBASE 0.98.5 installed on a remote
  cluster (pseudodistributed mode)
 
  The UI gives me the following info:
 
  Software AttributesAttribute NameValueDescriptionHBase
  Version0.98.5-hadoop2,
  rUnknownHBase version and revisionHBase CompiledMon Aug 4 23:58:06 PDT
  2014, apurtellWhen HBase version was compiled and by whomHadoop
  Version2.2.0,
  r1529768Hadoop version and revisionHadoop Compiled2013-10-07T06:28Z,
  hortonmuWhen Hadoop version was compiled and by whomZookeeper Quorum
  localhost:2181Addresses of all registered ZK servers. For more, see zk
 dump
  http://lnx-apollo.haifa.ibm.com:60010/zk.jsp.HBase Root Directory
  hdfs://localhost:8020/hbaseLocation of HBase home directoryHMaster Start
  TimeWed Sep 03 16:53:19 GMT+02:00 2014Date stamp of when this HMaster was
  startedHMaster Active TimeWed Sep 03 16:53:19 GMT+02:00 2014Date stamp of
  when this HMaster became activeHBase Cluster ID
  950d5ca3-a174-482c-9b6e-3c858d06f44fUnique identifier generated for each
  HBase clusterLoad average2.00Average number of regions per regionserver.
  Naive computation.Coprocessors[]Coprocessors currently loaded by the
 master
 
 
  However when connecting from my code with the following code:
  Configuration conf = HBaseConfiguration.create();
  conf.set(HBASE_CONFIGURATION_ZOOKEEPER_QUORUM, name);
  conf.set(HBASE_CONFIGURATION_ZOOKEEPER_CLIENTPORT, 8020);
  admin = new HBaseAdmin(conf);
 
 
  I get the following:
 
  java.net.ConnectException: Connection refused: no further information
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:585)
  at
 
 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
  at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
 
 
  what do I miss, pelase?
 
  Benjamin
 



Re: Using parquet

2014-10-21 Thread Nishanth S
Thanks All.I will get back if I take that direction.

-Nishanth

On Tue, Oct 21, 2014 at 8:15 AM, Ted Yu yuzhih...@gmail.com wrote:

 The link is about Cassandra, not hbase.

 Cheers

 On Tue, Oct 21, 2014 at 2:53 AM, Qiang Tian tian...@gmail.com wrote:

  Do you want some sql-on-hadoop could access hbase file directly?
  I did a quick search and find
  http://www.slideshare.net/Stratio/integrating-sparkandcassandra(P35),
 but
  not sure if I understand correctly.
 
  On Tue, Oct 21, 2014 at 12:15 PM, Nick Dimiduk ndimi...@gmail.com
 wrote:
 
   Not currently. HBase uses it's own file format that makes different
   assumptions than parquet. Instead, HBase supports it's own format
   optimizations, such as block encodings and compression. I would be
   interested in an exercise to see what things are necessary for HBase to
   support a columnar format such as parquet or orc; no such investigation
  has
   been undertaken that I am aware of.
  
   Thanks,
   Nick
  
   On Monday, October 20, 2014, Nishanth S nishanth.2...@gmail.com
 wrote:
  
Hey folks,
   
I  have been reading a bit about parque and how hive and impala
 works
   well
on data stored in parque format.Is it even  possible to do the same
  with
hbase to reduce storage etc..
   
   
Thanks,
Nishanth
   
  
 



Using parquet

2014-10-20 Thread Nishanth S
Hey folks,

I  have been reading a bit about parque and how hive and impala  works well
on data stored in parque format.Is it even  possible to do the same with
hbase to reduce storage etc..


Thanks,
Nishanth


Re: custom filter on hbase 0.96

2014-10-15 Thread Nishanth S
Hi Ted ,
 Since I am also working on similar thing is there a way we can  first test
the filter on client side?.You know what I  mean without disrupting others
who are using the same cluster for other work?

Thanks,
Nishanth

On Wed, Oct 15, 2014 at 3:17 PM, Ted Yu yuzhih...@gmail.com wrote:

 bq. Or create a new file, compile it into ...

 You should go with the above approach.

 On Wed, Oct 15, 2014 at 2:08 PM, Matt K matvey1...@gmail.com wrote:

  Hi all,
 
  I'm trying to get a custom filter to work on HBase 0.96. After some
  searching, I found that starting from 0.96, the implementer is required
 to
  implement toByteArray and parseFrom methods, using Protocol Buffers.
  But I'm having trouble with the how.
 
  The proto file for the existing filters is located here:
 
 
 https://github.com/apache/hbase/blob/master/hbase-protocol/src/main/protobuf/Filter.proto
 
  Am I supposed to modify that file? Or create a new file, compile it into
  Java, and package it up with the filter?
 
  In the meantime, I've taken a shortcut that's not working. Here's my
 code:
  http://pastebin.com/iHFKu9Xz
 
  I'm using PrefixFilter, which comes with HBase, since I'm also
 filtering
  by prefix. However, that errors out with the following:
  http://pastebin.com/zBg47p6Z
 
  Thanks in advance for helping!
 
  -Matt
 



Re: custom filter on hbase 0.96

2014-10-15 Thread Nishanth S
Thanks Ted .I will take a look.

-Nishanth

On Wed, Oct 15, 2014 at 3:43 PM, Ted Yu yuzhih...@gmail.com wrote:

 Nishanth:
 Good question.

 As a general coding guide, writing unit test is always a good start. Using
 Matt's case as an example, take a look at TestPrefixFilter.

 There're various unit tests for Filters in hbase code.

 Cheers

 On Wed, Oct 15, 2014 at 2:30 PM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Hi Ted ,
   Since I am also working on similar thing is there a way we can  first
 test
  the filter on client side?.You know what I  mean without disrupting
 others
  who are using the same cluster for other work?
 
  Thanks,
  Nishanth
 
  On Wed, Oct 15, 2014 at 3:17 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   bq. Or create a new file, compile it into ...
  
   You should go with the above approach.
  
   On Wed, Oct 15, 2014 at 2:08 PM, Matt K matvey1...@gmail.com wrote:
  
Hi all,
   
I'm trying to get a custom filter to work on HBase 0.96. After some
searching, I found that starting from 0.96, the implementer is
 required
   to
implement toByteArray and parseFrom methods, using Protocol
  Buffers.
But I'm having trouble with the how.
   
The proto file for the existing filters is located here:
   
   
  
 
 https://github.com/apache/hbase/blob/master/hbase-protocol/src/main/protobuf/Filter.proto
   
Am I supposed to modify that file? Or create a new file, compile it
  into
Java, and package it up with the filter?
   
In the meantime, I've taken a shortcut that's not working. Here's my
   code:
http://pastebin.com/iHFKu9Xz
   
I'm using PrefixFilter, which comes with HBase, since I'm also
   filtering
by prefix. However, that errors out with the following:
http://pastebin.com/zBg47p6Z
   
Thanks in advance for helping!
   
-Matt
   
  
 



Loading hbase from parquet files

2014-10-08 Thread Nishanth S
Hey folks,

I am evaluating on loading  an  hbase table from parquet files based on
some rules that  would be applied on  parquet file records.Could some one
help me on what would be the best way to do this?.


Thanks,
Nishan


Re: Loading hbase from parquet files

2014-10-08 Thread Nishanth S
I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver import. I
could see that we can pass in filters  to this utility but looks less
flexible since  you need to deploy a new filter every time  the rules for
processing records change.Is there some way that we could define a rules
engine?


Thanks,
-Nishan

On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S nishanth.2...@gmail.com wrote:

 Hey folks,

 I am evaluating on loading  an  hbase table from parquet files based on
 some rules that  would be applied on  parquet file records.Could some one
 help me on what would be the best way to do this?.


 Thanks,
 Nishan



Re: Loading hbase from parquet files

2014-10-08 Thread Nishanth S
Thanks Andrey.In the current system  the hbase cfs have a ttl of  30 days
and data gets deleted after this(has snappy compression).Below is something
what I am trying to acheive.

1.Export the data from hbase  table  before it gets deleted.
2.Store it  in some format  which supports maximum compression(storage cost
is my primary concern here),so looking at parquet.
3.Load a subset of this data back into hbase based on  certain rules(say i
want  to load all rows which has a particular string in one of the fields).


I was thinking of bulkloading this data back into hbase but I am not sure
how I can  load a subset of the data using
org.apache.hadoop.hbase.mapreduce.Driver
import.






On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev oct...@gmail.com wrote:

 Hi Nishanth.

 Not clear what exactly you are building.
 Can you share more detailed description of what you are building, how
 parquet files are supposed to be ingested.
 Some questions arise:
 1. is that online import or bulk load
 2. why rules need to be deployed to cluster. Do you suppose to do reading
 inside hbase region server?

 As for deploying filters your cat try to use coprocessors instead. They can
 be configurable and loadable (but not
 unloadable, so you need to think about some class loading magic like
 ClassWorlds)
 For bulk imports you can create HFiles directly and add them incrementally:
 http://hbase.apache.org/book/arch.bulk.load.html

 On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S nishanth.2...@gmail.com
 wrote:

  I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver import.
 I
  could see that we can pass in filters  to this utility but looks less
  flexible since  you need to deploy a new filter every time  the rules for
  processing records change.Is there some way that we could define a rules
  engine?
 
 
  Thanks,
  -Nishan
 
  On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S nishanth.2...@gmail.com
  wrote:
 
   Hey folks,
  
   I am evaluating on loading  an  hbase table from parquet files based on
   some rules that  would be applied on  parquet file records.Could some
 one
   help me on what would be the best way to do this?.
  
  
   Thanks,
   Nishan
  
 



 --
 Andrey.



Re: Loading hbase from parquet files

2014-10-08 Thread Nishanth S
Thank you guys for the information.

-cheers
Nishan

On Wed, Oct 8, 2014 at 12:49 PM, Andrey Stepachev oct...@gmail.com wrote:

 For that use case I'd prefer to write new filtered HFiles with map reduce
 and then import those data into hbase using bulk import. Keep in mind, that
 incremental load tool moves files, not copies them. So once written you
 will not do any additional writes (except for those regions which was split
 while you filtering data). If importing data is small that would not be a
 problem.

 On Wed, Oct 8, 2014 at 8:45 PM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Thanks Andrey.In the current system  the hbase cfs have a ttl of  30 days
  and data gets deleted after this(has snappy compression).Below is
 something
  what I am trying to acheive.
 
  1.Export the data from hbase  table  before it gets deleted.
  2.Store it  in some format  which supports maximum compression(storage
 cost
  is my primary concern here),so looking at parquet.
  3.Load a subset of this data back into hbase based on  certain rules(say
 i
  want  to load all rows which has a particular string in one of the
 fields).
 
 
  I was thinking of bulkloading this data back into hbase but I am not sure
  how I can  load a subset of the data using
  org.apache.hadoop.hbase.mapreduce.Driver
  import.
 
 
 
 
 
 
  On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev oct...@gmail.com
  wrote:
 
   Hi Nishanth.
  
   Not clear what exactly you are building.
   Can you share more detailed description of what you are building, how
   parquet files are supposed to be ingested.
   Some questions arise:
   1. is that online import or bulk load
   2. why rules need to be deployed to cluster. Do you suppose to do
 reading
   inside hbase region server?
  
   As for deploying filters your cat try to use coprocessors instead. They
  can
   be configurable and loadable (but not
   unloadable, so you need to think about some class loading magic like
   ClassWorlds)
   For bulk imports you can create HFiles directly and add them
  incrementally:
   http://hbase.apache.org/book/arch.bulk.load.html
  
   On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S nishanth.2...@gmail.com
   wrote:
  
I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver
  import.
   I
could see that we can pass in filters  to this utility but looks less
flexible since  you need to deploy a new filter every time  the rules
  for
processing records change.Is there some way that we could define a
  rules
engine?
   
   
Thanks,
-Nishan
   
On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S nishanth.2...@gmail.com
wrote:
   
 Hey folks,

 I am evaluating on loading  an  hbase table from parquet files
 based
  on
 some rules that  would be applied on  parquet file records.Could
 some
   one
 help me on what would be the best way to do this?.


 Thanks,
 Nishan

   
  
  
  
   --
   Andrey.
  
 



 --
 Andrey.



Re: Wide Rows vs Multiple column families

2014-09-29 Thread Nishanth S
Hey  Ted,

I was  in the process of comparing   insert throughputs   which we
discussed using ycsb.What I could find is that when I split the data into
multiple column families  the insert through  is  coming down to half when
compared to persisting into a single column family.Do you think this is
possible or am I doing some thing wrong.

-Nishan

On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu yuzhih...@gmail.com wrote:

 There should not be impact to hbase write performance for two column
 families.

 Cheers

 On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Thank you Ted.No I do not  plan to use bulk loading  since the data is
   incremental in nature.
 
  On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   For #1, do you plan to use bulk load ?
  
   For #3, take a look at HBASE-5416 which introduced essential column
  family.
   In your query, you can designate the smaller column family as essential
   column family where smaller columns are queried.
  
   Cheers
  
   On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S nishanth.2...@gmail.com
   wrote:
  
Hi everyone,
   
This question may have been asked many times  but I would really
   appreciate
if some one can help me  on how to go about this.
   
   
Currently my hbase table consists of about 10 columns per row  which
 in
total has an average size of  5K.The chunk of the size is held by
 one
particular column(more than 4K).Would it help to move  this column
 out
   to a
different column family when we do reads.There are cases where we
 just
   need
to access the smaller columns and  there is another set of use cases
   where
you need both the data(the one in smaller column and this huge data
chunk).In general I am trying to answer the below questions in this
scenario.
   
   
1.Would seperating to multiple column families affect  hbase write
performance?
   
2. How would if affect my read performance considering both the read
   cases?
   
3.Is there any advantage that I am gaining by seperating into
 multiple
   cfs?
   
   
I would really appreciate if any one could  point me in the right
direction.
   
   
-Thanks
Nishan
   
  
 



Re: Wide Rows vs Multiple column families

2014-09-29 Thread Nishanth S
Hbase Release: 0.96.1
Number of column families at which issue is observed is 2.Earlier I had one
single column family  where all the data was persisted.In the new case I
was  storing all meta data into column family 1(less than 1k) and  a blob
on second column family(around 7Kb).
We have 9 node cluster with 7 hbase region servers and using hadoop 2.3.0.

I am also using asynch hbase client 1.5  for  ingesting data into hbase.I
had to spawn multiple put requests in this case because there is no API for
sending insert requests to multiple column family.

Thanks,
Nishan


On Mon, Sep 29, 2014 at 10:49 AM, Ted Yu yuzhih...@gmail.com wrote:

 Can you give a bit more detail, such as:

 the release of HBase you're using
 number of column families where slowdown is observed
 size of cluster
 release of hadoop you're using

 Thanks

 On Mon, Sep 29, 2014 at 9:43 AM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Hey  Ted,
 
  I was  in the process of comparing   insert throughputs   which we
  discussed using ycsb.What I could find is that when I split the data into
  multiple column families  the insert through  is  coming down to half
 when
  compared to persisting into a single column family.Do you think this is
  possible or am I doing some thing wrong.
 
  -Nishan
 
  On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   There should not be impact to hbase write performance for two column
   families.
  
   Cheers
  
   On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S nishanth.2...@gmail.com
   wrote:
  
Thank you Ted.No I do not  plan to use bulk loading  since the data
 is
 incremental in nature.
   
On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu yuzhih...@gmail.com
 wrote:
   
 For #1, do you plan to use bulk load ?

 For #3, take a look at HBASE-5416 which introduced essential column
family.
 In your query, you can designate the smaller column family as
  essential
 column family where smaller columns are queried.

 Cheers

 On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S 
 nishanth.2...@gmail.com
  
 wrote:

  Hi everyone,
 
  This question may have been asked many times  but I would really
 appreciate
  if some one can help me  on how to go about this.
 
 
  Currently my hbase table consists of about 10 columns per row
  which
   in
  total has an average size of  5K.The chunk of the size is held by
   one
  particular column(more than 4K).Would it help to move  this
 column
   out
 to a
  different column family when we do reads.There are cases where we
   just
 need
  to access the smaller columns and  there is another set of use
  cases
 where
  you need both the data(the one in smaller column and this huge
 data
  chunk).In general I am trying to answer the below questions in
 this
  scenario.
 
 
  1.Would seperating to multiple column families affect  hbase
 write
  performance?
 
  2. How would if affect my read performance considering both the
  read
 cases?
 
  3.Is there any advantage that I am gaining by seperating into
   multiple
 cfs?
 
 
  I would really appreciate if any one could  point me in the right
  direction.
 
 
  -Thanks
  Nishan
 

   
  
 



Wide Rows vs Multiple column families

2014-09-25 Thread Nishanth S
Hi everyone,

This question may have been asked many times  but I would really appreciate
if some one can help me  on how to go about this.


Currently my hbase table consists of about 10 columns per row  which in
total has an average size of  5K.The chunk of the size is held by  one
particular column(more than 4K).Would it help to move  this column out to a
different column family when we do reads.There are cases where we just need
to access the smaller columns and  there is another set of use cases where
you need both the data(the one in smaller column and this huge data
chunk).In general I am trying to answer the below questions in this
scenario.


1.Would seperating to multiple column families affect  hbase write
performance?

2. How would if affect my read performance considering both the read cases?

3.Is there any advantage that I am gaining by seperating into multiple cfs?


I would really appreciate if any one could  point me in the right direction.


-Thanks
Nishan


Re: Wide Rows vs Multiple column families

2014-09-25 Thread Nishanth S
Thank you Ted.No I do not  plan to use bulk loading  since the data is
 incremental in nature.

On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu yuzhih...@gmail.com wrote:

 For #1, do you plan to use bulk load ?

 For #3, take a look at HBASE-5416 which introduced essential column family.
 In your query, you can designate the smaller column family as essential
 column family where smaller columns are queried.

 Cheers

 On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Hi everyone,
 
  This question may have been asked many times  but I would really
 appreciate
  if some one can help me  on how to go about this.
 
 
  Currently my hbase table consists of about 10 columns per row  which in
  total has an average size of  5K.The chunk of the size is held by  one
  particular column(more than 4K).Would it help to move  this column out
 to a
  different column family when we do reads.There are cases where we just
 need
  to access the smaller columns and  there is another set of use cases
 where
  you need both the data(the one in smaller column and this huge data
  chunk).In general I am trying to answer the below questions in this
  scenario.
 
 
  1.Would seperating to multiple column families affect  hbase write
  performance?
 
  2. How would if affect my read performance considering both the read
 cases?
 
  3.Is there any advantage that I am gaining by seperating into multiple
 cfs?
 
 
  I would really appreciate if any one could  point me in the right
  direction.
 
 
  -Thanks
  Nishan
 



Re: Wide Rows vs Multiple column families

2014-09-25 Thread Nishanth S
Thank you Ted.

-Nishan

On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu yuzhih...@gmail.com wrote:

 There should not be impact to hbase write performance for two column
 families.

 Cheers

 On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Thank you Ted.No I do not  plan to use bulk loading  since the data is
   incremental in nature.
 
  On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   For #1, do you plan to use bulk load ?
  
   For #3, take a look at HBASE-5416 which introduced essential column
  family.
   In your query, you can designate the smaller column family as essential
   column family where smaller columns are queried.
  
   Cheers
  
   On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S nishanth.2...@gmail.com
   wrote:
  
Hi everyone,
   
This question may have been asked many times  but I would really
   appreciate
if some one can help me  on how to go about this.
   
   
Currently my hbase table consists of about 10 columns per row  which
 in
total has an average size of  5K.The chunk of the size is held by
 one
particular column(more than 4K).Would it help to move  this column
 out
   to a
different column family when we do reads.There are cases where we
 just
   need
to access the smaller columns and  there is another set of use cases
   where
you need both the data(the one in smaller column and this huge data
chunk).In general I am trying to answer the below questions in this
scenario.
   
   
1.Would seperating to multiple column families affect  hbase write
performance?
   
2. How would if affect my read performance considering both the read
   cases?
   
3.Is there any advantage that I am gaining by seperating into
 multiple
   cfs?
   
   
I would really appreciate if any one could  point me in the right
direction.
   
   
-Thanks
Nishan
   
  
 



Help in purging hbase data

2014-09-24 Thread Nishanth S
Hi All,

We were using TTL feature to delete the hbase data  since we were able to
 define the retention days at column family level.But right now we have a
requirement for  storing data with different retention period in this
column family.So we would need to do a select and delete.What  would be the
best way to do this?.

Thanks,
Nishan


Re: Help in purging hbase data

2014-09-24 Thread Nishanth S
Thank you Jean.is there some batch API which hbase exposes for deletes for
 which I can feed the row keys?.
The reason I am asking is I have a solr implementation which runs on the
background  which has  hbase row key as one of the fields.So It would be
pretty easy to grab a set of row keys that needs to be deleted.

-Nishan

On Wed, Sep 24, 2014 at 12:14 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Nishan,

 What you are looking for is HBASE-11764
 https://issues.apache.org/jira/browse/HBASE-11764 and not available yet.

 JM

 2014-09-24 14:12 GMT-04:00 Nishanth S nishanth.2...@gmail.com:

  Hi All,
 
  We were using TTL feature to delete the hbase data  since we were able to
   define the retention days at column family level.But right now we have a
  requirement for  storing data with different retention period in this
  column family.So we would need to do a select and delete.What  would be
 the
  best way to do this?.
 
  Thanks,
  Nishan
 



Fwd: Restructuring Hbase Table

2014-09-23 Thread Nishanth S
Hi folks,

We   have a hbase table  with  4 column families which stores log data.The
columns and the content stored on each of these column families are the
same. The reason for having multiple families is that we needed 4 retention
buckets for messages and  were using the TTL  feature of hbase to achieve
this.Each of our hbase row would have a predefined set of meta fields and a
large blob  message.

I was considering re structuring the table with  2 column families.One
column family for metadata and other for the blob message which is the
meatier chunk.The reason for  this approach  being most of the analytics
queries would be directed at meta data which is in cf1 and few in cf2 which
has the blob message.There will be  few use cases where you would need to
query  the data in both cf1 and cf2 but that is not the dominant use
case.We would then devise some method to purge  the data manually(using
retention bucket + timestamp) in row key. How does this look so far?Is
there a better way to implement this?


Thanks,
Nishanth


Restructuring Hbase Table

2014-09-22 Thread Nishanth S
Hi folks,

We   have a hbase table  with  4 column families which stores log data.The
columns and the content stored on each of these column families are the
same. The reason for having multiple families is that we needed 4 retention
buckets for messages and  were using the TTL  feature of hbase to achieve
this.Each of our hbase row would have a predefined set of meta fields and a
large blob  message.

I was considering re structuring the table with  2 column families.One
column family for metadata and other for the blob message which is the
meatier chunk.The reason for  this approach  being most of the analytics
queries would be directed at meta data which is in cf1 and few in cf2 which
has the blob message.There will be  few use cases where you would need to
query  the data in both cf1 and cf2 but that is not the dominant use
case.We would then devise some method to purge  the data manually(using
retention bucket + timestamp) in row key. How does this look so far?Is
there a better way?.


Thanks,
Nishanth


Re: Custom Filter on hbase Column

2014-09-15 Thread Nishanth S
Thanks Anoop.I did that but the only method  that was getting called in my
filter  was  public byte[] toByteArray() ,even though I over ride
transformcell.


Thanks,
Nishanth

On Thu, Sep 11, 2014 at 10:51 PM, Anoop John anoop.hb...@gmail.com wrote:

 And u have to implement
 transformCell(*final* Cell v)
 in your custom Filter.

 JFYI

 -Anoop-

 On Fri, Sep 12, 2014 at 4:36 AM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Sure  Sean.This is much needed.
 
  -Nishan
 
  On Thu, Sep 11, 2014 at 3:57 PM, Sean Busbey bus...@cloudera.com
 wrote:
 
   I filed HBASE-11950 to get some details added to the book on this
  topic[1].
  
   Nishanth, could you follow that ticket and give feedback on whatever
  update
   ends up proposed?
  
   [1]: https://issues.apache.org/jira/browse/HBASE-11950
  
   On Thu, Sep 11, 2014 at 4:40 PM, Ted Yu yuzhih...@gmail.com wrote:
  
See http://search-hadoop.com/m/DHED4xWh622
   
On Thu, Sep 11, 2014 at 2:37 PM, Nishanth S nishanth.2...@gmail.com
 
wrote:
   
 Hey All,

 I am sorry if this is a naive question.Do we need to generate a
 proto
file
 using proto buffer compiler when implementing a filter.I did not
 see
   that
 any where in the documentation.Can some one  help please?

 On Thu, Sep 11, 2014 at 12:41 PM, Nishanth S 
  nishanth.2...@gmail.com
 wrote:

  Thanks Dima and Ted.Yes I need to  return the first the 1000
  characters.There is no matching involved.
 
  -Nishan
 
  On Thu, Sep 11, 2014 at 12:24 PM, Ted Yu yuzhih...@gmail.com
   wrote:
 
  In Nishanth's case, the 5K message is stored in one KeyValue,
  right
   ?
 
  If only the first 1000 Characters of this message are to be
   returned,
  a new KeyValue
  needs to be composed and returned.
 
  Cheers
 
  On Thu, Sep 11, 2014 at 11:09 AM, Dima Spivak 
  dspi...@cloudera.com
   
  wrote:
 
   Hi Nishanth,
  
   Take a look at
 http://hbase.apache.org/book/client.filter.html
  .
If I
   understand your use case correctly, you might want to look at
   RegexStringComparator to match the first 1000 characters of
 your
 column
   qualifier.
  
   -Dima
  
   On Thu, Sep 11, 2014 at 12:37 PM, Nishanth S 
nishanth.2...@gmail.com
 
   wrote:
  
Hi All,
   
I have an hbase table with multiple cfs (say c1,c2,c3).Each
 of
this
   column
family has  a column 'message' which is about 5K.What I need
  to
   do
 is
  to
grab only the first 1000 Characters of this message when I
 do
  a
get
 on
   the
table using row Key.I was thinking of using  filters to do
  this
   on
  hbase
sever side.Can some one help me on how to go about this.
   
   
Thanks,
Nishan
   
  
 
 
 

   
  
  
  
   --
   Sean
  
 



Custom Filter on hbase Column

2014-09-11 Thread Nishanth S
Hi All,

I have an hbase table with multiple cfs (say c1,c2,c3).Each of this column
family has  a column 'message' which is about 5K.What I need to do is to
grab only the first 1000 Characters of this message when I do a get on  the
table using row Key.I was thinking of using  filters to do this on hbase
sever side.Can some one help me on how to go about this.


Thanks,
Nishan


Re: Custom Filter on hbase Column

2014-09-11 Thread Nishanth S
Thanks Dima and Ted.Yes I need to  return the first the 1000
characters.There is no matching involved.

-Nishan

On Thu, Sep 11, 2014 at 12:24 PM, Ted Yu yuzhih...@gmail.com wrote:

 In Nishanth's case, the 5K message is stored in one KeyValue, right ?

 If only the first 1000 Characters of this message are to be returned,
 a new KeyValue
 needs to be composed and returned.

 Cheers

 On Thu, Sep 11, 2014 at 11:09 AM, Dima Spivak dspi...@cloudera.com
 wrote:

  Hi Nishanth,
 
  Take a look at http://hbase.apache.org/book/client.filter.html . If I
  understand your use case correctly, you might want to look at
  RegexStringComparator to match the first 1000 characters of your column
  qualifier.
 
  -Dima
 
  On Thu, Sep 11, 2014 at 12:37 PM, Nishanth S nishanth.2...@gmail.com
  wrote:
 
   Hi All,
  
   I have an hbase table with multiple cfs (say c1,c2,c3).Each of this
  column
   family has  a column 'message' which is about 5K.What I need to do is
 to
   grab only the first 1000 Characters of this message when I do a get on
  the
   table using row Key.I was thinking of using  filters to do this on
 hbase
   sever side.Can some one help me on how to go about this.
  
  
   Thanks,
   Nishan
  
 



Re: Custom Filter on hbase Column

2014-09-11 Thread Nishanth S
Hey All,

I am sorry if this is a naive question.Do we need to generate a proto file
using proto buffer compiler when implementing a filter.I did not see that
any where in the documentation.Can some one  help please?

On Thu, Sep 11, 2014 at 12:41 PM, Nishanth S nishanth.2...@gmail.com
wrote:

 Thanks Dima and Ted.Yes I need to  return the first the 1000
 characters.There is no matching involved.

 -Nishan

 On Thu, Sep 11, 2014 at 12:24 PM, Ted Yu yuzhih...@gmail.com wrote:

 In Nishanth's case, the 5K message is stored in one KeyValue, right ?

 If only the first 1000 Characters of this message are to be returned,
 a new KeyValue
 needs to be composed and returned.

 Cheers

 On Thu, Sep 11, 2014 at 11:09 AM, Dima Spivak dspi...@cloudera.com
 wrote:

  Hi Nishanth,
 
  Take a look at http://hbase.apache.org/book/client.filter.html . If I
  understand your use case correctly, you might want to look at
  RegexStringComparator to match the first 1000 characters of your column
  qualifier.
 
  -Dima
 
  On Thu, Sep 11, 2014 at 12:37 PM, Nishanth S nishanth.2...@gmail.com
  wrote:
 
   Hi All,
  
   I have an hbase table with multiple cfs (say c1,c2,c3).Each of this
  column
   family has  a column 'message' which is about 5K.What I need to do is
 to
   grab only the first 1000 Characters of this message when I do a get on
  the
   table using row Key.I was thinking of using  filters to do this on
 hbase
   sever side.Can some one help me on how to go about this.
  
  
   Thanks,
   Nishan
  
 





Re: Custom Filter on hbase Column

2014-09-11 Thread Nishanth S
Sure  Sean.This is much needed.

-Nishan

On Thu, Sep 11, 2014 at 3:57 PM, Sean Busbey bus...@cloudera.com wrote:

 I filed HBASE-11950 to get some details added to the book on this topic[1].

 Nishanth, could you follow that ticket and give feedback on whatever update
 ends up proposed?

 [1]: https://issues.apache.org/jira/browse/HBASE-11950

 On Thu, Sep 11, 2014 at 4:40 PM, Ted Yu yuzhih...@gmail.com wrote:

  See http://search-hadoop.com/m/DHED4xWh622
 
  On Thu, Sep 11, 2014 at 2:37 PM, Nishanth S nishanth.2...@gmail.com
  wrote:
 
   Hey All,
  
   I am sorry if this is a naive question.Do we need to generate a proto
  file
   using proto buffer compiler when implementing a filter.I did not see
 that
   any where in the documentation.Can some one  help please?
  
   On Thu, Sep 11, 2014 at 12:41 PM, Nishanth S nishanth.2...@gmail.com
   wrote:
  
Thanks Dima and Ted.Yes I need to  return the first the 1000
characters.There is no matching involved.
   
-Nishan
   
On Thu, Sep 11, 2014 at 12:24 PM, Ted Yu yuzhih...@gmail.com
 wrote:
   
In Nishanth's case, the 5K message is stored in one KeyValue, right
 ?
   
If only the first 1000 Characters of this message are to be
 returned,
a new KeyValue
needs to be composed and returned.
   
Cheers
   
On Thu, Sep 11, 2014 at 11:09 AM, Dima Spivak dspi...@cloudera.com
 
wrote:
   
 Hi Nishanth,

 Take a look at http://hbase.apache.org/book/client.filter.html .
  If I
 understand your use case correctly, you might want to look at
 RegexStringComparator to match the first 1000 characters of your
   column
 qualifier.

 -Dima

 On Thu, Sep 11, 2014 at 12:37 PM, Nishanth S 
  nishanth.2...@gmail.com
   
 wrote:

  Hi All,
 
  I have an hbase table with multiple cfs (say c1,c2,c3).Each of
  this
 column
  family has  a column 'message' which is about 5K.What I need to
 do
   is
to
  grab only the first 1000 Characters of this message when I do a
  get
   on
 the
  table using row Key.I was thinking of using  filters to do this
 on
hbase
  sever side.Can some one help me on how to go about this.
 
 
  Thanks,
  Nishan
 

   
   
   
  
 



 --
 Sean



Time Series Report generation in hbase

2014-08-22 Thread Nishanth S
Hi  everyone,

We have an hbase implementation  where we have a single table  which stores
 different types of  log messages.We have a requirement to notify (send
email to mailing list) when we receive a particular type of message.I will
be able to able to identify this type of message by looking at one of the
column values which we populate.I would need to do this every hour  and
 send the cumulative result.Could you  please point me in the right
direction on  what would be the best way to implement this?


Hbase Table

cf:family 1,cf:family 2

c1,c2,c3

If value of c1='x' I need to notify.Let me know if you need more
information on this.

Thanks
-Nishan


Re: Time Series Report generation in hbase

2014-08-22 Thread Nishanth S
Thanks Ted.I will definetly take a look at these.

-Nishanth


On Fri, Aug 22, 2014 at 10:11 AM, Ted Yu yuzhih...@gmail.com wrote:

 You can utilize the following method in Scan :

   public Scan setTimeRange(long minStamp, long maxStamp)

 since you're doing periodic scanning.

 Also take a look at HBASE-5416 which introduced essential column family.

 Cheers


 On Fri, Aug 22, 2014 at 8:41 AM, Nishanth S nishanth.2...@gmail.com
 wrote:

  Hi  everyone,
 
  We have an hbase implementation  where we have a single table  which
 stores
   different types of  log messages.We have a requirement to notify (send
  email to mailing list) when we receive a particular type of message.I
 will
  be able to able to identify this type of message by looking at one of the
  column values which we populate.I would need to do this every hour  and
   send the cumulative result.Could you  please point me in the right
  direction on  what would be the best way to implement this?
 
 
  Hbase Table
 
  cf:family 1,cf:family 2
 
  c1,c2,c3
 
  If value of c1='x' I need to notify.Let me know if you need more
  information on this.
 
  Thanks
  -Nishan