Root Region Error with hbase 0.98.4
Hello, I am running a ycsb instance to insert data into hbase.All was well when this was against hbase 0.96.1.Now I am trying to run the same program to another cluster which is configured with hbase 0.98.4.I get the below error on the client side.Could some one help me with this?. The znode for the -ROOT- region doesn't exist! Thanks, Nishan
Re: oldWALs: what it is and how can I clean it?
Do you have replication turned on in hbase and if so is your slave consuming the replicated data?. -Nishanth On Wed, Feb 25, 2015 at 10:19 AM, Madeleine Piffaretti mpiffare...@powerspace.com wrote: Hi all, We are running out of space in our small hadoop cluster so I was checking disk usage on HDFS and I saw that most of the space was occupied by the* /hbase/oldWALs* folder. I have checked in the HBase Definitive Book and others books, web-site and I have also search my issue on google but I didn't find a proper response... So I would like to know what does this folder, what is use for and also how can I free space from this folder without breaking everything... If it's related to a specific version... our cluster is under 5.3.0-1.cdh5.3.0.p0.30 from cloudera (hbase 0.98.6). Thx for your help!
Re: Removing a Stored Field from Solr Schema
Please ignore.. On Fri, Jan 30, 2015 at 10:39 AM, Nishanth S nishanth.2...@gmail.com wrote: Hello, I have a field which is indexed and stored in the solr schema( 4.4.solr cloud).This field is relatively huge and I plan to only index the field and not to store.Is there a need to re-index the documents once this change is made?. Thanks, Nishanth
Re: Tool to to execute an benchmark for HBase.
You are hitting hbase harder now which is important for benchmarking.If there is no data loss it means your hbase cluster is good enough to handle the load.You are simply making more use of the cores from where you launch ycsb process.Write your own workload depending on the record sizes,format to see what can be achieved in a particular use case. -Nishanth On Fri, Jan 30, 2015 at 5:34 AM, Guillermo Ortiz konstt2...@gmail.com wrote: I have coming back to the benchmark.I executde this command: yscb run hbase -P workflowA -p columnfamilty=cf -p operationcount=10 threads=32 And I got an performace of 2000op/seg What I did later it's to execute ten of those commands in parallel and I got about 18000op/sec in total. I don't get 2000op/sec for each ot them executions but I got about 1800op/sec I don't know if ti's an HBase question, but, I don't understand why I got more performance if I execute more commands in parallel if I already execute 32 threads. I took a look to the top and I saw that in the first (just one process) the CPU was working about 20-60% when I launch more processes the CPU it's about 400-500%. 2015-01-29 18:23 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com: There's an option when you execute yscb to say how many clients threads you want to use. I tried with 1/8/16/32. Those results are with 16, the improvement 1vs8 it's pretty high not as much 16 to 32. I only use one yscb, could it be that important? -threads : the number of client threads. By default, the YCSB Client uses a single worker thread, but additional threads can be specified. This is often done to increase the amount of load offered against the database. 2015-01-29 17:27 GMT+01:00 Nishanth S nishanth.2...@gmail.com: How many instances of ycsb do you run and how many threads do you use per instance.I guess these ops are per instance and you should get similar numbers if you run more instances.In short try running more workload instances... -Nishanth On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz konstt2...@gmail.com wrote: Yes, I'm using 40%. i can't access to those data either. I don't know how YSCB executes the reads and if they are random and could take advange of the cache. Do you think that it's an acceptable performance? 2015-01-29 16:26 GMT+01:00 Ted Yu yuzhih...@gmail.com: What's the value for hfile.block.cache.size ? By default it is 40%. You may want to increase its value if you're using default. Andrew published some ycsb results : http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb -0.98.0-vs-0.98.8.pdf However, I couldn't access the above now. Cheers On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz konstt2...@gmail.com wrote: Is there any result with that benchmark to compare?? I'm executing the different workloads and for example for 100% Reads in a table with 10Millions of records I only get an performance of 2000operations/sec. I hoped much better performance but I could be wrong. I'd like to know if it's a normal performance or I could have something bad configured. I have splitted the tabled and all the records are balanced and used snappy. The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32 w/ Hyperthreading), 0.98.6-cdh5.3.0, RegionServer is executed with these parameters: /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start The results for 100% reads are [OVERALL], RunTime(ms), 42734.0 [OVERALL], Throughput(ops/sec), 2340.0570973931763 [UPDATE], Operations, 1.0 [UPDATE], AverageLatency(us), 103170.0 [UPDATE], MinLatency(us), 103168.0 [UPDATE], MaxLatency(us), 103171.0 [UPDATE], 95thPercentileLatency(ms), 103.0 [UPDATE], 99thPercentileLatency(ms), 103.0 [READ], Operations, 10.0 [READ], AverageLatency(us), 412.5534 [READ], AverageLatency(us,corrected), 581.6249026771276 [READ], MinLatency(us), 218.0 [READ], MaxLatency(us), 268383.0 [READ], MaxLatency(us,corrected), 268383.0 [READ], 95thPercentileLatency(ms), 0.0 [READ], 95thPercentileLatency(ms,corrected), 0.0 [READ
Removing a Stored Field from Solr Schema
Hello, I have a field which is indexed and stored in the solr schema( 4.4.solr cloud).This field is relatively huge and I plan to only index the field and not to store.Is there a need to re-index the documents once this change is made?. Thanks, Nishanth
Re: Tool to to execute an benchmark for HBase.
How many instances of ycsb do you run and how many threads do you use per instance.I guess these ops are per instance and you should get similar numbers if you run more instances.In short try running more workload instances... -Nishanth On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz konstt2...@gmail.com wrote: Yes, I'm using 40%. i can't access to those data either. I don't know how YSCB executes the reads and if they are random and could take advange of the cache. Do you think that it's an acceptable performance? 2015-01-29 16:26 GMT+01:00 Ted Yu yuzhih...@gmail.com: What's the value for hfile.block.cache.size ? By default it is 40%. You may want to increase its value if you're using default. Andrew published some ycsb results : http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb -0.98.0-vs-0.98.8.pdf However, I couldn't access the above now. Cheers On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz konstt2...@gmail.com wrote: Is there any result with that benchmark to compare?? I'm executing the different workloads and for example for 100% Reads in a table with 10Millions of records I only get an performance of 2000operations/sec. I hoped much better performance but I could be wrong. I'd like to know if it's a normal performance or I could have something bad configured. I have splitted the tabled and all the records are balanced and used snappy. The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32 w/ Hyperthreading), 0.98.6-cdh5.3.0, RegionServer is executed with these parameters: /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start The results for 100% reads are [OVERALL], RunTime(ms), 42734.0 [OVERALL], Throughput(ops/sec), 2340.0570973931763 [UPDATE], Operations, 1.0 [UPDATE], AverageLatency(us), 103170.0 [UPDATE], MinLatency(us), 103168.0 [UPDATE], MaxLatency(us), 103171.0 [UPDATE], 95thPercentileLatency(ms), 103.0 [UPDATE], 99thPercentileLatency(ms), 103.0 [READ], Operations, 10.0 [READ], AverageLatency(us), 412.5534 [READ], AverageLatency(us,corrected), 581.6249026771276 [READ], MinLatency(us), 218.0 [READ], MaxLatency(us), 268383.0 [READ], MaxLatency(us,corrected), 268383.0 [READ], 95thPercentileLatency(ms), 0.0 [READ], 95thPercentileLatency(ms,corrected), 0.0 [READ], 99thPercentileLatency(ms), 0.0 [READ], 99thPercentileLatency(ms,corrected), 0.0 [READ], Return=0, 10 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 103598.0 [CLEANUP], MinLatency(us), 103596.0 [CLEANUP], MaxLatency(us), 103599.0 [CLEANUP], 95thPercentileLatency(ms), 103.0 [CLEANUP], 99thPercentileLatency(ms), 103.0 hbase(main):030:0 describe 'username' DESCRIPTION ENABLED 'username', {NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', true VERSIONS = '1', COMPRESSION = 'SNAPPY', MIN_VERSIONS = '0', TTL = 'FOREVER', KEEP_DELETED_CELLS = ' false', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'} 1 row(s) in 0.0170 seconds 2015-01-29 5:27 GMT+01:00 Ted Yu yuzhih...@gmail.com: Maybe ask on Cassandra mailing list for the benchmark tool they use ? Cheers On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz konstt2...@gmail.com wrote: I was checking that web, do you know if there's another possibility since last updated for Cassandra was two years ago and I'd like to compare bothof them with kind of same tool/code. 2015-01-28 22:10 GMT+01:00 Ted Yu yuzhih...@gmail.com: Guillermo: If you use hbase 0.98.x, please consider Andrew's ycsb repo: https://github.com/apurtell/ycsb/tree/new_hbase_client Cheers On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S nishanth.2...@gmail.com wrote: You can use ycsb for this purpose.See here https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started -Nishanth On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz konstt2...@gmail.com wrote: Hi, I'd like to do some benchmarks fo HBase but I don't know what tool could use. I started to make some
Re: Tool to to execute an benchmark for HBase.
You can use ycsb for this purpose.See here https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started -Nishanth On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz konstt2...@gmail.com wrote: Hi, I'd like to do some benchmarks fo HBase but I don't know what tool could use. I started to make some code but I guess that there're some easier. I've taken a look to JMeter, but I guess that I'd attack directly from Java, JMeter looks great but I don't know if it fits well in this scenario. What tool could I use to take some measures as time to response some read and write request, etc. I'd like that to be able to make the same benchmarks to Cassandra.
Deleting Files in oldwals
HI, We were running an hbase cluster with replication enabled.How ever we have moved away from replication and turned this off.I also went ahead and removed the peers from hbase shell.How ever the oldwals directory is not cleaned up.I am using hbase version 0.96.1. Is it safe enough to delete these logs?. Thanks, Nishanth
Hbase Error When running Map reduce
Hi All, I am running a map reduce job which scans the hbase table for a particular time period and then creates some files from that.The job runs fine for 10 minutes or so and few around 10% of maps get completed succesfully.Here is the error that I am getting.Can some one help? 15/01/22 19:34:33 INFO mapreduce.TableRecordReaderImpl: recovered from org.apache.hadoop.hbase.client.ScannerTimeoutException: 559843ms passed since the last invocation, timeout is currently set to 6 at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:352) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 3432603283499371482, already closed? at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:2973) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:198) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:96) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:336) ... 13 more
Using Pig To Scan Hbase
Hey folks, I am trying to write a map reduce in pig against my hbase table.I have a salting in my rowkey appended with reverse timestamps ,so I guess the best way is to do a scan for all the dates that I require to pull out records.Does any one know if pig supports hbase scan out of the box or do we need to write a udf for that. Thannks, Nishanth
Re: Using Pig To Scan Hbase
Thank you Pradeep.That was helpful. -Nishan On Fri, Dec 5, 2014 at 11:08 AM, Pradeep Gollakota pradeep...@gmail.com wrote: There is a built in storage handler for HBase. Take a look at the docs at https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html It doesn't support dealing with salted rowkeys (or reverse timestamps) out of the box, so you may have to munge with the data a little bit after it's loaded to get what you want. Hope this helps. Pradeep On Fri Dec 05 2014 at 9:55:04 AM Nishanth S nishanth.2...@gmail.com wrote: Hey folks, I am trying to write a map reduce in pig against my hbase table.I have a salting in my rowkey appended with reverse timestamps ,so I guess the best way is to do a scan for all the dates that I require to pull out records.Does any one know if pig supports hbase scan out of the box or do we need to write a udf for that. Thannks, Nishanth
Re: Re: Hbase Dead region Server
Thanks Every one.It turned out that I there were a few empty wal directories corresponding to the dead region servers.Moved them out of /hbase and failed over the master.Things started working fine after that. -Nishanth On Mon, Nov 3, 2014 at 10:25 PM, yeweichen2...@gmail.com yeweichen2...@gmail.com wrote: Nishanth, What version of HBase you are using? You can try clear the ZNode about regionserver list in zookeeper /hbase/ and then restart HMaster. -- yeweichen2...@gmail.com *From:* Nishanth S nishanth.2...@gmail.com *Date:* 2014-11-04 02:32 *To:* user user@hbase.apache.org *Subject:* Re: Hbase Dead region Server Thanks Pere. I just did that and still has the dead region server showing up in Master UI as well as in status command.I have replication turned on in hbase and seeing few issues.Below is the stack trace I am seeing. 2014-11-03 18:31:00,215 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of a local or network error: java.io.IOException: No replication sinks are available at org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:117) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:652) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:350) 2014-11-03 18:31:00,459 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of a local or network error: java.io.IOException: No replication sinks are available at org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:117) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:652) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:350) On Mon, Nov 3, 2014 at 11:18 AM, Pere Kyle p...@whisper.sh wrote: Nishanth, In my experience the only way I have been able to clear the dead region servers is to restart the master daemon. -Pere On Mon, Nov 3, 2014 at 9:49 AM, Nishanth S nishanth.2...@gmail.com wrote: Hey folks, How do I remove a dead region server?.I manually failed over the hbase master but this is still appearing in master UI and also on the status command that I run. Thanks, Nishan
Hbase Dead region Server
Hey folks, How do I remove a dead region server?.I manually failed over the hbase master but this is still appearing in master UI and also on the status command that I run. Thanks, Nishan
Re: Hbase Dead region Server
Thanks Pere. I just did that and still has the dead region server showing up in Master UI as well as in status command.I have replication turned on in hbase and seeing few issues.Below is the stack trace I am seeing. 2014-11-03 18:31:00,215 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of a local or network error: java.io.IOException: No replication sinks are available at org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:117) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:652) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:350) 2014-11-03 18:31:00,459 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of a local or network error: java.io.IOException: No replication sinks are available at org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:117) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:652) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:350) On Mon, Nov 3, 2014 at 11:18 AM, Pere Kyle p...@whisper.sh wrote: Nishanth, In my experience the only way I have been able to clear the dead region servers is to restart the master daemon. -Pere On Mon, Nov 3, 2014 at 9:49 AM, Nishanth S nishanth.2...@gmail.com wrote: Hey folks, How do I remove a dead region server?.I manually failed over the hbase master but this is still appearing in master UI and also on the status command that I run. Thanks, Nishan
Re: Connecting via API to a remote HBASE installation
Can you telnet to port 2181 and 60020 on the remote cluster if you are running default ports.I had a similar issue in the past where there was firewall. Thanks, Nishanthon On Sun, Oct 26, 2014 at 9:39 AM, Ted Yu yuzhih...@gmail.com wrote: Is hbase-site.xml corresponding to your cluster on the classpath of your Windows program ? Cheers On Sun, Oct 26, 2014 at 8:08 AM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Hi I am running some code from windows (java) and I would like to connect to the HBASE 0.98.5 installed on a remote cluster (pseudodistributed mode) The UI gives me the following info: Software AttributesAttribute NameValueDescriptionHBase Version0.98.5-hadoop2, rUnknownHBase version and revisionHBase CompiledMon Aug 4 23:58:06 PDT 2014, apurtellWhen HBase version was compiled and by whomHadoop Version2.2.0, r1529768Hadoop version and revisionHadoop Compiled2013-10-07T06:28Z, hortonmuWhen Hadoop version was compiled and by whomZookeeper Quorum localhost:2181Addresses of all registered ZK servers. For more, see zk dump http://lnx-apollo.haifa.ibm.com:60010/zk.jsp.HBase Root Directory hdfs://localhost:8020/hbaseLocation of HBase home directoryHMaster Start TimeWed Sep 03 16:53:19 GMT+02:00 2014Date stamp of when this HMaster was startedHMaster Active TimeWed Sep 03 16:53:19 GMT+02:00 2014Date stamp of when this HMaster became activeHBase Cluster ID 950d5ca3-a174-482c-9b6e-3c858d06f44fUnique identifier generated for each HBase clusterLoad average2.00Average number of regions per regionserver. Naive computation.Coprocessors[]Coprocessors currently loaded by the master However when connecting from my code with the following code: Configuration conf = HBaseConfiguration.create(); conf.set(HBASE_CONFIGURATION_ZOOKEEPER_QUORUM, name); conf.set(HBASE_CONFIGURATION_ZOOKEEPER_CLIENTPORT, 8020); admin = new HBaseAdmin(conf); I get the following: java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:585) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) what do I miss, pelase? Benjamin
Re: Using parquet
Thanks All.I will get back if I take that direction. -Nishanth On Tue, Oct 21, 2014 at 8:15 AM, Ted Yu yuzhih...@gmail.com wrote: The link is about Cassandra, not hbase. Cheers On Tue, Oct 21, 2014 at 2:53 AM, Qiang Tian tian...@gmail.com wrote: Do you want some sql-on-hadoop could access hbase file directly? I did a quick search and find http://www.slideshare.net/Stratio/integrating-sparkandcassandra(P35), but not sure if I understand correctly. On Tue, Oct 21, 2014 at 12:15 PM, Nick Dimiduk ndimi...@gmail.com wrote: Not currently. HBase uses it's own file format that makes different assumptions than parquet. Instead, HBase supports it's own format optimizations, such as block encodings and compression. I would be interested in an exercise to see what things are necessary for HBase to support a columnar format such as parquet or orc; no such investigation has been undertaken that I am aware of. Thanks, Nick On Monday, October 20, 2014, Nishanth S nishanth.2...@gmail.com wrote: Hey folks, I have been reading a bit about parque and how hive and impala works well on data stored in parque format.Is it even possible to do the same with hbase to reduce storage etc.. Thanks, Nishanth
Using parquet
Hey folks, I have been reading a bit about parque and how hive and impala works well on data stored in parque format.Is it even possible to do the same with hbase to reduce storage etc.. Thanks, Nishanth
Re: custom filter on hbase 0.96
Hi Ted , Since I am also working on similar thing is there a way we can first test the filter on client side?.You know what I mean without disrupting others who are using the same cluster for other work? Thanks, Nishanth On Wed, Oct 15, 2014 at 3:17 PM, Ted Yu yuzhih...@gmail.com wrote: bq. Or create a new file, compile it into ... You should go with the above approach. On Wed, Oct 15, 2014 at 2:08 PM, Matt K matvey1...@gmail.com wrote: Hi all, I'm trying to get a custom filter to work on HBase 0.96. After some searching, I found that starting from 0.96, the implementer is required to implement toByteArray and parseFrom methods, using Protocol Buffers. But I'm having trouble with the how. The proto file for the existing filters is located here: https://github.com/apache/hbase/blob/master/hbase-protocol/src/main/protobuf/Filter.proto Am I supposed to modify that file? Or create a new file, compile it into Java, and package it up with the filter? In the meantime, I've taken a shortcut that's not working. Here's my code: http://pastebin.com/iHFKu9Xz I'm using PrefixFilter, which comes with HBase, since I'm also filtering by prefix. However, that errors out with the following: http://pastebin.com/zBg47p6Z Thanks in advance for helping! -Matt
Re: custom filter on hbase 0.96
Thanks Ted .I will take a look. -Nishanth On Wed, Oct 15, 2014 at 3:43 PM, Ted Yu yuzhih...@gmail.com wrote: Nishanth: Good question. As a general coding guide, writing unit test is always a good start. Using Matt's case as an example, take a look at TestPrefixFilter. There're various unit tests for Filters in hbase code. Cheers On Wed, Oct 15, 2014 at 2:30 PM, Nishanth S nishanth.2...@gmail.com wrote: Hi Ted , Since I am also working on similar thing is there a way we can first test the filter on client side?.You know what I mean without disrupting others who are using the same cluster for other work? Thanks, Nishanth On Wed, Oct 15, 2014 at 3:17 PM, Ted Yu yuzhih...@gmail.com wrote: bq. Or create a new file, compile it into ... You should go with the above approach. On Wed, Oct 15, 2014 at 2:08 PM, Matt K matvey1...@gmail.com wrote: Hi all, I'm trying to get a custom filter to work on HBase 0.96. After some searching, I found that starting from 0.96, the implementer is required to implement toByteArray and parseFrom methods, using Protocol Buffers. But I'm having trouble with the how. The proto file for the existing filters is located here: https://github.com/apache/hbase/blob/master/hbase-protocol/src/main/protobuf/Filter.proto Am I supposed to modify that file? Or create a new file, compile it into Java, and package it up with the filter? In the meantime, I've taken a shortcut that's not working. Here's my code: http://pastebin.com/iHFKu9Xz I'm using PrefixFilter, which comes with HBase, since I'm also filtering by prefix. However, that errors out with the following: http://pastebin.com/zBg47p6Z Thanks in advance for helping! -Matt
Loading hbase from parquet files
Hey folks, I am evaluating on loading an hbase table from parquet files based on some rules that would be applied on parquet file records.Could some one help me on what would be the best way to do this?. Thanks, Nishan
Re: Loading hbase from parquet files
I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver import. I could see that we can pass in filters to this utility but looks less flexible since you need to deploy a new filter every time the rules for processing records change.Is there some way that we could define a rules engine? Thanks, -Nishan On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S nishanth.2...@gmail.com wrote: Hey folks, I am evaluating on loading an hbase table from parquet files based on some rules that would be applied on parquet file records.Could some one help me on what would be the best way to do this?. Thanks, Nishan
Re: Loading hbase from parquet files
Thanks Andrey.In the current system the hbase cfs have a ttl of 30 days and data gets deleted after this(has snappy compression).Below is something what I am trying to acheive. 1.Export the data from hbase table before it gets deleted. 2.Store it in some format which supports maximum compression(storage cost is my primary concern here),so looking at parquet. 3.Load a subset of this data back into hbase based on certain rules(say i want to load all rows which has a particular string in one of the fields). I was thinking of bulkloading this data back into hbase but I am not sure how I can load a subset of the data using org.apache.hadoop.hbase.mapreduce.Driver import. On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev oct...@gmail.com wrote: Hi Nishanth. Not clear what exactly you are building. Can you share more detailed description of what you are building, how parquet files are supposed to be ingested. Some questions arise: 1. is that online import or bulk load 2. why rules need to be deployed to cluster. Do you suppose to do reading inside hbase region server? As for deploying filters your cat try to use coprocessors instead. They can be configurable and loadable (but not unloadable, so you need to think about some class loading magic like ClassWorlds) For bulk imports you can create HFiles directly and add them incrementally: http://hbase.apache.org/book/arch.bulk.load.html On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S nishanth.2...@gmail.com wrote: I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver import. I could see that we can pass in filters to this utility but looks less flexible since you need to deploy a new filter every time the rules for processing records change.Is there some way that we could define a rules engine? Thanks, -Nishan On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S nishanth.2...@gmail.com wrote: Hey folks, I am evaluating on loading an hbase table from parquet files based on some rules that would be applied on parquet file records.Could some one help me on what would be the best way to do this?. Thanks, Nishan -- Andrey.
Re: Loading hbase from parquet files
Thank you guys for the information. -cheers Nishan On Wed, Oct 8, 2014 at 12:49 PM, Andrey Stepachev oct...@gmail.com wrote: For that use case I'd prefer to write new filtered HFiles with map reduce and then import those data into hbase using bulk import. Keep in mind, that incremental load tool moves files, not copies them. So once written you will not do any additional writes (except for those regions which was split while you filtering data). If importing data is small that would not be a problem. On Wed, Oct 8, 2014 at 8:45 PM, Nishanth S nishanth.2...@gmail.com wrote: Thanks Andrey.In the current system the hbase cfs have a ttl of 30 days and data gets deleted after this(has snappy compression).Below is something what I am trying to acheive. 1.Export the data from hbase table before it gets deleted. 2.Store it in some format which supports maximum compression(storage cost is my primary concern here),so looking at parquet. 3.Load a subset of this data back into hbase based on certain rules(say i want to load all rows which has a particular string in one of the fields). I was thinking of bulkloading this data back into hbase but I am not sure how I can load a subset of the data using org.apache.hadoop.hbase.mapreduce.Driver import. On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev oct...@gmail.com wrote: Hi Nishanth. Not clear what exactly you are building. Can you share more detailed description of what you are building, how parquet files are supposed to be ingested. Some questions arise: 1. is that online import or bulk load 2. why rules need to be deployed to cluster. Do you suppose to do reading inside hbase region server? As for deploying filters your cat try to use coprocessors instead. They can be configurable and loadable (but not unloadable, so you need to think about some class loading magic like ClassWorlds) For bulk imports you can create HFiles directly and add them incrementally: http://hbase.apache.org/book/arch.bulk.load.html On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S nishanth.2...@gmail.com wrote: I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver import. I could see that we can pass in filters to this utility but looks less flexible since you need to deploy a new filter every time the rules for processing records change.Is there some way that we could define a rules engine? Thanks, -Nishan On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S nishanth.2...@gmail.com wrote: Hey folks, I am evaluating on loading an hbase table from parquet files based on some rules that would be applied on parquet file records.Could some one help me on what would be the best way to do this?. Thanks, Nishan -- Andrey. -- Andrey.
Re: Wide Rows vs Multiple column families
Hey Ted, I was in the process of comparing insert throughputs which we discussed using ycsb.What I could find is that when I split the data into multiple column families the insert through is coming down to half when compared to persisting into a single column family.Do you think this is possible or am I doing some thing wrong. -Nishan On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu yuzhih...@gmail.com wrote: There should not be impact to hbase write performance for two column families. Cheers On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S nishanth.2...@gmail.com wrote: Thank you Ted.No I do not plan to use bulk loading since the data is incremental in nature. On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu yuzhih...@gmail.com wrote: For #1, do you plan to use bulk load ? For #3, take a look at HBASE-5416 which introduced essential column family. In your query, you can designate the smaller column family as essential column family where smaller columns are queried. Cheers On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S nishanth.2...@gmail.com wrote: Hi everyone, This question may have been asked many times but I would really appreciate if some one can help me on how to go about this. Currently my hbase table consists of about 10 columns per row which in total has an average size of 5K.The chunk of the size is held by one particular column(more than 4K).Would it help to move this column out to a different column family when we do reads.There are cases where we just need to access the smaller columns and there is another set of use cases where you need both the data(the one in smaller column and this huge data chunk).In general I am trying to answer the below questions in this scenario. 1.Would seperating to multiple column families affect hbase write performance? 2. How would if affect my read performance considering both the read cases? 3.Is there any advantage that I am gaining by seperating into multiple cfs? I would really appreciate if any one could point me in the right direction. -Thanks Nishan
Re: Wide Rows vs Multiple column families
Hbase Release: 0.96.1 Number of column families at which issue is observed is 2.Earlier I had one single column family where all the data was persisted.In the new case I was storing all meta data into column family 1(less than 1k) and a blob on second column family(around 7Kb). We have 9 node cluster with 7 hbase region servers and using hadoop 2.3.0. I am also using asynch hbase client 1.5 for ingesting data into hbase.I had to spawn multiple put requests in this case because there is no API for sending insert requests to multiple column family. Thanks, Nishan On Mon, Sep 29, 2014 at 10:49 AM, Ted Yu yuzhih...@gmail.com wrote: Can you give a bit more detail, such as: the release of HBase you're using number of column families where slowdown is observed size of cluster release of hadoop you're using Thanks On Mon, Sep 29, 2014 at 9:43 AM, Nishanth S nishanth.2...@gmail.com wrote: Hey Ted, I was in the process of comparing insert throughputs which we discussed using ycsb.What I could find is that when I split the data into multiple column families the insert through is coming down to half when compared to persisting into a single column family.Do you think this is possible or am I doing some thing wrong. -Nishan On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu yuzhih...@gmail.com wrote: There should not be impact to hbase write performance for two column families. Cheers On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S nishanth.2...@gmail.com wrote: Thank you Ted.No I do not plan to use bulk loading since the data is incremental in nature. On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu yuzhih...@gmail.com wrote: For #1, do you plan to use bulk load ? For #3, take a look at HBASE-5416 which introduced essential column family. In your query, you can designate the smaller column family as essential column family where smaller columns are queried. Cheers On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S nishanth.2...@gmail.com wrote: Hi everyone, This question may have been asked many times but I would really appreciate if some one can help me on how to go about this. Currently my hbase table consists of about 10 columns per row which in total has an average size of 5K.The chunk of the size is held by one particular column(more than 4K).Would it help to move this column out to a different column family when we do reads.There are cases where we just need to access the smaller columns and there is another set of use cases where you need both the data(the one in smaller column and this huge data chunk).In general I am trying to answer the below questions in this scenario. 1.Would seperating to multiple column families affect hbase write performance? 2. How would if affect my read performance considering both the read cases? 3.Is there any advantage that I am gaining by seperating into multiple cfs? I would really appreciate if any one could point me in the right direction. -Thanks Nishan
Wide Rows vs Multiple column families
Hi everyone, This question may have been asked many times but I would really appreciate if some one can help me on how to go about this. Currently my hbase table consists of about 10 columns per row which in total has an average size of 5K.The chunk of the size is held by one particular column(more than 4K).Would it help to move this column out to a different column family when we do reads.There are cases where we just need to access the smaller columns and there is another set of use cases where you need both the data(the one in smaller column and this huge data chunk).In general I am trying to answer the below questions in this scenario. 1.Would seperating to multiple column families affect hbase write performance? 2. How would if affect my read performance considering both the read cases? 3.Is there any advantage that I am gaining by seperating into multiple cfs? I would really appreciate if any one could point me in the right direction. -Thanks Nishan
Re: Wide Rows vs Multiple column families
Thank you Ted.No I do not plan to use bulk loading since the data is incremental in nature. On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu yuzhih...@gmail.com wrote: For #1, do you plan to use bulk load ? For #3, take a look at HBASE-5416 which introduced essential column family. In your query, you can designate the smaller column family as essential column family where smaller columns are queried. Cheers On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S nishanth.2...@gmail.com wrote: Hi everyone, This question may have been asked many times but I would really appreciate if some one can help me on how to go about this. Currently my hbase table consists of about 10 columns per row which in total has an average size of 5K.The chunk of the size is held by one particular column(more than 4K).Would it help to move this column out to a different column family when we do reads.There are cases where we just need to access the smaller columns and there is another set of use cases where you need both the data(the one in smaller column and this huge data chunk).In general I am trying to answer the below questions in this scenario. 1.Would seperating to multiple column families affect hbase write performance? 2. How would if affect my read performance considering both the read cases? 3.Is there any advantage that I am gaining by seperating into multiple cfs? I would really appreciate if any one could point me in the right direction. -Thanks Nishan
Re: Wide Rows vs Multiple column families
Thank you Ted. -Nishan On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu yuzhih...@gmail.com wrote: There should not be impact to hbase write performance for two column families. Cheers On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S nishanth.2...@gmail.com wrote: Thank you Ted.No I do not plan to use bulk loading since the data is incremental in nature. On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu yuzhih...@gmail.com wrote: For #1, do you plan to use bulk load ? For #3, take a look at HBASE-5416 which introduced essential column family. In your query, you can designate the smaller column family as essential column family where smaller columns are queried. Cheers On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S nishanth.2...@gmail.com wrote: Hi everyone, This question may have been asked many times but I would really appreciate if some one can help me on how to go about this. Currently my hbase table consists of about 10 columns per row which in total has an average size of 5K.The chunk of the size is held by one particular column(more than 4K).Would it help to move this column out to a different column family when we do reads.There are cases where we just need to access the smaller columns and there is another set of use cases where you need both the data(the one in smaller column and this huge data chunk).In general I am trying to answer the below questions in this scenario. 1.Would seperating to multiple column families affect hbase write performance? 2. How would if affect my read performance considering both the read cases? 3.Is there any advantage that I am gaining by seperating into multiple cfs? I would really appreciate if any one could point me in the right direction. -Thanks Nishan
Help in purging hbase data
Hi All, We were using TTL feature to delete the hbase data since we were able to define the retention days at column family level.But right now we have a requirement for storing data with different retention period in this column family.So we would need to do a select and delete.What would be the best way to do this?. Thanks, Nishan
Re: Help in purging hbase data
Thank you Jean.is there some batch API which hbase exposes for deletes for which I can feed the row keys?. The reason I am asking is I have a solr implementation which runs on the background which has hbase row key as one of the fields.So It would be pretty easy to grab a set of row keys that needs to be deleted. -Nishan On Wed, Sep 24, 2014 at 12:14 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Nishan, What you are looking for is HBASE-11764 https://issues.apache.org/jira/browse/HBASE-11764 and not available yet. JM 2014-09-24 14:12 GMT-04:00 Nishanth S nishanth.2...@gmail.com: Hi All, We were using TTL feature to delete the hbase data since we were able to define the retention days at column family level.But right now we have a requirement for storing data with different retention period in this column family.So we would need to do a select and delete.What would be the best way to do this?. Thanks, Nishan
Fwd: Restructuring Hbase Table
Hi folks, We have a hbase table with 4 column families which stores log data.The columns and the content stored on each of these column families are the same. The reason for having multiple families is that we needed 4 retention buckets for messages and were using the TTL feature of hbase to achieve this.Each of our hbase row would have a predefined set of meta fields and a large blob message. I was considering re structuring the table with 2 column families.One column family for metadata and other for the blob message which is the meatier chunk.The reason for this approach being most of the analytics queries would be directed at meta data which is in cf1 and few in cf2 which has the blob message.There will be few use cases where you would need to query the data in both cf1 and cf2 but that is not the dominant use case.We would then devise some method to purge the data manually(using retention bucket + timestamp) in row key. How does this look so far?Is there a better way to implement this? Thanks, Nishanth
Restructuring Hbase Table
Hi folks, We have a hbase table with 4 column families which stores log data.The columns and the content stored on each of these column families are the same. The reason for having multiple families is that we needed 4 retention buckets for messages and were using the TTL feature of hbase to achieve this.Each of our hbase row would have a predefined set of meta fields and a large blob message. I was considering re structuring the table with 2 column families.One column family for metadata and other for the blob message which is the meatier chunk.The reason for this approach being most of the analytics queries would be directed at meta data which is in cf1 and few in cf2 which has the blob message.There will be few use cases where you would need to query the data in both cf1 and cf2 but that is not the dominant use case.We would then devise some method to purge the data manually(using retention bucket + timestamp) in row key. How does this look so far?Is there a better way?. Thanks, Nishanth
Re: Custom Filter on hbase Column
Thanks Anoop.I did that but the only method that was getting called in my filter was public byte[] toByteArray() ,even though I over ride transformcell. Thanks, Nishanth On Thu, Sep 11, 2014 at 10:51 PM, Anoop John anoop.hb...@gmail.com wrote: And u have to implement transformCell(*final* Cell v) in your custom Filter. JFYI -Anoop- On Fri, Sep 12, 2014 at 4:36 AM, Nishanth S nishanth.2...@gmail.com wrote: Sure Sean.This is much needed. -Nishan On Thu, Sep 11, 2014 at 3:57 PM, Sean Busbey bus...@cloudera.com wrote: I filed HBASE-11950 to get some details added to the book on this topic[1]. Nishanth, could you follow that ticket and give feedback on whatever update ends up proposed? [1]: https://issues.apache.org/jira/browse/HBASE-11950 On Thu, Sep 11, 2014 at 4:40 PM, Ted Yu yuzhih...@gmail.com wrote: See http://search-hadoop.com/m/DHED4xWh622 On Thu, Sep 11, 2014 at 2:37 PM, Nishanth S nishanth.2...@gmail.com wrote: Hey All, I am sorry if this is a naive question.Do we need to generate a proto file using proto buffer compiler when implementing a filter.I did not see that any where in the documentation.Can some one help please? On Thu, Sep 11, 2014 at 12:41 PM, Nishanth S nishanth.2...@gmail.com wrote: Thanks Dima and Ted.Yes I need to return the first the 1000 characters.There is no matching involved. -Nishan On Thu, Sep 11, 2014 at 12:24 PM, Ted Yu yuzhih...@gmail.com wrote: In Nishanth's case, the 5K message is stored in one KeyValue, right ? If only the first 1000 Characters of this message are to be returned, a new KeyValue needs to be composed and returned. Cheers On Thu, Sep 11, 2014 at 11:09 AM, Dima Spivak dspi...@cloudera.com wrote: Hi Nishanth, Take a look at http://hbase.apache.org/book/client.filter.html . If I understand your use case correctly, you might want to look at RegexStringComparator to match the first 1000 characters of your column qualifier. -Dima On Thu, Sep 11, 2014 at 12:37 PM, Nishanth S nishanth.2...@gmail.com wrote: Hi All, I have an hbase table with multiple cfs (say c1,c2,c3).Each of this column family has a column 'message' which is about 5K.What I need to do is to grab only the first 1000 Characters of this message when I do a get on the table using row Key.I was thinking of using filters to do this on hbase sever side.Can some one help me on how to go about this. Thanks, Nishan -- Sean
Custom Filter on hbase Column
Hi All, I have an hbase table with multiple cfs (say c1,c2,c3).Each of this column family has a column 'message' which is about 5K.What I need to do is to grab only the first 1000 Characters of this message when I do a get on the table using row Key.I was thinking of using filters to do this on hbase sever side.Can some one help me on how to go about this. Thanks, Nishan
Re: Custom Filter on hbase Column
Thanks Dima and Ted.Yes I need to return the first the 1000 characters.There is no matching involved. -Nishan On Thu, Sep 11, 2014 at 12:24 PM, Ted Yu yuzhih...@gmail.com wrote: In Nishanth's case, the 5K message is stored in one KeyValue, right ? If only the first 1000 Characters of this message are to be returned, a new KeyValue needs to be composed and returned. Cheers On Thu, Sep 11, 2014 at 11:09 AM, Dima Spivak dspi...@cloudera.com wrote: Hi Nishanth, Take a look at http://hbase.apache.org/book/client.filter.html . If I understand your use case correctly, you might want to look at RegexStringComparator to match the first 1000 characters of your column qualifier. -Dima On Thu, Sep 11, 2014 at 12:37 PM, Nishanth S nishanth.2...@gmail.com wrote: Hi All, I have an hbase table with multiple cfs (say c1,c2,c3).Each of this column family has a column 'message' which is about 5K.What I need to do is to grab only the first 1000 Characters of this message when I do a get on the table using row Key.I was thinking of using filters to do this on hbase sever side.Can some one help me on how to go about this. Thanks, Nishan
Re: Custom Filter on hbase Column
Hey All, I am sorry if this is a naive question.Do we need to generate a proto file using proto buffer compiler when implementing a filter.I did not see that any where in the documentation.Can some one help please? On Thu, Sep 11, 2014 at 12:41 PM, Nishanth S nishanth.2...@gmail.com wrote: Thanks Dima and Ted.Yes I need to return the first the 1000 characters.There is no matching involved. -Nishan On Thu, Sep 11, 2014 at 12:24 PM, Ted Yu yuzhih...@gmail.com wrote: In Nishanth's case, the 5K message is stored in one KeyValue, right ? If only the first 1000 Characters of this message are to be returned, a new KeyValue needs to be composed and returned. Cheers On Thu, Sep 11, 2014 at 11:09 AM, Dima Spivak dspi...@cloudera.com wrote: Hi Nishanth, Take a look at http://hbase.apache.org/book/client.filter.html . If I understand your use case correctly, you might want to look at RegexStringComparator to match the first 1000 characters of your column qualifier. -Dima On Thu, Sep 11, 2014 at 12:37 PM, Nishanth S nishanth.2...@gmail.com wrote: Hi All, I have an hbase table with multiple cfs (say c1,c2,c3).Each of this column family has a column 'message' which is about 5K.What I need to do is to grab only the first 1000 Characters of this message when I do a get on the table using row Key.I was thinking of using filters to do this on hbase sever side.Can some one help me on how to go about this. Thanks, Nishan
Re: Custom Filter on hbase Column
Sure Sean.This is much needed. -Nishan On Thu, Sep 11, 2014 at 3:57 PM, Sean Busbey bus...@cloudera.com wrote: I filed HBASE-11950 to get some details added to the book on this topic[1]. Nishanth, could you follow that ticket and give feedback on whatever update ends up proposed? [1]: https://issues.apache.org/jira/browse/HBASE-11950 On Thu, Sep 11, 2014 at 4:40 PM, Ted Yu yuzhih...@gmail.com wrote: See http://search-hadoop.com/m/DHED4xWh622 On Thu, Sep 11, 2014 at 2:37 PM, Nishanth S nishanth.2...@gmail.com wrote: Hey All, I am sorry if this is a naive question.Do we need to generate a proto file using proto buffer compiler when implementing a filter.I did not see that any where in the documentation.Can some one help please? On Thu, Sep 11, 2014 at 12:41 PM, Nishanth S nishanth.2...@gmail.com wrote: Thanks Dima and Ted.Yes I need to return the first the 1000 characters.There is no matching involved. -Nishan On Thu, Sep 11, 2014 at 12:24 PM, Ted Yu yuzhih...@gmail.com wrote: In Nishanth's case, the 5K message is stored in one KeyValue, right ? If only the first 1000 Characters of this message are to be returned, a new KeyValue needs to be composed and returned. Cheers On Thu, Sep 11, 2014 at 11:09 AM, Dima Spivak dspi...@cloudera.com wrote: Hi Nishanth, Take a look at http://hbase.apache.org/book/client.filter.html . If I understand your use case correctly, you might want to look at RegexStringComparator to match the first 1000 characters of your column qualifier. -Dima On Thu, Sep 11, 2014 at 12:37 PM, Nishanth S nishanth.2...@gmail.com wrote: Hi All, I have an hbase table with multiple cfs (say c1,c2,c3).Each of this column family has a column 'message' which is about 5K.What I need to do is to grab only the first 1000 Characters of this message when I do a get on the table using row Key.I was thinking of using filters to do this on hbase sever side.Can some one help me on how to go about this. Thanks, Nishan -- Sean
Time Series Report generation in hbase
Hi everyone, We have an hbase implementation where we have a single table which stores different types of log messages.We have a requirement to notify (send email to mailing list) when we receive a particular type of message.I will be able to able to identify this type of message by looking at one of the column values which we populate.I would need to do this every hour and send the cumulative result.Could you please point me in the right direction on what would be the best way to implement this? Hbase Table cf:family 1,cf:family 2 c1,c2,c3 If value of c1='x' I need to notify.Let me know if you need more information on this. Thanks -Nishan
Re: Time Series Report generation in hbase
Thanks Ted.I will definetly take a look at these. -Nishanth On Fri, Aug 22, 2014 at 10:11 AM, Ted Yu yuzhih...@gmail.com wrote: You can utilize the following method in Scan : public Scan setTimeRange(long minStamp, long maxStamp) since you're doing periodic scanning. Also take a look at HBASE-5416 which introduced essential column family. Cheers On Fri, Aug 22, 2014 at 8:41 AM, Nishanth S nishanth.2...@gmail.com wrote: Hi everyone, We have an hbase implementation where we have a single table which stores different types of log messages.We have a requirement to notify (send email to mailing list) when we receive a particular type of message.I will be able to able to identify this type of message by looking at one of the column values which we populate.I would need to do this every hour and send the cumulative result.Could you please point me in the right direction on what would be the best way to implement this? Hbase Table cf:family 1,cf:family 2 c1,c2,c3 If value of c1='x' I need to notify.Let me know if you need more information on this. Thanks -Nishan