from:"ramkrishna vasudevan"

Re: [ANNOUNCE] New HBase committer Nihal Jain

2023-05-03 Thread ramkrishna vasudevan

Congratulations !!!

On Thu, May 4, 2023 at 8:39 AM 张铎(Duo Zhang)  wrote:

> Congratulations!
>
> Viraj Jasani  于2023年5月3日周三 23:47写道：
>
> > Congratulations Nihal!! Very well deserved!!
> >
> > On Wed, May 3, 2023 at 5:12 AM Nick Dimiduk  wrote:
> >
> > > Hello!
> > >
> > > On behalf of the Apache HBase PMC, I am pleased to announce that Nihal
> > Jain
> > > has accepted the PMC's invitation to become a committer on the project.
> > We
> > > appreciate all of Nihal's generous contributions thus far and look
> > forward
> > > to his continued involvement.
> > >
> > > Congratulations and welcome, Nihal Jain!
> > >
> > > Thanks,
> > > Nick
> > >
> >
>

Re: Extremely long flush times

2020-01-29 Thread ramkrishna vasudevan

Hi Minwoo Kang

Any updates here? Where you able to over come the issue with the upgrade?
Or by applying the patch?

Regards
Ram

On Fri, Jan 10, 2020 at 11:44 AM Kang Minwoo 
wrote:

> Thanks for the reply.
> It is a lot of help to me.
>
> Best regards,
> Minwoo Kang
>
> ____
> 보낸 사람: ramkrishna vasudevan 
> 보낸 날짜: 2020년 1월 10일 금요일 14:35
> 받는 사람: Hbase-User
> 참조: Stack
> 제목: Re: Extremely long flush times
>
> Hi
>
> In your case you have large compactions going on and at the same time heavy
> reads happening. Since there are lot of deletes the scan is spending
> sufficient time in file reads.
> Since compactions/flushes  happens every now and then the readers are
> getting reset and that is causing the lock to be acquired and since there
> are multiple threads competing your scans suffer more because they are not
> able to reset themselves.
>
> Yes - if the above said case is true for your scenario - then HBASE-13082
> will help you out. Since it avoid scanners being reset on compactions and
> only one flushes and that too it is not a hard call to reset. If the
> scanner finds the boolean to be set then it resets if not the scan just
> goes on .
>
> Regards
> Ram
>
> On Fri, Jan 10, 2020 at 10:01 AM Kang Minwoo 
> wrote:
>
> > Thank you for reply.
> >
> > All Regions or just the one?
> > => just one
> >
> > Do thread dumps lock thread reading against hdfs every time you take one?
> > => yes
> >
> > Is it always inside in updateReaders? Is there a bad file or lots of
> files
> > to add to the list?
> > => always inside in updateReaders.
> >
> > 
> >
> > Sorry for the delay in reply.
> >
> > I had to handle this issue.
> > Temporarily, I fixed my code that does not occur in that situation that
> is
> > read worthlessness cell.
> > After that, The issue hasn't occurred.
> >
> > Background:
> > My application deletes out of date data every day.
> > And Region is extremely big.  Major compaction spent a lot of time.
> > tombstone cell remains a long time.
> > If the client read full data. there is a lot of worthlessness cells.
> > I think it is a reason for lock thread reading hdfs files.
> >
> > I'm looking at the HBASE-13082[1].
> > (I am not sure HBASE-13082 is related.)
> >
> > [1]: https://issues.apache.org/jira/browse/HBASE-13082
> >
> > Best regards,
> > Minwoo Kang
> >
> > 
> > 보낸 사람: Stack 
> > 보낸 날짜: 2020년 1월 4일 토요일 03:40
> > 받는 사람: Hbase-User
> > 제목: Re: Extremely long flush times
> >
> > All Regions or just the one?
> >
> > Do thread dumps lock thread reading against hdfs every time you take one?
> >
> > Is it always inside in updateReaders? Is there a bad file or lots of
> files
> > to add to the list?
> >
> > Yours,
> > S
> >
> >
> >
> > On Thu, Jan 2, 2020 at 8:34 PM Kang Minwoo 
> > wrote:
> >
> > > Hello Users,
> > >
> > > I met an issue that is flush times is too long.
> > >
> > > MemStoreFlusher is waiting for a lock.
> > > ```
> > > "MemStoreFlusher.0"
> > >java.lang.Thread.State: WAITING (parking)
> > > at sun.misc.Unsafe.park(Native Method)
> > > - parking to wait for  <0x7f0412bddcb8> (a
> > > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > > at
> > > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > > at
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> > > at
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> > > at
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> > > at
> > >
> >
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> > > at
> > > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:692)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1

Re: Extremely long flush times

2020-01-09 Thread ramkrishna vasudevan

Hi

In your case you have large compactions going on and at the same time heavy
reads happening. Since there are lot of deletes the scan is spending
sufficient time in file reads.
Since compactions/flushes  happens every now and then the readers are
getting reset and that is causing the lock to be acquired and since there
are multiple threads competing your scans suffer more because they are not
able to reset themselves.

Yes - if the above said case is true for your scenario - then HBASE-13082
will help you out. Since it avoid scanners being reset on compactions and
only one flushes and that too it is not a hard call to reset. If the
scanner finds the boolean to be set then it resets if not the scan just
goes on .

Regards
Ram

On Fri, Jan 10, 2020 at 10:01 AM Kang Minwoo 
wrote:

> Thank you for reply.
>
> All Regions or just the one?
> => just one
>
> Do thread dumps lock thread reading against hdfs every time you take one?
> => yes
>
> Is it always inside in updateReaders? Is there a bad file or lots of files
> to add to the list?
> => always inside in updateReaders.
>
> 
>
> Sorry for the delay in reply.
>
> I had to handle this issue.
> Temporarily, I fixed my code that does not occur in that situation that is
> read worthlessness cell.
> After that, The issue hasn't occurred.
>
> Background:
> My application deletes out of date data every day.
> And Region is extremely big.  Major compaction spent a lot of time.
> tombstone cell remains a long time.
> If the client read full data. there is a lot of worthlessness cells.
> I think it is a reason for lock thread reading hdfs files.
>
> I'm looking at the HBASE-13082[1].
> (I am not sure HBASE-13082 is related.)
>
> [1]: https://issues.apache.org/jira/browse/HBASE-13082
>
> Best regards,
> Minwoo Kang
>
> 
> 보낸 사람: Stack 
> 보낸 날짜: 2020년 1월 4일 토요일 03:40
> 받는 사람: Hbase-User
> 제목: Re: Extremely long flush times
>
> All Regions or just the one?
>
> Do thread dumps lock thread reading against hdfs every time you take one?
>
> Is it always inside in updateReaders? Is there a bad file or lots of files
> to add to the list?
>
> Yours,
> S
>
>
>
> On Thu, Jan 2, 2020 at 8:34 PM Kang Minwoo 
> wrote:
>
> > Hello Users,
> >
> > I met an issue that is flush times is too long.
> >
> > MemStoreFlusher is waiting for a lock.
> > ```
> > "MemStoreFlusher.0"
> >java.lang.Thread.State: WAITING (parking)
> > at sun.misc.Unsafe.park(Native Method)
> > - parking to wait for  <0x7f0412bddcb8> (a
> > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > at
> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> > at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> > at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> > at
> >
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> > at
> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> > at
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:692)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1100)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1079)
> > at
> > org.apache.hadoop.hbase.regionserver.HStore.access$700(HStore.java:118)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2321)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2430)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2153)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2115)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2005)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1930)
> > at
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:514)
> > at
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:475)
> > at
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
> > at
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:263)
> > at java.lang.Thread.run(Thread.java:748)
> >Locked ownable synchronizers:
> > - None
> > ```
> >
> >
> > RPC Handler had the lock.
> > ```
> > "B.defaultRpcServer.handler"
> >java.lang.Thread.State: RUNNABLE
> > at org.apache.log4j.

Re: [ANNOUNCE] New HBase committer Viraj Jasani

2019-12-31 Thread ramkrishna vasudevan

Congratulations viraj.

On Tue, Dec 31, 2019 at 10:59 AM Pankaj kr  wrote:

> Congratulations Viraj...!!
>
> Regards,
> Pankaj
>
> From:Anoop John 
> To:dev 
> Cc:hbase-user 
> Date:2019-12-31 10:15:07
> Subject:Re: [ANNOUNCE] New HBase committer Viraj Jasani
>
> Congrats Viraj !!!
>
> Anoop
>
> On Tue, Dec 31, 2019 at 9:26 AM Sukumar Maddineni
>  wrote:
>
> > Wow congrats Viraj and Keep up the good work.
> >
> > --
> > Sukumar
> >
> > On Mon, Dec 30, 2019 at 5:45 PM 宾莉金（binlijin） 
> wrote:
> >
> > > Welcome and Congratulations, Viraj!
> > >
> > > 张铎(Duo Zhang)  于2019年12月30日周一 下午1:18写道：
> > >
> > > > Congratulations!
> > > >
> > > > 李响  于2019年12月30日周一 上午10:12写道：
> > > >
> > > > >- Congratulations and warmly welcome \^o^/
> > > > >
> > > > >
> > > > > On Sun, Dec 29, 2019 at 2:14 AM Jan Hentschel <
> > > > > jan.hentsc...@ultratendency.com> wrote:
> > > > >
> > > > > > Congrats and welcome, Viraj! Well deserved.
> > > > > >
> > > > > > From: Peter Somogyi 
> > > > > > Reply-To: "d...@hbase.apache.org" 
> > > > > > Date: Friday, December 27, 2019 at 2:02 PM
> > > > > > To: HBase Dev List , hbase-user <
> > > > > > user@hbase.apache.org>
> > > > > > Subject: [ANNOUNCE] New HBase committer Viraj Jasani
> > > > > >
> > > > > > On behalf of the Apache HBase PMC I am pleased to announce that
> > > > > > Viraj Jasani has accepted the PMC's invitation to become a
> > > > > > commiter on the project.
> > > > > >
> > > > > > Thanks so much for the work you've been contributing. We look
> > forward
> > > > > > to your continued involvement.
> > > > > >
> > > > > > Congratulations and welcome!
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > >李响 Xiang Li
> > > > >
> > > > > 手机 cellphone ：+86-136-8113-8972
> > > > > 邮件 e-mail  ：wate...@gmail.com
> > > > >
> > > >
> > >
> > >
> > > --
> > > *Best Regards,*
> > >  lijin bin
> > >
> >
> >
> > --
> >
> > 
> >
>

Re: a problem of long STW because of GC ref-proc

2019-09-30 Thread ramkrishna vasudevan

Hi

Thanks Zheng for pinging here.
As far as I know I have not delved deeper into this offset lock and its
soft reference. I think after Zheng's suggestion the STW came down a lot
after making the block size 64 KB - because the number of blocks reduces
and so the soft references.  But still seems the time is big for the user.
I think it is worth to now check the impact of this particularly when we
suggest bigger sized bucket caches. Will be back .

Regards
Ram


On Mon, Sep 30, 2019 at 9:03 AM OpenInx  wrote:

> OK,  the huge number of softReference from offsetLock for each block still
> be the main problem.
> I'm not sure whether there're some g1 option can help to optimize the long
> STW.
> One solution I can image for now : limit the bucketcache size for a single
> RS, say the 70g bucketcache may
> need to separate it into two RS.
>
> As far as I know, Anoop & ram have some good practice about using huge
> bucket cache.  Ping anoop & ramkrishna,
> Any thoughts about this GC issue ?
>
>
> On Mon, Sep 30, 2019 at 11:09 AM zheng wang <18031...@qq.com> wrote:
>
> > Even if set to 64KB,it also has more than 100w softRef ,and will cost too
> > long still.
> >
> >
> > this "GC ref-proc" process 50w softRef and cost 700ms:
> >
> >
> > 2019-09-18T03:16:42.088+0800: 125161.477:
> > [GC remark
> > 2019-09-18T03:16:42.088+0800: 125161.477:
> > [Finalize Marking, 0.0018076 secs]
> > 2019-09-18T03:16:42.089+0800: 125161.479:
> > [GC ref-proc
> > 2019-09-18T03:16:42.089+0800: 125161.479: [SoftReference,
> > 499278 refs, 0.1382086 secs]
> > 2019-09-18T03:16:42.228+0800: 125161.617: [WeakReference,
> > 3750 refs, 0.0049171 secs]
> > 2019-09-18T03:16:42.233+0800: 125161.622:
> [FinalReference,
> > 1040 refs, 0.0009375 secs]
> > 2019-09-18T03:16:42.234+0800: 125161.623:
> > [PhantomReference, 0 refs, 21921 refs, 0.0058014 secs]
> > 2019-09-18T03:16:42.239+0800: 125161.629: [JNI Weak
> > Reference, 0.0001070 secs]
> > , 0.6667733 secs]
> > 2019-09-18T03:16:42.756+0800: 125162.146:
> > [Unloading, 0.0224078 secs]
> > , 0.6987032 secs]
> >
> >
> > -- 原始邮件 --
> > 发件人: "OpenInx";
> > 发送时间: 2019年9月30日(星期一) 上午10:27
> > 收件人: "Hbase-User";
> >
> > 主题: Re: a problem of long STW because of GC ref-proc
> >
> >
> >
> > 100% get is not the right reason for choosing 16KB I think, because  if
> you
> > read a block, there's larger possibility that we
> > will read the adjacent cells in the same block... I think caching a 16KB
> > block or caching a 64KB block in BucketCache won't
> > make a big difference ?  (but if you cell byte size is quite small,  then
> > it will have so many cells encoded in a 64KB block,
> > then block with smaller size will be better because we search the cells
> in
> > a block one by one , means O(N) complexity).
> >
> >
> > On Mon, Sep 30, 2019 at 10:08 AM zheng wang <18031...@qq.com> wrote:
> >
> > > Yes,it will be remission by your advise,but there only get request in
> our
> > > business,so 16KB is better.
> > > IMO,the locks of offset will always be used,so is the strong reference
> a
> > > better choice?
> > >
> > >
> > >
> > >
> > > -- 原始邮件 --
> > > 发件人: "OpenInx";
> > > 发送时间: 2019年9月30日(星期一) 上午9:46
> > > 收件人: "Hbase-User";
> > >
> > > 主题: Re: a problem of long STW because of GC ref-proc
> > >
> > >
> > >
> > > Seems your block size is very small (16KB), so there will be
> > > 70*1024*1024/16=4587520 block (at most) in your BucketCache.
> > > For each block, the RS will maintain a soft reference idLock and a
> > > BucketEntry in its bucket cache.  So maybe you can try to
> > > enlarge the block size ?
> > >
> > > On Sun, Sep 29, 2019 at 10:14 PM zheng wang <18031...@qq.com> wrote:
> > >
> > > > Hi~
> > > >
> > > >
> > > > My live cluster env config below:
> > > > hbase version:cdh6.0.1(apache hbase2.0.0)
> > > > hbase config: bucketCache(70g),blocksize(16k)
> > > >
> > > >
> > > > java version:1.8.0_51
> > > > javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100
> > > > -XX:+ParallelRefProcEnabled
> > > >
> > > >
> > > > About 1-2days ,regionServer would occur a old gen gc that cost 1~2s
> in
> > > > remark phase:
> > > >
> > > >
> > > > 2019-09-29T01:55:45.186+0800: 365222.053:
> > > > [GC remark
> > > > 2019-09-29T01:55:45.186+0800: 365222.053:
> > > > [Finalize Marking, 0.0016327 secs]
> > > > 2019-09-29T01:55:45.188+0800: 365222.054:
> > > > [GC ref-proc
> > > > 2019-09-29T01:55:45.188+0800: 365222.054:
> > [SoftReference,
> > > > 1264586 refs, 0.3151392 secs]
> > > > 2019-09-29T01:55:45.503+0800: 365222.370:
> > [WeakReference,
> > > > 4317 refs, 0.0024381 secs]
> > > > 2019-09-29T01:55:45.505+0800: 365222.372:
> > > [FinalReference,
> > > > 9791 refs, 0.0037445 secs]
> > > > 2019-09-29T01:55:45.509+080

Re: Cacheblocksonwrite not working during compaction?

2019-09-23 Thread ramkrishna vasudevan

Hi

I can see your case where your compaction ends up your reads hitting the S3
for fetching the new blocks. However pls note that in the version that you
are using when ever a compaction happens and any scans/reads happening at
that point of time will still try to use the existing Hfiles that were
available during the start of the scan and hence those blocks are not
invalidated.

How ever any new files that are created out of compaction and new scans
that start after the compaction will need to fetch those blocks to the
cache on the first read (because we don't cache the block on write after
compaction). But you need to be careful like once you enable cache on write
for compaction (after your patch) the LRU behaviour may start evicting
other blocks which may be needed for other scans that may be fired in that
region server. If your use case is that the scan queries are going to be
repeating touching the same set of files then enabling cache on write after
compaction may help - also considering the fact that you use the file mode
bucket cache which is very big in size and evictions may not be common.

>>I'll plan on opening a JIRA ticket for this and I'd also be happy to take
a stab at creating a patch.
Pls feel free to open a JIRA.

Regards
Ram


On Mon, Sep 23, 2019 at 8:42 AM Jacob LeBlanc 
wrote:

> My questions were primarily around how cacheblocksonwrite, prefetching,
> and compaction work together, which I think is not AWS specific. Although
> it may be that yes, the 1+ hour prefetching I am seeing is an AWS-specific
> phenomenon.
>
> I've looked at the 1.4.9 source a bit more now that I have a better
> understanding of everything. As you say cacheDataOnWrite is hardcoded to
> false for compactions so the hbase.rs.cacheblocksonwrite setting will have
> no effect in these cases.
>
> I also now understand that the cache key is partly based on filename, so
> disabling hbase.rs.evictblocksonclose isn't going to help for compactions
> either since the pre-compaction filenames will no longer be relevant.
>
> Prefetching also makes more sense once I looked at the code. I see now it
> comes into effect for HFileReaderV2, so happens on a per-file basis, not
> per-region. I was confused before why I was seeing prefetching happen when
> the region was not opened recently, but now it makes sense because it is
> occurring when the compacted file is opened, not the region.
>
> So unfortunately, it looks like I'm sunk in terms of caching data during
> compaction. Thanks for the aid in understanding this.
>
> However, I do think this is a valid use case and also seems like it should
> be fairly easy to implement with a new cache config setting. On the one
> hand there is this nice prefetching feature which is acknowledging the use
> case for when people want to cache entire tables, and this use case is more
> common when considering larger L2 caches. Then on the other hand there is
> this hardcoded setting that is assuming nobody would ever want to cache all
> of the blocks being written during a compaction which seems at odds with
> the use case prefetching is trying to address. Don't get me wrong: I
> understand that in many use cases caching while writing during compaction
> is not desirable in that you don't want to evict blocks that you care about
> during the compaction process. In other words it sort of throws a big
> monkey wrench into the concept of an LRU cache. I also realize that
> hbase.rs.cachedataonwrite is geared more towards flushes for use cases
> where people often read what was recently written and don't necessarily
> want to cache the entire table. But a new config option (call it 
> hbase.rs.cacheblocksoncompaction?)
> to address this specific use case would be nice.
>
> I'll plan on opening a JIRA ticket for this and I'd also be happy to take
> a stab at creating a patch.
>
> --Jacob LeBlanc
>
> -Original Message-
> From: Vladimir Rodionov [mailto:vladrodio...@gmail.com]
> Sent: Friday, September 20, 2019 10:29 PM
> To: user@hbase.apache.org
> Subject: Re: Cacheblocksonwrite not working during compaction?
>
> You are asking questions on Apache HBase user forum, which are more
> appropriate to ask on AWS forum, taking into account that you are using
> Amazon-specific distributive of HBase and Amazon - specific implementation
> of  a S3 file system.
>
> As for not working hbase.rs.cacheblocksonwrite, HBase ignores this flag
> and set it to false forcefully if file writer is opened by compaction
> thread (this is true for 2.x, but I am pretty sure that in 1.x it is the
> same).
>
> -Vlad
>
> On Fri, Sep 20, 2019 at 4:24 PM Jacob LeBlanc <
> jacob.lebl...@microfocus.com>
> wrote:
>
> > Thank you for the feedback!
> >
> > Our cache size *is* larger than our data size, at least for our
> > heavily accessed tables. Memory may be prohibitively expensive for
> > keeping large tables in an in-memory cache, but storage is cheap, so
> > hosting a 1 TB bucketcache on the local disk of each of our

Re: HBase Scan consumes high cpu

2019-09-16 Thread ramkrishna vasudevan

Hi Solvannan

Currently there is no easy way to over come this case because deletes and
its tracking takes precedence before the filter is even applied.

I get your case where you really don't know the columns which could have
been previously deleted and hence you specify the entire range of
columns in the filter. When this Put/Delete combination keeps increasing
then you end up in these issues.

Am not aware of the use case here,  but is there any better way to handle
your schema for these cases?

Regards
Ram










On Mon, Sep 16, 2019 at 10:54 PM Solvannan R M 
wrote:

> Hi Ramkrishna,
>
> Thank you for your inputs! Unfortunately we would not be knowing the
> column names beforehand. We had generated the above scenario for
> illustration purposes.
>
> The intent of our query is that, given a single row key, a start column
> key and an end column key, scan for the columns that are between the two
> column keys.  We have been achieving that by using ColumnRangeFilter.
> Our write pattern would be Put followed by Delete immediately
> (Keep_deleted_cells is set to false). So as more Deletes start to
> accumulate, we notice the scan time starts to be very long and the cpu
> shoots up to 100% for a core during every scan. On trying to debug we
> observed the following behavior:
>
> At any instant, the cells of the particular row would be roughly
> organized like
>
> D1 P1 D2 P2 D3 P3  Dn-1 Pn-1 Dn Pn Pn+1 Pn+2 Pn+3 Pn+4
>
> where D and P are Delete and it's corresponding Put. The newer values
> from Pn haven't been deleted yet.
>
> As the scan initiates, inside the StoreScanner,
> NormalUserScanQueryMatcher would match the first cell (D1). It would be
> added to the DeleteTracker and a MatchCode of SKIP is returned. Now for
> the next cell (P1) the matcher would check with the DeleteTracker and
> return a code of SEEK_NEXT_COL. Again the next cell would be D2 and this
> would happen alternately. No filter is applied. This goes on till it
> encounters Pn where filter is applied, SEEK_NEXT_USING_HINT is done and
> now reseek happens to position near the desired range. The result is
> returned quickly after that.
>
> The SKIP iterations happen a lot because our pattern would have very
> less active cells and only towards the latest column qualifiers(ordered
> high lexicographically). We were wondering if the query could be
> modified so that the filter could be applied initially or some other way
> to seek to the desired range directly.
>
> Regards,
> Solvannan R M
>
>
> On 2019/09/13 15:53:51, ramkrishna vasudevan wrote:
>  > Hi>
>  > Generally if you can form the column names like you did in the above
> case>
>  > it is always better you add them using>
>  > scan#addColumn(family, qual). I am not sure of the shell syntax to add>
>  > multiple columns but am sure there is a provision to do it.>
>  >
>  > This will ensure that the scan starts from the given column and
> fetches the>
>  > required column only. In your case probably you need to pass a set of>
>  > qualifiers (instead of just 1).>
>  >
>  > Regards>
>  > Ram>
>  >
>  > On Fri, Sep 13, 2019 at 8:45 PM Solvannan R M >
>  > wrote:>
>  >
>  > > Hi Anoop,>
>  > >>
>  > > We have executed the query with the qualifier set like you advised.>
>  > > But we dont get the results for the range but only the specified>
>  > > qualifier cell is returned.>
>  > >>
>  > > Query & Result:>
>  > >>
>  > > hbase(main):008:0> get 'mytable', 'MY_ROW',>
>  > > {COLUMN=>["pcf:\x00\x16\xDFx"],>
>  > > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>
>  > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>
>  > > COLUMN CELL>
>  > > pcf:\x00\x16\xDFx timestamp=1568380663616,>
>  > > value=\x00\x16\xDFx>
>  > > 1 row(s) in 0.0080 seconds>
>  > >>
>  > > hbase(main):009:0>>
>  > >>
>  > >>
>  > > Is there any other way to get arond this ?.>
>  > >>
>  > >>
>  > > Regards,>
>  > >>
>  > > Solvannan R M>
>  > >>
>  > >>
>  > > On 2019/09/13 04:53:45, Anoop John wrote:>
>  > > > Hi>>
>  > > > When you did a put with a lower qualifier int (put 'mytable',>>
>  > > > 'MY_ROW', "pcf:\x0A", "\x00") the system flow is getting a valid
> cell>
>  > > at>>
>  > &g

Re: HBase Scan consumes high cpu

2019-09-13 Thread ramkrishna vasudevan

Hi
Generally if you can form the column names like you did in the above case
it is always better you add them using
scan#addColumn(family, qual).  I am not sure of the shell syntax to add
multiple columns but am sure there is a provision to do it.

This will ensure that the scan starts from the given column and fetches the
required column only. In your case probably you need to pass a set of
qualifiers (instead of just 1).

Regards
Ram

On Fri, Sep 13, 2019 at 8:45 PM Solvannan R M 
wrote:

> Hi Anoop,
>
> We have executed the query with the qualifier set like you advised.
> But we dont get the results for the range but only the specified
> qualifier cell is returned.
>
> Query & Result:
>
> hbase(main):008:0> get 'mytable', 'MY_ROW',
> {COLUMN=>["pcf:\x00\x16\xDFx"],
> FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),
> true, Bytes.toBytes(1499010.to_java(:int)), false)}
> COLUMN CELL
>   pcf:\x00\x16\xDFx timestamp=1568380663616,
> value=\x00\x16\xDFx
> 1 row(s) in 0.0080 seconds
>
> hbase(main):009:0>
>
>
> Is there any other way to get arond this ?.
>
>
> Regards,
>
> Solvannan R M
>
>
> On 2019/09/13 04:53:45, Anoop John wrote:
>  > Hi>
>  > When you did a put with a lower qualifier int (put 'mytable',>
>  > 'MY_ROW', "pcf:\x0A", "\x00") the system flow is getting a valid cell
> at>
>  > 1st step itself and that getting passed to the Filter. The Filter is
> doing>
>  > a seek which just avoids all the in between deletes and puts
> processing..>
>  > In 1st case the Filter wont get into action at all unless the scan flow>
>  > sees a valid cell. The delete processing happens as 1st step before the>
>  > filter processinf step happening.>
>  >
>  > In this case I am wondering why you can not add the specific 1st
> qualifier>
>  > in the get part itself along with the column range filter. I mean>
>  >
>  > get 'mytable', 'MY_ROW', {COLUMN=>['pcf: *1499000 * '],>
>  > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>
>  > true, Bytes.toBytes(1499010.to_java(:int)), false)}>
>  >
>  > Pardon the syntax it might not be proper for the shell.. Can this be
> done?>
>  > This will make the scan to make a seek to the given qualifier at 1st
> step>
>  > itself.>
>  >
>  > Anoop>
>  >
>  > On Thu, Sep 12, 2019 at 10:18 PM Udai Bhan Kashyap (BLOOMBERG/
> PRINCETON) <>
>  > ukashy...@bloomberg.net> wrote:>
>  >
>  > > Are you keeping the deleted cells? Check 'VERSIONS' for the column
> family>
>  > > and set it to 1 if you don't want to keep the deleted cells.>
>  > >>
>  > > From: user@hbase.apache.org At: 09/12/19 12:40:01To:>
>  > > user@hbase.apache.org>
>  > > Subject: Re: HBase Scan consumes high cpu>
>  > >>
>  > > Hi,>
>  > >>
>  > > As said earlier, we have populated the rowkey "MY_ROW" with integers>
>  > > from 0 to 150 as column qualifiers. Then we have deleted the>
>  > > qualifiers from 0 to 1499000.>
>  > >>
>  > > We executed the following query. It took 15.3750 seconds to execute.>
>  > >>
>  > > hbase(main):057:0> get 'mytable', 'MY_ROW', {COLUMN=>['pcf'],>
>  > > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>
>  > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>
>  > > COLUMN CELL>
>  > > pcf:\x00\x16\xDFx timestamp=1568123881899,>
>  > > value=\x00\x16\xDFx>
>  > > pcf:\x00\x16\xDFy timestamp=1568123881899,>
>  > > value=\x00\x16\xDFy>
>  > > pcf:\x00\x16\xDFz timestamp=1568123881899,>
>  > > value=\x00\x16\xDFz>
>  > > pcf:\x00\x16\xDF{ timestamp=1568123881899,>
>  > > value=\x00\x16\xDF{>
>  > > pcf:\x00\x16\xDF| timestamp=1568123881899,>
>  > > value=\x00\x16\xDF|>
>  > > pcf:\x00\x16\xDF} timestamp=1568123881899,>
>  > > value=\x00\x16\xDF}>
>  > > pcf:\x00\x16\xDF~ timestamp=1568123881899,>
>  > > value=\x00\x16\xDF~>
>  > > pcf:\x00\x16\xDF\x7F timestamp=1568123881899,>
>  > > value=\x00\x16\xDF\x7F>
>  > > pcf:\x00\x16\xDF\x80 timestamp=1568123881899,>
>  > > value=\x00\x16\xDF\x80>
>  > > pcf:\x00\x16\xDF\x81 timestamp=1568123881899,>
>  > > value=\x00\x16\xDF\x81>
>  > > 1 row(s) in 15.3750 seconds>
>  > >>
>  > >>
>  > > Now we inserted a new column with qualifier 10 (\x0A), such that it>
>  > > comes earlier in lexicographical order. Now we executed the same
> query.>
>  > > It only took 0.0240 seconds.>
>  > >>
>  > > hbase(main):058:0> put 'mytable', 'MY_ROW', "pcf:\x0A", "\x00">
>  > > 0 row(s) in 0.0150 seconds>
>  > > hbase(main):059:0> get 'mytable', 'MY_ROW', {COLUMN=>['pcf'],>
>  > > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>
>  > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>
>  > > COLUMN CELL>
>  > > pcf:\x00\x16\xDFx timestamp=1568123881899,>
>  > > value=\x00\x16\xDFx>
>  > > pcf:\x00\x16\xDFy timestamp=1568123881899,>
>  > > value=\x00\x16\xDFy>
>  > > pcf:\x00\x16\xDFz timestamp=1568123881899,>
>  > > value=\x00\x16\xDFz>
>  > > pcf:\x00\x16\xDF{ timestamp=1568123881899,>
>  > > value=\x00\x16\xDF{>
>  > > pcf:\x00\x16\xDF| timestamp=1568123881899,>

Re: [ANNOUNCE] new HBase committer Sakthi

2019-08-01 Thread ramkrishna vasudevan

Congratulations Sakthi !!!

On Thu, Aug 1, 2019 at 3:34 PM 张铎(Duo Zhang)  wrote:

> Congratulations!
>
> Pankaj kr  于2019年8月1日周四 下午5:56写道：
>
> > Congratulation Sakthi..!!
> >
> > Regards,
> > Pankaj
> >
> > -Original Message-
> > From: Sean Busbey [mailto:bus...@apache.org]
> > Sent: 01 August 2019 05:35
> > To: user@hbase.apache.org; dev 
> > Subject: [ANNOUNCE] new HBase committer Sakthi
> >
> > On behalf of the HBase PMC, I'm pleased to announce that Sakthi has
> > accepted our invitation to become an HBase committer.
> >
> > We'd like to thank Sakthi for all of his diligent contributions to the
> > project thus far. We look forward to his continued participation in our
> > community.
> >
> > Congrats and welcome Sakthi!
> >
>

Re: [ANNOUNCE] Please welcome Jan Hentschel to the Apache HBase PMC

2019-05-08 Thread ramkrishna vasudevan

Congratulations Jan !!

On Thu, May 9, 2019 at 9:33 AM Yu Li  wrote:

> Congratulations and welcome, Jan!
>
> Best Regards,
> Yu
>
>
> On Thu, 9 May 2019 at 10:06, OpenInx  wrote:
>
> > Congratulation, Jan! Thanks for your work.
> >
> > On Thu, May 9, 2019 at 6:08 AM Artem Ervits 
> wrote:
> >
> > > Well deserved Jan!
> > >
> > > On Wed, May 8, 2019, 5:37 PM Sean Busbey  wrote:
> > >
> > > > On behalf of the Apache HBase PMC I am pleased to announce that Jan
> > > > Hentschel has accepted our invitation to become a PMC member on the
> > > > HBase project. We appreciate Jan stepping up to take more
> > > > responsibility in the HBase project.
> > > >
> > > > Please join me in welcoming Jan to the HBase PMC!
> > > >
> > > >
> > > >
> > > > As a reminder, if anyone would like to nominate another person as a
> > > > committer or PMC member, even if you are not currently a committer or
> > > > PMC member, you can always drop a note to priv...@hbase.apache.org
> to
> > > > let us know.
> > > >
> > > > -busbey
> > > >
> > >
> >
>

Re: Debugging High I/O Wait

2019-04-10 Thread ramkrishna vasudevan

Hi

I think you can try to take that fix into your version. IMO the SSD
fragmentation issue may also be due to the way the bucket allocator works.

Regards
Ram

On Thu, Apr 11, 2019 at 4:20 AM Srinidhi Muppalla 
wrote:

> Thanks for the suggestions! The total size of the bucket cache is 72.00
> GB. We generally have close to half of that used when the issue happens. We
> are using only one file path for the bucket cache. We will try using
> multiple paths and also adding an additional disk to our region-servers as
> suggested.
>
> When looking through the HBase Jira I came across this ticket --
> https://issues.apache.org/jira/browse/HBASE-16630 that affects the
> version of HBase that we are running. From what I can tell, this bug + fix
> looks like it only applies when the Bucket Cache is running in memory. Is
> there an equivalent bug + fix for a Bucket Cache running in file mode?
>
> Thanks,
> Srinidhi
>
> On 4/5/19, 5:14 AM, "Anoop John"  wrote:
>
> Hi Srinidhi
> You have File mode bucket cache.  What is the size of
> the
> cache? You configure single file path for cache or 1+ paths?   If
> former,
> splitting the cache into multiple files (paths can be given as comma
> separated in the config) will help?
>
> Anoop
>
> On Fri, Apr 5, 2019 at 2:58 AM Srinidhi Muppalla  >
> wrote:
>
> > After some more digging, I discovered that during the time that the
> RS is
> > stuck the kernel message buffer outputted only this message
> >
> > "[1031214.108110] XFS: java(6522) possible memory allocation
> deadlock size
> > 32944 in kmem_alloc (mode:0x2400240)"
> >
> > From my reading online, the cause of this error appears to generally
> be
> > excessive memory and file fragmentation. We haven't changed the mslab
> > config and we are running HBase 1.3.0 so it should be running by
> default.
> > The issue tends to arise consistently and regularly (every 10 or so
> days)
> > and once one node is affected other nodes start to follow after a few
> > hours. What could be causing this to happen and is there any way to
> prevent
> > or minimize fragmentation?
> >
> > Best,
> > Srinidhi
> >
> > On 3/29/19, 11:02 AM, "Srinidhi Muppalla" 
> wrote:
> >
> > Stack and Ram,
> >
> > Attached the thread dumps. 'Jstack normal' is the normal node.
> 'Jstack
> > problematic' was taken when the node was stuck.
> >
> > We don't have full I/O stats for the problematic node.
> Unfortunately,
> > it was impacting production so we had to recreate the cluster as
> soon as
> > possible and couldn't get full data. I attached the dashboards with
> the
> > wait I/O and other CPU stats. Thanks for helping look into the issue!
> >
> > Best,
> > Srinidhi
> >
> >
> >
> > On 3/28/19, 2:41 PM, "Stack"  wrote:
> >
> > Mind putting up a thread dump?
> >
> > How many spindles?
> >
> > If you compare the i/o stats between a good RS and a stuck
> one,
> > how do they
> > compare?
> >
> > Thanks,
> > S
> >
> >
> > On Wed, Mar 27, 2019 at 11:57 AM Srinidhi Muppalla <
> > srinid...@trulia.com>
> > wrote:
> >
> > > Hello,
> > >
> > > We've noticed an issue in our HBase cluster where one of
> the
> > > region-servers has a spike in I/O wait associated with a
> spike
> > in Load for
> > > that node. As a result, our request times to the cluster
> increase
> > > dramatically. Initially, we suspected that we were
> experiencing
> > > hotspotting, but even after temporarily blocking requests
> to the
> > highest
> > > volume regions on that region-servers the issue persisted.
> > Moreover, when
> > > looking at request counts to the regions on the
> region-server
> > from the
> > > HBase UI, they were not particularly high and our own
> > application level
> > > metrics on the requests we were making were not very high
> > either. From
> > > looking at a thread dump of the region-server, it appears
> that
> > our get and
> > > scan requests are getting stuck when trying to read from
> the
> > blocks in our
> > > bucket cache leaving the threads in a 'runnable' state. For
> > context, we are
> > > running HBase 1.30 on a cluster backed by S3 running on
> EMR and
> > our bucket
> > > cache is running in File mode. Our region-servers all have
> SSDs.
> > We have a
> > > combined cache with the L1 standard LRU cache and the L2
> file
> > mode bucket
> > > cache. Our Bucket Cache utilization is less than 50% of the
> > allocated space.
> > >
> >

Re: Debugging High I/O Wait

2019-04-09 Thread ramkrishna vasudevan

Hi Srinidhi

Am not able to view the attachments for some reason. How ever as Anoop
suggested can you try multi paths for the bucket cache. As said in the
first email - a separate SSD for WAL writes and multiple (more than one
file path) for the bucket cache SSD may help. Here again the mutiple paths
can also be on multiple devices.

REgards
Ram

On Fri, Apr 5, 2019 at 2:58 AM Srinidhi Muppalla 
wrote:

> After some more digging, I discovered that during the time that the RS is
> stuck the kernel message buffer outputted only this message
>
> "[1031214.108110] XFS: java(6522) possible memory allocation deadlock size
> 32944 in kmem_alloc (mode:0x2400240)"
>
> From my reading online, the cause of this error appears to generally be
> excessive memory and file fragmentation. We haven't changed the mslab
> config and we are running HBase 1.3.0 so it should be running by default.
> The issue tends to arise consistently and regularly (every 10 or so days)
> and once one node is affected other nodes start to follow after a few
> hours. What could be causing this to happen and is there any way to prevent
> or minimize fragmentation?
>
> Best,
> Srinidhi
>
> On 3/29/19, 11:02 AM, "Srinidhi Muppalla"  wrote:
>
> Stack and Ram,
>
> Attached the thread dumps. 'Jstack normal' is the normal node. 'Jstack
> problematic' was taken when the node was stuck.
>
> We don't have full I/O stats for the problematic node. Unfortunately,
> it was impacting production so we had to recreate the cluster as soon as
> possible and couldn't get full data. I attached the dashboards with the
> wait I/O and other CPU stats. Thanks for helping look into the issue!
>
> Best,
> Srinidhi
>
>
>
> On 3/28/19, 2:41 PM, "Stack"  wrote:
>
> Mind putting up a thread dump?
>
> How many spindles?
>
> If you compare the i/o stats between a good RS and a stuck one,
> how do they
> compare?
>
> Thanks,
> S
>
>
> On Wed, Mar 27, 2019 at 11:57 AM Srinidhi Muppalla <
> srinid...@trulia.com>
> wrote:
>
> > Hello,
> >
> > We've noticed an issue in our HBase cluster where one of the
> > region-servers has a spike in I/O wait associated with a spike
> in Load for
> > that node. As a result, our request times to the cluster increase
> > dramatically. Initially, we suspected that we were experiencing
> > hotspotting, but even after temporarily blocking requests to the
> highest
> > volume regions on that region-servers the issue persisted.
> Moreover, when
> > looking at request counts to the regions on the region-server
> from the
> > HBase UI, they were not particularly high and our own
> application level
> > metrics on the requests we were making were not very high
> either. From
> > looking at a thread dump of the region-server, it appears that
> our get and
> > scan requests are getting stuck when trying to read from the
> blocks in our
> > bucket cache leaving the threads in a 'runnable' state. For
> context, we are
> > running HBase 1.30 on a cluster backed by S3 running on EMR and
> our bucket
> > cache is running in File mode. Our region-servers all have SSDs.
> We have a
> > combined cache with the L1 standard LRU cache and the L2 file
> mode bucket
> > cache. Our Bucket Cache utilization is less than 50% of the
> allocated space.
> >
> > We suspect that part of the issue is our disk space utilization
> on the
> > region-server as our max disk space utilization also increased
> as this
> > happened. What things can we do to minimize disk space
> utilization? The
> > actual HFiles are on S3 -- only the cache, application logs, and
> write
> > ahead logs are on the region-servers. Other than the disk space
> > utilization, what factors could cause high I/O wait in HBase and
> is there
> > anything we can do to minimize it?
> >
> > Right now, the only thing that works is terminating and
> recreating the
> > cluster (which we can do safely because it's S3 backed).
> >
> > Thanks!
> > Srinidhi
> >
>
>
>
>
>

Re: Debugging High I/O Wait

2019-03-28 Thread ramkrishna vasudevan

Hi Srinidhi

Thanks for the details. As Stack said, can you get a thread dump, i/o stats
while this issue happens. You can compare it with the case when the RS is
in good shape.

If at all the SSD writes and reads is the reason for the Bucket cache read
to perform slower then it might be better to have seperate SSD. But lets
first check the dumps to know if that is the real reason.

Regards
Ram

On Fri, Mar 29, 2019 at 3:11 AM Stack  wrote:

> Mind putting up a thread dump?
>
> How many spindles?
>
> If you compare the i/o stats between a good RS and a stuck one, how do they
> compare?
>
> Thanks,
> S
>
>
> On Wed, Mar 27, 2019 at 11:57 AM Srinidhi Muppalla 
> wrote:
>
> > Hello,
> >
> > We've noticed an issue in our HBase cluster where one of the
> > region-servers has a spike in I/O wait associated with a spike in Load
> for
> > that node. As a result, our request times to the cluster increase
> > dramatically. Initially, we suspected that we were experiencing
> > hotspotting, but even after temporarily blocking requests to the highest
> > volume regions on that region-servers the issue persisted. Moreover, when
> > looking at request counts to the regions on the region-server from the
> > HBase UI, they were not particularly high and our own application level
> > metrics on the requests we were making were not very high either. From
> > looking at a thread dump of the region-server, it appears that our get
> and
> > scan requests are getting stuck when trying to read from the blocks in
> our
> > bucket cache leaving the threads in a 'runnable' state. For context, we
> are
> > running HBase 1.30 on a cluster backed by S3 running on EMR and our
> bucket
> > cache is running in File mode. Our region-servers all have SSDs. We have
> a
> > combined cache with the L1 standard LRU cache and the L2 file mode bucket
> > cache. Our Bucket Cache utilization is less than 50% of the allocated
> space.
> >
> > We suspect that part of the issue is our disk space utilization on the
> > region-server as our max disk space utilization also increased as this
> > happened. What things can we do to minimize disk space utilization? The
> > actual HFiles are on S3 -- only the cache, application logs, and write
> > ahead logs are on the region-servers. Other than the disk space
> > utilization, what factors could cause high I/O wait in HBase and is there
> > anything we can do to minimize it?
> >
> > Right now, the only thing that works is terminating and recreating the
> > cluster (which we can do safely because it's S3 backed).
> >
> > Thanks!
> > Srinidhi
> >
>

Re: Debugging High I/O Wait

2019-03-27 Thread ramkrishna vasudevan

Hi Srinidhi

As you said the cache, WAL files are in the RS SSD drives. The cache and
the WAL files reside on seperate SSDs or on the same SSD?

Or there writes happening also while these reads happen from the Bucket
cache?  Is your LRU cache big enough to hold all the index blocks?

Regards
Ram


On Thu, Mar 28, 2019 at 12:27 AM Srinidhi Muppalla 
wrote:

> Hello,
>
> We've noticed an issue in our HBase cluster where one of the
> region-servers has a spike in I/O wait associated with a spike in Load for
> that node. As a result, our request times to the cluster increase
> dramatically. Initially, we suspected that we were experiencing
> hotspotting, but even after temporarily blocking requests to the highest
> volume regions on that region-servers the issue persisted. Moreover, when
> looking at request counts to the regions on the region-server from the
> HBase UI, they were not particularly high and our own application level
> metrics on the requests we were making were not very high either. From
> looking at a thread dump of the region-server, it appears that our get and
> scan requests are getting stuck when trying to read from the blocks in our
> bucket cache leaving the threads in a 'runnable' state. For context, we are
> running HBase 1.30 on a cluster backed by S3 running on EMR and our bucket
> cache is running in File mode. Our region-servers all have SSDs. We have a
> combined cache with the L1 standard LRU cache and the L2 file mode bucket
> cache. Our Bucket Cache utilization is less than 50% of the allocated space.
>
> We suspect that part of the issue is our disk space utilization on the
> region-server as our max disk space utilization also increased as this
> happened. What things can we do to minimize disk space utilization? The
> actual HFiles are on S3 -- only the cache, application logs, and write
> ahead logs are on the region-servers. Other than the disk space
> utilization, what factors could cause high I/O wait in HBase and is there
> anything we can do to minimize it?
>
> Right now, the only thing that works is terminating and recreating the
> cluster (which we can do safely because it's S3 backed).
>
> Thanks!
> Srinidhi
>

Re: [ANNOUNCE] Please welcome Peter Somogyi to the HBase PMC

2019-01-21 Thread ramkrishna vasudevan

Congratulations Peter.

On Tue, Jan 22, 2019 at 11:48 AM Tamas Penzes 
wrote:

> Congrats Peter!
>
> On Tue, Jan 22, 2019, 02:36 Duo Zhang 
> > On behalf of the Apache HBase PMC I am pleased to announce that Peter
> > Somogyi
> > has accepted our invitation to become a PMC member on the Apache HBase
> > project.
> > We appreciate Peter stepping up to take more responsibility in the HBase
> > project.
> >
> > Please join me in welcoming Peter to the HBase PMC!
> >
>

Re: Pictures, Videos and Slides for HBaseConAsia2018

2018-09-01 Thread ramkrishna vasudevan

Thanks to Yu Li and Stack for enabling the videos and slides.

Regards
Ram

On Sat, Sep 1, 2018 at 12:15 PM Yu Li  wrote:

> Thank you for all the helps Stack! It must have cost lots of your time
> downloading the videos then uploading to youtube, uploading slides onto
> slideshare, and put up all together into the blog!
>
> The success of the conference is attributed to all PC members and supports
> from hbase community rather than me alone (smile)
>
> Best Regards,
> Yu
>
>
> On Sat, 1 Sep 2018 at 03:12, Stack  wrote:
>
> > On Fri, Aug 31, 2018 at 10:22 AM Chetan Khatri <
> > chetan.opensou...@gmail.com>
> > wrote:
> >
> > > Thank you Stack for everything.
> > >
> > >
> > Thanks Chetan but our mighty Yu Li did all the work!
> > S
> >
> >
> >
> > > On Fri, Aug 31, 2018 at 8:18 PM Stack  wrote:
> > >
> > > > I blew the cobwebs off our blog and put up a short note on the
> > conference
> > > > by Yu Li and myself. See here: https://blogs.apache.org/hbase/
> > > >
> > > > S
> > > >
> > > > On Wed, Aug 22, 2018 at 3:03 AM Yu Li  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > HBaseConAsia2018 is successfully held on Aug. 17th in Beijing,
> China
> > > and
> > > > > please following below links for a quick review:
> > > > >
> > > > > Pictures:
> > > > >
> > >
> https://drive.google.com/drive/folders/1eGuNI029a78s_BdH37VsSr4uOalyLi5O
> > > > >
> > > > > Slides and Video recording:
> > > > > https://yq.aliyun.com/articles/626119
> > > > >
> > > > > Enjoy it and let's expect the next year!
> > > > >
> > > > > Yu - on behalf of HBaseConAsia2018 PC
> > > > >
> > > >
> > >
> >
>

Re: Scan problem

2018-03-18 Thread ramkrishna vasudevan

Hi

First regarding the scans,

Generally the data resides in the store files which is in HDFS. So probably
the first scan that you are doing is reading from HDFS which involves disk
reads. Once the blocks are read, they are cached in the Block cache of
HBase. So your further reads go through that and hence you see further
speed up in the scans.

>> And another question about region split, I want to know which
RegionServer
will load the new region afther splited ,
Will they be the same One with the old region?
Yes . Generally same region server hosts it.

In master the code is here,
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java

You may need to understand the entire flow to know how the regions are
opened after a split.

Regards
Ram

On Sat, Mar 17, 2018 at 9:02 PM, Yang Zhang  wrote:

> Hello everyone
>
> I try to do many Scan use RegionScanner in coprocessor, and ervery
> time ,the first Scan cost  about 10 times than the other,
> I don't know why this will happen
>
> OneBucket Scan cost is : 8794 ms Num is : 710
> OneBucket Scan cost is : 91 ms Num is : 776
> OneBucket Scan cost is : 87 ms Num is : 808
> OneBucket Scan cost is : 105 ms Num is : 748
> OneBucket Scan cost is : 68 ms Num is : 200
>
>
> And another question about region split, I want to know which RegionServer
> will load the new region afther splited ,
> Will they be the same One with the old region?  Anyone know where I can
> find the code to learn about that?
>
>
> Thanks for your help
>

Re: [ANNOUNCE] New HBase committer Zach York

2018-03-08 Thread ramkrishna vasudevan

Congratulations Zach !!!

On Thu, Mar 8, 2018 at 11:03 AM, Yu Li  wrote:

> Congratulations, Zach!
>
> Best Regards,
> Yu
>
> On 8 March 2018 at 06:13, Mike Drob  wrote:
>
> > Congratulations, Zach!
> >
> > On Wed, Mar 7, 2018 at 4:03 PM, Andrew Purtell 
> > wrote:
> >
> > > Congratulations and welcome Zach!
> > >
> > >
> > > On Wed, Mar 7, 2018 at 8:27 AM, Sean Busbey  wrote:
> > >
> > > > On behalf of the Apache HBase PMC, I am pleased to announce that Zach
> > > > York has accepted the PMC's invitation to become a committer on the
> > > > project.
> > > >
> > > > We appreciate all of Zach's great work thus far and look forward to
> > > > continued involvement.
> > > >
> > > > Please join me in congratulating Zach!
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >- A23, Crosstalk
> > >
> >
>

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-04 Thread ramkrishna vasudevan

Hi Saad

Your argument here
>> The
>>theory is that since prefetch is an async operation, a lot of the reads in
>>the checkAndPut for the region in question start reading from S3 which is
>>slow. So the write lock obtained for the checkAndPut is held for a longer
>>duration than normal. This has cascading upstream effects. Does that sound
>>plausible?

Seems very much plausible. So before even the prefetch happens say for
'block 1' - and you have already issues N checkAndPut calls for the rows in
that 'block 1' -  all those checkAndPut will have to read that block from
S3 to perform the get() and then apply the mutation.

This may happen for multiple threads at the same time because we are not
sure when the prefetch would have actually been completed. I don know what
are the general read characteristics when a read happens from S3 but you
could try to see how things work when a read happens from S3 and after the
prefetch completes ensure the same checkandPut() is done (from cache this
time) to really know the difference what S3 does there.

Regards
Ram

On Fri, Mar 2, 2018 at 2:57 AM, Saad Mufti  wrote:

> So after much investigation I can confirm:
>
> a) it was indeed one of the regions that was being compacted, major
> compaction in one case, minor compaction in another, the issue started just
> after compaction completed blowing away bucket cached blocks for the older
> HFile's
> b) in another case there was no compaction just a newly opened region in a
> region server that hadn't finished perfetching its pages from S3
>
> We have prefetch on open set to true. Our load is heavy on checkAndPut .The
> theory is that since prefetch is an async operation, a lot of the reads in
> the checkAndPut for the region in question start reading from S3 which is
> slow. So the write lock obtained for the checkAndPut is held for a longer
> duration than normal. This has cascading upstream effects. Does that sound
> plausible?
>
> The part I don't understand still is all the locks held are for the same
> region but are all for different rows. So once the prefetch is completed,
> shouldn't the problem clear up quickly? Or does the slow region slow down
> anyone trying to do checkAndPut on any row in the same region even after
> the prefetch has completed. That is, do the long held row locks prevent
> others from getting a row lock on a different row in the same region?
>
> In any case, we trying to use
> https://issues.apache.org/jira/browse/HBASE-16388 support in HBase 1.4.0
> to
> both insulate the app a bit from this situation and hoping that it will
> reduce pressure on the region server in question, allowing it to recover
> faster. I haven't quite tested that yet, any advice in the meantime would
> be appreciated.
>
> Cheers.
>
> 
> Saad
>
>
>
> On Thu, Mar 1, 2018 at 9:21 AM, Saad Mufti  wrote:
>
> > Actually it happened again while some minior compactions were running, so
> > don't think it related to our major compaction tool, which isn't even
> > running right now. I will try to capture a debug dump of threads and
> > everything while the event is ongoing. Seems to last at least half an
> hour
> > or so and sometimes longer.
> >
> > 
> > Saad
> >
> >
> > On Thu, Mar 1, 2018 at 7:54 AM, Saad Mufti  wrote:
> >
> >> Unfortunately I lost the stack trace overnight. But it does seem related
> >> to compaction, because now that the compaction tool is done, I don't see
> >> the issue anymore. I will run our incremental major compaction tool
> again
> >> and see if I can reproduce the issue.
> >>
> >> On the plus side the system stayed stable and eventually recovered,
> >> although it did suffer all those timeouts.
> >>
> >> 
> >> Saad
> >>
> >>
> >> On Wed, Feb 28, 2018 at 10:18 PM, Saad Mufti 
> >> wrote:
> >>
> >>> I'll paste a thread dump later, writing this from my phone  :-)
> >>>
> >>> So the same issue has happened at different times for different
> regions,
> >>> but I couldn't see that the region in question was the one being
> compacted,
> >>> either this time or earlier. Although I might have missed an earlier
> >>> correlation in the logs where the issue started just after the
> compaction
> >>> completed.
> >>>
> >>> Usually a compaction for this table's regions take around 5-10 minutes,
> >>> much less for its smaller column family which is block cache enabled,
> >>> around a minute or less, and 5-10 minutes for the much larger one for
> which
> >>> we have block cache disabled in the schema, because we don't ever read
> it
> >>> in the primary cluster. So the only impact on reads would be from that
> >>> smaller column family which takes less than a minute to compact.
> >>>
> >>> But the issue once started doesn't seem to recover for a long time,
> long
> >>> past when any compaction on the region itself could impact anything.
> The
> >>> compaction tool which is our own code has long since moved to other
> >>> regions.
> >>>
> >>> Cheers.
> >>>
> >>> 
> >>> Saad
> >>>
> >>>
> >>> On Wed,

Re: Bucket Cache Failure In HBase 1.3.1

2018-02-25 Thread ramkrishna vasudevan

>From the logs, it seems there were some issue with the file that was used
by the bucket cache. Probably the volume where the file was mounted had
some issues.
If you can confirm that , then this issue should be pretty straightforward.
If not let us know, we can help.

Regards
Ram

On Sun, Feb 25, 2018 at 9:40 PM, Ted Yu  wrote:

> Here is related code for disabling bucket cache:
>
> if (this.ioErrorStartTime > 0) {
>
>   if (cacheEnabled && (now - ioErrorStartTime) > this.
> ioErrorsTolerationDuration) {
>
> LOG.error("IO errors duration time has exceeded " +
> ioErrorsTolerationDuration +
>
>   "ms, disabling cache, please check your IOEngine");
>
> disableCache();
>
> Can you search in the region server log to see if the above occurred ?
>
> Was this server the only one with disabled cache ?
>
> Cheers
>
> On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti 
> wrote:
>
> > HI,
> >
> > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > configured to use two attached EBS disks of 50 GB each and I provisioned
> > the bucket cache to be a bit less than the total, at a total of 98 GB per
> > instance to be on the safe side. My tables have column families set to
> > prefetch on open.
> >
> > On some instances during cluster startup, the bucket cache starts
> throwing
> > errors, and eventually the bucket cache gets completely disabled on this
> > instance. The instance still stays up as a valid region server and the
> only
> > clue in the region server UI is that the bucket cache tab reports a count
> > of 0, and size of 0 bytes.
> >
> > I have already opened a ticket with AWS to see if there are problems with
> > the EBS volumes, but wanted to tap the open source community's hive-mind
> to
> > see what kind of problem would cause the bucket cache to get disabled. If
> > the application depends on the bucket cache for performance, wouldn't it
> be
> > better to just remove that region server from the pool if its bucket
> cache
> > cannot be recovered/enabled?
> >
> > The error look like the following. Would appreciate any insight, thank:
> >
> > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > bucket.BucketCache: Failed reading block
> > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > java.nio.channels.ClosedByInterruptException
> > at
> > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > AbstractInterruptibleChannel.java:202)
> > at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > java:746)
> > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> > at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > FileReadAccessor.access(FileIOEngine.java:219)
> > at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > accessFile(FileIOEngine.java:170)
> > at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > read(FileIOEngine.java:105)
> > at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > getBlock(BucketCache.java:492)
> > at
> > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > getBlock(CombinedBlockCache.java:84)
> > at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > getCachedBlock(HFileReaderV2.java:279)
> > at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > HFileReaderV2.java:420)
> > at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > HFileReaderV2.java:209)
> > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(
> > ScheduledThreadPoolExecutor.java:293)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1149)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > and
> >
> > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > 16020-BucketCacheWriter-7]
> > bucket.BucketCache: Failed writing to bucket cache
> > java.nio.channels.ClosedChannelException
> > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> java:110)
> > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> > at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > FileWriteAccessor.access(FileIOEngine.java:227)
> > at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > accessFile(FileIOEngine.java:170)
> > at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > write(FileIOEngine.java:116)
> > at
> > org.apache.hadoop.hbase.i

Re: [ANNOUNCE] New HBase committer Peter Somogyi

2018-02-23 Thread ramkrishna vasudevan

Congratulations Peter !!!

On Fri, Feb 23, 2018 at 3:40 PM, Peter Somogyi  wrote:

> Thank you very much everyone!
>
> On Thu, Feb 22, 2018 at 8:08 PM, Sean Busbey  wrote:
>
> > On behalf of the Apache HBase PMC, I am pleased to announce that Peter
> > Somogyi has accepted the PMC's invitation to become a committer on the
> > project.
> >
> > We appreciate all of Peter's great work thus far and look forward to
> > continued involvement.
> >
> > Please join me in congratulating Peter!
> >
>

Re: Uninitialized Message Exception thrown while getting values.

2018-01-17 Thread ramkrishna vasudevan

Hi

Which version of HBase you get this problem? Do you have any pb classpath
issues?

Regards
Ram

On Thu, Jan 18, 2018 at 12:40 PM, Karthick Ram 
wrote:

> "UninitializedMessageException : Message missing required fields : region,
> get", is thrown while performing Get. Due to this all the Get requests to
> the same Region Server are getting stalled.
>
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields : region, get
> at com.google.protobuf.AbstractMessage$Build.
> newUninitializedMessageException(AbstractMessage.java:770)
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:6377)
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:6309)
> at org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processRequest(RpcServer.java:1840)
> at org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processOneRpc(RpcServer.java:1775)
> at org.apache.hadoop.hbase.ipc.RpcServer$Connection.process(
> RPcServer.java:1623)
> at org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1603)
> at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:861)
> at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:643)
> at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:619)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>

Re: [ANNOUNCE] Please welcome new HBase committer YI Liang

2017-12-20 Thread ramkrishna vasudevan

Congratulations Yi Liang!!!

On Thu, Dec 21, 2017 at 12:21 PM, Pankaj kr  wrote:

> Congratulations YI Liang..!!
>
> Thanks & Regards,
> Pankaj
>
> HUAWEI TECHNOLOGIES CO.LTD.
> Huawei Tecnologies India Pvt. Ltd.
> Near EPIP Industrial Area, Kundalahalli Village
> Whitefield, Bangalore-560066
> www.huawei.com
> 
> -
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
> -Original Message-
> From: Jerry He [mailto:jerry...@gmail.com]
> Sent: Thursday, December 21, 2017 5:37 AM
> To: dev; user@hbase.apache.org
> Subject: [ANNOUNCE] Please welcome new HBase committer YI Liang
>
> On behalf of the Apache HBase PMC, I am pleased to announce that Yi Liang
> has accepted the PMC's invitation to become a committer on the project.
>
> We appreciate all of Yi's great work thus far and look forward to his
> continued involvement.
>
> Please join me in congratulating Yi!
>
> --
> Thanks,
> Jerry
>

Re: Regionservers consuming too much ram in HDP 2.6.

2017-12-08 Thread ramkrishna vasudevan

Thanks Esteban.

Probably Esteban is right. I have not seen a case where we have configured
1G but RS alone occupies 16G in the RAM.  so my knowledge in this case
could be limited here.

Yes the important thing is does the RS OOME. If that is not the case then
cluster is operable and does not go for aborts.

Regards
Ram




On Thu, Dec 7, 2017 at 9:40 PM, Esteban Gutierrez 
wrote:

> Hi Lalit,
>
> I don't think there shouldn't be any concern for you about RAM used by the
> JVM as long as it stays around the heap size + few 100s of MBs for stack
> and DM as Ram mentioned. Even if there is no activity on HBase (no puts, no
> reads) objects will be created and destroyed from time to time and the JVM
> will be touching RAM pages during this process and  the OS will be marking
> those memory pages as used even if they have been GC'd by the JVM. As long
> as you don't see OOMEs in the JVM or running low of available memory in the
> JVM there is no issue.
>
> thanks,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Thu, Dec 7, 2017 at 2:43 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Your JDK parameters and configs. Heap configs set in hbase-env.sh etc.
> > The hbase-site.xml details reltaed to memstore and cache.
> >
> > Regards
> > Ram
> >
> > On Thu, Dec 7, 2017 at 12:07 PM, Lalit Jadhav <
> lalit.jad...@nciportal.com>
> > wrote:
> >
> > > Hello,
> > >
> > > Yes, it is an interesting case.
> > > Ram, Sorry I cannot share whole details, but let me know params you
> need.
> > >
> > >
> > > On Thu, Dec 7, 2017 at 11:58 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > I don have any idea now from this. Seems to be an interesting case
> > since
> > > > none of the users have reported such an issue. I need to try out this
> > to
> > > > ascertain what is really happening.
> > > >
> > > > Can you paste your hbase-env.sh details and the hbase-site.xml just
> to
> > be
> > > > sure there is nothing overriding the configs?
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > > On Wed, Dec 6, 2017 at 5:41 PM, Lalit Jadhav <
> > lalit.jad...@nciportal.com
> > > >
> > > > wrote:
> > > >
> > > > > Are you sure the 16G taken up in the RAM is due to the region
> server?
> > > > > :Yes
> > > > >
> > > > > Are you having any other cache configuration like  bucket cache?
> > > > > :No
> > > > >
> > > > > Are you allocating any Direct_memory for the region servers?
> > > > > :No
> > > > >
> > > > > So when does this raise to 16G happen - is it after the regions are
> > > > created
> > > > > or even before them i.e just when you start the region server?
> > > > > : Even before regions are created.
> > > > >
> > > > > Which version of hbase is it?
> > > > > 1.1.2
> > > > >
> > > > > On Dec 6, 2017 4:02 PM, "ramkrishna vasudevan" <
> > > > > ramkrishna.s.vasude...@gmail.com> wrote:
> > > > >
> > > > > > Few more questions,
> > > > > > Are you sure the 16G taken up in the RAM is due to the region
> > server?
> > > > Are
> > > > > > you having any other cache configuration like  bucket cache?
> > > > > > Are you allocating any Direct_memory for the region servers?
> > > > > >
> > > > > > So when does this raise to 16G happen - is it after the regions
> are
> > > > > created
> > > > > > or even before them i.e just when you start the region server?
> > Which
> > > > > > version of hbase is it?
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > > On Wed, Dec 6, 2017 at 3:57 PM, Lalit Jadhav <
> > > > lalit.jad...@nciportal.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ramkrishna,
> > > > > > >
> > > > > > > Thanks for reply,
> > > > > > > Right now I am not performing any operation on HBase(it is
> idle),
> > > > > Still,
> > &g

Re: Regionservers consuming too much ram in HDP 2.6.

2017-12-07 Thread ramkrishna vasudevan

Your JDK parameters and configs. Heap configs set in hbase-env.sh etc.
The hbase-site.xml details reltaed to memstore and cache.

Regards
Ram

On Thu, Dec 7, 2017 at 12:07 PM, Lalit Jadhav 
wrote:

> Hello,
>
> Yes, it is an interesting case.
> Ram, Sorry I cannot share whole details, but let me know params you need.
>
>
> On Thu, Dec 7, 2017 at 11:58 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > I don have any idea now from this. Seems to be an interesting case since
> > none of the users have reported such an issue. I need to try out this to
> > ascertain what is really happening.
> >
> > Can you paste your hbase-env.sh details and the hbase-site.xml just to be
> > sure there is nothing overriding the configs?
> >
> > Regards
> > Ram
> >
> >
> > On Wed, Dec 6, 2017 at 5:41 PM, Lalit Jadhav  >
> > wrote:
> >
> > > Are you sure the 16G taken up in the RAM is due to the region server?
> > > :Yes
> > >
> > > Are you having any other cache configuration like  bucket cache?
> > > :No
> > >
> > > Are you allocating any Direct_memory for the region servers?
> > > :No
> > >
> > > So when does this raise to 16G happen - is it after the regions are
> > created
> > > or even before them i.e just when you start the region server?
> > > : Even before regions are created.
> > >
> > > Which version of hbase is it?
> > > 1.1.2
> > >
> > > On Dec 6, 2017 4:02 PM, "ramkrishna vasudevan" <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > Few more questions,
> > > > Are you sure the 16G taken up in the RAM is due to the region server?
> > Are
> > > > you having any other cache configuration like  bucket cache?
> > > > Are you allocating any Direct_memory for the region servers?
> > > >
> > > > So when does this raise to 16G happen - is it after the regions are
> > > created
> > > > or even before them i.e just when you start the region server? Which
> > > > version of hbase is it?
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Wed, Dec 6, 2017 at 3:57 PM, Lalit Jadhav <
> > lalit.jad...@nciportal.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Ramkrishna,
> > > > >
> > > > > Thanks for reply,
> > > > > Right now I am not performing any operation on HBase(it is idle),
> > > Still,
> > > > > utilization is 16GB. But when I shut down them, It frees this
> memory.
> > > > >
> > > > > On Wed, Dec 6, 2017 at 2:41 PM, ramkrishna vasudevan <
> > > > > ramkrishna.s.vasude...@gmail.com> wrote:
> > > > >
> > > > > > Hi Lalith
> > > > > >
> > > > > > Seems you have configured very minimum heap space.
> > > > > > So when you say RAM size is increasing  what are the operations
> > that
> > > > you
> > > > > > are performing when the memory increases. I can see heap is only
> 1G
> > > but
> > > > > > still your  memory is 16G.Are you sure that 16G is only due to
> > Region
> > > > > > servers? Are you having heavy writes or reads during that time.
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > > On Wed, Dec 6, 2017 at 2:31 PM, Lalit Jadhav <
> > > > lalit.jad...@nciportal.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Adding More info,
> > > > > > >
> > > > > > > Regions per region server : 13-15 Regions per RS
> > > > > > >
> > > > > > > memory region server is taking : 16GB
> > > > > > >
> > > > > > > Memstore size : 64 MB
> > > > > > >
> > > > > > > configured heap for Master : 1 GB
> > > > > > >
> > > > > > > configured heap for RegionServer : 1 GB
> > > > > > >
> > > > > > > On Tue, Dec 5, 2017 at 5:40 PM, Lalit Jadhav <
> > > > > lalit.jad...@nciportal.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello All,
> > > > > > > >
> > > > > > > > When we do any operations on the database(HBase),
> RegionServers
> > > are
> > > > > > > taking
> > > > > > > > too much ram and not releases until we restart them. Is there
> > any
> > > > > > > parameter
> > > > > > > > or property to release ram or to restrict RegionServers to
> take
> > > > this
> > > > > > much
> > > > > > > > of memory? Help will be appreciated.
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards,
> > > > > > > > Lalit Jadhav
> > > > > > > > Network Component Private Limited.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Lalit Jadhav
> > > > > > > Network Component Private Limited.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Lalit Jadhav
> > > > > Network Component Private Limited.
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Regards,
> Lalit Jadhav
> Network Component Private Limited.
>

Re: Regionservers consuming too much ram in HDP 2.6.

2017-12-06 Thread ramkrishna vasudevan

I don have any idea now from this. Seems to be an interesting case since
none of the users have reported such an issue. I need to try out this to
ascertain what is really happening.

Can you paste your hbase-env.sh details and the hbase-site.xml just to be
sure there is nothing overriding the configs?

Regards
Ram


On Wed, Dec 6, 2017 at 5:41 PM, Lalit Jadhav 
wrote:

> Are you sure the 16G taken up in the RAM is due to the region server?
> :Yes
>
> Are you having any other cache configuration like  bucket cache?
> :No
>
> Are you allocating any Direct_memory for the region servers?
> :No
>
> So when does this raise to 16G happen - is it after the regions are created
> or even before them i.e just when you start the region server?
> : Even before regions are created.
>
> Which version of hbase is it?
> 1.1.2
>
> On Dec 6, 2017 4:02 PM, "ramkrishna vasudevan" <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Few more questions,
> > Are you sure the 16G taken up in the RAM is due to the region server? Are
> > you having any other cache configuration like  bucket cache?
> > Are you allocating any Direct_memory for the region servers?
> >
> > So when does this raise to 16G happen - is it after the regions are
> created
> > or even before them i.e just when you start the region server? Which
> > version of hbase is it?
> >
> > Regards
> > Ram
> >
> > On Wed, Dec 6, 2017 at 3:57 PM, Lalit Jadhav  >
> > wrote:
> >
> > > Hi Ramkrishna,
> > >
> > > Thanks for reply,
> > > Right now I am not performing any operation on HBase(it is idle),
> Still,
> > > utilization is 16GB. But when I shut down them, It frees this memory.
> > >
> > > On Wed, Dec 6, 2017 at 2:41 PM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > Hi Lalith
> > > >
> > > > Seems you have configured very minimum heap space.
> > > > So when you say RAM size is increasing  what are the operations that
> > you
> > > > are performing when the memory increases. I can see heap is only 1G
> but
> > > > still your  memory is 16G.Are you sure that 16G is only due to Region
> > > > servers? Are you having heavy writes or reads during that time.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Wed, Dec 6, 2017 at 2:31 PM, Lalit Jadhav <
> > lalit.jad...@nciportal.com
> > > >
> > > > wrote:
> > > >
> > > > > Adding More info,
> > > > >
> > > > > Regions per region server : 13-15 Regions per RS
> > > > >
> > > > > memory region server is taking : 16GB
> > > > >
> > > > > Memstore size : 64 MB
> > > > >
> > > > > configured heap for Master : 1 GB
> > > > >
> > > > > configured heap for RegionServer : 1 GB
> > > > >
> > > > > On Tue, Dec 5, 2017 at 5:40 PM, Lalit Jadhav <
> > > lalit.jad...@nciportal.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hello All,
> > > > > >
> > > > > > When we do any operations on the database(HBase), RegionServers
> are
> > > > > taking
> > > > > > too much ram and not releases until we restart them. Is there any
> > > > > parameter
> > > > > > or property to release ram or to restrict RegionServers to take
> > this
> > > > much
> > > > > > of memory? Help will be appreciated.
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Lalit Jadhav
> > > > > > Network Component Private Limited.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Lalit Jadhav
> > > > > Network Component Private Limited.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Lalit Jadhav
> > > Network Component Private Limited.
> > >
> >
>

Re: Regionservers consuming too much ram in HDP 2.6.

2017-12-06 Thread ramkrishna vasudevan

Few more questions,
Are you sure the 16G taken up in the RAM is due to the region server? Are
you having any other cache configuration like  bucket cache?
Are you allocating any Direct_memory for the region servers?

So when does this raise to 16G happen - is it after the regions are created
or even before them i.e just when you start the region server? Which
version of hbase is it?

Regards
Ram

On Wed, Dec 6, 2017 at 3:57 PM, Lalit Jadhav 
wrote:

> Hi Ramkrishna,
>
> Thanks for reply,
> Right now I am not performing any operation on HBase(it is idle), Still,
> utilization is 16GB. But when I shut down them, It frees this memory.
>
> On Wed, Dec 6, 2017 at 2:41 PM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Hi Lalith
> >
> > Seems you have configured very minimum heap space.
> > So when you say RAM size is increasing  what are the operations that you
> > are performing when the memory increases. I can see heap is only 1G but
> > still your  memory is 16G.Are you sure that 16G is only due to Region
> > servers? Are you having heavy writes or reads during that time.
> >
> > Regards
> > Ram
> >
> > On Wed, Dec 6, 2017 at 2:31 PM, Lalit Jadhav  >
> > wrote:
> >
> > > Adding More info,
> > >
> > > Regions per region server : 13-15 Regions per RS
> > >
> > > memory region server is taking : 16GB
> > >
> > > Memstore size : 64 MB
> > >
> > > configured heap for Master : 1 GB
> > >
> > > configured heap for RegionServer : 1 GB
> > >
> > > On Tue, Dec 5, 2017 at 5:40 PM, Lalit Jadhav <
> lalit.jad...@nciportal.com
> > >
> > > wrote:
> > >
> > > > Hello All,
> > > >
> > > > When we do any operations on the database(HBase), RegionServers are
> > > taking
> > > > too much ram and not releases until we restart them. Is there any
> > > parameter
> > > > or property to release ram or to restrict RegionServers to take this
> > much
> > > > of memory? Help will be appreciated.
> > > >
> > > > --
> > > > Regards,
> > > > Lalit Jadhav
> > > > Network Component Private Limited.
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Lalit Jadhav
> > > Network Component Private Limited.
> > >
> >
>
>
>
> --
> Regards,
> Lalit Jadhav
> Network Component Private Limited.
>

Re: Regionservers consuming too much ram in HDP 2.6.

2017-12-06 Thread ramkrishna vasudevan

Hi Lalith

Seems you have configured very minimum heap space.
So when you say RAM size is increasing  what are the operations that you
are performing when the memory increases. I can see heap is only 1G but
still your  memory is 16G.Are you sure that 16G is only due to Region
servers? Are you having heavy writes or reads during that time.

Regards
Ram

On Wed, Dec 6, 2017 at 2:31 PM, Lalit Jadhav 
wrote:

> Adding More info,
>
> Regions per region server : 13-15 Regions per RS
>
> memory region server is taking : 16GB
>
> Memstore size : 64 MB
>
> configured heap for Master : 1 GB
>
> configured heap for RegionServer : 1 GB
>
> On Tue, Dec 5, 2017 at 5:40 PM, Lalit Jadhav 
> wrote:
>
> > Hello All,
> >
> > When we do any operations on the database(HBase), RegionServers are
> taking
> > too much ram and not releases until we restart them. Is there any
> parameter
> > or property to release ram or to restrict RegionServers to take this much
> > of memory? Help will be appreciated.
> >
> > --
> > Regards,
> > Lalit Jadhav
> > Network Component Private Limited.
> >
>
>
>
> --
> Regards,
> Lalit Jadhav
> Network Component Private Limited.
>

Re: [ANNOUNCE] New HBase committer Zheng Hu

2017-10-22 Thread ramkrishna vasudevan

Welcome Zheng and congratulations !!!

On Mon, Oct 23, 2017 at 11:48 AM, Duo Zhang  wrote:

> On behalf of the Apache HBase PMC, I am pleased to announce that Zheng Hu
> has accepted the PMC's invitation to become a committer on the project. We
> appreciate all of Zheng's generous contributions thus far and look forward
> to his continued involvement.
>
> Congratulations and welcome, Zheng!
>

Re: HBase Replication vs Read Replicas

2017-10-10 Thread ramkrishna vasudevan

Hi Kahlil

Your understanding is right as how HBase replication is across data centres
where as Hbase read replicas are more for providing faster availability for
reads.

>>not be the proper tool to use here since it appears to have higher
replication latency and be more catered towards Disaster Recovery than High
Availability.
Yes you are right here.

Read Replicas how ever am not sure if we can have it across data centres.
You can have your region servers hosted across different racks and the
region replicas are created in such a way that you could have your replica
regions in different racks so even if a rack is down your data can be
served from the other replica regions.


>>My use case is that I have a table I'd like to replicate between data
centers A and B. It is OK if all writes can only go through one data center
(say, A). However, all clients should be able to read from either A or B.
In particular, I'd like for some clients to be able to specifically say
they'd like to read from A and others to say they'd like to read from B,
for any given row key.

To answer this, long back we had feature developed call cross site big
table which allows you to configure two data centres and will allow your
cleint to write to these data centres and as you wanted the reads can be
specifically targetted to the data centre A or B by mentioning that in the
client API. There will be some lag as the replication has to happen but it
allows to manage your writes and reads across clusters using a single
client.

https://www.slideshare.net/HBaseCon/ecosystem-session-3.

Regards
Ram


On Tue, Oct 10, 2017 at 8:25 PM, Kahlil Oppenheimer <
kahliloppenhei...@gmail.com> wrote:

> Hi All,
>
> I have some questions about when to use HBase Replication vs. HBase Read
> Replicas. They seem to accomplish similar-ish things, and I'm trying to
> figure out which I should use.
>
> I've read through the documentation, but I am confused on a few points. It
> seems that HBase Replication can have very high latency for replication (on
> a magnitude of minutes). My application can tolerate a rough maximum of 60s
> of replication latency, so that would be problematic for me.
>
> Read Replicas seem to have quite low (configurable) replication latency,
> but do not seem to lend themselves cross-datacenter replication. For
> instance, having Replica 1 in Datacenter A and Replica 2 in Datacenter B,
> allowing clients to say "Read only from Datacenter A" vs. "Read only from
> Datacenter B".
>
> My use case is that I have a table I'd like to replicate between data
> centers A and B. It is OK if all writes can only go through one data center
> (say, A). However, all clients should be able to read from either A or B.
> In particular, I'd like for some clients to be able to specifically say
> they'd like to read from A and others to say they'd like to read from B,
> for any given row key.
>
> It is also OK if the data coming from one of these reads can be stale, so
> long as it is no more than 60s stale, and that the client has some
> indication that the data may not be up to date.
>
> Because of the 60s stale constraint, it seems like HBase Replication may
> not be the proper tool to use here since it appears to have higher
> replication latency and be more catered towards Disaster Recovery than High
> Availability.
>
> Read Replicas seem like the proper solution here, but the Timeline
> consistency model doesn't seem to let me say "Read from datacenter B", it
> just says "Try to read from all data-centers and return B if it gets back
> first". Furthermore, it doesn't seem intuitive to force the region replicas
> to be hosted on datacenter B.
>
> What would you all recommend? Am I misunderstanding either of these HBase
> features, or is there a more intuitive feature of HBase I should reference
> to solve this problem?
>
> For what it's worth, I'm running the CDH-5.9-1.2.0 version of HBase.
>
> Many thanks,
> Kahlil
>

Re: Hbase regionserver heap occupancy because of MSLAB

2017-10-04 Thread ramkrishna vasudevan

Hi

If there is MSLAB enabled - one is with pool and another is without Pool.

I hope you are using MSLAB without pool in this case. So only when ever
there are writes that happen then we create 2MB chunks and write data to
them. As and when the chunk is full we keep creating new chunks.

So when a memstore snapshot happens then we flush out all those chunks and
again we keep creating chunks to continue with writes.
So in your case when there are empty regions - ya there won't be any chunk
being created.

>>since data are removed after
configured TTL. Would like to know would that cause unnecessary heap usage?
If there is an active write that is happening then the last chunk would be
around till that is flushed and cleared off. How long is the TTL? Is it few
seconds or minutes where TTL expiry can even happen on the memstore data
itself?

Regards
Ram

On Thu, Oct 5, 2017 at 9:25 AM, Stack  wrote:

> On Wed, Oct 4, 2017 at 9:42 AM, Subash Kunjupillai 
> wrote:
>
> > As sited in  HBase Doc
> >  > -cdh5.1.3/book/regions.arch.html#too_many_regions>
> > , I understand that MSLAB of each memstore will occupy 2MB(default) of
> the
> > region server heap memory. So if there are 2000 regions in my region
> > server,
> > 3.9GB of my heap would be occupied.
> >
> > My question is, memory for MSLAB will be allocated as soon as the regions
> > come online or only if data is written to memstore?
>
>
> It looks like the latter, lazy allocation on first Cell add.
> See HeapMemStoreLAB alloc called out of DefaultMemStore.
>
> If up for experimenting, you could try running without MSLAB with G1GC
> enabled; various report G1GC mitigates need for MSLAB (I've not tried this
> myself).
>
>
>
> > Because I've a cluster
> > where 1000 of empty regions in a table, since data are removed after
> > configured TTL. Would like to know would that cause unnecessary heap
> usage?
> >
> >
> >
> As I read it, it would; the last Chunk allocated in DefaultMemStore would
> stick around.
>
> St.Ack
>
>
>
>
> >
> >
> > --
> > Sent from: http://apache-hbase.679495.n3.nabble.com/HBase-User-f4020416
> > .html
> >
>

Re: Deletes with cell visibility labels

2017-09-21 Thread ramkrishna vasudevan

So the problem that you are facing here is that you don know what is the
visibility labels associated with a row that was added during 'PUT'. Now to
form the delete you are not sure what are the exact labels so that the
'PUT' remains masked.

But in terms of a generic use case that is the way you can exactly mask a
PUT with visibility labels right?
Say if a row is having a sensitive info then it is marked with PRIVATE &
SECRET labels, then if you need to remove that row ( so that it is later
changed to PUBLIC) it is always better you specify the ROW with exact
labels.
The current impl is that we mask only those PUTs which matches exactly with
the deletes visibility labels.

And to answer your question in a simple way
Since you are not sure what were the labels added for a PUT you need to re
run the algo that generated the labels and add it with deletes if that
specific row needs to be masked.

Regards
Ram

On Thu, Sep 21, 2017 at 9:53 PM, Mike Thomsen 
wrote:

> Yes, I realized my mistake shortly after posting. So my question is how do
> you form a proper delete? Is the expected behavior roughly...
>
> 1. Get the row.
> 2. Rerun the algorithm that computed the visibility label on the row.
> 3. Build a list of deletes.
>
> Is that what we're expected to do here? Or is there a simpler way of
> handling this?
>
> Thanks,
>
> Mike
>
> On Thu, Sep 21, 2017 at 12:18 PM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Hi Thomson
> >
> > I think you are saying that the shell allows you to specify
> > delete 'tablename', 'row', 'family', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
> > but the java client does not allow to do it? I doubt it.
> >
> > In case of mutations like  puts and deletes what we pass is the
> visibility
> > labels. Now when you do a scan that is where we specify the AUTHORIZTIONs
> > so that only those cells with visibility cells as passed in the
> > AUTHORIZATIONS are returned back.
> >
> > Hope you find this useful. Let us know if you need further inputs.
> >
> > Regards
> > Ram
> >
> > On Thu, Sep 21, 2017 at 6:04 PM, Mike Thomsen 
> > wrote:
> >
> > > According to the javadocs and some examples I've seen, it looks like
> with
> > > the Java client you have to know the visibility label of the cell you
> > want
> > > to delete. You cannot just pass a token list like you can in the shell
> > > (delete TABLE, ROW, COLUMN, {AUTHORIZATIONS => ["token", "token"]})
> > >
> > > Is this true or am I missing something?
> > >
> > > Thanks,
> > >
> > > Mike
> > >
> >
>

Re: Deletes with cell visibility labels

2017-09-21 Thread ramkrishna vasudevan

Hi Thomson

I think you are saying that the shell allows you to specify
delete 'tablename', 'row', 'family', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
but the java client does not allow to do it? I doubt it.

In case of mutations like  puts and deletes what we pass is the visibility
labels. Now when you do a scan that is where we specify the AUTHORIZTIONs
so that only those cells with visibility cells as passed in the
AUTHORIZATIONS are returned back.

Hope you find this useful. Let us know if you need further inputs.

Regards
Ram

On Thu, Sep 21, 2017 at 6:04 PM, Mike Thomsen 
wrote:

> According to the javadocs and some examples I've seen, it looks like with
> the Java client you have to know the visibility label of the cell you want
> to delete. You cannot just pass a token list like you can in the shell
> (delete TABLE, ROW, COLUMN, {AUTHORIZATIONS => ["token", "token"]})
>
> Is this true or am I missing something?
>
> Thanks,
>
> Mike
>

Re: Offheap config question for Hbase 1.1.2

2017-09-12 Thread ramkrishna vasudevan

Oh I did not know that the version was HDP. I just thought the set up
process was through some HDP links. Thanks.

On Tue, Sep 12, 2017 at 10:44 PM, Ted Yu  wrote:

> Ram:
> As Arul indicated, HDP is being used where there're a lot of backports to
> 1.1.2 baseline.
>
> Arul:
> I suggest going thru vendor's channel first.
>
> Cheers
>
> On Tue, Sep 12, 2017 at 10:11 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Why Master also accepts MaxDirectMemory config is probably because in 1.1
> > onwards we treat HMaster as HregionServer but it does the region
> managment
> > also. Am not very sure in 1.1.2 is HMaster allowed to host regions ? If
> so
> > you need to configure MaxDirectMemory, if not probably we can see how we
> > can avoid it. We need to raise a JIRA for that.
> >
> > Coming to the size of MaxDirectMemory less than bucket cache - I am
> > wondering was there a bug previously? Because assume you need 25G offheap
> > bucket cache then atleast 25G MaxDirectMemory is a must. Ideally you may
> > need some delta more than 25G.
> >
> > 0.98 is obsolete now so its better we go with how 1.1.2 works. But if you
> > feel there is a documentation that could help I think it is better we
> > provide one so that users like you are not affected.
> >
> > Regards
> > Ram
> >
> >
> >
> >
> > On Tue, Sep 12, 2017 at 10:22 PM, Arul Ramachandran 
> > wrote:
> >
> > > Thank you, Ram.
> > >
> > > >> So are you trying to use bucket cache feature in offheap mode with
> > > 1.1.2?
> > >
> > > Yes.
> > >
> > > >> So even in 0.98 you were using bucket cache in offheap mode?
> > >
> > > Yes, but it is a different hbase cluster and it run 0.98. The one I am
> > > trying to setup offheap cache is hbase 1.1.2
> > >
> > >
> > > -arul
> > >
> > >
> > > On Tue, Sep 12, 2017 at 9:34 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > Hi
> > > >
> > > > So are you trying to use bucket cache feature in offheap mode with
> > 1.1.2?
> > > > If so then it is needed that the MaxDirectMemory is greater than the
> > > > offheap bucket cache size.
> > > >
> > > > If you are not using in offheap mode then probably there is no need
> for
> > > > MaxDirectMemory to be greater than bucket cache size.
> > > >
> > > > >>in Hbase 0.98, I had to set -XX:MaxDirectMemorySize less than
> > > > hbase.bucket.cache.size
> > > > So even in 0.98 you were using bucket cache in offheap mode?
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Tue, Sep 12, 2017 at 9:40 PM, Ted Yu  wrote:
> > > >
> > > > > Looks like the config you meant should be hbase.bucketcache.size
> > > > >
> > > > > As the refguide says:
> > > > >
> > > > > A float that EITHER represents a percentage of total heap memory
> size
> > > to
> > > > > give to the cache (if < 1.0) OR, it is the total capacity in
> > megabytes
> > > of
> > > > > BucketCache. Default: 0.0
> > > > >
> > > > > If you specify the size as capacity, -XX:MaxDirectMemorySize should
> > be
> > > > > bigger than the capacity.
> > > > >
> > > > > For #2, did you encounter some error ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Tue, Sep 12, 2017 at 8:52 AM, Arul Ramachandran <
> > arkup...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > In HBase 1.1.2, I am setting up bucket cache. I set
> MaxDirectMemory
> > > > size
> > > > > > greater than hbase.bucket.cache.size  - only then it would work.
> > > > > >
> > > > > > 1) Does HBASE_REGIONSERVER_OPTS -XX:MaxDirectMemorySize needs to
> be
> > > > > greater
> > > > > > than hbase.bucket.cache.size?
> > > > > > 2) It seems with hbase 1.1.2, HBASE_MASTER_OPTS also needs the
> > > > > >  -XX:MaxDirectMemorySize setting?
> > > > > >
> > > > > > IIRC, in Hbase 0.98, I had to set -XX:MaxDirectMemorySize less
> than
> > > > > > hbase.bucket.cache.size --and-- I did not have to set
> > > > > >  -XX:MaxDirectMemorySize for HBASE_MASTER_OPTS.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Arul
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Offheap config question for Hbase 1.1.2

2017-09-12 Thread ramkrishna vasudevan

Why Master also accepts MaxDirectMemory config is probably because in 1.1
onwards we treat HMaster as HregionServer but it does the region managment
also. Am not very sure in 1.1.2 is HMaster allowed to host regions ? If so
you need to configure MaxDirectMemory, if not probably we can see how we
can avoid it. We need to raise a JIRA for that.

Coming to the size of MaxDirectMemory less than bucket cache - I am
wondering was there a bug previously? Because assume you need 25G offheap
bucket cache then atleast 25G MaxDirectMemory is a must. Ideally you may
need some delta more than 25G.

0.98 is obsolete now so its better we go with how 1.1.2 works. But if you
feel there is a documentation that could help I think it is better we
provide one so that users like you are not affected.

Regards
Ram




On Tue, Sep 12, 2017 at 10:22 PM, Arul Ramachandran 
wrote:

> Thank you, Ram.
>
> >> So are you trying to use bucket cache feature in offheap mode with
> 1.1.2?
>
> Yes.
>
> >> So even in 0.98 you were using bucket cache in offheap mode?
>
> Yes, but it is a different hbase cluster and it run 0.98. The one I am
> trying to setup offheap cache is hbase 1.1.2
>
>
> -arul
>
>
> On Tue, Sep 12, 2017 at 9:34 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Hi
> >
> > So are you trying to use bucket cache feature in offheap mode with 1.1.2?
> > If so then it is needed that the MaxDirectMemory is greater than the
> > offheap bucket cache size.
> >
> > If you are not using in offheap mode then probably there is no need for
> > MaxDirectMemory to be greater than bucket cache size.
> >
> > >>in Hbase 0.98, I had to set -XX:MaxDirectMemorySize less than
> > hbase.bucket.cache.size
> > So even in 0.98 you were using bucket cache in offheap mode?
> >
> > Regards
> > Ram
> >
> > On Tue, Sep 12, 2017 at 9:40 PM, Ted Yu  wrote:
> >
> > > Looks like the config you meant should be hbase.bucketcache.size
> > >
> > > As the refguide says:
> > >
> > > A float that EITHER represents a percentage of total heap memory size
> to
> > > give to the cache (if < 1.0) OR, it is the total capacity in megabytes
> of
> > > BucketCache. Default: 0.0
> > >
> > > If you specify the size as capacity, -XX:MaxDirectMemorySize should be
> > > bigger than the capacity.
> > >
> > > For #2, did you encounter some error ?
> > >
> > > Cheers
> > >
> > > On Tue, Sep 12, 2017 at 8:52 AM, Arul Ramachandran  >
> > > wrote:
> > >
> > > > In HBase 1.1.2, I am setting up bucket cache. I set MaxDirectMemory
> > size
> > > > greater than hbase.bucket.cache.size  - only then it would work.
> > > >
> > > > 1) Does HBASE_REGIONSERVER_OPTS -XX:MaxDirectMemorySize needs to be
> > > greater
> > > > than hbase.bucket.cache.size?
> > > > 2) It seems with hbase 1.1.2, HBASE_MASTER_OPTS also needs the
> > > >  -XX:MaxDirectMemorySize setting?
> > > >
> > > > IIRC, in Hbase 0.98, I had to set -XX:MaxDirectMemorySize less than
> > > > hbase.bucket.cache.size --and-- I did not have to set
> > > >  -XX:MaxDirectMemorySize for HBASE_MASTER_OPTS.
> > > >
> > > >
> > > > Thanks,
> > > > Arul
> > > >
> > >
> >
>

Re: Offheap config question for Hbase 1.1.2

2017-09-12 Thread ramkrishna vasudevan

Hi

So are you trying to use bucket cache feature in offheap mode with 1.1.2?
If so then it is needed that the MaxDirectMemory is greater than the
offheap bucket cache size.

If you are not using in offheap mode then probably there is no need for
MaxDirectMemory to be greater than bucket cache size.

>>in Hbase 0.98, I had to set -XX:MaxDirectMemorySize less than
hbase.bucket.cache.size
So even in 0.98 you were using bucket cache in offheap mode?

Regards
Ram

On Tue, Sep 12, 2017 at 9:40 PM, Ted Yu  wrote:

> Looks like the config you meant should be hbase.bucketcache.size
>
> As the refguide says:
>
> A float that EITHER represents a percentage of total heap memory size to
> give to the cache (if < 1.0) OR, it is the total capacity in megabytes of
> BucketCache. Default: 0.0
>
> If you specify the size as capacity, -XX:MaxDirectMemorySize should be
> bigger than the capacity.
>
> For #2, did you encounter some error ?
>
> Cheers
>
> On Tue, Sep 12, 2017 at 8:52 AM, Arul Ramachandran 
> wrote:
>
> > In HBase 1.1.2, I am setting up bucket cache. I set MaxDirectMemory size
> > greater than hbase.bucket.cache.size  - only then it would work.
> >
> > 1) Does HBASE_REGIONSERVER_OPTS -XX:MaxDirectMemorySize needs to be
> greater
> > than hbase.bucket.cache.size?
> > 2) It seems with hbase 1.1.2, HBASE_MASTER_OPTS also needs the
> >  -XX:MaxDirectMemorySize setting?
> >
> > IIRC, in Hbase 0.98, I had to set -XX:MaxDirectMemorySize less than
> > hbase.bucket.cache.size --and-- I did not have to set
> >  -XX:MaxDirectMemorySize for HBASE_MASTER_OPTS.
> >
> >
> > Thanks,
> > Arul
> >
>

Re: Visibility labels without Kerberos

2017-09-06 Thread ramkrishna vasudevan

you can setup visibility labels without kerberos. You just need the
visibility related coprocessor in ur config file.

Regards
Ram


excuse typos. sent from mobile

On 6 Sep 2017 17:59, "Mike Thomsen"  wrote:

Is it possible to use visibility labels without Kerberos?  Our admins are
still figuring out how to set up Kerberos, and we just need something
really simple to get started like being able to set a list of tokens on a
scanner and go with that. Is that possible?

Thanks,

Mike

Re: Multiple column families - scan performance

2017-08-22 Thread ramkrishna vasudevan

In HBase even if you say keyOnlyFilter there is a column family involved.
In this case if the scan does not specify addfamily() then I think all the
column families will be loaded.

Regards
Ram

On Tue, Aug 22, 2017 at 6:47 PM, Partha  wrote:

> One other observation - even scanning 1MM rowkeys (using keyonlyfilter and
> firstkeyonlyfilter) takes 4x the time on 2nd table. No column family is
> queried at all in this test..
>
> On Aug 21, 2017 10:47 PM, "Partha"  wrote:
>
> > hbase(main):001:0> describe 'TABLE1'
> > Table TABLE1 is ENABLED
> > TABLE1
> > COLUMN FAMILIES DESCRIPTION
> > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> > CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > 1 row(s) in 0.2410 seconds
> >
> > hbase(main):002:0> describe 'TABLE2'
> > Table TABLE2 is ENABLED
> > TABLE2
> > COLUMN FAMILIES DESCRIPTION
> > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> > OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> > OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > {NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC
> > KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > {NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> 'FAST_DIFF',
> > TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> > CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> >
> > Here are the table definitions..
> >
> > On Mon, Aug 21, 2017 at 10:06 PM, Partha  wrote:
> >
> >>   final Scan scan = new Scan(startInclusive, endExclusive)
> >> .addFamily(stage.getBytes())
> >> .setCaching(DEFAULT_BATCH_SIZE)
> >> .setCacheBlocks(false);
> >>
> >> Here is the scan test code. This will return ~1MM rows from both tables,
> >> while limiting scan to a single column family..
> >>
> >> Thanks.
> >>
> >> On Mon, Aug 21, 2017 at 2:16 PM, Partha  wrote:
> >>
> >>> addFamily only. There is only 1 column/qualifier per column family
> >>>
> >>>
> >>> On Aug 21, 2017 2:05 PM, "Anoop John"  wrote:
> >>>
> >>> In ur test are u using Scan#addColumn(byte [] family, byte []
> >>> qualifier)  or it is addFamily(byte [] family) only?
> >>>
> >>> On Mon, Aug 21, 2017 at 10:02 PM, Partha 
> wrote:
> >>> > Block cache is disabled on both scan tests. Setcaching is set to 500
> >>> in both
> >>> > scans. Hbase version is 1.1.2.2.6.0.3-8
> >>> >
> >>> > Will post client scan test code.
> >>> >
> >>> > Thanks
> >>> >
> >>> >
> >>> > On Aug 21, 2017 8:57 AM, "Anoop John"  wrote:
> >>> >
> >>> > I was abt to ask to whether have run the tests after a major
> >>> > compaction.  But there also u are facing same issue it seems !
> >>> >
> >>> > Which version of HBase?
> >>> >
> >>> > Block cache been used?  What are the size and configs related to
> cache?
> >>> >
> >>> > Can u pls paste the exact client side code been used in tests?
> >>> >
> >>> > -Anoop-
> >>> >
> >>> > On Sun, Aug 20, 2017 at 4:36 AM, Partha 
> >>> wrote:
> >>> >> Anoop,
> >>> >>
> >>> >> Yes, each column family (in both tables) uses the same encoding
> >>> >> (fast-diff)
> >>> >> and same compression (gzip).
> >>> >>
> >>> >> I suggest you to just try the simple test as my case and see if you
> >>> notice
> >>> >> a
> >>> >> similar drop in performance (almost linear to the # of column
> >>> families)
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>
> >
>

Re: Multiple column families - scan performance

2017-08-21 Thread ramkrishna vasudevan

One more request would be to check the same test case with a new version of
hbase - probably with the 1.3 or 1.2 latest. This is just to confirm if the
problem that you see is across all releases.

Because a simple test case reveals that with addFamily only the specified
column is scanned and we don't read other families at all. (with or without
encoding).

Regards
Ram

On Tue, Aug 22, 2017 at 10:49 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Can you try one more thing - instead of addFamily try using
> addColumn(byte[] fam, byte[] qual). Since you are sure that there is only
> one qualifier.
> See how it works? Does it reduce the performance or increase the
> performance than the addFamily() and how is it related to the 1 CF case.
>
> Also just to be sure - are you sure that the 4 CF table has only one
> qualifier?
>
> Regards
> Ram
>
> On Tue, Aug 22, 2017 at 8:17 AM, Partha  wrote:
>
>> hbase(main):001:0> describe 'TABLE1'
>> Table TABLE1 is ENABLED
>> TABLE1
>> COLUMN FAMILIES DESCRIPTION
>> {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
>> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
>> 'FAST_DIFF',
>> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
>> CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>> 1 row(s) in 0.2410 seconds
>>
>> hbase(main):002:0> describe 'TABLE2'
>> Table TABLE2 is ENABLED
>> TABLE2
>> COLUMN FAMILIES DESCRIPTION
>> {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
>> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
>> 'FAST_DIFF',
>> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
>> OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>> {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
>> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
>> 'FAST_DIFF',
>> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
>> OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>> {NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
>> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
>> 'FAST_DIFF',
>> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC
>> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>> {NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
>> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
>> 'FAST_DIFF',
>> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
>> CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>>
>> Here are the table definitions..
>>
>> On Mon, Aug 21, 2017 at 10:06 PM, Partha  wrote:
>>
>> >   final Scan scan = new Scan(startInclusive, endExclusive)
>> > .addFamily(stage.getBytes())
>> > .setCaching(DEFAULT_BATCH_SIZE)
>> > .setCacheBlocks(false);
>> >
>> > Here is the scan test code. This will return ~1MM rows from both tables,
>> > while limiting scan to a single column family..
>> >
>> > Thanks.
>> >
>> > On Mon, Aug 21, 2017 at 2:16 PM, Partha  wrote:
>> >
>> >> addFamily only. There is only 1 column/qualifier per column family
>> >>
>> >>
>> >> On Aug 21, 2017 2:05 PM, "Anoop John"  wrote:
>> >>
>> >> In ur test are u using Scan#addColumn(byte [] family, byte []
>> >> qualifier)  or it is addFamily(byte [] family) only?
>> >>
>> >> On Mon, Aug 21, 2017 at 10:02 PM, Partha 
>> wrote:
>> >> > Block cache is disabled on both scan tests. Setcaching is set to 500
>> in
>> >> both
>> >> > scans. Hbase version is 1.1.2.2.6.0.3-8
>> >> >
>> >> > Will post client scan test code.
>> >> >
>> >> > Thanks
>> >> >
>> >> >
>> >> > On Aug 21, 2017 8:57 AM, "Anoop John"  wrote:
>> >> >
>> >> > I was abt to ask to whether have run the tests after a major
>> >> > compaction.  But there also u are facing same issue it seems !
>> >> >
>> >> > Which version of HBase?
>> >> >
>> >> > Block cache been used?  What are the size and configs related to
>> cache?
>> >> >
>> >> > Can u pls paste the exact client side code been used in tests?
>> >> >
>> >> > -Anoop-
>> >> >
>> >> > On Sun, Aug 20, 2017 at 4:36 AM, Partha 
>> wrote:
>> >> >> Anoop,
>> >> >>
>> >> >> Yes, each column family (in both tables) uses the same encoding
>> >> >> (fast-diff)
>> >> >> and same compression (gzip).
>> >> >>
>> >> >> I suggest you to just try the simple test as my case and see if you
>> >> notice
>> >> >> a
>> >> >> similar drop in performance (almost linear to the # of column
>> families)
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >
>>
>
>

Re: Multiple column families - scan performance

2017-08-21 Thread ramkrishna vasudevan

Can you try one more thing - instead of addFamily try using
addColumn(byte[] fam, byte[] qual). Since you are sure that there is only
one qualifier.
See how it works? Does it reduce the performance or increase the
performance than the addFamily() and how is it related to the 1 CF case.

Also just to be sure - are you sure that the 4 CF table has only one
qualifier?

Regards
Ram

On Tue, Aug 22, 2017 at 8:17 AM, Partha  wrote:

> hbase(main):001:0> describe 'TABLE1'
> Table TABLE1 is ENABLED
> TABLE1
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> 1 row(s) in 0.2410 seconds
>
> hbase(main):002:0> describe 'TABLE2'
> Table TABLE2 is ENABLED
> TABLE2
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BL
> OCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf3', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> {NAME => 'cf4', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF',
> TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLO
> CKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>
> Here are the table definitions..
>
> On Mon, Aug 21, 2017 at 10:06 PM, Partha  wrote:
>
> >   final Scan scan = new Scan(startInclusive, endExclusive)
> > .addFamily(stage.getBytes())
> > .setCaching(DEFAULT_BATCH_SIZE)
> > .setCacheBlocks(false);
> >
> > Here is the scan test code. This will return ~1MM rows from both tables,
> > while limiting scan to a single column family..
> >
> > Thanks.
> >
> > On Mon, Aug 21, 2017 at 2:16 PM, Partha  wrote:
> >
> >> addFamily only. There is only 1 column/qualifier per column family
> >>
> >>
> >> On Aug 21, 2017 2:05 PM, "Anoop John"  wrote:
> >>
> >> In ur test are u using Scan#addColumn(byte [] family, byte []
> >> qualifier)  or it is addFamily(byte [] family) only?
> >>
> >> On Mon, Aug 21, 2017 at 10:02 PM, Partha 
> wrote:
> >> > Block cache is disabled on both scan tests. Setcaching is set to 500
> in
> >> both
> >> > scans. Hbase version is 1.1.2.2.6.0.3-8
> >> >
> >> > Will post client scan test code.
> >> >
> >> > Thanks
> >> >
> >> >
> >> > On Aug 21, 2017 8:57 AM, "Anoop John"  wrote:
> >> >
> >> > I was abt to ask to whether have run the tests after a major
> >> > compaction.  But there also u are facing same issue it seems !
> >> >
> >> > Which version of HBase?
> >> >
> >> > Block cache been used?  What are the size and configs related to
> cache?
> >> >
> >> > Can u pls paste the exact client side code been used in tests?
> >> >
> >> > -Anoop-
> >> >
> >> > On Sun, Aug 20, 2017 at 4:36 AM, Partha 
> wrote:
> >> >> Anoop,
> >> >>
> >> >> Yes, each column family (in both tables) uses the same encoding
> >> >> (fast-diff)
> >> >> and same compression (gzip).
> >> >>
> >> >> I suggest you to just try the simple test as my case and see if you
> >> notice
> >> >> a
> >> >> similar drop in performance (almost linear to the # of column
> families)
> >> >
> >> >
> >>
> >>
> >>
> >
>

Re: Multiple column families - scan performance

2017-08-17 Thread ramkrishna vasudevan

bq. a scan test on (any) single
column family in the 2nd table takes 4x the time to scan the single column
family from the 1st table
So which means your scan is targeted for a specific family only I believe?

Are you seeing lot of cache miss for the 4 col family table where as the 1
col family table does not have heavy cache miss rate?

Regards
Ram

On Fri, Aug 18, 2017 at 7:18 AM, Anoop John  wrote:

> So on the 2nd table, even if there are 4 CFs , while scanning you need
> only data from single CF.  And this under test CF is similar to what u
> have in the 1st table?  I mean same encoding and compression schema
> and data size?   While creating scan for 2nd table how u make?  I hope
> u do
> Scan s = new Scan();
> s.setStartRow
> s.setStopRow
> s.addFamily(cf)
>
> Correct?
>
> -Anoop-
>
> On Thu, Aug 17, 2017 at 4:42 PM, Partha  wrote:
> > I have 2 HBase tables - one with a single column family, and other has 4
> > column families. Both tables are keyed by same rowkey, and the column
> > families all have a single column qualifier each, with a json string as
> > value (each json payload is about 10-20K in size). All column families
> use
> > fast-diff encoding and gzip compression.
> >
> > After loading about 60MM rows to each table, a scan test on (any) single
> > column family in the 2nd table takes 4x the time to scan the single
> column
> > family from the 1st table. In both cases, the scanner is bounded by a
> start
> > and stop key to scan 1MM rows. Performance did not change much even after
> > running a major compaction on both tables.
> >
> > Though HBase doc and other tech forums recommend not using more than 1
> > column family per table, nothing I have read so far suggests scan
> > performance will linearly degrade based on number of column families. Has
> > anyone else experienced this, and is there a simple explanation for this?
> >
> > To note, the reason second table has 4 column families is even though I
> > only scan one column family at a time now, there are requirements to scan
> > multiple column families from that table given a set of rowkeys.
> >
> > Thanks for any insight into the performance question.
>

Re: Awesome HBase - a curated list

2017-08-01 Thread ramkrishna vasudevan

I was going thro those links yesterday and wanted to ask you is these
projects
http://www.geomesa.org/ still active?

BTW thanks for letting me know  the ecosystem that are available on HBase.

Regards
Ram

On Tue, Aug 1, 2017 at 10:59 PM, Anoop John  wrote:

> This is great and very useful.. Tks.
>
> On Mon, Jul 31, 2017 at 8:34 PM, Robert Yokota  wrote:
> > To help me keep track of all the awesome stuff in the HBase ecosystem, I
> > started a list.  Let me know if I missed anything awesome.
> >
> > https://github.com/rayokota/awesome-hbase
>

Re: [ANNOUNCE] New HBase committer Vikas Vishwakarma

2017-07-30 Thread ramkrishna vasudevan

Congrats Vikas !!

Regards
Ram

On Mon, Jul 31, 2017 at 10:16 AM, Pankaj kr  wrote:

> Congratulations Vikas..!!
>
> Thanks & Regards,
> Pankaj
>
> HUAWEI TECHNOLOGIES CO.LTD.
> Huawei Tecnologies India Pvt. Ltd.
> Near EPIP Industrial Area, Kundalahalli Village
> Whitefield, Bangalore-560066
> www.huawei.com
> 
> -
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
>
> -Original Message-
> From: Andrew Purtell [mailto:apurt...@apache.org]
> Sent: Saturday, July 29, 2017 6:33 AM
> To: d...@hbase.apache.org; user@hbase.apache.org
> Subject: [ANNOUNCE] New HBase committer Vikas Vishwakarma
>
> On behalf of the Apache HBase PMC, I am pleased to announce that Vikas
> Vishwakarma has accepted the PMC's invitation to become a committer on the
> project.
>
> We appreciate all of Vikas's great work thus far and look forward to
> continued involvement.
>
> Please join me in congratulating Vikas!
>
> --
> Best regards,
> Andrew
>

Re: [ANNOUNCE] New HBase committer Abhishek Singh Chouhan

2017-07-30 Thread ramkrishna vasudevan

Congratulations Abhishek !!!

Regards
Ram

On Mon, Jul 31, 2017 at 10:16 AM, Pankaj kr  wrote:

> Congratulations Abhishek..!!
>
> Thanks & Regards,
> Pankaj
>
> HUAWEI TECHNOLOGIES CO.LTD.
> Huawei Tecnologies India Pvt. Ltd.
> Near EPIP Industrial Area, Kundalahalli Village
> Whitefield, Bangalore-560066
> www.huawei.com
> 
> -
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
> -Original Message-
> From: Andrew Purtell [mailto:apurt...@apache.org]
> Sent: Saturday, July 29, 2017 6:32 AM
> To: d...@hbase.apache.org; user@hbase.apache.org
> Subject: [ANNOUNCE] New HBase committer Abhishek Singh Chouhan
>
> On behalf of the Apache HBase PMC, I am pleased to announce that Abhishek
> Singh Chouhan has accepted the PMC's invitation to become a committer on
> the project.
>
> We appreciate all of Abhishek's great work thus far and look forward to
> continued involvement.
>
> Please join me in congratulating Abhishek!
>
> --
> Best regards,
> Andrew
>

Re: [ANNOUNCE] Chunhui Shen joins the Apache HBase PMC

2017-07-04 Thread ramkrishna vasudevan

Congratulations !!!

On Wed, Jul 5, 2017 at 10:33 AM, QI Congyun  wrote:

> Congratulations, Chunhui.
>
> -Original Message-
> From: Jerry He [mailto:jerry...@gmail.com]
> Sent: Wednesday, July 05, 2017 1:02 PM
> To: d...@hbase.apache.org; user@hbase.apache.org
> Subject: Re: [ANNOUNCE] Chunhui Shen joins the Apache HBase PMC
>
> Congrats,  Chunhui!
>
> Thanks
>
> Jerry
>
> On Tue, Jul 4, 2017 at 8:37 PM Anoop John  wrote:
>
> > Congrats Chunhui..
> >
> > On Wed, Jul 5, 2017 at 6:55 AM, Pankaj kr  wrote:
> > > Congratulations Chunhui..!!
> > >
> > > Regards,
> > > Pankaj
> > >
> > >
> > > -Original Message-
> > > From: Yu Li [mailto:car...@gmail.com]
> > > Sent: Tuesday, July 04, 2017 1:24 PM
> > > To: d...@hbase.apache.org; Hbase-User
> > > Subject: [ANNOUNCE] Chunhui Shen joins the Apache HBase PMC
> > >
> > > On behalf of the Apache HBase PMC I am pleased to announce that
> > > Chunhui
> > Shen has accepted our invitation to become a PMC member on the Apache
> > HBase project. He has been an active contributor to HBase for past many
> years.
> > Looking forward for many more contributions from him.
> > >
> > > Please join me in welcoming Chunhui to the HBase PMC!
> > >
> > > Best Regards,
> > > Yu
> >
>

Re: [ANNOUNCE] Chunhui Shen joins the Apache HBase PMC

2017-07-03 Thread ramkrishna vasudevan

Congratulations !!

On Tue, Jul 4, 2017 at 11:00 AM, 张铎(Duo Zhang) 
wrote:

> Congratulations!
>
> Yu Li 于2017年7月4日 周二13:24写道：
>
> > On behalf of the Apache HBase PMC I am pleased to announce that Chunhui
> > Shen
> > has accepted our invitation to become a PMC member on the Apache
> > HBase project. He has been an active contributor to HBase for past many
> > years. Looking forward for many more contributions from him.
> >
> > Please join me in welcoming Chunhui to the HBase PMC!
> >
> > Best Regards,
> > Yu
> >
>

Fwd: Encryption of exisiting data in Stripe Compaction

2017-06-20 Thread ramkrishna vasudevan

Hi all

Interesting case with Stripe compactions and Encryptions. Does any one has
any suggestion for Karthick's case? The initial mail was targetted to
issues@ and so forwarding it to dev@ and user@.


Regards
Ram

-- Forwarded message --
From: ramkrishna vasudevan 
Date: Tue, Jun 20, 2017 at 4:51 PM
Subject: Re: Encryption of exisiting data in Stripe Compaction
To: Karthick Ram 


I am not aware of any other mechanism. I just noticed that you had fwded
the message to issues@ and not to dev@ and users@. Let me forward it to
those mailing address. Thanks Karthick.

Regards
Ram

On Mon, Jun 19, 2017 at 1:07 PM, Karthick Ram 
wrote:

> Hi,
> Yes we are doing exactly the same. We altered the table with
> exploringcompaction and triggered a major compaction. But when it comes to
> key rotation, which we do very often, we have to manually alter the table
> and rollback to previous compaction policy. Currently we have a cron job
> for this. Is there any other way to automate this?
>
> Regards
> Karthick R
>
> On Thu, Jun 15, 2017 at 9:55 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
>> Hi
>> Very interesting case. Ya Stripe compaction does not need to under go a
>> major compaction if it already running under stripe compaction (reading the
>> docs I get this).
>> Since you have enable encryption at a later point of time you face this
>> issue I believe. The naive workaround I can think of is that do a alter
>> table with default compaction and it will do a major compaction and once
>> that is done again move back to Stripe compaction?  Will that work?
>>
>> I would like to hear opinion of others who have experience with Strip
>> compaction.
>>
>> Regards
>> Ram
>>
>> On Wed, Jun 14, 2017 at 10:25 AM, Karthick Ram 
>> wrote:
>>
>>> We have a table which has time series data with Stripe Compaction
>>> enabled.
>>> After encryption has been enabled for this table the newer entries are
>>> encrypted and inserted. However to encrypt the existing data in the
>>> table,
>>> a major compaction has to run. Since, stripe compaction doesn't allow a
>>> major compaction to run, we are unable to encrypt the previous data.
>>> Please
>>> suggest some ways to rectify this problem.
>>>
>>> Regards,
>>> Karthick R
>>>
>>
>>
>

Re: [ANNOUNCE] New Apache HBase committer Ashu Pachauri

2017-06-18 Thread ramkrishna vasudevan

Congratulations !!!

On Mon, Jun 19, 2017 at 9:11 AM, Jingcheng Du  wrote:

> Congratulations!
>
> 2017-06-19 11:26 GMT+08:00 宾莉金（binlijin） :
>
> > Congratulations, Ashu !
> >
> > 2017-06-17 7:27 GMT+08:00 Gary Helmling :
> >
> > > On behalf of the Apache HBase PMC, I am pleased to announce that Ashu
> > > Pachauri has accepted the PMC's invitation to become a committer on the
> > > project.  We appreciate all of Ashu's generous contributions thus far
> and
> > > look forward to his continued involvement.
> > >
> > > Congratulations and welcome, Ashu!
> > >
> >
> >
> >
> > --
> > *Best Regards,*
> >  lijin bin
> >
>

Re: [ANNOUNCE] New HBase committer Allan Yang

2017-06-18 Thread ramkrishna vasudevan

Congratulations !!!

On Mon, Jun 19, 2017 at 8:57 AM, 宾莉金（binlijin）  wrote:

> Congratulations, Allan !
>
> 2017-06-09 11:49 GMT+08:00 Yu Li :
>
> > On behalf of the Apache HBase PMC, I am pleased to announce that Allan
> Yang
> > has accepted the PMC's invitation to become a committer on the
> > project. We appreciate all of Allan's generous contributions thus far and
> > look forward to his continued involvement.
> >
> > Congratulations and welcome, Allan!
> >
>
>
>
> --
> *Best Regards,*
>  lijin bin
>

Re: HBase API for Scala

2017-06-15 Thread ramkrishna vasudevan

Thanks Vladimir. Thanks for the information and the link.

Scala developers may find this useful or may act as useful reference for
them.

Regards
Ram

On Fri, Jun 16, 2017 at 4:05 AM, Vladimir Glushak 
wrote:

> Good day hbase community,
>
> I've recently had small Scala project that needed to be integrated with
> HBase (simple crud operations with some scan requirements).
> Unfortunately, I did not find any decent scala api to work with hbase.
>
> So, I've build small library on top of hbase java client.
> It has tidy API and removes most of boilerplate code.
>
> Maybe it can be useful for community.
> Take a look: GitHub.com/hbase4s
> Otherwise sorry for the spam.
>
> Thanks.
>

Re: ANNOUNCE: Yu Li joins the Apache HBase PMC

2017-04-16 Thread ramkrishna vasudevan

Congratulations!!

On Mon, Apr 17, 2017 at 7:05 AM, Guanghao Zhang  wrote:

> Congratulations!
>
> 2017-04-16 21:36 GMT+08:00 Yu Li :
>
> > Thanks all! My honor and will do my best.
> >
> > Best Regards,
> > Yu
> >
> > On 16 April 2017 at 14:20, 张铎(Duo Zhang)  wrote:
> >
> > > Congratulations!
> > >
> > > 2017-04-16 11:24 GMT+08:00 Mikhail Antonov :
> > >
> > >> Congratulations Yu!
> > >>
> > >> -Mikhail
> > >>
> > >> On Sat, Apr 15, 2017 at 12:44 PM, Nick Dimiduk 
> > >> wrote:
> > >>
> > >> > Congratulations Yu and thanks a lot! Keep up the good work!
> > >> >
> > >> > On Fri, Apr 14, 2017 at 7:22 AM Anoop John 
> > >> wrote:
> > >> >
> > >> > > On behalf of the Apache HBase PMC I"m pleased to announce that Yu
> Li
> > >> > > has accepted our invitation to become a PMC member on the Apache
> > HBase
> > >> > > project. He has been an active contributor to HBase for past many
> > >> > > years. Looking forward for
> > >> > > many more contributions from him.
> > >> > >
> > >> > > Welcome to the PMC, Yu Li...
> > >> > >
> > >> > >
> > >> > > -Anoop-
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks,
> > >> Michael Antonov
> > >>
> > >
> > >
> >
>

[ANNOUNCE] - Welcome our new HBase committer Anastasia Braginsky

2017-03-27 Thread ramkrishna vasudevan

Hi All

Welcome Anastasia Braginsky, one more female committer to HBase. She has
been active now for a while with her Compacting memstore feature and she
along with Eshcar have done lot of talks in various meetups and HBaseCon on
their feature.

Welcome onboard and looking forward to work with you Anastasia !!!

Regards
Ram

Re: Optimizations for a Read-only database

2017-03-17 Thread ramkrishna vasudevan

Hi

Default memstore is 0.42 of the heap size. You can just make it 0.1 and
rest give to block cache. It will ensure that you have maximum data in the
block cache.  It is better for reads. In case of latest trunk you can even
use bucket cache in offheap mode but as it is not available as a release
you may not be able to make use of it.

How many hfiles do you expect to be loaded. If it is not too many you can
disable the major compaction.

BTW which version are you using?

Regards
Ram

On Fri, Mar 17, 2017 at 11:02 PM, jeff saremi 
wrote:

> We're creating a readonly database and would like to know the recommended
> optimizations we could do. We'd be loading data via direct write to HFiles.
>
> One thing i could immediately think of is to eliminate the memory for
> Memstore. What is the minimum that we could get away with?
>
> How about disabling some regular operations to save CPU time. I think
> Compaction is one of those we'd like to stop.
>
> thanks
>
> Jeff
>

Re: Parallel Scanner

2017-02-20 Thread ramkrishna vasudevan

You are trying to scan one region itself in parallel, then even I got you
wrong. Richard's suggestion is the right choice for client only soln.

On Mon, Feb 20, 2017 at 7:40 PM, Anil  wrote:

> Thanks Richard :)
>
> On 20 February 2017 at 18:56, Richard Startin 
> wrote:
>
> > RegionLocator is not deprecated, hence the suggestion to use it if it's
> > available in place of whatever is still available on HTable for your
> > version of HBase - it will make upgrades easier. For instance
> > HTable::getRegionsInRange no longer exists on the current master branch.
> >
> >
> > "I am trying to scan a region in parallel :)"
> >
> >
> > I thought you were asking about scanning many regions at the same time,
> > not scanning a single region in parallel? HBASE-1935 is about
> parallelising
> > scans over regions, not within regions.
> >
> >
> > If you want to parallelise within a region, you could write a little
> > method to split the first and last key of the region into several
> disjoint
> > lexicographic buckets and create a scan for each bucket, then execute
> those
> > scans in parallel. Your data probably doesn't distribute uniformly over
> > lexicographic buckets though so the scans are unlikely to execute at a
> > constant rate and you'll get results in time proportional to the
> > lexicographic bucket with the highest cardinality in the region. I'd be
> > interested to know if anyone on the list has ever tried this and what the
> > results were?
> >
> >
> > Using the much simpler approach of parallelising over regions by creating
> > multiple disjoint scans client side, as suggested, your performance now
> > depends on your regions which you have some control over. You can achieve
> > the same effect by pre-splitting your table such that you empirically
> > optimise read performance for the dataset you store.
> >
> >
> > Thanks,
> >
> > Richard
> >
> >
> > 
> > From: Anil 
> > Sent: 20 February 2017 12:35
> > To: user@hbase.apache.org
> > Subject: Re: Parallel Scanner
> >
> > Thanks Richard.
> >
> > I am able to get the regions for data to be loaded from table. I am
> trying
> > to scan a region in parallel :)
> >
> > Thanks
> >
> > On 20 February 2017 at 16:44, Richard Startin <
> richardstar...@outlook.com>
> > wrote:
> >
> > > For a client only solution, have you looked at the RegionLocator
> > > interface? It gives you a list of pairs of byte[] (the start and stop
> > keys
> > > for each region). You can easily use a ForkJoinPool recursive task or
> > java
> > > 8 parallel stream over that list. I implemented a spark RDD to do that
> > and
> > > wrote about it with code samples here:
> > >
> > > https://richardstartin.com/2016/11/07/co-locating-spark-
> >
> > > partitions-with-hbase-regions/
> > >
> > > Forget about the spark details in the post (and forget that Hortonworks
> > > have a library to do the same thing :)) the idea of creating one scan
> per
> > > region and setting scan starts and stops from the region locator would
> > give
> > > you a parallel scan. Note you can also group the scans by region
> server.
> > >
> > > Cheers,
> > > Richard
> > > On 20 Feb 2017, at 07:33, Anil mailto:ani
> > > lk...@gmail.com>> wrote:
> > >
> > > Thanks Ram. I will look into EndPoints.
> > >
> > > On 20 February 2017 at 12:29, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com<mailto:ramkrishna.s.
> vasude...@gmail.com
> > >>
> > > wrote:
> > >
> > > Yes. There is way.
> > >
> > > Have you seen Endpoints? Endpoints are triggers like points that allows
> > > your client to trigger them parallely in one ore more regions using the
> > > start and end key of the region. This executes parallely and then you
> may
> > > have to sort out the results as per your need.
> > >
> > > But these endpoints have to running on your region servers and it is
> not
> > a
> > > client only soln.
> > > https://blogs.apache.org/hbase/entry/coprocessor_introduction.
> > [https://blogs.apache.org/hbase/mediaresource/60b135e5-
> > 04c6-4197-b262-e7cd08de784b]<https://blogs.apache.org/hbase/
> > entry/coprocessor_introduction>
> >
> > Coprocessor Introduction : Apache HBase<https://blo

Re: Parallel Scanner

2017-02-19 Thread ramkrishna vasudevan

Yes. There is way.

Have you seen Endpoints? Endpoints are triggers like points that allows
your client to trigger them parallely in one ore more regions using the
start and end key of the region. This executes parallely and then you may
have to sort out the results as per your need.

But these endpoints have to running on your region servers and it is not a
client only soln.
https://blogs.apache.org/hbase/entry/coprocessor_introduction.

Be careful when you use them. Since these endpoints run on server ensure
that these are not heavy or things that consume more memory which can have
adverse effects on the server.


Regards
Ram

On Mon, Feb 20, 2017 at 12:18 PM, Anil  wrote:

> Thanks Ram.
>
> So, you mean that there is no harm in using  HTable#getRegionsInRange in
> the application code.
>
> HTable#getRegionsInRange returned single entry for all my region start key
> and end key. i need to explore more on this.
>
> "If you know the table region's start and end keys you could create
> parallel scans in your application code."  - is there any way to scan a
> region in the application code other than the one i put in the original
> email ?
>
> "One thing to watch out is that if there is a split in the region then
> this start
> and end row may change so in that case it is better you try to get
> the regions every time before you issue a scan"
>  - Agree. i am dynamically determining the region start key and end key
> before initiating scan operations for every initial load.
>
> Thanks.
>
>
>
>
> On 20 February 2017 at 10:59, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Hi Anil,
> >
> > HBase directly does not provide parallel scans. If you know the table
> > region's start and end keys you could create parallel scans in your
> > application code.
> >
> > In the above code snippet, the intent is right - you get the required
> > regions and can issue parallel scans from your app.
> >
> > One thing to watch out is that if there is a split in the region then
> this
> > start and end row may change so in that case it is better you try to get
> > the regions every time before you issue a scan. Does that make sense to
> > you?
> >
> > Regards
> > Ram
> >
> > On Sat, Feb 18, 2017 at 1:44 PM, Anil  wrote:
> >
> > > Hi ,
> > >
> > > I am building an usecase where i have to load the hbase data into
> > In-memory
> > > database (IMDB). I am scanning the each region and loading data into
> > IMDB.
> > >
> > > i am looking at parallel scanner ( https://issues.apache.org/
> > > jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and
> HTable#
> > > getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> > > deprecated, HBASE-1935 is still open.
> > >
> > > I see Connection from ConnectionFactory is HConnectionImplementation by
> > > default and creates HTable instance.
> > >
> > > Do you see any issues in using HTable from Table instance ?
> > > for each region {
> > > int i = 0;
> > > List regions =
> > > hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(),
> true);
> > >
> > > for (HRegionLocation region : regions){
> > > startRow = i == 0 ? scans.getStartRow() :
> > > region.getRegionInfo().getStartKey();
> > > i++;
> > > endRow = i == regions.size()? scans.getStopRow() :
> > > region.getRegionInfo().getEndKey();
> > >  }
> > >}
> > >
> > > are there any alternatives to achieve parallel scan? Thanks.
> > >
> > > Thanks
> > >
> >
>

Re: Parallel Scanner

2017-02-19 Thread ramkrishna vasudevan

Hi Anil,

HBase directly does not provide parallel scans. If you know the table
region's start and end keys you could create parallel scans in your
application code.

In the above code snippet, the intent is right - you get the required
regions and can issue parallel scans from your app.

One thing to watch out is that if there is a split in the region then this
start and end row may change so in that case it is better you try to get
the regions every time before you issue a scan. Does that make sense to you?

Regards
Ram

On Sat, Feb 18, 2017 at 1:44 PM, Anil  wrote:

> Hi ,
>
> I am building an usecase where i have to load the hbase data into In-memory
> database (IMDB). I am scanning the each region and loading data into IMDB.
>
> i am looking at parallel scanner ( https://issues.apache.org/
> jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and HTable#
> getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> deprecated, HBASE-1935 is still open.
>
> I see Connection from ConnectionFactory is HConnectionImplementation by
> default and creates HTable instance.
>
> Do you see any issues in using HTable from Table instance ?
> for each region {
> int i = 0;
> List regions =
> hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(), true);
>
> for (HRegionLocation region : regions){
> startRow = i == 0 ? scans.getStartRow() :
> region.getRegionInfo().getStartKey();
> i++;
> endRow = i == regions.size()? scans.getStopRow() :
> region.getRegionInfo().getEndKey();
>  }
>}
>
> are there any alternatives to achieve parallel scan? Thanks.
>
> Thanks
>

Re: [ANNOUNCE] New HBase committer Guanghao Zhang

2016-12-20 Thread ramkrishna vasudevan

Congratulations and welcome Guanghao.

Regards
Ram

On Tue, Dec 20, 2016 at 4:07 PM, Guanghao Zhang  wrote:

> Thanks all. Looking forward to work with you guys and keep contributing for
> HBase. Thanks.
>
> 2016-12-20 16:48 GMT+08:00 Yu Li :
>
> > Congratulations and welcome Guanghao!
> >
> > Best Regards,
> > Yu
> >
> > On 20 December 2016 at 12:59, 宾莉金 or binlijin 
> wrote:
> >
> > > Congratulations and welcome!
> > >
> > > 2016-12-20 12:54 GMT+08:00 Nick Dimiduk :
> > >
> > > > Congratulations Guanghao and thank you for all your contributions!
> > > >
> > > > On Mon, Dec 19, 2016 at 5:37 PM Duo Zhang 
> wrote:
> > > >
> > > > > On behalf of the Apache HBase PMC, I am pleased to announce that
> > > Guanghao
> > > > >
> > > > > Zhang has accepted the PMC's invitation to become a committer on
> the
> > > > >
> > > > > project. We appreciate all of Guanghao's generous contributions
> thus
> > > far
> > > > >
> > > > > and look forward to his continued involvement.
> > > > >
> > > > >
> > > > >
> > > > > Congratulations and welcome, Guanghao!
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Best Regards,*
> > >  lijin bin
> > >
> >
>

Re: [ANNOUNCE] New HBase Committer Josh Elser

2016-12-12 Thread ramkrishna vasudevan

Congratulations Josh !!!

Regards
Ram

On Tue, Dec 13, 2016 at 4:18 AM, Enis Söztutar  wrote:

> Congrats Josh!
>
> Enis
>
> On Mon, Dec 12, 2016 at 11:39 AM, Esteban Gutierrez 
> wrote:
>
> > Congrats and welcome, Josh!
> >
> > esteban.
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Sun, Dec 11, 2016 at 10:17 PM, Yu Li  wrote:
> >
> > > Congratulations and welcome!
> > >
> > > Best Regards,
> > > Yu
> > >
> > > On 12 December 2016 at 12:47, Mikhail Antonov 
> > > wrote:
> > >
> > > > Congratulations Josh!
> > > >
> > > > -Mikhail
> > > >
> > > > On Sun, Dec 11, 2016 at 5:20 PM, 张铎  wrote:
> > > >
> > > > > Congratulations!
> > > > >
> > > > > 2016-12-12 9:03 GMT+08:00 Jerry He :
> > > > >
> > > > > > Congratulations , Josh!
> > > > > >
> > > > > > Good work on the PQS too.
> > > > > >
> > > > > > Jerry
> > > > > >
> > > > > > On Sun, Dec 11, 2016 at 12:14 PM, Josh Elser 
> > > > wrote:
> > > > > >
> > > > > > > Thanks, all. I'm looking forward to continuing to work with you
> > > all!
> > > > > > >
> > > > > > >
> > > > > > > Nick Dimiduk wrote:
> > > > > > >
> > > > > > >> On behalf of the Apache HBase PMC, I am pleased to announce
> that
> > > > Josh
> > > > > > >> Elser
> > > > > > >> has accepted the PMC's invitation to become a committer on the
> > > > > project.
> > > > > > We
> > > > > > >> appreciate all of Josh's generous contributions thus far and
> > look
> > > > > > forward
> > > > > > >> to his continued involvement.
> > > > > > >>
> > > > > > >> Allow me to be the first to congratulate and welcome Josh into
> > his
> > > > new
> > > > > > >> role!
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Michael Antonov
> > > >
> > >
> >
>

Re: Talks from hbaseconeast2016 have been posted

2016-12-09 Thread ramkrishna vasudevan

Thanks for the slides.

Regards
Ram

On Fri, Dec 9, 2016 at 5:49 AM, Stack  wrote:

> The slides were just pushed up on slideshare:
> http://www.slideshare.net/search/slideshow?searchfrom=
> header&q=hbaseconeast2016
>
> St.Ack
> P.S. Thanks Carter
>
> On Thu, Oct 27, 2016 at 9:56 AM, Stack  wrote:
>
> > Some good stuff in here:
> >
> > + BigTable Lead on some interesting tricks done to make the service more
> > robust
> > + Compare of SQL tiers by the boys from Splice
> > + Mikhail (looking like a gangster!) on the dead-but-not-dead
> RegionServer
> >
> > Then there's our JMS on HBase+Spark,  Joep and Sangjin on HBase as store
> > for Yarn Timeline Service v2 Lots of meaty material.
> >
> > Yours,
> > The Program Committee
> > P.S. Thanks Carter Page for assemblage.
> >
> >
> >
>

Re: Scan a region in parallel

2016-10-21 Thread ramkrishna vasudevan

Phoenix does support intelligent ways when you query using columns since it
is a SQL engine.

There the parallelism happens by using guideposts - those are fixed spaced
row keys stored in a seperate stats table. So when you do a query the
Phoenix internally spawns parallels scan queries using those guide posts
and thus making querying faster.

Regards
Ram

On Fri, Oct 21, 2016 at 1:26 PM, Anil  wrote:

> Thank you Ram.
>
> "So now  you are spawning those many scan threads equal to the number of
> regions " - YES
>
> There are two ways of scanning region in parallel
>
> 1. scan a region with start row and stop row in parallel with single scan
> operation on server side and hbase take care of parallelism internally.
> 2. transform a start row and stop row of a region into number of start and
> stop rows (by some criteria) and span scan query for each start and stop
> row.
>
> #1 is not supported (as you also said).
>
> i am looking for #2. i checked the phoenix documentation and code. it seems
> to me that phoenix is doing #2. i looked into phoenix code and could not
> understand it completely.
>
> The usecase is very simple. Hbase not good (at least in terms of
> performance for OLTP) query by all columns (other than row key) and sorting
> of all columns of a row. even phoenix too.
>
> So i am planning load the hbase/phoenix table into in-memory data base for
> faster access.
>
> scanning of big region sequentially will lead to larger load time. so
> finding ways to minimize the load time.
>
> Hope this helps.
>
> Thanks.
>
>
> On 21 October 2016 at 09:30, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Hi Anil
> >
> > So now  you are spawning those many scan threads equal to the number of
> > regions.
> > bq.Is there any way to scan a region in parallel ?
> > You mean with in a region you want to scan parallely? Which means that a
> > single query you want to split up into N number of small scans and read
> and
> > aggregate on the client side/server side?
> >
> > Currently you cannot do that. Once you set a start and stoprow the scan
> > will determine which region it belongs to and retrieves the data
> > sequentially in that region (it applies the filtering that you do during
> > the course of the scan).
> >
> > Have you tried Apache Phoenix?  Its a SQL wrapper over HBase and there
> you
> > could do parallel scans for a given SQL query if there are some guide
> posts
> > collected. Such things cannot be an integral part of HBase. But I fear
> as I
> > am not aware of your usecase we cannot suggest on this.
> >
> > REgards
> > Ram
> >
> >
> > On Fri, Oct 21, 2016 at 8:40 AM, Anil  wrote:
> >
> > > Any pointers ?
> > >
> > > On 20 October 2016 at 18:15, Anil  wrote:
> > >
> > > > HI,
> > > >
> > > > I am loading hbase table into an in-memory db to support filter,
> > ordering
> > > > and pagination.
> > > >
> > > > I am scanning region and inserting data into in-memory db. each
> region
> > > > scan is done in single thread so each region is scanned in parallel.
> > > >
> > > > Is there any way to scan a region in parallel ? any pointers would be
> > > > helpful.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Re: Scan a region in parallel

2016-10-20 Thread ramkrishna vasudevan

Hi Anil

So now  you are spawning those many scan threads equal to the number of
regions.
bq.Is there any way to scan a region in parallel ?
You mean with in a region you want to scan parallely? Which means that a
single query you want to split up into N number of small scans and read and
aggregate on the client side/server side?

Currently you cannot do that. Once you set a start and stoprow the scan
will determine which region it belongs to and retrieves the data
sequentially in that region (it applies the filtering that you do during
the course of the scan).

Have you tried Apache Phoenix?  Its a SQL wrapper over HBase and there you
could do parallel scans for a given SQL query if there are some guide posts
collected. Such things cannot be an integral part of HBase. But I fear as I
am not aware of your usecase we cannot suggest on this.

REgards
Ram

On Fri, Oct 21, 2016 at 8:40 AM, Anil  wrote:

> Any pointers ?
>
> On 20 October 2016 at 18:15, Anil  wrote:
>
> > HI,
> >
> > I am loading hbase table into an in-memory db to support filter, ordering
> > and pagination.
> >
> > I am scanning region and inserting data into in-memory db. each region
> > scan is done in single thread so each region is scanned in parallel.
> >
> > Is there any way to scan a region in parallel ? any pointers would be
> > helpful.
> >
> > Thanks
> >
>

Re: [ANNOUNCE] Stephen Yuan Jiang joins Apache HBase PMC

2016-10-17 Thread ramkrishna vasudevan

Congrats Stephen!!

On Tue, Oct 18, 2016 at 2:37 AM, Stack  wrote:

> Wahoo!
>
> On Fri, Oct 14, 2016 at 11:27 AM, Enis Söztutar  wrote:
>
> > On behalf of the Apache HBase PMC, I am happy to announce that Stephen
> has
> > accepted our invitation to become a PMC member of the Apache HBase
> project.
> >
> > Stephen has been working on HBase for a couple of years, and is already a
> > committer for more than a year. Apart from his contributions in proc v2,
> > hbck and other areas, he is also helping for the 2.0 release which is the
> > most important milestone for the project this year.
> >
> > Welcome to the PMC Stephen,
> > Enis
> >
>

Re: How to get Last 1000 records from 1 millions records

2016-08-25 Thread ramkrishna vasudevan

And reading thro the mail chain as Ted suggested if you setReversedScan as
True and reverse your stop and start Row you can just do a count in your
Row filter filter till 10k is reached and then just skip all the other
results.

In the other logic that I had said you may have to do a sort before
returning the collected result. In the reverse scan case too if you need
the result in lexographical order you may need to sort it in the client
side.

Regards
Ram

On Thu, Aug 25, 2016 at 3:11 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Hi Manjeet
>
> For your first question regarding fetching last 1000 records
>
> First in your scan you set your start Row with the bytes corresponding to (
> A_98_)
> and let the end byte be the byte representation of  A_98 + 1 . I
> mean add +1 to the last byte of what comes out of  (A_98_). So
> this will ensure you scan only the rows corresponding to  (A_98_).
>
> Just thinking the first thing that I can see is that it may be easier to
> do this with CPs than Filters. Because filters deals with per cell or that
> row. Adding the results and maintaing the last 10k records may be
> difficult. I have to see in detail if possible.
>
> Do you know the number of columns you have?  If there are multiple columns
> then it is quite tricky. But if you have only one column per row then or
> you want only the row keys
>
> You can implement an User Coprocessor and in that you can implement
> preStoreScannerOpen(). Take for eg.  you have only one family so in that
> case in you preStoreScannerOpen you will create your own StoreScanner and
> in the StoreScanner.next() you can
> just skip all KeyValues and during that process keep collecting your
> cells. Ensure you keep collecting the cells row wise by adding to a list.
> You will have to have only the latest 1 cells in the list any time.
>
> Every time keep checking if the row has reached the stopRow that is set in
> the scan (so may be it moves to A_981112_).
> Once you see this condition you may have to replace the list given by the
> StoreScanner.next() call with the list that you have collected and send it
> to the client.
> I have not yet tried it but it can give you an idea with CPs.
>
> With filters am not sure as I said as I need to read the flow and see if
> there are any such APIs to mimic the above.
>
> PS. Don't take this as a working algo. There may be reasons why it may not
> work but you can see and read about CPs to see if something like above can
> work out.
>
> Regards
> Ram
>
>
>
>
> On Thu, Aug 25, 2016 at 2:16 PM, Manjeet Singh  > wrote:
>
>> Hi All
>>
>> I have one another question for same case
>>
>> below is my sample Hbase data  as we all know that hbase store data on the
>> basis of rowkey (sorted)
>> below is IP as we can see 2.168.129.81_1 is in last what I am expecting it
>> shuld come just after 1.168.129.81_2
>>
>>
>>
>>  1.168.129.81_0
>>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104396288,
>> value=4
>>  1.168.129.81_1
>>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104396288,
>> value=1
>>  1.168.129.81_1
>>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104396288,
>> value=2
>>  1.168.129.81_2
>>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104396288,
>> value=0
>>  192.168.129.81_1
>>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104386671,
>> value=2
>>  192.168.129.81_1
>>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104386671,
>> value=4
>>  192.168.129.81_2
>>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104386671,
>> value=1
>>  192.168.129.81_3
>>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104386671,
>> value=0
>>  192.168.129.81_3
>>  column=c2:D_com.stackoverflow/questions/3, timestamp=1472104386671,
>> value=3
>>  2.168.129.81_1
>>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104404609,
>> value=0
>>  2.168.129.81_1
>>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104404609,
>> value=1
>>  2.168.129.81_1
>>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104404609,
>> value=2
>>  2.168.129.81_3
>>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104404609,
>> value=4
>>
>>
>>
>> On Thu, Aug 25, 2016 at 12:36 PM, Manjeet Singh <
>> manjeet.chand...@gmail.com>
>> wrote:
>>
>> > I am using some logical salt say I have mobile number in my row key so I
>> > am using some

Re: How to get Last 1000 records from 1 millions records

2016-08-25 Thread ramkrishna vasudevan

Hi Manjeet

For your first question regarding fetching last 1000 records

First in your scan you set your start Row with the bytes corresponding to (
A_98_)
and let the end byte be the byte representation of  A_98 + 1 . I
mean add +1 to the last byte of what comes out of  (A_98_). So this
will ensure you scan only the rows corresponding to  (A_98_).

Just thinking the first thing that I can see is that it may be easier to do
this with CPs than Filters. Because filters deals with per cell or that
row. Adding the results and maintaing the last 10k records may be
difficult. I have to see in detail if possible.

Do you know the number of columns you have?  If there are multiple columns
then it is quite tricky. But if you have only one column per row then or
you want only the row keys

You can implement an User Coprocessor and in that you can implement
preStoreScannerOpen(). Take for eg.  you have only one family so in that
case in you preStoreScannerOpen you will create your own StoreScanner and
in the StoreScanner.next() you can
just skip all KeyValues and during that process keep collecting your cells.
Ensure you keep collecting the cells row wise by adding to a list. You will
have to have only the latest 1 cells in the list any time.

Every time keep checking if the row has reached the stopRow that is set in
the scan (so may be it moves to A_981112_).
Once you see this condition you may have to replace the list given by the
StoreScanner.next() call with the list that you have collected and send it
to the client.
I have not yet tried it but it can give you an idea with CPs.

With filters am not sure as I said as I need to read the flow and see if
there are any such APIs to mimic the above.

PS. Don't take this as a working algo. There may be reasons why it may not
work but you can see and read about CPs to see if something like above can
work out.

Regards
Ram

On Thu, Aug 25, 2016 at 2:16 PM, Manjeet Singh 
wrote:

> Hi All
>
> I have one another question for same case
>
> below is my sample Hbase data  as we all know that hbase store data on the
> basis of rowkey (sorted)
> below is IP as we can see 2.168.129.81_1 is in last what I am expecting it
> shuld come just after 1.168.129.81_2
>
>
>
>  1.168.129.81_0
>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104396288,
> value=4
>  1.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104396288,
> value=1
>  1.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104396288,
> value=2
>  1.168.129.81_2
>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104396288,
> value=0
>  192.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104386671,
> value=2
>  192.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104386671,
> value=4
>  192.168.129.81_2
>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104386671,
> value=1
>  192.168.129.81_3
>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104386671,
> value=0
>  192.168.129.81_3
>  column=c2:D_com.stackoverflow/questions/3, timestamp=1472104386671,
> value=3
>  2.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104404609,
> value=0
>  2.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104404609,
> value=1
>  2.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104404609,
> value=2
>  2.168.129.81_3
>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104404609,
> value=4
>
>
>
> On Thu, Aug 25, 2016 at 12:36 PM, Manjeet Singh <
> manjeet.chand...@gmail.com>
> wrote:
>
> > I am using some logical salt say I have mobile number in my row key so I
> > am using some algo and fitting this mobile number into some ASCII char
> > So each time I know what will be the salt so its clear to me and it will
> > never change the order
> > example
> > if based on my algo I get A for 98
> > so each time it will always return me A for 98
> > so if I have my row key Like
> > A_98_101
> > A_98_102
> > A_98_103
> > A_98_104
> > A_98_105
> > A_98_106
> > A_98_107
> > A_98_108
> >
> > it will sort my row key in same manner as showing above now these are
> > millions of record now i want to get last 1 records
> > is their any way to get it, my concern is to perform all calcuation on
> > server side not client side.
> >
> >
> > Thanks
> > Manjeet
> >
> >
> > On Thu, Aug 25, 2016 at 1:06 AM, Esteban Gutierrez  >
> > wrote:
> >
> >> As long as new rows are added to the latest region that "might" work.
> But
> >> if the table is using hashed keys or rows are added randomly to the
> table
> >> then retrieving the last million will be trickier and you will have to
> >> scan
> >> based on timestamp (if not modified) and then filter one more time.
> >>
> >> esteban.
> >>
> >>
> >> --
> >> Cloudera, Inc.
> >>
> >>
> >> On

Re: Hbase regionserver.MultiVersionConcurrencyControl Warning

2016-08-12 Thread ramkrishna vasudevan

>>We saw this as well at Splice Machine.  This led us to run compactions in
Spark.
It will be great to see how this is done. One thing moving HBase's internal
to an external  proces is something out of the box. But eager to see. Looks
interesting.

Regards
Ram

On Thu, Aug 11, 2016 at 8:43 PM, John Leach 
wrote:

> We saw this as well at Splice Machine.  This led us to run compactions in
> Spark.  Once we did this, we saw the compaction effects go away almost
> entirely.
>
> Here is a link to our code.
>
> https://github.com/splicemachine/spliceengine/blob/
> 73640a81972ef5831c1ea834ac9ac22f5b3428db/hbase_sql/src/main/
> java/com/splicemachine/olap/CompactionJob.java  splicemachine/spliceengine/blob/73640a81972ef5831c1ea834ac9ac2
> 2f5b3428db/hbase_sql/src/main/java/com/splicemachine/olap/
> CompactionJob.java>
>
> We have a todo to get this back in the community.
>
> Regards,
> John Leach
>
> > On Aug 11, 2016, at 8:03 AM, Sterfield  wrote:
> >
> > And it's gone [1]. No more spikes in the writes / read, no more OpenTSDB
> > error. So I think it's safe to assume that OpenTSDB compaction is
> > generating some additional load that is not very well handled by the
> HBase,
> > and therefore, generating the issues I'm mentioning.
> >
> > It seems also that the MVCC error are gone (to be checked).
> >
> > I don't know how to manage Hbase in order to make it possible to handle
> > compaction without any issues, but at least, I know where it comes from
> >
> > [1] :
> > https://www.dropbox.com/s/d6l2lngr6mpizh9/Without%
> 20OpenTSDB%20compaction.png?dl=0
> >
> > 2016-08-11 13:18 GMT+02:00 Sterfield :
> >
> >> Hello,
> >>
> >>
>  Hi,
> 
>  Thanks for your answer.
> 
>  I'm currently testing OpenTSDB + HBase, so I'm generating thousands of
> >>> HTTP
>  POST on OpenTSDB in order to write data points (currently up to
> 300k/s).
>  OpenTSDB is only doing increment / append (AFAIK)
> 
>  How many nodes or is that 300k/s on a single machine?
> >>
> >>
> >> 1 master node, 4 slaves, colo HDFS + RS.
> >>
> >> Master : m4.2xlarge (8CPU, 32GB RAM)
> >> Slave : d2.2xlarge (8CPU, 61GB RAM, 6x2T disk)
> >>
> >>
>  If I have understood your answer correctly, some write ops are queued,
> >>> and
>  some younger ops in the queue are "done" while some older are not.
> 
> 
> >>> What Anoop said plus, we'll see the STUCK notice when it is taking a
> long
> >>> time for the MVCC read point to come up to the write point of the
> >>> currently
> >>> ongoing transaction. We will hold the updating thread until the
> readpoint
> >>> is equal or greater than the current transactions write point. We do
> this
> >>> to ensure a client can read its own writes. The MVCC is region wide. If
> >>> many ongoing updates, a slightly slower one may drag down other
> >>> outstanding
> >>> transactions completing. The STUCK message goes away after some time?
> It
> >>> happens frequently? A thread dump while this is going on would be
> >>> interesting if possible or what else is going on on the server around
> this
> >>> time (see in logs?)
> >>
> >>
> >> Yes, the STUCK message happens for quite some time (a dozen of minutes,
> >> each hour.). It happens every hour.
> >>
> >>> Few additional questions :
> 
>    - Is it a problem regarding the data or is it "safe" ? In other
> >>> words,
>    the old data not been written yet will be dropped or they will be
>  written
>    correctly, just later ?
> 
> >>> No data is going to be dropped. The STUCK message is just flagging you
> >>> that
> >>> a write is taking a while to complete while we wait on MVCC. You
> backed up
> >>> on disk or another resource or a bunch of writers have all happened to
> >>> arrive at one particular region (MVCC is by region)?
> >>
> >>
> >> I've pre-splitted my "tsdb" region, to be managed by all 4 servers, so I
> >> think I'm ok on this side. All information are stored locally on the EC2
> >> disks.
> >>
> >>
>    - How can I debug this and if possible, fix it ?
> 
> 
> >>> See above. Your writes are well distributed. Disks are healthy?
> >>
> >>
> >>
> >> So here's the result of my investigation + my assumptions :
> >>
> >>
> >>   - Every hour, my RS have a peak of Load / CPU. I was looking at a RS
> >>   when it happened (that's easy, it's at the beginning of each hour),
> and the
> >>   RS java process was taking all the CPU available on the machine,
> hence the
> >>   load. You can see the load of all my servers on those images, see [1]
> and
> >>   [2].
> >>   - Disk are fine IMO. Write IO is OK on average, peak to 300 / 400
> >>   IOPS, in range of a correct mechanical drive. I don't see a
> particular IOPS
> >>   load at that time
> >>   - However, you can see that every hour (see [3]) :
> >>  - Calls are queued
> >>  - Write are impacted (from 160k/s, down to 150 - 140k/s)
> >>  - Read RPC are increased (I suppose that the RS is not answering,
> >>

Re: Cannot create table with DateTiered compaction

2016-08-10 Thread ramkrishna vasudevan

Then the choice that you have is to back port the feature and the related
changes to your branch. But not sure if that is feasible for you. Then you
have only one choice is to upgrade the hbase relase if you are in need of
it.

REgards
Ram

On Thu, Aug 11, 2016 at 8:49 AM, spats  wrote:

> Hi Ram,
>
> Unfortunately i don't have hbase version with DateTiered compaction, tried
> with version 1.0.0 but as expected doesn't work as changes are not in this
> version also, as per https://issues.apache.org/jira/browse/HBASE-15181
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/Cannot-create-table-with-DateTiered-
> compaction-tp4081690p4081714.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Cannot create table with DateTiered compaction

2016-08-10 Thread ramkrishna vasudevan

Hi Spats

This date tiered version is in 0.98.18 I believe and not in 0.98.6. Can you
try latest releases of 0.98 branch? If it does not work pls report back.

Regards
Ram

On Wed, Aug 10, 2016 at 1:56 PM, spats  wrote:

> Cannot create table with DateTiered compaction in hbase 0.98.6. Also tried
> alter table to change existing table to DateTiered Storage, this hangs
> indefinately.
>
> Create table command
> create 'XXX', { NAME => 'f1', COMPRESSION => 'SNAPPY', DATA_BLOCK_ENCODING
> => 'FAST_DIFF', REPLICATION_SCOPE => '1', CONFIGURATION =>
> {'hbase.hstore.engine.class' =>
> 'org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine'}}
>
> error:
> ERROR: 6 millis timeout while waiting for channel
>
> Does anyone know DateTiered  storage is supported in hbase 0.98.6? Or am i
> doing something wrong in create table?
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/Cannot-create-table-with-DateTiered-compaction-tp4081690.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: (无主题)

2016-06-22 Thread ramkrishna vasudevan

Hi WangYQ,

For code related suggestions if you feel there is an improvement or bug it
is preferrable to raise a JIRA and give a patch. Pls feel free to raise a
JIRA with your suggestion and why you plan to change it.

Regards
Ram

On Thu, Jun 23, 2016 at 9:36 AM, WangYQ  wrote:

>
> there is a class named "HDFSBlockDistribution",  use a tree map
> "hostAndWeight" to store data
> private Map hostAndWeight
>
> I think we can use
> private Map hostAndWeight
> to store data
>
>
> thanks

Re: Writing visibility labels with HFileOutputFormat2

2016-06-16 Thread ramkrishna vasudevan

>>so long as only the HBase user and the spark user can read/write to the
file, I'm not sure what the risk is?
I was saying more with respect to the sensitivity of the data that was
written.
Say there are following users
Admin
Manager
Worker1
Worker 2

and the following labels
CONFIDENTIAL, SECRET, PUBLIC, WORKER_1_INFO, WORKER_2_INFO
Now if the manager has associated Worker 1 with WORKER_1_INFO and Worker 2
with WORKER_2_INFO. Now when worker1 is trying to read his information he
should set WORKER_1_INFO in his scan.

So if there is a bulk load scenario where the entire file is getting read
so the user trying to do the bulk load in this example should not be
worker1 or worker 2. It should be either the Admin or Manager.

Now in your case spark user and hbase user are these Admin or Manager (as
in my eg) then it is perfectly fine.

>>am I able to read the HFile manually to determine if Tags have been
written properly?
 HBASE-15707 is a case which was not allowing the tags to be written while
creating the file. You may be needing that fix when you are adding tags
directly. But in your case they are visibility tags which you are not
supposed to add directly except for using the setCellVisibility() way. But
it is better to have that fix in your branch also.

>>"hbase.security.visibility.mutations.checkauths" - for now the method of
set_auths 'client','system' along with only giving 'client' read on
'hbase:labels' is working for me.

Fine. I have some doubts on here with respect to how SYSTEM tags are
implemented. Will get back on this.

Regards
Ram

On Thu, Jun 16, 2016 at 9:11 PM, Ellis, Tom (Financial Markets IT) <
tom.el...@lloydsbanking.com.invalid> wrote:

> Hi Again Ram,
>
> "hbase.security.visibility.mutations.checkauths" - for now the method of
> set_auths 'client','system' along with only giving 'client' read on
> 'hbase:labels' is working for me.
>
> "Coming to reading the HFile and creating a bulk load, I think we should
> be more cautious here " - I don't follow again sorry. The spark user writes
> the HFile, and then initiates the load with
> LoadIncrementalHFiles.doBulkLoad - so long as only the HBase user and the
> spark user can read/write to the file, I'm not sure what the risk is?
>
> HBASE-15707 - am I able to read the HFile manually to determine if Tags
> have been written properly?
>
> Cheers,
>
> Tom
>
>
> -Original Message-
> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> Sent: 16 June 2016 06:01
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> Thanks for the updates here. Going through the mails here
> >> Why is it that a client user without admin/super user privileges can
> >> set
> a visibility expression using Put.setCellVisibility, but if we want to
> write using HFiles,
>
> I get your point now. There is a property
> '"hbase.security.visibility.mutations.checkauths" if set will check if the
> user is authorized to mutate the visibility labels that he is trying to
> write. If the user is not allowed to add that label the mutation will fail.
> Can you see if this solves the other problem of allowing any client user
> to write? If the above is not well documented pls feel free to raise a JIRA
> and we are happy to address it.
>
> Coming to reading the HFile and creating a bulk load, I think we should be
> more cautious here. There are some critical info stored in the HFile and
> just allowing any user to read it is going to be risky.
>
> Coming to the PutSortReducer problem,  I think what you say is true. Not
> sure if there is a bug already, if not pls feel free to raise a bug here.
> We need to fix it.
>
>  HBASE-15707 - you may need this because for scala's HBasecontext you need
> to ensure tags are included just incase ImportTSV has to be used.
>
> Write back, if I had missed something or if my info was lacking. Its been
> quite sometime we had worked in this area so have to see code every time to
> know what was done.
>
> Regards
> Ram
>
> On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) <
> tom.el...@lloydsbanking.com.invalid> wrote:
>
> > So, I can see that I can correctly get the Lists from the
> > VisibilityExpressionResolver, set them on the Cell, and write them
> > using HFileOutputFormat2, however when I scan using an unprivileged
> > user I can still see the cells. If I write the cells with
> > setCellVisibility the unprivileged user can't see them.
> >
> > Then I noticed the fix for HBAS

Re: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread ramkrishna vasudevan

ion using
> Put.setCellVisibility, but if we want to write using HFiles, the client
> user has to have admin/super user privileges so they can use
> VisibilityExpressionResolver to correctly create the tags on the Cell with
> correct ordinals?
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low carbon
> economy.
> Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: Ellis, Tom (Financial Markets IT) [mailto:
> tom.el...@lloydsbanking.com.INVALID]
> Sent: 15 June 2016 16:25
> To: user@hbase.apache.org
> Subject: RE: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> So I have a working prototype using just bulk puts on a table and using
> setCellVisibility as necessary. Now I'm trying to do it using HFile.
>
> Sorry Ram, I don't quite follow why the user doing the writing of the
> HFile has to be an admin/super user? Is that necessary to load HFiles?
>
> The use case is to hopefully have an application user (non admin)
> performing the writes to an hbase table via a bulk load of an hfile,
> setting visibility labels on individual cells as necessary. Then business
> users who has been given the auth to view that label can see those cells,
> and others not.
>
> I've seen that it's possible to do this with map reduce & setting the map
> output to be a Put (and thus could setCellVisibility on the puts), but I'm
> struggling to do this with Spark, as I keep getting the exception that I
> can't cast a Put to a Cell.
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low carbon
> economy.
> Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> Sent: 15 June 2016 12:31
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> >>We could I guess create multiple puts for cells in the same row with
> different labels and use the setCellVisibility on each individual
> put/cell, but will this create additional overhead?
> This can be done. If you want different cells in the same row to have
> different labels then it is better to create those many puts and
> setCellVisibility on each of them. What type of overhead you see here? In
> terms of the server processing them? If so there should not be much
> overhead here and also adding different cells to every column inturn means
> you need every cell to be treated differenly in terms of security. so
> should be fine IMHO.
>
> Without doing put.setCellvisibility() there is no other way I believe. One
> question regarding your use case Now in the mail you had told about the
> spark job where you will create a bulk loaded file. Now if that is to have
> all the visibility related information of all the cells then the user doing
> this job should be an admin or super user right Why is the case that a
> normal client user will read through all the visibility cells which may or
> may not be associated with that user?
>
> Thank you very much for testing and using this feature. LEt us know your
> feedback and if you find any gaps here. Happy to help.
>
> Regards
> Ram
>
>
> On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) <
> tom.el...@lloydsbanking.com.invalid> wrote:
>
> > Hmm, is there no other way to set labels on individual cells where we
> > don't have to give the client users system perms? For instance, client
> > users can set the cell visibility on the entire put without having
> > this (i.e. put.setCellVisibility("label")) and the
> > VisibilityController will check this.
> >
> > We could I guess create multiple puts for cells in the same row with
> > different labels and use the setCellVisibility on each individual
> > put/cell, but will this create additional overhead?
> >
> > Cheers,
> >
> > Tom Ellis
> > Consultant Developer – Excelian
> > Data Lake | Financial Markets IT
> > LLOYDS BANK COMMER

Re: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread ramkrishna vasudevan

>>We could I guess create multiple puts for cells in the same row with
different labels and use the setCellVisibility on each individual put/cell,
but will this create additional overhead?
This can be done. If you want different cells in the same row to have
different labels then it is better to create those many puts and
setCellVisibility on each of them. What type of overhead you see here? In
terms of the server processing them? If so there should not be much
overhead here and also adding different cells to every column inturn means
you need every cell to be treated differenly in terms of security. so
should be fine IMHO.

Without doing put.setCellvisibility() there is no other way I believe. One
question regarding your use case
Now in the mail you had told about the spark job where you will create a
bulk loaded file. Now if that is to have all the visibility related
information of all the cells then the user doing this job should be an
admin or super user right Why is the case that a normal client user will
read through all the visibility cells which may or may not be associated
with that user?

Thank you very much for testing and using this feature. LEt us know your
feedback and if you find any gaps here. Happy to help.

Regards
Ram


On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) <
tom.el...@lloydsbanking.com.invalid> wrote:

> Hmm, is there no other way to set labels on individual cells where we
> don't have to give the client users system perms? For instance, client
> users can set the cell visibility on the entire put without having this
> (i.e. put.setCellVisibility("label")) and the VisibilityController will
> check this.
>
> We could I guess create multiple puts for cells in the same row with
> different labels and use the setCellVisibility on each individual put/cell,
> but will this create additional overhead?
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low carbon
> economy.
> Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> Sent: 15 June 2016 11:24
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> The visibility expression resolver tries to scan the labels table and the
> user using the resolver should have the SYSTEM privileges. Since the
> information that is getting accessed is sensitive information.
>
> Suppose in your above case you have the client user added as a an admin
> then when you scan the label table you should be able to  scan it.
>
> Regards
> Ram
>
> On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
> tom.el...@lloydsbanking.com.invalid> wrote:
>
> > Yeah, thanks for this Ram. Although in my testing I have found that a
> > client user attempting to use the visibility expression resolver
> > doesn't seem to have the ability to scan the hbase:labels table for
> > the full list of labels and thus can't get the ordinals/tags to add to
> > the cell. Does the client user attempting to use the
> > VisibilityExpressionResolver have to have some special permissions?
> >
> > Scan of hbase:labels by client user:
> >
> > hbase(main):003:0> scan 'hbase:labels'
> > ROW COLUMN+CELL
> >  \x00\x00\x00\x01   column=f:\x00,
> > timestamp=1465216652662, value=system
> > 1 row(s) in 0.0650 seconds
> >
> > Scan of hbase:labels by hbase user:
> >
> > hbase(main):001:0> scan 'hbase:labels'
> > ROW COLUMN+CELL
> >  \x00\x00\x00\x01   column=f:\x00,
> > timestamp=1465216652662, value=system
> >  \x00\x00\x00\x02   column=f:\x00,
> > timestamp=1465216944935, value=protected
> >  \x00\x00\x00\x02   column=f:hbase,
> > timestamp=1465547138533, value=
> >  \x00\x00\x00\x02   column=f:tom,
> > timestamp=1465980236882, value=
> >  \x00\x00\x00\x03   column=f:\x00,
> > timestamp=1465500156667, value=testtesttest
> >  \x00\x00\x00\x03   column=f:@hadoop,
> > timestamp=1465980236967, value=
> >  \x00\x00\x00\x03   column=f:hadoop,
> &g

Re: Writing visibility labels with HFileOutputFormat2

2016-06-15 Thread ramkrishna vasudevan

The visibility expression resolver tries to scan the labels table and the
user using the resolver should have the SYSTEM privileges. Since the
information that is getting accessed is sensitive information.

Suppose in your above case you have the client user added as a an admin
then when you scan the label table you should be able to  scan it.

Regards
Ram

On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) <
tom.el...@lloydsbanking.com.invalid> wrote:

> Yeah, thanks for this Ram. Although in my testing I have found that a
> client user attempting to use the visibility expression resolver doesn't
> seem to have the ability to scan the hbase:labels table for the full list
> of labels and thus can't get the ordinals/tags to add to the cell. Does the
> client user attempting to use the VisibilityExpressionResolver have to have
> some special permissions?
>
> Scan of hbase:labels by client user:
>
> hbase(main):003:0> scan 'hbase:labels'
> ROW COLUMN+CELL
>  \x00\x00\x00\x01   column=f:\x00,
> timestamp=1465216652662, value=system
> 1 row(s) in 0.0650 seconds
>
> Scan of hbase:labels by hbase user:
>
> hbase(main):001:0> scan 'hbase:labels'
> ROW COLUMN+CELL
>  \x00\x00\x00\x01   column=f:\x00,
> timestamp=1465216652662, value=system
>  \x00\x00\x00\x02   column=f:\x00,
> timestamp=1465216944935, value=protected
>  \x00\x00\x00\x02   column=f:hbase,
> timestamp=1465547138533, value=
>  \x00\x00\x00\x02   column=f:tom,
> timestamp=1465980236882, value=
>  \x00\x00\x00\x03   column=f:\x00,
> timestamp=1465500156667, value=testtesttest
>  \x00\x00\x00\x03   column=f:@hadoop,
> timestamp=1465980236967, value=
>  \x00\x00\x00\x03   column=f:hadoop,
> timestamp=1465547304610, value=
>  \x00\x00\x00\x03   column=f:hive,
> timestamp=1465501322616, value=
>  \x00\x00\x00\x04   column=f:\x00,
> timestamp=1465570719901, value=confidential
>  \x00\x00\x00\x05   column=f:\x00,
> timestamp=1465835047835, value=branch
>  \x00\x00\x00\x05   column=f:hdfs,
> timestamp=1465980237060, value=
>  \x00\x00\x00\x06   column=f:\x00,
> timestamp=1465980447307, value=group
>  \x00\x00\x00\x06   column=f:hdfs,
> timestamp=1465980454130, value=
> 6 row(s) in 0.7370 seconds
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low carbon
> economy.
> Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads
>
> -Original Message-
> From: Anoop John [mailto:anoop.hb...@gmail.com]
> Sent: 08 June 2016 11:58
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> Thanks Ram.. Ya that seems the best way as CellCreator is public exposed
> class. May be we should explain abt this in hbase book under the Visibility
> labels area.  Good to know you have Visibility labels based usecase. Let us
> know in case of any trouble.  Thanks.
>
> -Anoop-
>
> On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
> > Hi
> >
> > It can be done. See the class CellCreator which is Public facing
> interface.
> > When you create your spark job to create the hadoop files that
> > produces the
> > HFileOutputformat2 data. While creating the KeyValues you can use the
> > CellCreator to create your KeyValues and use the
> > CellCreator.getVisibilityExpressionResolver() to map your String
> > Visibility tags with the system generated ordinals.
> >
> > For eg, you can see how TextSortReducer works.  I think this should
> > help you solve your problem. Let us know if you need further information.
> >
> > Regards
> > Ram
> >
> > On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) <
> > tom.el...@lloydsbanking.com.invalid> wrote:
> >
> >> Hi Ram,
> >>
> >> We're attempting to do it programmatically so:
> >>
> >> The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and
> &

Re: Writing visibility labels with HFileOutputFormat2

2016-06-08 Thread ramkrishna vasudevan

Hi

It can be done. See the class CellCreator which is Public facing interface.
When you create your spark job to create the hadoop files that produces the
HFileOutputformat2 data. While creating the KeyValues you can use the
CellCreator to create your KeyValues and use the
CellCreator.getVisibilityExpressionResolver() to map your String Visibility
tags with the system generated ordinals.

For eg, you can see how TextSortReducer works.  I think this should help
you solve your problem. Let us know if you need further information.

Regards
Ram

On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) <
tom.el...@lloydsbanking.com.invalid> wrote:

> Hi Ram,
>
> We're attempting to do it programmatically so:
>
> The HFile is created by a Spark job using saveAsNewAPIHadoopFile, and
> using ImmutableBytesWritable as the key (rowkey) with KeyValue as the
> value, and using the HFilOutputFormat2 format.
> This HFile is then loaded using HBase client's
> LoadIncrementalHFiles.doBulkLoad
>
> Is there a way to do this programmatically without using the ImportTsv
> tool? I was taking a look at VisibilityUtils.createVisibilityExpTags and
> maybe being able to just create the Tags myself that way (although it's
> obviously @InterfaceAudience.Private) but it seems to be able to use that
> I'd need to know Label ordinality client side..
>
> Thanks for your help,
>
> Tom
>
> -Original Message-
> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> Sent: 07 June 2016 11:19
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> Hi Ellis
>
> How is the HFileOutputFormat2 files created?  Are you using the ImportTsv
> tool?  If you are using the ImportTsv tool then yes there is a way to
> specify visibility tags while loading from the ImportTsv tool and those
> visibility tags are also bulk loaded as HFile.
>
> There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to
> indicate that the data will have Visibility Tags and the tool will
> automatically parse the specified field as Visibility Tag.
>
> In case you have access to the code you can see the test case
> TestImportTSVWithVisibilityLabels to get an initial idea of how it is being
> done. If not get back to us, happy to help .
>
> Regards
> Ram
>
>
>
> On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) <
> tom.el...@lloydsbanking.com.invalid> wrote:
>
> > Hi,
> >
> > I was wondering if it's possible/how to write Visibility Labels to an
> > HFileOutputFormat2? I believe Visibility Labels are just implemented
> > as Tags, but with the normal way of writing them with
> > Mutation#setCellVisibility these are formally written as Tags to the
> > cells during the VisibilityController coprocessor as we need to assert
> > the expression is valid for the labels configured.
> >
> > How can we add visibility labels to cells if we have a job that
> > creates an HFile with HFileOutputFormat2 which is then subsequently
> > loaded using LoadIncrementalHFiles?
> >
> > Cheers,
> >
> > Tom Ellis
> > Consultant Developer - Excelian
> > Data Lake | Financial Markets IT
> > LLOYDS BANK COMMERCIAL BANKING
> > 
> >
> > E: tom.el...@lloydsbanking.com<mailto:tom.el...@lloydsbanking.com>
> > Website:
> > www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.com/
> > >
> > , , ,
> > Reduce printing. Lloyds Banking Group is helping to build the low
> > carbon economy.
> > Corporate Responsibility Report:
> > www.lloydsbankinggroup-cr.com/downloads<
> > http://www.lloydsbankinggroup-cr.com/downloads>
> >
> >
> >
> > Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1
> 1YZ.
> > Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds
> > Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN.
> > Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank
> of Scotland plc.
> > Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland
> no.
> > SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc.
> > Registered
> > Office: Barnett Way, Gloucester GL4 3RL. Registered in England and
> > Wales 2299428. Telephone: 0345 603 1637
> >
> > Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
> > Regulation Authority and regulated by the Financial Conduct Authority
> > and Prudential Regulation Authority.
> >

Re: Writing visibility labels with HFileOutputFormat2

2016-06-07 Thread ramkrishna vasudevan

Hi Ellis

How is the HFileOutputFormat2 files created?  Are you using the ImportTsv
tool?  If you are using the ImportTsv tool then yes there is a way to
specify visibility tags while loading from the ImportTsv tool and those
visibility tags are also bulk loaded as HFile.

There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be used to
indicate that the data will have Visibility Tags and the tool will
automatically parse the specified field as Visibility Tag.

In case you have access to the code you can see the test
case TestImportTSVWithVisibilityLabels to get an initial idea of how it is
being done. If not get back to us, happy to help .

Regards
Ram



On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) <
tom.el...@lloydsbanking.com.invalid> wrote:

> Hi,
>
> I was wondering if it's possible/how to write Visibility Labels to an
> HFileOutputFormat2? I believe Visibility Labels are just implemented as
> Tags, but with the normal way of writing them with
> Mutation#setCellVisibility these are formally written as Tags to the cells
> during the VisibilityController coprocessor as we need to assert the
> expression is valid for the labels configured.
>
> How can we add visibility labels to cells if we have a job that creates an
> HFile with HFileOutputFormat2 which is then subsequently loaded using
> LoadIncrementalHFiles?
>
> Cheers,
>
> Tom Ellis
> Consultant Developer - Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
> 
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com >
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low carbon
> economy.
> Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads<
> http://www.lloydsbankinggroup-cr.com/downloads>
>
>
>
> Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ.
> Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank
> plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in
> England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc.
> Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no.
> SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered
> Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales
> 2299428. Telephone: 0345 603 1637
>
> Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential
> Regulation Authority and regulated by the Financial Conduct Authority and
> Prudential Regulation Authority.
>
> Cheltenham & Gloucester plc is authorised and regulated by the Financial
> Conduct Authority.
>
> Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester
> Savings is a division of Lloyds Bank plc.
>
> HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in
> Scotland no. SC218813.
>
> This e-mail (including any attachments) is private and confidential and
> may contain privileged material. If you have received this e-mail in error,
> please notify the sender and delete it (including any attachments)
> immediately. You must not copy, distribute, disclose or use any of the
> information in it or any attachments. Telephone calls may be monitored or
> recorded.
>

Re: Hbase ACL

2016-05-05 Thread ramkrishna vasudevan

Thanks Anoop for pointing to the JIRA.
>>One thing I don't understand - if I don't do the initial RW grant then
userX will never be allowed to write to the table,

Even if you don't set a grant permisison to a table for userX - you will be
able to write and read to the table provided the cell to which you are
writing / reading is having the specific permission for that userX.

I think in your case it is better that userX is given access to specific
Qualifier and specific family only if you know the list of qualifiers
before hand rather than giving permission for the entire table. This way
you can ensure that your cell level permission takes effect.  I verified
this with a test case too.

Regards
Ram

On Fri, May 6, 2016 at 12:14 PM, Anoop John  wrote:

> >>I am working on Hbase ACLs in order to lock a particular cell value for
> writes by a user for an indefinite amount of time. This same user will be
> writing to Hbase during normal program execution, and he needs to be able
> to continue to write to other cells during the single cell lock period.
> I’ve been experimenting with simple authentication (i.e. No Kerberos), and
> the plan is to extend to a Kerberized cluster once I get this working.
>
> So you want one user to be not allowed to write to a single cell or
> one column (cf:qual)? Do u know all possible column names for this
> table?
>
> -Anoop-
>
> On Fri, May 6, 2016 at 11:59 AM, Anoop John  wrote:
> > HBASE-11432 removed the cell first strategy
> >
> > -Anoop-
> >
> > On Thu, May 5, 2016 at 4:49 PM, Tokayer, Jason M.
> >  wrote:
> >> Hi Ram,
> >>
> >> I very much appreciate the guidance. I should be able to run through
> the gambit of tests via code this afternoon and will report back when I do.
> >>
> >> One thing I don't understand - if I don't do the initial RW grant then
> userX will never be allowed to write to the table, so I don't see how I can
> use that approach.
> >>
> >> Thanks,
> >> Jason
> >>
> >>
> >>
> >> Sent with Good (www.good.com)
> >> 
> >> From: ramkrishna vasudevan 
> >> Sent: Thursday, May 5, 2016 4:03:48 AM
> >> To: user@hbase.apache.org
> >> Subject: Re: Hbase ACL
> >>
> >> I verified the above behaviour using test case as the cluster was busy
> with
> >> other activities.
> >> So in the above example that you mentioned, you had already issued RW
> >> access to user-X on the table. Then a specific cell is over written
> with R
> >> permission using the special 'grant' command.
> >>
> >> Now as per the code since you already have a Write permission granted
> the
> >> cell level access does not work. Instead if you don't grant the RW
> >> permission to the user-X and try your steps it should work fine.
> >>
> >> So when a user with no permission on a table tries to do some mutations
> >> then if there is already a cell level permission granted by the super
> user
> >> then the cell level permission takes precedence.
> >>
> >> One thing to note is that the special 'grant' command is only for
> testing
> >> and not for production use case. You should always go with storing the
> ACL
> >> per mutation.
> >>
> >> Also see this section
> >> ACL Granularity and Evaluation Order
> >>
> >> ACLs are evaluated from least granular to most granular, and when an
> ACL is
> >> reached that grants permission, evaluation stops. This means that cell
> ACLs
> >> do not override ACLs at less granularity.
> >> I will raise a bug to remove the OP_ATTRIBUTE_ACL_STRATEGY from
> >> AccessControlConstants if it is misleading to users.
> >>
> >> Feel free to write back incase am missing something or need further
> inputs.
> >> Happy to help.
> >>
> >> Regards
> >> Ram
> >>
> >>
> >>
> >>
> >>
> >> On Wed, May 4, 2016 at 11:38 PM, ramkrishna vasudevan <
> >> ramkrishna.s.vasude...@gmail.com> wrote:
> >>
> >>> I tried out with the examples already available in the code base. Will
> try
> >>> it out on a cluster which I did not have access to today. Will probably
> >>> have access tomorrow.
> >>>
> >>> I was not aware of that 'grant' feature which allows to set permission
> on
> >>> all the cells with a specific prefix and on a specific qualifier. I
> will
> >>> che

Re: Access cell tags from HBase shell

2016-05-05 Thread ramkrishna vasudevan

>>Then how can I revert them to a recognizable form?
I think for that I don't think we have any APIs. May be for now you may
have to parse the tag expression and map every oridinal to the visibility
label string.

Regards
Ram

On Thu, May 5, 2016 at 9:09 PM, 
wrote:

> Yes, cell.getTagsLength is != 0. Good.
>
>
>
> I’m running HBase locally in pseudo distributed mode so I only have one
> hbase-site.xml to edit (?). But I must force the issue in the java client
> by setting the configuration programmatically so:
>
>
>
>
> config.set("hbase.client.rpc.codec", 
> "org.apache.hadoop.hbase.codec.KeyValueCodecWithTags");
>
>
>
> (If I don’t do this I get cell.getTagsLength = 0 even though the property
> is set in hbase-site.xml)
>
>
>
> I guess I could now use cell.getTagsArray to “see” the tags. (Though maybe
> both these steps rule out the use of the shell.)
>
>
>
> Then how can I revert them to a recognizable form?
>
>
>
> Thanks for your help,
>
>
>
> Ben
>
>
>
>
>
> *From:* ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> *Sent:* 05 May 2016 08:00
> *To:* Whittam Smith, Benedict (TR Technology & Ops)
> *Cc:* Anoop John; user@hbase.apache.org
>
> *Subject:* Re: Access cell tags from HBase shell
>
>
>
> Have you set the property in the hbase-site.xml on the client side also?
>
> Can you try to retrieve per cell from the REsult and check if the
> cell.getTagsLength is != 0?
>
>
>
> Regards
>
> Ram
>
>
>
> On Thu, May 5, 2016 at 1:41 AM, 
> wrote:
>
> Thanks guys. I've given it a go with the Java client but without success.
>
> I assume I set this property in hbase-site.xml:
>
> 
>   hbase.client.rpc.codec
>   org.apache.hadoop.hbase.codec.KeyValueCodecWithTags
> 
>
> Then I successfully set the labels. But the scan returns nothing new:
>
> scan = new Scan();
> ResultScanner rs = table.getScanner(scan);
> try {
>   for (Result r = rs.next(); r != null; r = rs.next()) {
>   System.out.println(r);
>   } ...
>
> Where am I doing wrong/not noticing?
>
> Many thanks,
>
> Ben
>
> > -Original Message-
> > From: Anoop John [mailto:anoop.hb...@gmail.com]
>
> > Sent: 03 May 2016 18:35
> > To: user@hbase.apache.org
> > Subject: Re: Access cell tags from HBase shell
> >
> > You have to config the Codec KeyValueCodecWithTags at client side.
> > Server also will use same Codec to talk with this client.  Ya just
> > check with a java client 1st and then experiment with shell.
> >
> > -Anoop-
> >
> > On Tue, May 3, 2016 at 9:43 PM, ramkrishna vasudevan
> >  wrote:
> > > Hi Benedict
> > >
> > > As super user you should be able to get back the tags as ordinals and
> > make
> > > sure you set the codec KeyValueCodecWithTags.
> > > But I am not sure if it is possible to do it from the hBase shell.
> > Can you
> > > try from a java client ?
> > >
> > > I did not do the hands on of late on this but I can do it if you face
> > any
> > > difficulties and revert back if needed.
> > >
> > > Regards
> > > Ram
> > >
> > > On Tue, May 3, 2016 at 8:19 PM,
> > 
> > > wrote:
> > >
> > >> Hi Anoop,
> > >>
> > >> Can I still get the labels back (as ordinals, as a super user, and
> > using
> > >> the KeyValueCodecWithTags codec) using the HBase shell?
> > >>
> > >> If so, what are the steps I need to take (i.e. doesn't seem to be
> > working
> > >> for me, but then I've likely made a mistake setting the codec).
> > >>
> > >> Thanks,
> > >>
> > >> Ben
> > >>
> > >> > -Original Message-
> > >> > From: Anoop John [mailto:anoop.hb...@gmail.com]
> > >> > Sent: 15 September 2015 14:28
> > >> > To: user@hbase.apache.org
> > >> > Subject: Re: Access cell tags from HBase shell
> > >> >
> > >> > We are not returning back the cell labels back to client.  So what
> > I
> > >> > will
> > >> > recommend you to test is by having a predicate in scan and test
> > you see
> > >> > only the relevant data back.
> > >> > But there is way to return cells (all*) with out any vis check and
> > >> > cells in
> > >> > client will have the vis label tag also in it. This is by issuing
> > the
> > >&

Re: Hbase ACL

2016-05-05 Thread ramkrishna vasudevan

I verified the above behaviour using test case as the cluster was busy with
other activities.
So in the above example that you mentioned, you had already issued RW
access to user-X on the table. Then a specific cell is over written with R
permission using the special 'grant' command.

Now as per the code since you already have a Write permission granted the
cell level access does not work. Instead if you don't grant the RW
permission to the user-X and try your steps it should work fine.

So when a user with no permission on a table tries to do some mutations
then if there is already a cell level permission granted by the super user
then the cell level permission takes precedence.

One thing to note is that the special 'grant' command is only for testing
and not for production use case. You should always go with storing the ACL
per mutation.

Also see this section
ACL Granularity and Evaluation Order

ACLs are evaluated from least granular to most granular, and when an ACL is
reached that grants permission, evaluation stops. This means that cell ACLs
do not override ACLs at less granularity.
I will raise a bug to remove the OP_ATTRIBUTE_ACL_STRATEGY from
AccessControlConstants if it is misleading to users.

Feel free to write back incase am missing something or need further inputs.
Happy to help.

Regards
Ram





On Wed, May 4, 2016 at 11:38 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> I tried out with the examples already available in the code base. Will try
> it out on a cluster which I did not have access to today. Will probably
> have access tomorrow.
>
> I was not aware of that 'grant' feature which allows to set permission on
> all the cells with a specific prefix and on a specific qualifier. I will
> check and get back to you on that.
>
> Regards
> Ram
>
> On Wed, May 4, 2016 at 10:25 PM, Tokayer, Jason M. <
> jason.toka...@capitalone.com> wrote:
>
>> Hi Ram,
>>
>> Thanks for the reply. I can take a look at that Mutation documentation.
>> But I wanted to first confirm that this works at all, which is why I
>> started in the shell. The docs I’ve been using are here:
>>
>> https://github.com/apache/hbase/blob/master/src/main/asciidoc/_chapters/sec
>> urity.adoc. If you search for 'The syntax for granting cell ACLs uses the
>> following syntax:’ you'll find the example I’ve been following for cell
>> ACLs. According to the docs, "The shell will run a scanner with the given
>> criteria, rewrite the found cells with new ACLs, and store them back to
>> their exact coordinates.”. So I was under the impression that this would
>> lock ALL cells that meet the criteria, and if I wanted to lock a specific
>> cell I could add some more filters. Might I be reading that wrong?
>>
>> I can access the examples and will take a look. Were you able to confirm
>> proper functioning for table overrides on existing cells?
>>
>> --
>> Warmest Regards,
>> Jason Tokayer, PhD
>>
>>
>>
>>
>> On 5/4/16, 12:30 PM, "ramkrishna vasudevan"
>>  wrote:
>>
>> >Superuser:
>> >grant 'ns1:t1', {'userX' => 'R' }, { COLUMNS => 'cf1', FILTER =>
>> >"(PrefixFilter ('r2'))" }
>> >
>> >So you are trying to grant R permission to user-X for a given qualifier.
>> >Please not that this is NOT for a given Cell.
>> >
>> >Reiterating from your first mail
>> >>>What I need to be able to do next is to set user-X’s permissions on a
>> >particular cell to read only and have that take precedence over the table
>> >permissions.
>> >So where is this  being done in your above example? I may be missing
>> >something here.
>> >
>> >You need to create Put mutation and set READ permission using the
>> >Mutation.setACL API for User-X for that specific cell.
>> >
>> >Can you see an example in TestCellACLs in case you have access to the
>> >code?
>> >
>> >The idea of cell level ACLs is to give cell level access. So in this case
>> >your super-user can pass a mutation with ACL set on the mutation which
>> >could say - Grant R permission to user-X.
>> >
>> >So only user-X can read the cell but he will not be able to do any
>> updates
>> >to that cell.
>> >
>> >I think once you see some examples in TestCellACLs you can be more clear
>> >on
>> >how it is being done.
>> >
>> >Regards
>> >Ram
>> >
>> >
>> >On Wed, May 4, 2016 at 6:02 PM, Tokayer, J

Re: Access cell tags from HBase shell

2016-05-04 Thread ramkrishna vasudevan

Have you set the property in the hbase-site.xml on the client side also?
Can you try to retrieve per cell from the REsult and check if the
cell.getTagsLength is != 0?

Regards
Ram

On Thu, May 5, 2016 at 1:41 AM, 
wrote:

> Thanks guys. I've given it a go with the Java client but without success.
>
> I assume I set this property in hbase-site.xml:
>
> 
>   hbase.client.rpc.codec
>   org.apache.hadoop.hbase.codec.KeyValueCodecWithTags
> 
>
> Then I successfully set the labels. But the scan returns nothing new:
>
> scan = new Scan();
> ResultScanner rs = table.getScanner(scan);
> try {
>   for (Result r = rs.next(); r != null; r = rs.next()) {
>   System.out.println(r);
>   } ...
>
> Where am I doing wrong/not noticing?
>
> Many thanks,
>
> Ben
>
> > -Original Message-
> > From: Anoop John [mailto:anoop.hb...@gmail.com]
> > Sent: 03 May 2016 18:35
> > To: user@hbase.apache.org
> > Subject: Re: Access cell tags from HBase shell
> >
> > You have to config the Codec KeyValueCodecWithTags at client side.
> > Server also will use same Codec to talk with this client.  Ya just
> > check with a java client 1st and then experiment with shell.
> >
> > -Anoop-
> >
> > On Tue, May 3, 2016 at 9:43 PM, ramkrishna vasudevan
> >  wrote:
> > > Hi Benedict
> > >
> > > As super user you should be able to get back the tags as ordinals and
> > make
> > > sure you set the codec KeyValueCodecWithTags.
> > > But I am not sure if it is possible to do it from the hBase shell.
> > Can you
> > > try from a java client ?
> > >
> > > I did not do the hands on of late on this but I can do it if you face
> > any
> > > difficulties and revert back if needed.
> > >
> > > Regards
> > > Ram
> > >
> > > On Tue, May 3, 2016 at 8:19 PM,
> > 
> > > wrote:
> > >
> > >> Hi Anoop,
> > >>
> > >> Can I still get the labels back (as ordinals, as a super user, and
> > using
> > >> the KeyValueCodecWithTags codec) using the HBase shell?
> > >>
> > >> If so, what are the steps I need to take (i.e. doesn't seem to be
> > working
> > >> for me, but then I've likely made a mistake setting the codec).
> > >>
> > >> Thanks,
> > >>
> > >> Ben
> > >>
> > >> > -Original Message-
> > >> > From: Anoop John [mailto:anoop.hb...@gmail.com]
> > >> > Sent: 15 September 2015 14:28
> > >> > To: user@hbase.apache.org
> > >> > Subject: Re: Access cell tags from HBase shell
> > >> >
> > >> > We are not returning back the cell labels back to client.  So what
> > I
> > >> > will
> > >> > recommend you to test is by having a predicate in scan and test
> > you see
> > >> > only the relevant data back.
> > >> > But there is way to return cells (all*) with out any vis check and
> > >> > cells in
> > >> > client will have the vis label tag also in it. This is by issuing
> > the
> > >> > scan
> > >> > as a super user.  And also set the codec as KeyValueCodecWithTags.
> > >> > But one thing we wont be storing the vis label with Cells as
> > string..
> > >> > We
> > >> > will optimize..  We will store them as ordinals and & and |
> > condition
> > >> > also
> > >> > we will optimize. So even if you read back the vis label tags back
> > in
> > >> > client it will be hard to parse it and understand..  Any thing
> > more you
> > >> > would like to know, pls let me know..  Will be happy to help.
> > >> >
> > >> > BTW  once you test and if start to use the feature pls let me
> > know..
> > >> > Will
> > >> > be great to hear the usage cases and feedback.
> > >> >
> > >> > -Anoop-
> > >> >
> > >> >
> > >> > On Fri, Sep 11, 2015 at 5:35 AM, Suresh Subbiah
> > >> > 
> > >> > wrote:
> > >> >
> > >> > > Hi Anoop,
> > >> > >
> > >> > > Thank you very much for the offer to help.
> > >> > >
> > >> > > I have been thinking some more about what it is that we need to
> > do
> > >> > and have
> > >> > > reali

Re: Hbase ACL

2016-05-04 Thread ramkrishna vasudevan

I tried out with the examples already available in the code base. Will try
it out on a cluster which I did not have access to today. Will probably
have access tomorrow.

I was not aware of that 'grant' feature which allows to set permission on
all the cells with a specific prefix and on a specific qualifier. I will
check and get back to you on that.

Regards
Ram

On Wed, May 4, 2016 at 10:25 PM, Tokayer, Jason M. <
jason.toka...@capitalone.com> wrote:

> Hi Ram,
>
> Thanks for the reply. I can take a look at that Mutation documentation.
> But I wanted to first confirm that this works at all, which is why I
> started in the shell. The docs I’ve been using are here:
> https://github.com/apache/hbase/blob/master/src/main/asciidoc/_chapters/sec
> urity.adoc. If you search for 'The syntax for granting cell ACLs uses the
> following syntax:’ you'll find the example I’ve been following for cell
> ACLs. According to the docs, "The shell will run a scanner with the given
> criteria, rewrite the found cells with new ACLs, and store them back to
> their exact coordinates.”. So I was under the impression that this would
> lock ALL cells that meet the criteria, and if I wanted to lock a specific
> cell I could add some more filters. Might I be reading that wrong?
>
> I can access the examples and will take a look. Were you able to confirm
> proper functioning for table overrides on existing cells?
>
> --
> Warmest Regards,
> Jason Tokayer, PhD
>
>
>
>
> On 5/4/16, 12:30 PM, "ramkrishna vasudevan"
>  wrote:
>
> >Superuser:
> >grant 'ns1:t1', {'userX' => 'R' }, { COLUMNS => 'cf1', FILTER =>
> >"(PrefixFilter ('r2'))" }
> >
> >So you are trying to grant R permission to user-X for a given qualifier.
> >Please not that this is NOT for a given Cell.
> >
> >Reiterating from your first mail
> >>>What I need to be able to do next is to set user-X’s permissions on a
> >particular cell to read only and have that take precedence over the table
> >permissions.
> >So where is this  being done in your above example? I may be missing
> >something here.
> >
> >You need to create Put mutation and set READ permission using the
> >Mutation.setACL API for User-X for that specific cell.
> >
> >Can you see an example in TestCellACLs in case you have access to the
> >code?
> >
> >The idea of cell level ACLs is to give cell level access. So in this case
> >your super-user can pass a mutation with ACL set on the mutation which
> >could say - Grant R permission to user-X.
> >
> >So only user-X can read the cell but he will not be able to do any updates
> >to that cell.
> >
> >I think once you see some examples in TestCellACLs you can be more clear
> >on
> >how it is being done.
> >
> >Regards
> >Ram
> >
> >
> >On Wed, May 4, 2016 at 6:02 PM, Tokayer, Jason M. <
> >jason.toka...@capitalone.com> wrote:
> >
> >> Hi Ram,
> >>
> >> Unfortunately, that configuration doesn’t seem to help. I’ve pasted my
> >> config followed by the CLI commands I’ve been running so that the issue
> >> can be reproduced.
> >>
> >>
> >> CONFIG:
> >> 
> >>   hbase.security.authentication
> >>   simple
> >> 
> >> 
> >>   hbase.security.authorization
> >>   true
> >> 
> >> 
> >> hbase.security.access.early_out
> >> false
> >> 
> >> 
> >>   hbase.coprocessor.master.classes
> >>
> >>
> >>org.apache.hadoop.hbase.security.access.AccessController,org.apach
> >>e.
> >> hadoop.hbase.security.visibility.VisibilityController
> >> 
> >> 
> >>   hbase.coprocessor.region.classes
> >>
> >>
> >>org.apache.hadoop.hbase.security.access.AccessController,org.apach
> >>e.
> >> hadoop.hbase.security.visibility.VisibilityController
> >> 
> >> 
> >>   hbase.coprocessor.regionserver.classes
> >>
> >>
> >>org.apache.hadoop.hbase.security.access.AccessController,org.apach
> >>e.
> >>
> >>hadoop.hbase.security.visibility.VisibilityController$VisibilityReplicati
> >>on
> >> 
> >> 
> >>
> >>
> >>
> >> CLI COMMANDS:
> >>
> >> Superuser:
> >> create_namespace 'ns1'
> >> create 'ns1:t1','cf1'
> >> grant 'userX','RW','n

Re: Hbase ACL

2016-05-04 Thread ramkrishna vasudevan

Superuser:
grant 'ns1:t1', {'userX' => 'R' }, { COLUMNS => 'cf1', FILTER =>
"(PrefixFilter ('r2'))" }

So you are trying to grant R permission to user-X for a given qualifier.
Please not that this is NOT for a given Cell.

Reiterating from your first mail
>>What I need to be able to do next is to set user-X’s permissions on a
particular cell to read only and have that take precedence over the table
permissions.
So where is this  being done in your above example? I may be missing
something here.

You need to create Put mutation and set READ permission using the
Mutation.setACL API for User-X for that specific cell.

Can you see an example in TestCellACLs in case you have access to the code?

The idea of cell level ACLs is to give cell level access. So in this case
your super-user can pass a mutation with ACL set on the mutation which
could say - Grant R permission to user-X.

So only user-X can read the cell but he will not be able to do any updates
to that cell.

I think once you see some examples in TestCellACLs you can be more clear on
how it is being done.

Regards
Ram


On Wed, May 4, 2016 at 6:02 PM, Tokayer, Jason M. <
jason.toka...@capitalone.com> wrote:

> Hi Ram,
>
> Unfortunately, that configuration doesn’t seem to help. I’ve pasted my
> config followed by the CLI commands I’ve been running so that the issue
> can be reproduced.
>
>
> CONFIG:
> 
>   hbase.security.authentication
>   simple
> 
> 
>   hbase.security.authorization
>   true
> 
> 
> hbase.security.access.early_out
> false
> 
> 
>   hbase.coprocessor.master.classes
>
> org.apache.hadoop.hbase.security.access.AccessController,org.apache.
> hadoop.hbase.security.visibility.VisibilityController
> 
> 
>   hbase.coprocessor.region.classes
>
> org.apache.hadoop.hbase.security.access.AccessController,org.apache.
> hadoop.hbase.security.visibility.VisibilityController
> 
> 
>   hbase.coprocessor.regionserver.classes
>
> org.apache.hadoop.hbase.security.access.AccessController,org.apache.
> hadoop.hbase.security.visibility.VisibilityController$VisibilityReplication
> 
> 
>
>
>
> CLI COMMANDS:
>
> Superuser:
> create_namespace 'ns1'
> create 'ns1:t1','cf1'
> grant 'userX','RW','ns1:t1'
>
>
> userX:
> put 'ns1:t1', 'r2', 'cf1:q1', 'v1',1462364682267
> put 'ns1:t1', 'r2', 'cf1:q2', 'v2',1462364700012
>
> Superuser:
> grant 'ns1:t1', {'userX' => 'R' }, { COLUMNS => 'cf1', FILTER =>
> "(PrefixFilter ('r2'))" }
>
> userX:
> put 'ns1:t1', 'r2', 'cf1:q1', 'v2',1462364682267 #WORKS, BUT SHOULD IT???
>
>
>
> Any help/guidance you can provide will be greatly appreciated.
>
> --
> Warmest Regards,
> Jason Tokayer, PhD
>
>
>
> On 5/3/16, 2:30 PM, "ramkrishna vasudevan"
>  wrote:
>
> >I think reading the code - there should be no change between the version
> >that you are using and the trunk version.
> >
> >Set this property to false
> >'hbase.security.access.early_out' and try once.
> >Tomorrow early in the morning I will try out some test case and will
> >revert
> >back to you.
> >Do let me know if the above config works for you.
> >
> >Regards
> >Ram
> >
> >On Tue, May 3, 2016 at 11:27 PM, Tokayer, Jason M. <
> >jason.toka...@capitalone.com> wrote:
> >
> >> Hi Ram,
> >>
> >> We are using 1.1.2, but can update to most recent if the desired feature
> >> is provided. We do set authorization to true, and I can confirm that I
> >>can
> >> block writes to the entire table for user-X. But, it that when I grant
> >>RW
> >> permission (to user-X) on a table and R only on a specific cell in that
> >> table then user-X can still write to that cell. This indicates to me
> >>that
> >> table/cf ACLs are given preference over cell ACLs.
> >>
> >> Have there been significant upgrades to this particular feature since
> >> v1.1.2? Would you recommend attempting an upgrade, or do you think the
> >> issue is still present in trunk? Can you verify via tests that
> >> CHECK_CELL_DEFAULT is (a) used by default and (b) is working properly? I
> >> don¹t see any unit tests in the codebase for this feature.
> >>
> >> --
> >> Warmest Regards,
> >> Jason Tokayer, PhD
> >>
> >>
> >>
> >> On

Re: Hbase ACL

2016-05-03 Thread ramkrishna vasudevan

I think reading the code - there should be no change between the version
that you are using and the trunk version.

Set this property to false
'hbase.security.access.early_out' and try once.
Tomorrow early in the morning I will try out some test case and will revert
back to you.
Do let me know if the above config works for you.

Regards
Ram

On Tue, May 3, 2016 at 11:27 PM, Tokayer, Jason M. <
jason.toka...@capitalone.com> wrote:

> Hi Ram,
>
> We are using 1.1.2, but can update to most recent if the desired feature
> is provided. We do set authorization to true, and I can confirm that I can
> block writes to the entire table for user-X. But, it that when I grant RW
> permission (to user-X) on a table and R only on a specific cell in that
> table then user-X can still write to that cell. This indicates to me that
> table/cf ACLs are given preference over cell ACLs.
>
> Have there been significant upgrades to this particular feature since
> v1.1.2? Would you recommend attempting an upgrade, or do you think the
> issue is still present in trunk? Can you verify via tests that
> CHECK_CELL_DEFAULT is (a) used by default and (b) is working properly? I
> don¹t see any unit tests in the codebase for this feature.
>
> --
> Warmest Regards,
> Jason Tokayer, PhD
>
>
>
> On 5/3/16, 1:41 PM, "ramkrishna vasudevan"
>  wrote:
>
> >Hi Jason
> >Which version of HBase are you using?
> >
> >Atleast in trunk I could see that 'OP_ATTRIBUTE_ACL_STRATEGY_CELL_FIRST'
> >is
> >not used rather by default CHECK_CELL_DEFAULT strategy is what getting
> >used
> >now.
> >
> >Ensure that 'hbase.security.authorization' is set to true in
> >hbase-site.xml. If you could tell the version you are using can be much
> >more specific.
> >
> >Regards
> >Ram
> >
> >On Tue, May 3, 2016 at 6:22 PM, Tokayer, Jason M. <
> >jason.toka...@capitalone.com> wrote:
> >
> >> I am working on Hbase ACLs in order to lock a particular cell value for
> >> writes by a user for an indefinite amount of time. This same user will
> >>be
> >> writing to Hbase during normal program execution, and he needs to be
> >>able
> >> to continue to write to other cells during the single cell lock period.
> >> I¹ve been experimenting with simple authentication (i.e. No Kerberos),
> >>and
> >> the plan is to extend to a Kerberized cluster once I get this working.
> >>
> >> First, I am able to grant Œuser-X¹ read and write permissions to a
> >> particular namespace. In this way user-X can write to any Hbase table in
> >> that namespace during normal execution. What I need to be able to do
> >>next
> >> is to set user-X¹s permissions on a particular cell to read only and
> >>have
> >> that take precedence over the table permissions. I found a parameter in
> >>the
> >> codebase here
> >>
> >>
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/or
> >>g/apache/hadoop/hbase/security/access/AccessControlConstants.java,
> >> namely OP_ATTRIBUTE_ACL_STRATEGY_CELL_FIRST, that seems to allow for
> >>this
> >> prioritization of cell-level over table-/column-level. But I cannot
> >>figure
> >> out how to set this with key OP_ATTRIBUTE_ACL_STRATEGY. Is it possible
> >>to
> >> set the strategy to cell-level prioritization, preferably in
> >> hbase-site.xml? This feature is critical to our cell-level access
> >>control.
> >>
> >> --
> >> *Warmest Regards,*
> >> *Jason Tokayer, PhD*
> >>
> >> --
> >>
> >> The information contained in this e-mail is confidential and/or
> >> proprietary to Capital One and/or its affiliates and may only be used
> >> solely in performance of work or services for Capital One. The
> >>information
> >> transmitted herewith is intended only for use by the individual or
> >>entity
> >> to which it is addressed. If the reader of this message is not the
> >>intended
> >> recipient, you are hereby notified that any review, retransmission,
> >> dissemination, distribution, copying or other use of, or taking of any
> >> action in reliance upon this information is strictly prohibited. If you
> >> have received this communication in error, please contact the sender and
> >> delete the material from your computer.
> >>
>
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>

Re: Hbase ACL

2016-05-03 Thread ramkrishna vasudevan

Hi Jason
Which version of HBase are you using?

Atleast in trunk I could see that 'OP_ATTRIBUTE_ACL_STRATEGY_CELL_FIRST' is
not used rather by default CHECK_CELL_DEFAULT strategy is what getting used
now.

Ensure that 'hbase.security.authorization' is set to true in
hbase-site.xml. If you could tell the version you are using can be much
more specific.

Regards
Ram

On Tue, May 3, 2016 at 6:22 PM, Tokayer, Jason M. <
jason.toka...@capitalone.com> wrote:

> I am working on Hbase ACLs in order to lock a particular cell value for
> writes by a user for an indefinite amount of time. This same user will be
> writing to Hbase during normal program execution, and he needs to be able
> to continue to write to other cells during the single cell lock period.
> I’ve been experimenting with simple authentication (i.e. No Kerberos), and
> the plan is to extend to a Kerberized cluster once I get this working.
>
> First, I am able to grant ‘user-X’ read and write permissions to a
> particular namespace. In this way user-X can write to any Hbase table in
> that namespace during normal execution. What I need to be able to do next
> is to set user-X’s permissions on a particular cell to read only and have
> that take precedence over the table permissions. I found a parameter in the
> codebase here
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlConstants.java,
> namely OP_ATTRIBUTE_ACL_STRATEGY_CELL_FIRST, that seems to allow for this
> prioritization of cell-level over table-/column-level. But I cannot figure
> out how to set this with key OP_ATTRIBUTE_ACL_STRATEGY. Is it possible to
> set the strategy to cell-level prioritization, preferably in
> hbase-site.xml? This feature is critical to our cell-level access control.
>
> --
> *Warmest Regards,*
> *Jason Tokayer, PhD*
>
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Access cell tags from HBase shell

2016-05-03 Thread ramkrishna vasudevan

Hi Benedict

As super user you should be able to get back the tags as ordinals and make
sure you set the codec KeyValueCodecWithTags.
But I am not sure if it is possible to do it from the hBase shell. Can you
try from a java client ?

I did not do the hands on of late on this but I can do it if you face any
difficulties and revert back if needed.

Regards
Ram

On Tue, May 3, 2016 at 8:19 PM, 
wrote:

> Hi Anoop,
>
> Can I still get the labels back (as ordinals, as a super user, and using
> the KeyValueCodecWithTags codec) using the HBase shell?
>
> If so, what are the steps I need to take (i.e. doesn't seem to be working
> for me, but then I've likely made a mistake setting the codec).
>
> Thanks,
>
> Ben
>
> > -Original Message-
> > From: Anoop John [mailto:anoop.hb...@gmail.com]
> > Sent: 15 September 2015 14:28
> > To: user@hbase.apache.org
> > Subject: Re: Access cell tags from HBase shell
> >
> > We are not returning back the cell labels back to client.  So what I
> > will
> > recommend you to test is by having a predicate in scan and test you see
> > only the relevant data back.
> > But there is way to return cells (all*) with out any vis check and
> > cells in
> > client will have the vis label tag also in it. This is by issuing the
> > scan
> > as a super user.  And also set the codec as KeyValueCodecWithTags.
> > But one thing we wont be storing the vis label with Cells as string..
> > We
> > will optimize..  We will store them as ordinals and & and | condition
> > also
> > we will optimize. So even if you read back the vis label tags back in
> > client it will be hard to parse it and understand..  Any thing more you
> > would like to know, pls let me know..  Will be happy to help.
> >
> > BTW  once you test and if start to use the feature pls let me know..
> > Will
> > be great to hear the usage cases and feedback.
> >
> > -Anoop-
> >
> >
> > On Fri, Sep 11, 2015 at 5:35 AM, Suresh Subbiah
> > 
> > wrote:
> >
> > > Hi Anoop,
> > >
> > > Thank you very much for the offer to help.
> > >
> > > I have been thinking some more about what it is that we need to do
> > and have
> > > realized that we don't need custom cell tags.
> > > We we will only be using visibility labels. This is basically for
> > testing
> > > purpose and to understand exactly how data looks.
> > >
> > > How do we see visibility labels that are applied to a particular
> > cell? For
> > > ex, if we want to know all the labels that have been applied to
> > > all cells, how do we do that? Or can that only be done by applying a
> > > predicate and then check to see if the pred passes?
> > >
> > > Is there a way to pass visibility labels to client is a test mode ?
> > >
> > > Thanks
> > > Suresh
> > >
> > >
> > > On Thu, Sep 3, 2015 at 11:07 PM, Anoop John 
> > wrote:
> > >
> > > > Hi Suresh
> > > > You wan to use ur own custom tags with cells?  The
> > > features
> > > > like cell level vis labels etc are also implemented by storing them
> > as
> > > cell
> > > > tags.  Yes as others said, the tags is by default a server only
> > thing.
> > > > Means you can not pass tags from/to client along with cells.  There
> > is
> > > some
> > > > security reasons why we had opted this path.  And there were no
> > custom
> > > tag
> > > > needs by then. Pls let us know what you want to achieve.   There is
> > ways
> > > to
> > > > pass tags to/from client. I can help you.
> > > >
> > > > -Anoop-
> > > >
> > > >
> > > > On Tue, Sep 1, 2015 at 4:29 AM, Jerry He 
> > wrote:
> > > >
> > > > > Hi, Suresh
> > > > >
> > > > > In you Java client program, you can 'label' the cells in your
> > PUT.  You
> > > > can
> > > > > ask which labeled cells to be returned in your Get and Scan, but
> > the
> > > > labels
> > > > > are not returned with the cells.
> > > > > Yes, "labels on cells are only interpreted server side"
> > > > >
> > > > >
> > > > > Jerry
> > > > >
> > > > > On Mon, Aug 31, 2015 at 1:27 PM, Suresh Subbiah <
> > > > > suresh.subbia...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thank you very much Ted, Jean-Marc.
> > > > > >
> > > > > > I see that slide 4 in
> > > > > > https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__www.slideshare.net_HBaseCon_features-2Dsession-
> > 2D2&d=CwIBaQ&c=4ZIZThykDLcoWk-
> > GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=GQ6xvz2BG1vCgiGGeLHdL1qJLbLUqYG6W19eFBlz
> > nzDGH3wjzyriGVJemENTKsgx&m=sLIg484DFLi0oSu5ylkGuIuB-
> > re6sXaYY0fb9BreY2o&s=BhpulFRnZ_JNgAOPjb_MFtv0rnH9yaNXtQZE_g7y-28&e=
> > states
> > > > > > that "cells are only interpreted server side"
> > > > > > However https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__issues.apache.org_jira_browse_HBASE-
> > 2D9056&d=CwIBaQ&c=4ZIZThykDLcoWk-
> > GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=GQ6xvz2BG1vCgiGGeLHdL1qJLbLUqYG6W19eFBlz
> > nzDGH3wjzyriGVJemENTKsgx&m=sLIg484DFLi0oSu5ylkGuIuB-
> > re6sXaYY0fb9BreY2o&s=u_ISxz2OpkFA6Y5cYXGcQqpG24S54zDi1WhuHfbq18A&e=  &
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__issues.apache.

Re: Getting all the columns of row-key at once

2016-01-21 Thread ramkrishna vasudevan

No it is not possible. Even in RDBMS if you want to do select * from table
where row=100, you need to do rs.getXXX(i) with suitable index to get the
expected row right?

If you want specific rows you specify that in the select query - similar
case here. When you say you don't want to iterate you mean that the Result
object should just have all the values of all the columns appended to it
and currently there is no such way possible.

Regards
Ram

On Fri, Jan 22, 2016 at 11:11 AM, Rajeshkumar J  wrote:

> If that is the case if I do maintain only one versions of my data is this
> retrieval is possible?
>
> Thanks
>
> On Fri, Jan 22, 2016 at 11:01 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Currently that is not possible. The reason being that the columns are not
> > fixed in HBase.
> > There could be another row or may another version of this row 100 where
> > there are only col2 and col4 populated and there is no col1 and col3.
> > So as per your schema you should be knowing with which column the value
> is
> > associated.
> > In other words
> >
> > Row-key col1col2 col3col4
> >
> > 100   xxxyyy zzzaaa
> > 100   xxx
> yyy
> >
> > Now how do you know xxx is associated with col2 or col1 when you try to
> > retrieve the latest version of row key 100?
> >
> > Regards
> > Ram
> >
> >
> > On Fri, Jan 22, 2016 at 10:55 AM, Rajeshkumar J <
> > rajeshkumarit8...@gmail.com
> > > wrote:
> >
> > > Hi,
> > >
> > >   For instance
> > >
> > > Row-key col1col2 col3col4
> > >
> > > 100   xxxyyy zzzaaa
> > >
> > > I am scanning this row-key(100) and I want to get the value as
> > > xxx,yyy,zzz,aaa from Result instance. Not using iterator to get 
> then
> > >  then  then .
> > >
> > > Thanks
> > >
> > > On Fri, Jan 22, 2016 at 10:47 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > Once you retrieve a result it will have all the columns that were
> > > scanned.
> > > > If suppose you had 5 columns and you specifically wanted only 2
> columns
> > > out
> > > > of it you can add the required columns using scan.addColumn() API
> then
> > > the
> > > > result will have only those 2 columns.
> > > > If nothing is specified your result will have entire set of columns
> > that
> > > > comprises that row (including multiple Column families).
> > > >
> > > > But every column's result is an individual KeyValue which you may
> have
> > to
> > > > iterate and get it.
> > > > >> So is there any option to get all the column
> > > > values of row-key at once.
> > > > So this is already happening for you.  Am I missing something here?
> > > >
> > > > On Fri, Jan 22, 2016 at 10:31 AM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > >   I have already posted this in mailing list but with changes in my
> > use
> > > > > case.  Is there any options to retrieve all the columns of row-key
> at
> > > > once.
> > > > >
> > > > > ResultScanner resultScanner = table.getScanner(scan);
> > > > > Iterator iterator = resultScanner.iterator();
> > > > > while (iterator.hasNext()) {
> > > > >  Result next = iterator.next();
> > > > > for (KeyValue key : next.list()) {
> > > > >
> > > > >  System.out.println(Bytes.toString(key.getValue()));
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > This  is how I am doing scan using java api. Using this I can get
> > only
> > > > one
> > > > > columns in each iteration. So is there any option to get all the
> > column
> > > > > values of row-key at once.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Re: Getting all the columns of row-key at once

2016-01-21 Thread ramkrishna vasudevan

Currently that is not possible. The reason being that the columns are not
fixed in HBase.
There could be another row or may another version of this row 100 where
there are only col2 and col4 populated and there is no col1 and col3.
So as per your schema you should be knowing with which column the value is
associated.
In other words

Row-key col1col2 col3col4

100   xxxyyy zzzaaa
100   xxx  yyy

Now how do you know xxx is associated with col2 or col1 when you try to
retrieve the latest version of row key 100?

Regards
Ram


On Fri, Jan 22, 2016 at 10:55 AM, Rajeshkumar J  wrote:

> Hi,
>
>   For instance
>
> Row-key col1col2 col3col4
>
> 100   xxxyyy zzzaaa
>
> I am scanning this row-key(100) and I want to get the value as
> xxx,yyy,zzz,aaa from Result instance. Not using iterator to get  then
>  then  then .
>
> Thanks
>
> On Fri, Jan 22, 2016 at 10:47 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Once you retrieve a result it will have all the columns that were
> scanned.
> > If suppose you had 5 columns and you specifically wanted only 2 columns
> out
> > of it you can add the required columns using scan.addColumn() API then
> the
> > result will have only those 2 columns.
> > If nothing is specified your result will have entire set of columns that
> > comprises that row (including multiple Column families).
> >
> > But every column's result is an individual KeyValue which you may have to
> > iterate and get it.
> > >> So is there any option to get all the column
> > values of row-key at once.
> > So this is already happening for you.  Am I missing something here?
> >
> > On Fri, Jan 22, 2016 at 10:31 AM, Rajeshkumar J <
> > rajeshkumarit8...@gmail.com
> > > wrote:
> >
> > > Hi,
> > >
> > >   I have already posted this in mailing list but with changes in my use
> > > case.  Is there any options to retrieve all the columns of row-key at
> > once.
> > >
> > > ResultScanner resultScanner = table.getScanner(scan);
> > > Iterator iterator = resultScanner.iterator();
> > > while (iterator.hasNext()) {
> > >  Result next = iterator.next();
> > > for (KeyValue key : next.list()) {
> > >
> > >  System.out.println(Bytes.toString(key.getValue()));
> > > }
> > >
> > >
> > >
> > > This  is how I am doing scan using java api. Using this I can get only
> > one
> > > columns in each iteration. So is there any option to get all the column
> > > values of row-key at once.
> > >
> > > Thanks
> > >
> >
>

Re: Getting all the columns of row-key at once

2016-01-21 Thread ramkrishna vasudevan

Once you retrieve a result it will have all the columns that were scanned.
If suppose you had 5 columns and you specifically wanted only 2 columns out
of it you can add the required columns using scan.addColumn() API then the
result will have only those 2 columns.
If nothing is specified your result will have entire set of columns that
comprises that row (including multiple Column families).

But every column's result is an individual KeyValue which you may have to
iterate and get it.
>> So is there any option to get all the column
values of row-key at once.
So this is already happening for you.  Am I missing something here?

On Fri, Jan 22, 2016 at 10:31 AM, Rajeshkumar J  wrote:

> Hi,
>
>   I have already posted this in mailing list but with changes in my use
> case.  Is there any options to retrieve all the columns of row-key at once.
>
> ResultScanner resultScanner = table.getScanner(scan);
> Iterator iterator = resultScanner.iterator();
> while (iterator.hasNext()) {
>  Result next = iterator.next();
> for (KeyValue key : next.list()) {
>
>  System.out.println(Bytes.toString(key.getValue()));
> }
>
>
>
> This  is how I am doing scan using java api. Using this I can get only one
> columns in each iteration. So is there any option to get all the column
> values of row-key at once.
>
> Thanks
>

Re: hbase (coprocessors & cell tags) used in hadoop-yarn

2016-01-04 Thread ramkrishna vasudevan

I saw the patches some time back but got lost in other work. You are
creating the new cells with Tags inside the Coprocessors?
Do you see any need for introducing Tags to be added directly from the
client side as part of Puts (for your usecase)? Currently HBase does not
support Tags on the client side. Tags are now server side pieces for a Cell.

Regards
Ram

On Tue, Jan 5, 2016 at 1:34 AM, Vrushali Channapattan 
wrote:

> I see, thanks Anoop. We wanted to use cell tags for indicating the context
> of information in cells in that cells for aggregation purpose. It is
> referred to only in the coprocessor. We also use in the flush/compaction
> processing to decide which cells to discard/what info to keep.
>
> I will be on the lookout for Tag interface changes.
>
> On Thu, Dec 24, 2015 at 7:19 AM, Anoop John  wrote:
>
> > I can see in the patches that you are trying to use Cell creation with
> Tags
> > and use of Tag APIs..  Only concern is Tag is Private audience marked. It
> > was created to support per cell ACL/ visibility etc.
> >
> > As part of off heaping effort, we are planning to make some changes to
> Tag
> > APIs.. (To make it interface impl itself).. This will happen in HBase
> > trunk..   So later when you move to newer version need to change it.
> >
> > -Anoop-
> >
> >
> > On Tue, Dec 22, 2015 at 12:37 PM, Vrushali Channapattan <
> > vrushal...@gmail.com> wrote:
> >
> > > A group of us in the hadoop community are working on Yarn's next gen
> > > timeline service component
> > https://issues.apache.org/jira/browse/YARN-2928
> > >
> > > that will be storing for application that runs on a hadoop cluster all
> of
> > > the application stats, workflow metadata and container metrics
> > information
> > > in hbase tables (some plain hbase tables and some phoenix based ones).
> > >
> > > We have been thinking about validating some of the implementation
> > > approaches we are taking with HBase. It would be great to get some
> > feedback
> > > on the code and design from the HBase dev perspective.
> > >
> > > Among other things, we are making use of cell tags in coprocessors for
> > > summation, min and max operations on different versions of cells in a
> > given
> > > column during read as well flush and compaction operations.  Some
> > relevant
> > > subjiras that deal with hbase coprocessors
> > > https://issues.apache.org/jira/browse/YARN-4062
> > > https://issues.apache.org/jira/browse/YARN-3901
> > >
> > > We have the schema documented with example records in the code as well
> as
> > > in pdf on the jira.
> > >
> > >
> > >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34
> > >
> > >
> > >
> >
> https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40
> > >
> > > Schema jira (pdf attachment that describes the schema)
> > > https://issues.apache.org/jira/browse/YARN-3411
> > >
> > > Would appreciate any feedback/comments that you have and be glad to
> > answer
> > > any questions to clarify in depth further.
> > >
> > > thanks
> > > Vrushali
> > >
> >
>

Re: delete of cells with visibility expressions

2015-11-03 Thread ramkrishna vasudevan

Seeing the code I think we have identified the issue as Anoop John said. We
could fix this probably in the next release.  Let us know if you want us to
file a JIRA for this.

Regards
Ram


On Wed, Nov 4, 2015 at 11:09 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Sorry for the confusion. Yes the bug exists. When I tried in the cluster
> the Visibilty CP was not on. So it is better we can raise a JIRA and fix
> this over there. Thanks Anoop Sharma and sorry for the delay from my side
> due to wrong info.
>
> Regards
> Ram
>
> On Wed, Nov 4, 2015 at 4:15 AM, Anoop Sharma 
> wrote:
>
>> hi
>>
>> which hbase version did you try this on?
>> We tried on the following 2 hbase versions and see the delete problem.
>>
>>Version 1.0.2, r76745a2cbffe08b812be16e0e19e637a23a923c5, Tue Aug 25
>> 15:59:49 PDT 2015
>>Version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26
>> 20:11:27 PDT 2015
>>
>> Is there a later version that has the fix?
>>
>> thanks
>>
>> -Original Message-
>> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
>> Sent: Sunday, November 1, 2015 11:11 PM
>> To: user@hbase.apache.org
>> Subject: Re: delete of cells with visibility expressions
>>
>> Is it still a bug? I reproduced the above steps in latest trunk and I
>> thought the behaviour was corrected due to a recent bug fix?  Is it not
>> that
>> case ?
>>
>> Regards
>> Ram
>>
>> On Mon, Nov 2, 2015 at 12:20 PM, Anoop John 
>> wrote:
>>
>> > I believe it is a bug.. I think I know the reason also..  Can you file
>> > a jira?  We can discuss under that.  Thanks for the test.
>> >
>> > -Anoop-
>> >
>> > On Sat, Oct 31, 2015 at 12:45 AM, Anoop Sharma
>> > 
>> > wrote:
>> >
>> > > Thanks Ram.
>> > >
>> > > we are using hbase 1.0.2.
>> > >
>> > > anoop
>> > >
>> > > -Original Message-
>> > > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
>> > > Sent: Thursday, October 29, 2015 10:22 PM
>> > > To: user@hbase.apache.org
>> > > Subject: Re: delete of cells with visibility expressions
>> > >
>> > > Hi Anoop
>> > >
>> > > Which version of the HBase are you using?  This got solved in the
>> > > latest version of 0.98 and above. Could you try that?  I just
>> > > reproduced this
>> > and
>> > > this problem no longer occurs.
>> > >
>> > > Regards
>> > > Ram
>> > >
>> > > On Fri, Oct 30, 2015 at 3:26 AM, Anoop Sharma
>> > > 
>> > > wrote:
>> > >
>> > > > hi
>> > > >
>> > > >   running into an issue related to visibility expressions and
>> delete.
>> > > >
>> > > > Example run from hbase shell is listed below.
>> > > >
>> > > > Will appreciate any help on this issue.
>> > > >
>> > > > thanks.
>> > > >
>> > > >
>> > > >
>> > > > In the example below, user running queries has ‘MANAGER’
>> > > > authorization.
>> > > >
>> > > >
>> > > >
>> > > > *First example:*
>> > > >
>> > > >   add a column with visib expr ‘MANAGER’
>> > > >
>> > > >   delete it by passing in visibility of ‘MANAGER’
>> > > >
>> > > >   This works and scan doesn’t return anything.
>> > > >
>> > > >
>> > > >
>> > > > *Second example:*
>> > > >
>> > > >   add a column with visib expr ‘MANAGER’
>> > > >
>> > > >   delete it by not passing in any visibility.
>> > > >
>> > > >   This doesn’t delete the column.
>> > > >
>> > > >   Scan doesn’t return the row but RAW scan shows the column
>> > > >
>> > > >   marked as deleteColumn.
>> > > >
>> > > >
>> > > >
>> > > >   Now if delete is done again with visibility of ‘MANAGER’,
>> > > >
>> > > >   it still doesn’t delete it and scan returns the original column.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>&g

Re: delete of cells with visibility expressions

2015-11-03 Thread ramkrishna vasudevan

Sorry for the confusion. Yes the bug exists. When I tried in the cluster
the Visibilty CP was not on. So it is better we can raise a JIRA and fix
this over there. Thanks Anoop Sharma and sorry for the delay from my side
due to wrong info.

Regards
Ram

On Wed, Nov 4, 2015 at 4:15 AM, Anoop Sharma  wrote:

> hi
>
> which hbase version did you try this on?
> We tried on the following 2 hbase versions and see the delete problem.
>
>Version 1.0.2, r76745a2cbffe08b812be16e0e19e637a23a923c5, Tue Aug 25
> 15:59:49 PDT 2015
>Version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26
> 20:11:27 PDT 2015
>
> Is there a later version that has the fix?
>
> thanks
>
> -Original Message-
> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> Sent: Sunday, November 1, 2015 11:11 PM
> To: user@hbase.apache.org
> Subject: Re: delete of cells with visibility expressions
>
> Is it still a bug? I reproduced the above steps in latest trunk and I
> thought the behaviour was corrected due to a recent bug fix?  Is it not
> that
> case ?
>
> Regards
> Ram
>
> On Mon, Nov 2, 2015 at 12:20 PM, Anoop John  wrote:
>
> > I believe it is a bug.. I think I know the reason also..  Can you file
> > a jira?  We can discuss under that.  Thanks for the test.
> >
> > -Anoop-
> >
> > On Sat, Oct 31, 2015 at 12:45 AM, Anoop Sharma
> > 
> > wrote:
> >
> > > Thanks Ram.
> > >
> > > we are using hbase 1.0.2.
> > >
> > > anoop
> > >
> > > -Original Message-
> > > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> > > Sent: Thursday, October 29, 2015 10:22 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: delete of cells with visibility expressions
> > >
> > > Hi Anoop
> > >
> > > Which version of the HBase are you using?  This got solved in the
> > > latest version of 0.98 and above. Could you try that?  I just
> > > reproduced this
> > and
> > > this problem no longer occurs.
> > >
> > > Regards
> > > Ram
> > >
> > > On Fri, Oct 30, 2015 at 3:26 AM, Anoop Sharma
> > > 
> > > wrote:
> > >
> > > > hi
> > > >
> > > >   running into an issue related to visibility expressions and delete.
> > > >
> > > > Example run from hbase shell is listed below.
> > > >
> > > > Will appreciate any help on this issue.
> > > >
> > > > thanks.
> > > >
> > > >
> > > >
> > > > In the example below, user running queries has ‘MANAGER’
> > > > authorization.
> > > >
> > > >
> > > >
> > > > *First example:*
> > > >
> > > >   add a column with visib expr ‘MANAGER’
> > > >
> > > >   delete it by passing in visibility of ‘MANAGER’
> > > >
> > > >   This works and scan doesn’t return anything.
> > > >
> > > >
> > > >
> > > > *Second example:*
> > > >
> > > >   add a column with visib expr ‘MANAGER’
> > > >
> > > >   delete it by not passing in any visibility.
> > > >
> > > >   This doesn’t delete the column.
> > > >
> > > >   Scan doesn’t return the row but RAW scan shows the column
> > > >
> > > >   marked as deleteColumn.
> > > >
> > > >
> > > >
> > > >   Now if delete is done again with visibility of ‘MANAGER’,
> > > >
> > > >   it still doesn’t delete it and scan returns the original column.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *Example 1:*
> > > >
> > > > hbase(main):096:0> create 'HBT1', 'cf'
> > > >
> > > >
> > > >
> > > > hbase(main):098:0* *put 'HBT1', 'John', 'cf:a', 'CA',
> > > > {VISIBILITY=>'MANAGER'}*
> > > >
> > > >
> > > >
> > > > hbase(main):099:0> *scan 'HBT1'*
> > > >
> > > > ROW
> > > > COLUMN+CELL
> > > >
> > > >  John column=cf:a, timestamp=1446154722055,
> > > > value=CA
> > > >
> > > > 1 row(s) in 0.0030 seconds
> > > >

Re: delete of cells with visibility expressions

2015-11-01 Thread ramkrishna vasudevan

Is it still a bug? I reproduced the above steps in latest trunk and I
thought the behaviour was corrected due to a recent bug fix?  Is it not
that case ?

Regards
Ram

On Mon, Nov 2, 2015 at 12:20 PM, Anoop John  wrote:

> I believe it is a bug.. I think I know the reason also..  Can you file a
> jira?  We can discuss under that.  Thanks for the test.
>
> -Anoop-
>
> On Sat, Oct 31, 2015 at 12:45 AM, Anoop Sharma 
> wrote:
>
> > Thanks Ram.
> >
> > we are using hbase 1.0.2.
> >
> > anoop
> >
> > -Original Message-
> > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> > Sent: Thursday, October 29, 2015 10:22 PM
> > To: user@hbase.apache.org
> > Subject: Re: delete of cells with visibility expressions
> >
> > Hi Anoop
> >
> > Which version of the HBase are you using?  This got solved in the latest
> > version of 0.98 and above. Could you try that?  I just reproduced this
> and
> > this problem no longer occurs.
> >
> > Regards
> > Ram
> >
> > On Fri, Oct 30, 2015 at 3:26 AM, Anoop Sharma 
> > wrote:
> >
> > > hi
> > >
> > >   running into an issue related to visibility expressions and delete.
> > >
> > > Example run from hbase shell is listed below.
> > >
> > > Will appreciate any help on this issue.
> > >
> > > thanks.
> > >
> > >
> > >
> > > In the example below, user running queries has ‘MANAGER’ authorization.
> > >
> > >
> > >
> > > *First example:*
> > >
> > >   add a column with visib expr ‘MANAGER’
> > >
> > >   delete it by passing in visibility of ‘MANAGER’
> > >
> > >   This works and scan doesn’t return anything.
> > >
> > >
> > >
> > > *Second example:*
> > >
> > >   add a column with visib expr ‘MANAGER’
> > >
> > >   delete it by not passing in any visibility.
> > >
> > >   This doesn’t delete the column.
> > >
> > >   Scan doesn’t return the row but RAW scan shows the column
> > >
> > >   marked as deleteColumn.
> > >
> > >
> > >
> > >   Now if delete is done again with visibility of ‘MANAGER’,
> > >
> > >   it still doesn’t delete it and scan returns the original column.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > *Example 1:*
> > >
> > > hbase(main):096:0> create 'HBT1', 'cf'
> > >
> > >
> > >
> > > hbase(main):098:0* *put 'HBT1', 'John', 'cf:a', 'CA',
> > > {VISIBILITY=>'MANAGER'}*
> > >
> > >
> > >
> > > hbase(main):099:0> *scan 'HBT1'*
> > >
> > > ROW
> > > COLUMN+CELL
> > >
> > >  John column=cf:a, timestamp=1446154722055,
> > > value=CA
> > >
> > > 1 row(s) in 0.0030 seconds
> > >
> > >
> > >
> > > hbase(main):100:0> *delete 'HBT1', 'John', 'cf:a',
> > > {VISIBILITY=>'MANAGER'}*
> > >
> > > 0 row(s) in 0.0030 seconds
> > >
> > >
> > >
> > > hbase(main):101:0> *scan 'HBT1'*
> > >
> > > ROW
> > > COLUMN+CELL
> > >
> > > 0 row(s) in 0.0030 seconds
> > >
> > >
> > >
> > >
> > >
> > > *Example 2:*
> > >
> > > hbase(main):010:0* *put 'HBT1', 'John', 'cf:a', 'CA',
> > > {VISIBILITY=>'MANAGER'}*
> > >
> > > 0 row(s) in 0.0040 seconds
> > >
> > >
> > >
> > > hbase(main):011:0> *scan 'HBT1'*
> > >
> > > ROW
> > > COLUMN+CELL
> > >
> > >  John column=cf:a, timestamp=1446155346473,
> > > value=CA
> > >
> > > 1 row(s) in 0.0060 seconds
> > >
> > >
> > >
> > > hbase(main):012:0> *delete 'HBT1', 'John', 'cf:a'*
> > >
> > > 0 row(s) in 0.0090 seconds
> > >
> > >
> > >
> > > hbase(main):013:0> *scan 'HBT1'*
> > >
> > > ROW
> > > COLUMN+CELL
> > >
> > >  John column=cf:a, timestamp=1446155346473,
> > > value=CA
> > >
> > > 1 row(s) in 0.0050 seconds
> > >
> > >
> > >
> > > hbase(main):014:0> *scan 'HBT1', {RAW => true}*
> > >
> > > ROW
> > > COLUMN+CELL
> > >
> > >  John column=cf:a, timestamp=1446155346519,
> > > type=DeleteColumn
> > >
> > > 1 row(s) in 0.0060 seconds
> > >
> > >
> > >
> > > hbase(main):015:0> *delete 'HBT1', 'John', 'cf:a',
> > > {VISIBILITY=>'MANAGER'}*
> > >
> > > 0 row(s) in 0.0030 seconds
> > >
> > >
> > >
> > > hbase(main):016:0> *scan 'HBT1'*
> > >
> > > ROW
> > > COLUMN+CELL
> > >
> > >  John column=cf:a, timestamp=1446155346473,
> > > value=CA
> > >
> > > 1 row(s) in 0.0040 seconds
> > >
> > >
> > >
> > > hbase(main):017:0> *scan 'HBT1', {RAW => true}*
> > >
> > > ROW
> > > COLUMN+CELL
> > >
> > >  John column=cf:a, timestamp=1446155346601,
> > > type=DeleteColumn
> > >
> > > 1 row(s) in 0.0060 seconds
> > >
> >
>

Re: delete of cells with visibility expressions

2015-10-29 Thread ramkrishna vasudevan

Hi Anoop

Which version of the HBase are you using?  This got solved in the latest
version of 0.98 and above. Could you try that?  I just reproduced this and
this problem no longer occurs.

Regards
Ram

On Fri, Oct 30, 2015 at 3:26 AM, Anoop Sharma 
wrote:

> hi
>
>   running into an issue related to visibility expressions and delete.
>
> Example run from hbase shell is listed below.
>
> Will appreciate any help on this issue.
>
> thanks.
>
>
>
> In the example below, user running queries has ‘MANAGER’ authorization.
>
>
>
> *First example:*
>
>   add a column with visib expr ‘MANAGER’
>
>   delete it by passing in visibility of ‘MANAGER’
>
>   This works and scan doesn’t return anything.
>
>
>
> *Second example:*
>
>   add a column with visib expr ‘MANAGER’
>
>   delete it by not passing in any visibility.
>
>   This doesn’t delete the column.
>
>   Scan doesn’t return the row but RAW scan shows the column
>
>   marked as deleteColumn.
>
>
>
>   Now if delete is done again with visibility of ‘MANAGER’,
>
>   it still doesn’t delete it and scan returns the original column.
>
>
>
>
>
>
>
> *Example 1:*
>
> hbase(main):096:0> create 'HBT1', 'cf'
>
>
>
> hbase(main):098:0* *put 'HBT1', 'John', 'cf:a', 'CA',
> {VISIBILITY=>'MANAGER'}*
>
>
>
> hbase(main):099:0> *scan 'HBT1'*
>
> ROW
> COLUMN+CELL
>
>  John column=cf:a, timestamp=1446154722055,
> value=CA
>
> 1 row(s) in 0.0030 seconds
>
>
>
> hbase(main):100:0> *delete 'HBT1', 'John', 'cf:a', {VISIBILITY=>'MANAGER'}*
>
> 0 row(s) in 0.0030 seconds
>
>
>
> hbase(main):101:0> *scan 'HBT1'*
>
> ROW
> COLUMN+CELL
>
> 0 row(s) in 0.0030 seconds
>
>
>
>
>
> *Example 2:*
>
> hbase(main):010:0* *put 'HBT1', 'John', 'cf:a', 'CA',
> {VISIBILITY=>'MANAGER'}*
>
> 0 row(s) in 0.0040 seconds
>
>
>
> hbase(main):011:0> *scan 'HBT1'*
>
> ROW
> COLUMN+CELL
>
>  John column=cf:a, timestamp=1446155346473,
> value=CA
>
> 1 row(s) in 0.0060 seconds
>
>
>
> hbase(main):012:0> *delete 'HBT1', 'John', 'cf:a'*
>
> 0 row(s) in 0.0090 seconds
>
>
>
> hbase(main):013:0> *scan 'HBT1'*
>
> ROW
> COLUMN+CELL
>
>  John column=cf:a, timestamp=1446155346473,
> value=CA
>
> 1 row(s) in 0.0050 seconds
>
>
>
> hbase(main):014:0> *scan 'HBT1', {RAW => true}*
>
> ROW
> COLUMN+CELL
>
>  John column=cf:a, timestamp=1446155346519,
> type=DeleteColumn
>
> 1 row(s) in 0.0060 seconds
>
>
>
> hbase(main):015:0> *delete 'HBT1', 'John', 'cf:a', {VISIBILITY=>'MANAGER'}*
>
> 0 row(s) in 0.0030 seconds
>
>
>
> hbase(main):016:0> *scan 'HBT1'*
>
> ROW
> COLUMN+CELL
>
>  John column=cf:a, timestamp=1446155346473,
> value=CA
>
> 1 row(s) in 0.0040 seconds
>
>
>
> hbase(main):017:0> *scan 'HBT1', {RAW => true}*
>
> ROW
> COLUMN+CELL
>
>  John column=cf:a, timestamp=1446155346601,
> type=DeleteColumn
>
> 1 row(s) in 0.0060 seconds
>

Re: Visibility Expressions and Deletes

2015-10-26 Thread ramkrishna vasudevan

Hi

>>  Is there a way to delete all cells of a 'row' irrespective of what
visibility expression is associated with a particular cell?

I do think that this is an expected behaviour.  A user cannot be allowed to
delete all the rows that does not match with the visibilty that he is
associated with right?
May be one thing - do you expect the 'super user' to issue a delete and
want that to delete all the rows with visibility expressions also.

Regarding your second question,  for the delete to work we need to have the
exact match because a row with DEVELOPER | MANAGER and a row with DEVELOPER
are considered.  But for the scan case if the scan specifies DEVELOPER he
should be able to see both the entries.

Let me check the code to be more specific as am away from the system now.

REgards
Ram

On Mon, Oct 26, 2015 at 9:02 PM, Anoop Sharma 
wrote:

>
>
> hi, Couple questions regarding Visibility Expressions and Delete.
>
>
>
> 1)  If delete of a row is done without passing any visib expression,
> then it deletes cells that contain no visib expressions.
>
>If a delete is done after specifying a visibility label, then it
> deletes cells which match that expression.
>
>Is there a way to delete all cells of a 'row' irrespective of what
> visibility expression is associated with a particular cell?
>
>Otherwise one need to know what visibility expression is  stored in
> a
> cell and keeping track of that is not trivial.
>
>
>
> 2)  What is the expected behavior in the following example:
>
>  put 'HBT1', 'John', 'cf:address2', 'CA',
> {VISIBILITY=>'DEVELOPER | MANAGER'}
>
>  delete 'HBT1', 'John', 'cf:address2',
> {VISIBILITY=>'DEVELOPER'}
>
>Should the delete remove that cell since the visib expr of that cell
> is an 'OR'?
>
>   Right now, it seems like the original expr (DEVELOPER | MANAGER)  is
> needed to delete it?
>
>
>
>
>
> thanks for any help you can provide.
>
>
>
> anoop
>
>

Re: Slow region moves

2015-10-21 Thread ramkrishna vasudevan

Seems that the BucketAllocator#freeBlock() is synchronized and hence all
the bulk close that it tries to do will be blocked in the synchronized
block.  May be something like the IdLock has to be tried here?

Regards
Ram

On Wed, Oct 21, 2015 at 4:20 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> I think the forceful clearing of the blocks from the bucket cache is
> hurting in this case.  I think it is worth opening a JIRA for this and work
> on a fix.
>
> Regards
> Ram
>
> On Wed, Oct 21, 2015 at 12:00 AM, Randy Fox  wrote:
>
>> Hi Vlad,
>>
>> I tried it on a table and on a RegionServer basis and it appears to have
>> no affect.
>> Are we sure it is supported for bucket cache?  From my charts the bucket
>> cache is getting cleared at the same time as the region moves occurred.
>> The regions slow to move are the ones with bucket cache.
>>
>> I took a table with 102 regions and blockcache true and turned off block
>> cache via alter while the table is enabled - it took 19 minutes.  To turn
>> block cache back on took 4.3 seconds.
>>
>> Let me know if there is anything else to try.  This issue is really
>> hurting our day to day ops.
>>
>> Thanks,
>>
>> Randy
>>
>>
>>
>> On 10/15/15, 3:55 PM, "Vladimir Rodionov"  wrote:
>>
>> >Hey, Randy
>> >
>> >You can verify your hypothesis by setting hbase.rs.evictblocksonclose to
>> >false for your tables.
>> >
>> >-Vlad
>> >
>> >On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox  wrote:
>> >
>> >> Caveat - we are trying to tune the BucketCache (probably a new thread
>> - as
>> >> we are not sure we are getting the most out of it)
>> >> 72G off heap
>> >>
>> >> 
>> >>hfile.block.cache.size
>> >>0.58
>> >> 
>> >>
>> >> 
>> >>hbase.bucketcache.ioengine
>> >>offheap
>> >> 
>> >>
>> >> 
>> >>hbase.bucketcache.size
>> >>72800
>> >> 
>> >>
>> >> 
>> >>hbase.bucketcache.bucket.sizes
>> >>9216,17408,33792,66560
>> >> 
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 10/15/15, 12:00 PM, "Ted Yu"  wrote:
>> >>
>> >> >I am a bit curious.
>> >> >0.94 doesn't have BucketCache.
>> >> >
>> >> >Can you share BucketCache related config parameters in your cluster ?
>> >> >
>> >> >Cheers
>> >> >
>> >> >On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox 
>> wrote:
>> >> >
>> >> >>
>> >> >> "StoreFileCloserThread-L-1" prio=10 tid=0x027ec800
>> nid=0xad84
>> >> >> runnable [0x7fbcc0c65000]
>> >> >>java.lang.Thread.State: RUNNABLE
>> >> >> at java.util.LinkedList.indexOf(LinkedList.java:602)
>> >> >> at java.util.LinkedList.contains(LinkedList.java:315)
>> >> >> at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247)
>> >> >> at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449)
>> >> >> - locked <0x00041b0887a8> (a
>> >> >> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator)
>> >> >> at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459)
>> >> >> at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036)
>> >> >> at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90)
>> >> >> at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516)
>> >> >> at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143)
>> >> >> at
>>

Re: Slow region moves

2015-10-21 Thread ramkrishna vasudevan

I think the forceful clearing of the blocks from the bucket cache is
hurting in this case.  I think it is worth opening a JIRA for this and work
on a fix.

Regards
Ram

On Wed, Oct 21, 2015 at 12:00 AM, Randy Fox  wrote:

> Hi Vlad,
>
> I tried it on a table and on a RegionServer basis and it appears to have
> no affect.
> Are we sure it is supported for bucket cache?  From my charts the bucket
> cache is getting cleared at the same time as the region moves occurred.
> The regions slow to move are the ones with bucket cache.
>
> I took a table with 102 regions and blockcache true and turned off block
> cache via alter while the table is enabled - it took 19 minutes.  To turn
> block cache back on took 4.3 seconds.
>
> Let me know if there is anything else to try.  This issue is really
> hurting our day to day ops.
>
> Thanks,
>
> Randy
>
>
>
> On 10/15/15, 3:55 PM, "Vladimir Rodionov"  wrote:
>
> >Hey, Randy
> >
> >You can verify your hypothesis by setting hbase.rs.evictblocksonclose to
> >false for your tables.
> >
> >-Vlad
> >
> >On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox  wrote:
> >
> >> Caveat - we are trying to tune the BucketCache (probably a new thread -
> as
> >> we are not sure we are getting the most out of it)
> >> 72G off heap
> >>
> >> 
> >>hfile.block.cache.size
> >>0.58
> >> 
> >>
> >> 
> >>hbase.bucketcache.ioengine
> >>offheap
> >> 
> >>
> >> 
> >>hbase.bucketcache.size
> >>72800
> >> 
> >>
> >> 
> >>hbase.bucketcache.bucket.sizes
> >>9216,17408,33792,66560
> >> 
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 10/15/15, 12:00 PM, "Ted Yu"  wrote:
> >>
> >> >I am a bit curious.
> >> >0.94 doesn't have BucketCache.
> >> >
> >> >Can you share BucketCache related config parameters in your cluster ?
> >> >
> >> >Cheers
> >> >
> >> >On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox 
> wrote:
> >> >
> >> >>
> >> >> "StoreFileCloserThread-L-1" prio=10 tid=0x027ec800 nid=0xad84
> >> >> runnable [0x7fbcc0c65000]
> >> >>java.lang.Thread.State: RUNNABLE
> >> >> at java.util.LinkedList.indexOf(LinkedList.java:602)
> >> >> at java.util.LinkedList.contains(LinkedList.java:315)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449)
> >> >> - locked <0x00041b0887a8> (a
> >> >> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503)
> >> >> - locked <0x0004944ff2d8> (a
> >> >> org.apache.hadoop.hbase.regionserver.StoreFile)
> >> >> at
> >> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873)
> >> >> at
> >> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870)
> >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >> >> at
> >> >>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >> >> at
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> >> at
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> >> at java.lang.Thread.run(Thread.java:745)
> >> >>
> >> >>
> >>
> "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1"
> >> >> prio=10 tid=0x03508800 nid=0xad83 waiting on condition
> >> >> [0x7fbcc5dcc000]
> >> >>java.lang.Thread.State: WAITING (parking)
> >> >> at sun.misc.Unsafe.park(Native Method)
> >> >> - parking to wait for  <0x000534e90a80> (a
> >> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >> >> at
> >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >> >> at
> >> >>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> >> >> at
> >> >>
> >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> >> >> at
> >> >>
> >>
> java.util.concurrent.

Re: HBase versions problem

2015-10-16 Thread ramkrishna vasudevan

With VERSIONS=>1 and MIN_VERSIONS=0, I think the intended behaviour is to
always have the current version as the one to be returned. So in your cases
you inserted two cells at different timestamp. But the Delete with
addColumn will always try to delete the latest version.  So in this case
the latest version happens to be the value 2 version.  Hence for C1 that
value got masked and whether u do the flush or not you always end up in the
value 1 cell. (ie the first one).

Coming to the family 2 where you want atleast one version to be maintained
. I need to check the code more thoroughly to determine the behaviour
difference between flush and no flush.

REgards
Ram

On Fri, Oct 16, 2015 at 1:05 PM, mukund murrali 
wrote:

> Can anyone enlighten why this is happening. This is causing problems in our
> production.
>
> On Thu, Oct 15, 2015 at 4:15 PM, mukund murrali 
> wrote:
>
> > Hi
> >
> > I am using hbase-1.0. I had two column families C1 and C2.
> >
> > C1 => 'VERSIONS => 1, MIN_VERSIONS => 0 (default)
> > C2 => 'VERSIONS' => 1, MIN_VERSIONS => 1
> >
> > I inserted two versions as follows
> > put 'test','ro1','C1:col1,'value1'
> > put 'test','ro1','C2:col1,'value1'
> >
> >
> > put 'test','ro1','C1:col1,'value2'
> > put 'test','ro1','C2:col1,'value2'
> >
> > I did a delete using java API with addColumn ( not Columns) for both C1
> > and C2 column families. On a get call I got the result as
> >
> > C1:col1   timestamp=1444904709797,value=value1
> >
> > C2:col1timestamp=1444904695656, value=value1
> >
> > on doing a *flush* on the table, the C2 data got vanished and subsequent
> > get call returned with first version of C1.
> >
> > C1:col1   timestamp=1444904709797,value=value1
> >
> > On deeper analysis, we found that a flush before delete will purge both
> of
> > those data.
> >
> > My question is with MIN_VERSIONS => 0 and VERSIONS => 1, why does the
> > second version gets promoted during deletion and does not get removed
> even
> > after flush?
> >
> > In other words with VERSIONS =>1 , why should the earlier versions be
> > stored?
> >
> > Also with MIN_VERSIONS => 1 and VERSIONS => 1, though the second version
> > promoted but subsequent flush purged it.
> >
> > Is this an inconsistency or my understanding is wrong?
> >
> > Thanks
> >
> > Regards
> > Mukund Murrali
> >
>

Re: Unexpected behaviour when VisibilityController coprocessor is used

2015-10-12 Thread ramkrishna vasudevan

I tried it on the latest trunk and this issue is not there. So as Anoop
said the latest version of 0.98 should be solving this problem.
@Suresh
Let us know if you still find the issue in later versions of 0.98 and we
can work on it to solve the problem.

Regards
Ram

On Tue, Oct 13, 2015 at 9:09 AM, Anoop John  wrote:

> Yes as such there is not mandatory to use AC along with VC.  It can be used
> alone..
> I believe u r getting the bug HBASE-13734.  This is fixed in 98.13 only.
> Just change ur version from 98.6 to 98.13 and test once.   Let us know how
> is it then.
>
> -Anoop-
>
> On Tue, Oct 13, 2015 at 9:01 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > I think, even with only configuring VisibilityController there should not
> > be a different behaviour, considering the fact that there are no
> visibility
> > labels.  With just VisibilityController configured and doing puts and
> scans
> > using super user let me check what is happening.
> >
> > Regards
> > Ram
> >
> > On Tue, Oct 13, 2015 at 8:47 AM, Anoop John 
> wrote:
> >
> > > Hi Suresh
> > >You said abt doing test as an HBase super user.  You mean even when
> > scan
> > > is issues as a super user, u are not getting the rows back?
> > >
> > > -Anoop-
> > >
> > > On Tue, Oct 13, 2015 at 4:06 AM, Ted Yu  wrote:
> > >
> > > > Convention is to put AccessController ahead of VisibilityController
> in
> > > > hbase-site.xml
> > > >
> > > > Took a quick pass over region server log but haven't found much yet.
> > > >
> > > > FYI
> > > >
> > > > On Mon, Oct 12, 2015 at 3:28 PM, Suresh Subbiah <
> > > > suresh.subbia...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > Thank you. Yes HDFS cluster has also been kerberized. BTW, this is
> a
> > > > > "cluster" with only one node.
> > > > >
> > > > > Master hbase-site.xml, RS hbase-site.ml and RS log for the time
> > > interval
> > > > > test was run is attached
> > > > >
> > > > > http://pastebin.com/zuqCC4xG
> > > > > http://pastebin.com/88Wx0KDf
> > > > > http://pastebin.com/QZqihN1W
> > > > >
> > > > > Will try deploying 1.1.2 next.
> > > > >
> > > > > Thanks
> > > > > Suresh
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Oct 12, 2015 at 3:46 PM, Ted Yu 
> wrote:
> > > > >
> > > > > > bq. cluster enabled for secure HBase with kerberos
> > > > > >
> > > > > > I assume your hdfs cluster has also been kerberized.
> > > > > >
> > > > > > Please pastebin the complete hbase-site.xml
> > > > > >
> > > > > > Please turn on DEBUG logging and pastebin the region server log
> > which
> > > > > hosts
> > > > > > visibilityTest
> > > > > >
> > > > > > BTW if possible, can you deploy 1.1.2 ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Mon, Oct 12, 2015 at 1:14 PM, Suresh Subbiah <
> > > > > > suresh.subbia...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ted,
> > > > > > >
> > > > > > > I understand that using VisibilityController on an unsercure
> > > cluster
> > > > is
> > > > > > of
> > > > > > > limited value. I am still in the early stages of my task. I am
> > > logged
> > > > > in
> > > > > > as
> > > > > > > HBase super user and was simply checking if rows could be
> > accessed.
> > > > > > >
> > > > > > > With my colleague's help we did get the cluster enabled for
> > secure
> > > > > HBase
> > > > > > > with kerberos. I repeated the test to get the same result. Our
> > > > cluster
> > > > > is
> > > > > > > on 1.0. Do you think I may be doing something incorrectly? What
> > > > > > information
> > > > > > > can I send to help ensure that I have not made a mistake.
> > > &

Re: Unexpected behaviour when VisibilityController coprocessor is used

2015-10-12 Thread ramkrishna vasudevan

I think, even with only configuring VisibilityController there should not
be a different behaviour, considering the fact that there are no visibility
labels.  With just VisibilityController configured and doing puts and scans
using super user let me check what is happening.

Regards
Ram

On Tue, Oct 13, 2015 at 8:47 AM, Anoop John  wrote:

> Hi Suresh
>You said abt doing test as an HBase super user.  You mean even when scan
> is issues as a super user, u are not getting the rows back?
>
> -Anoop-
>
> On Tue, Oct 13, 2015 at 4:06 AM, Ted Yu  wrote:
>
> > Convention is to put AccessController ahead of VisibilityController in
> > hbase-site.xml
> >
> > Took a quick pass over region server log but haven't found much yet.
> >
> > FYI
> >
> > On Mon, Oct 12, 2015 at 3:28 PM, Suresh Subbiah <
> > suresh.subbia...@gmail.com>
> > wrote:
> >
> > > Hi Ted,
> > >
> > > Thank you. Yes HDFS cluster has also been kerberized. BTW, this is a
> > > "cluster" with only one node.
> > >
> > > Master hbase-site.xml, RS hbase-site.ml and RS log for the time
> interval
> > > test was run is attached
> > >
> > > http://pastebin.com/zuqCC4xG
> > > http://pastebin.com/88Wx0KDf
> > > http://pastebin.com/QZqihN1W
> > >
> > > Will try deploying 1.1.2 next.
> > >
> > > Thanks
> > > Suresh
> > >
> > >
> > >
> > > On Mon, Oct 12, 2015 at 3:46 PM, Ted Yu  wrote:
> > >
> > > > bq. cluster enabled for secure HBase with kerberos
> > > >
> > > > I assume your hdfs cluster has also been kerberized.
> > > >
> > > > Please pastebin the complete hbase-site.xml
> > > >
> > > > Please turn on DEBUG logging and pastebin the region server log which
> > > hosts
> > > > visibilityTest
> > > >
> > > > BTW if possible, can you deploy 1.1.2 ?
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Oct 12, 2015 at 1:14 PM, Suresh Subbiah <
> > > > suresh.subbia...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > I understand that using VisibilityController on an unsercure
> cluster
> > is
> > > > of
> > > > > limited value. I am still in the early stages of my task. I am
> logged
> > > in
> > > > as
> > > > > HBase super user and was simply checking if rows could be accessed.
> > > > >
> > > > > With my colleague's help we did get the cluster enabled for secure
> > > HBase
> > > > > with kerberos. I repeated the test to get the same result. Our
> > cluster
> > > is
> > > > > on 1.0. Do you think I may be doing something incorrectly? What
> > > > information
> > > > > can I send to help ensure that I have not made a mistake.
> > > > >
> > > > > Thanks
> > > > > Suresh
> > > > >
> > > > > hbase shell
> > > > > 15/10/12 14:35:09 INFO Configuration.deprecation: hadoop.native.lib
> > is
> > > > > deprecated. Instead, use io.native.lib.available
> > > > > HBase Shell; enter 'help' for list of supported commands.
> > > > > Type "exit" to leave the HBase Shell
> > > > > Version 1.0.0-cdh5.4.4, rUnknown, Mon Jul  6 16:59:55 PDT 2015
> > > > >
> > > > > hbase(main):001:0> create 'visibilityTest', 'f1'
> > > > > 0 row(s) in 0.7780 seconds
> > > > >
> > > > > => Hbase::Table - visibilityTest
> > > > > hbase(main):002:0> put 'visibilityTest', 'r1', 'f1:c1', 'value1'
> > > > > 0 row(s) in 0.1300 seconds
> > > > >
> > > > > hbase(main):003:0> deleteall 'visibilityTest', 'r1'
> > > > > 0 row(s) in 0.0330 seconds
> > > > >
> > > > > hbase(main):004:0> put 'visibilityTest', 'r1', 'f1:c1', 'value2'
> > > > > 0 row(s) in 0.0150 seconds
> > > > >
> > > > > hbase(main):005:0> scan 'visibilityTest'
> > > > > ROW   COLUMN+CELL
> > > > >
> > > > > 0 row(s) in 0.0550 seconds
> > > > >
> > > > > hbase(main):006:0> scan 'visibilityTest', {RAW=>TRUE}
> > > > > ROW   COLUMN+CELL
> > > > >
> > > > >  r1   column=f1:, timestamp=1444660561138,
> > > > > type=DeleteFamily
> > > > >  r1   column=f1:c1, timestamp=1444660576868,
> > > value=value2
> > > > >
> > > > > 1 row(s) in 0.0370 seconds
> > > > >
> > > > > -
> > > > > 
> > > > > hbase.coprocessor.master.classes
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.security.visibility.VisibilityController,org.apache.hadoop.hbase.security.access.AccessController
> > > > >   
> > > > >
> > > > > 
> > > > > hbase.coprocessor.region.classes
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.security.visibility.VisibilityController,org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint,org.apache.hadoop.hbase.security.access.AccessController
> > > > >   
> > > > >
> > > > > 
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Oct 10, 2015 at 9:51 PM, Ted Yu 
> wrote:
> > > > >
> > > > > > To my understanding, VisibilityController is used in a secure
> > > cluster.
> > > > > > Without security, how do you enforce that only select user(s) can
> > > > a

Re: Large number of column qualifiers

2015-09-24 Thread ramkrishna vasudevan

Hi

In the version that you were using by default the caching was 1000 ( I
believe) need to see the old code.  So in that case it was trying to fetch
1000 rows and each row with 20k cols.  Now when you are saying that the
client was missing rows, did you check the server logs?

Did you get any OutOfOrderScannerException?  There is something called
'client.rpc.timeout' which can be increased in your case - but provided
your caching and batching is adjusted.

In the current trunk code - there is no default caching value (unless
specified), the server tries to fetch 2MB of data and that is sent back to
the client.
In any case I would suggest to check your server logs for any Exceptions.
Increase the timeout property and adjust your caching and batching to fetch
the data.  If still the client is missing out on rows then we need the logs
and analyse things.  Ted's mail referring to
https://issues.apache.org/jira/browse/HBASE-11544 will give an idea of the
general behaviour with scans and how it affects scanning bigger and wider
rows.

Regards
Ram


On Thu, Sep 24, 2015 at 2:32 PM, Gaurav Agarwal  wrote:

> Hi,
>
> The problem that I am actually facing is that when doing a scan over rows
> where each row has very large number of cells (large number of columns),
> the scan API seems to be transparently dropping data - in my case I noticed
> that entire row of data was missing in few cases.
>
> On suggestions from Ram(above), I tried doing *scan.setCaching(1)* and
> optionally,* scan.setBatch(5000)* and the problem got resolved (at least
> for now).  So this indicates that the client (cannot be server I hope) was
> dropping the cells if the number (or maybe bytes) of cells became quite
> large across number of rows cached. Note that in my case, the number of
> bytes per cell is close to 30B (including qualifier,value and timestamp)
> and each row key is close to 20B.
>
> I am not clear what setting controls the maximum number/bytes of cells that
> can be received by the client before this problem surfaces. Can someone
> please point me these settings/code?
>
> On Thu, Sep 24, 2015 at 12:05 PM, Gaurav Agarwal  wrote:
>
> > After spending more time I realised that my understanding and my question
> > (was invalid).
> > I am still trying to get more information regarding the problem and will
> > update the thread once I have a better handle on the problem.
> >
> > Apologies for the confusion..
> >
> > On Thu, Sep 24, 2015 at 10:32 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> >> Am not sure whether you have tried it. the scan API has got an API
> called
> >> 'batching'. Did you try it?  So per row if there are more columns you
> can
> >> still limit the amount of data being sent to the client. I think the
> main
> >> issue you are facing is that the qualifiers getting returned are more in
> >> number and so the client is not able to accept them?
> >>
> >> 'Short.MAX_VALUE which is 32,767 bytes.'
> >> This comment applies for the qualifier length ie. the name that you
> >> specify
> >> for the qualifier not on the number of qualifiers.
> >>
> >> Regards
> >> Ram
> >>
> >> On Thu, Sep 24, 2015 at 8:52 AM, Anoop John 
> >> wrote:
> >>
> >> > >>I have Column Family with very large number of column qualifiers (>
> >> > 50,000). Each column qualifier is 8 bytes long.
> >> >
> >> > When u say u have 5 qualifiers in a CF, means u will have those
> many
> >> > cells coming under that CF per row.  So am not getting what is the
> >> > qualifier length limit as such coming. Per qualifier, you will have a
> >> diff
> >> > cell and its qualifier.
> >> >
> >> > -Anoop-
> >> >
> >> >
> >> > On Thu, Sep 24, 2015 at 1:13 AM, Vladimir Rodionov <
> >> vladrodio...@gmail.com
> >> > >
> >> > wrote:
> >> >
> >> > > Yes, the comment is incorrect.
> >> > >
> >> > > hbase.client.keyvalue.maxsize controls max key-value size, but its
> >> > > unlimited in a master (I was wrong about 1MB, this is probably for
> >> older
> >> > > versions of HBase)
> >> > >
> >> > >
> >> > > -Vlad
> >> > >
> >> > > On Wed, Sep 23, 2015 at 11:45 AM, Gaurav Agarwal 
> >> > wrote:
> >> > >
> >> > > > Thanks Vlad. Could you please point me the KV size setting
> (default
> >> > 1MB

Re: HBase Filter Problem

2015-09-23 Thread ramkrishna vasudevan

Just trying to understand  more,
you are having a combination of PRefixFilter and SingleColumnValueFilter -
now the column you have specified in the SingleColumnValueFilter -  is it
the only column that you have in your table?  Or is there many other
columns and one such column was used in the SingleColumnValueFilter?

The idea of FirstKeyOnlyFilter is just to skip to the next row on getting
the first ever column in that row.  May be the combination of these two is
causing some issues.

Regards
Ram

On Wed, Sep 23, 2015 at 2:31 PM, donhoff_h <165612...@qq.com> wrote:

> Hi,
>
> There are 90 Million records in the table. And I use the the MUST_PASS_ALL
> for all my filters.  When I use PrefixFilter + SingleColumnValueFilter, it
> returned fast. So I supposed that the combination of PrefixFilter +
> SingleColumnValueFilter + FirstKeyOnlyFilter should be fast. But the fact
> is just in contrast. Do you know the reason that cause it?
>
> Thanks!
>
>
>
> -- 原始邮件 --
> 发件人: "Fulin Sun";;
> 发送时间: 2015年9月23日(星期三) 下午4:53
> 收件人: "HBase User";
>
> 主题: 回复: HBase Filter Problem
>
>
>
> Hi , there
>
> How many rows are there in the hbase table ? You want to achive the
> default FilterList.Operator.MUST_PASS_ALL or
> you just want to use or conditions for these filters ?
>
> I think the reason is that this kind of filter list just go more scan work
> and lower performance.
>
> Best,
> Sun.
>
>
>
>
> CertusNet
>
> 发件人： donhoff_h
> 发送时间： 2015-09-23 16:33
> 收件人： user
> 主题： HBase Filter Problem
> Hi，
>
> I wrote a program which function is to extract some data from a HBase
> table. According to business requirements I had to use the PrefixFilter and
> the SingleColumnValueFilter to filter the data.  The program ran very fast
> and returned in 1 sec.
>
> Considering I just need the rowkey of each record in my final result, I
> tried to improve my program by using the PrefixFilter +
> SingleColumnValueFilter + FirstKeyOnlyFitler. To my surprise the program
> ran very slow this time. It run about 20min and still not finished. So I
> had to kill it.
>
> Does anybody know the reason that cause my program run such slow?  Since I
> set the PrefixFilter as the first filter in the FilterList object, I think
> the program should ran fast.
>
> Many Thanks!
>

Re: Large number of column qualifiers

2015-09-23 Thread ramkrishna vasudevan

Am not sure whether you have tried it. the scan API has got an API called
'batching'. Did you try it?  So per row if there are more columns you can
still limit the amount of data being sent to the client. I think the main
issue you are facing is that the qualifiers getting returned are more in
number and so the client is not able to accept them?

'Short.MAX_VALUE which is 32,767 bytes.'
This comment applies for the qualifier length ie. the name that you specify
for the qualifier not on the number of qualifiers.

Regards
Ram

On Thu, Sep 24, 2015 at 8:52 AM, Anoop John  wrote:

> >>I have Column Family with very large number of column qualifiers (>
> 50,000). Each column qualifier is 8 bytes long.
>
> When u say u have 5 qualifiers in a CF, means u will have those many
> cells coming under that CF per row.  So am not getting what is the
> qualifier length limit as such coming. Per qualifier, you will have a diff
> cell and its qualifier.
>
> -Anoop-
>
>
> On Thu, Sep 24, 2015 at 1:13 AM, Vladimir Rodionov  >
> wrote:
>
> > Yes, the comment is incorrect.
> >
> > hbase.client.keyvalue.maxsize controls max key-value size, but its
> > unlimited in a master (I was wrong about 1MB, this is probably for older
> > versions of HBase)
> >
> >
> > -Vlad
> >
> > On Wed, Sep 23, 2015 at 11:45 AM, Gaurav Agarwal 
> wrote:
> >
> > > Thanks Vlad. Could you please point me the KV size setting (default
> 1MB)?
> > > Just to make sure that I understand correct, are you suggesting that
> the
> > > following comment is incorrect in Cell.java?
> > >
> > >  /**
> > >* Contiguous raw bytes that may start at any index in the containing
> > > array. Max length is
> > >* Short.MAX_VALUE which is 32,767 bytes.
> > >* @return The array containing the qualifier bytes.
> > >*/
> > >   byte[] getQualifierArray();
> > >
> > > On Thu, Sep 24, 2015 at 12:10 AM, Gaurav Agarwal 
> > wrote:
> > >
> > > > Thanks Vlad. Could you please point me the KV size setting (default
> > 1MB)?
> > > > Just to make sure that I understand correct - the following comment
> is
> > > > incorrect in Cell.java:
> > > >
> > > >  /**
> > > >* Contiguous raw bytes that may start at any index in the
> containing
> > > > array. Max length is
> > > >* Short.MAX_VALUE which is 32,767 bytes.
> > > >* @return The array containing the qualifier bytes.
> > > >*/
> > > >   byte[] getQualifierArray();
> > > >
> > > > On Wed, Sep 23, 2015 at 11:43 PM, Vladimir Rodionov <
> > > > vladrodio...@gmail.com> wrote:
> > > >
> > > >> Check KeyValue class (Cell's implementation). getQualifierArray()
> > > returns
> > > >> kv's backing array. There is no SHORT limit on a size of this array,
> > but
> > > >> there are other limits in  HBase - maximum KV size, for example,
> which
> > > is
> > > >> configurable, but, by default, is 1MB. Having 50K qualifiers is a
> bad
> > > >> idea.
> > > >> Consider redesigning your data model and use rowkey instead.
> > > >>
> > > >> -Vlad
> > > >>
> > > >> On Wed, Sep 23, 2015 at 10:24 AM, Ted Yu 
> wrote:
> > > >>
> > > >> > Please take a look at HBASE-11544 which is in hbase 1.1
> > > >> >
> > > >> > Cheers
> > > >> >
> > > >> > On Wed, Sep 23, 2015 at 10:18 AM, Gaurav Agarwal <
> gau...@arkin.net>
> > > >> wrote:
> > > >> >
> > > >> > > Hi All,
> > > >> > >
> > > >> > > I have Column Family with very large number of column qualifiers
> > (>
> > > >> > > 50,000). Each column qualifier is 8 bytes long. The problem is
> the
> > > >> when I
> > > >> > > do a scan operation to fetch some rows, the client side Cell
> > object
> > > >> does
> > > >> > > not have enough space allocated in it to hold all the
> > > columnQaulifiers
> > > >> > for
> > > >> > > a given row and hence I cannot read all the columns back for a
> > given
> > > >> row.
> > > >> > >
> > > >> > > Please see the code snippet that I am using:
> > > >> > >
> > > >> > >  final ResultScanner rs = htable.getScanner(scan);
> > > >> > >  for (Result row = rs.next(); row != null; row = rs.next()) {
> > > >> > > final Cell[] cells = row.rawCells();
> > > >> > > if (cells != null) {
> > > >> > > for (final Cell cell : cells) {
> > > >> > > final long c = Bytes.toLong(
> > > >> > > *cell.getQualifierArray()*,
> > > >> > cell.getQualifierOffset(),
> > > >> > > cell.getQualifierLength());
> > > >> > > final long v = Bytes.toLong(cell.getValueArray(),
> > > >> > > cell.getValueOffset());
> > > >> > > points.put(c, v);
> > > >> > > }
> > > >> > > }
> > > >> > > }
> > > >> > >
> > > >> > > The cell.getQualifierArray() method says that it's 'Max length
> is
> > > >> > > Short.MAX_VALUE which is 32,767 bytes.'. Hence it can only hold
> > > around
> > > >> > > 4,000 columnQualfiers.
> > > >> > >
> > > >> > > Is there an alternate API that I should be using or am I missing
> > > some
> > > >> > > setting here? Note that in worst case I need to read all the
> > > >> > > columnQualifiers in

1 2 3 4 >

1 - 100 of 343 matches

Mail list logo