Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Anoop John
Because of some compatibility issues, we decide that this will be done
in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
backported patches.  Is it a mega patch at ur end?  Or issue by issue
patches?  Latter would be best.  Pls share patches in some place and a
list of issues backported. I can help with verifying the issues once
so as to make sure we dont miss any...

-Anoop-

On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar  wrote:
> Thanks for sharing this. Great work.
>
> I don't see any reason why we cannot backport to branch-1.
>
> Enis
>
> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell 
> wrote:
>
>> Yes, please, the patches will be useful to the community even if we decide
>> not to backport into an official 1.x release.
>>
>>
>> > On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
>> bbeaudrea...@hubspot.com> wrote:
>> >
>> > Is the backported patch available anywhere? Not seeing it on the
>> referenced
>> > JIRA. If it ends up not getting officially backported to branch-1 due to
>> > 2.0 around the corner, some of us who build our own deploy may want to
>> > integrate into our builds. Thanks! These numbers look great
>> >
>> >> On Fri, Nov 18, 2016 at 12:20 PM Anoop John 
>> wrote:
>> >>
>> >> Hi Yu Li
>> >>   Good to see that the off heap work help you..  The perf
>> >> numbers looks great.  So this is a compare of on heap L1 cache vs off
>> heap
>> >> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>> >> cache ON by default I believe.  Will raise a jira for that we can
>> discuss
>> >> under that.   Seems like L2 off heap cache for data blocks and L1 cache
>> for
>> >> index blocks seems a right choice.
>> >>
>> >> Thanks for the backport and the help in testing the feature..  You were
>> >> able to find some corner case bugs and helped community to fix them..
>> >> Thanks goes to ur whole team.
>> >>
>> >> -Anoop-
>> >>
>> >>
>> >>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>> >>>
>> >>> Sorry guys, let me retry the inline images:
>> >>>
>> >>> Performance w/o offheap:
>> >>>
>> >>>
>> >>> Performance w/ offheap:
>> >>>
>> >>>
>> >>> Peak Get QPS of one single RS during Singles' Day (11/11):
>> >>>
>> >>>
>> >>>
>> >>> And attach the files in case inline still not working:
>> >>>
>> >>> Performance_without_offheap.png
>> >>> <
>> >> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVG
>> NtM0VxWC1n/view?usp=drive_web
>> >>>
>> >>>
>> >>> Performance_with_offheap.png
>> >>> <
>> >> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
>> VrcUdPc2ww/view?usp=drive_web
>> >>>
>> >>>
>> >>> Peak_Get_QPS_of_Single_RS.png
>> >>> <
>> >> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
>> F6bHpNYnJz/view?usp=drive_web
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Best Regards,
>> >>> Yu
>> >>>
>>  On 18 November 2016 at 19:29, Ted Yu  wrote:
>> 
>>  Yu:
>>  With positive results, more hbase users would be asking for the
>> backport
>>  of offheap read path patches.
>> 
>>  Do you think you or your coworker has the bandwidth to publish
>> backport
>>  for branch-1 ?
>> 
>>  Thanks
>> 
>> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
>> >
>> > Dear all,
>> >
>> > We have backported read path offheap (HBASE-11425) to our customized
>>  hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
>> >> more
>>  than a month, and would like to share our experience, for what it's
>> >> worth
>>  (smile).
>> >
>> > Generally speaking, we gained a better and more stable
>>  throughput/performance with offheap, and below are some details:
>> > 1. QPS become more stable with offheap
>> >
>> > Performance w/o offheap:
>> >
>> >
>> >
>> > Performance w/ offheap:
>> >
>> >
>> >
>> > These data come from our online A/B test cluster (with 450 physical
>>  machines, and each with 256G memory + 64 core) with real world
>> >> workloads,
>>  it shows using offheap we could gain a more stable throughput as well
>> as
>>  better performance
>> >
>> > Not showing fully online data here because for online we published
>> the
>>  version with both offheap and NettyRpcServer together, so no
>> standalone
>>  comparison data for offheap
>> >
>> > 2. Full GC frequency and cost
>> >
>> > Average Full GC STW time reduce from 11s to 7s with offheap.
>> >
>> > 3. Young GC frequency and cost
>> >
>> > No performance degradation observed with offheap.
>> >
>> > 4. Peak throughput of one single RS
>> >
>> > On Singles Day (11/11), peak throughput of one single RS reached
>> 100K,
>>  among which 90K from Get. Plus internet in/out data we could know the
>>  average result size of get request is ~1KB
>> >
>> >
>> >
>> > Offheap are used on all online machines (more than 1600 nodes)
>> instead
>>

Re: Why hbase doesn't remove the empty region

2016-11-18 Thread Nick Dimiduk
HBase will not remove empty regions. It assumes you know what you're doing.

In 1.2 there's a new "Region Normalizer" [0] feature that runs in the
Master, which acts a bit like the Balancer, but for region sizes. I think
it's still considered experimental, so YMMV. We're still on 1.1, so I
haven't tried it in prod yet.

I also have this problem in our cluster, because of a combination of TTL
and Phoenix's Rowkey Timestamp feature. This results in region boundaries
including the timestamp, and data that goes away after a period. Eventually
all regions will become empty. I have a script that runs periodically to
calculate region sizes and drop empty regions. I volunteered to contribute
it [1], but haven't pulled it out of our puppet infra in the form of a
patch. Let me revive that thread internally.

-n

[0]: https://issues.apache.org/jira/browse/HBASE-13103
[1]: https://issues.apache.org/jira/browse/HBASE-15712

On Fri, Nov 18, 2016 at 4:59 PM, Xi Yang  wrote:

> Yes, we're still on old release of 0.98.x
>
> Many of empty region didn't be deleted. Take one for example:
>
>
> > "Namespace_default_table_java_app_logs_region_
> 29d4fa5482d17daf813e12732155e0ee_metric_storeCount"
> > : 1,
> >
> > "Namespace_default_table_java_app_logs_region_
> 29d4fa5482d17daf813e12732155e0ee_metric_storeFileCount"
> > : 1,
> >
> > "Namespace_default_table_java_app_logs_region_
> 29d4fa5482d17daf813e12732155e0ee_metric_memStoreSize"
> > : 424,
> >
> > "Namespace_default_table_java_app_logs_region_
> 29d4fa5482d17daf813e12732155e0ee_metric_storeFileSize"
> > : 549,
> >
> > "Namespace_default_table_java_app_logs_region_
> 29d4fa5482d17daf813e12732155e0ee_metric_compactionsCompletedCount"
> > : 4,
> >
> > "Namespace_default_table_java_app_logs_region_
> 29d4fa5482d17daf813e12732155e0ee_metric_numBytesCompactedCount"
> > : 8616506268,
> >
> > "Namespace_default_table_java_app_logs_region_
> 29d4fa5482d17daf813e12732155e0ee_metric_numFilesCompactedCount"
> > : 4,
>
>
> The log about last time major compaction of this region is:
>
> 2016-11-03 22:45:33,453 INFO org.apache.hadoop.hbase.regionserver.HStore:
> > Completed major compaction of 1 (all) file(s) in log of
> > java_app_logs,CM-sjcmhpcapp01-1476076988799-135168375-
> 178aebc1-4bfc-47c4-907f-952f50270f50-b,1476360804449.
> 29d4fa5482d17daf813e12732155e0ee.
> > into 47b8ee796fc74ff0b8bf60318f99(size=549), total size for store is
> > 549. This selection was in queue for 0sec, and took 0sec to execute.
> > 2016-11-03 22:45:33,454 INFO
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed
> > compaction: Request =
> > regionName=java_app_logs,CM-sjcmhpcapp01-1476076988799-
> 135168375-178aebc1-4bfc-47c4-907f-952f50270f50-b,1476360804449.
> 29d4fa5482d17daf813e12732155e0ee.,
> > storeName=log, fileCount=1, fileSize=1.3 G, priority=9,
> > time=25652433393376299; duration=0sec
>
>
> You can find that its size is only 549 bytes at that time. But HBase didn't
> remove it.
>
> Here is the result I run HFile tool
>
> $ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f
> > hdfs://xxx:8020/hbase/data/default/java_app_logs/
> 29d4fa5482d17daf813e12732155e0ee/log/47b8ee796fc74ff0b8bf60318f99
> > 16/11/18 16:48:25 INFO Configuration.deprecation: hadoop.native.lib is
> > deprecated. Instead, use io.native.lib.available
> > 16/11/18 16:48:29 INFO Configuration.deprecation: fs.default.name is
> > deprecated. Instead, use fs.defaultFS
> > Scanning ->
> > hdfs://xxx:8020/hbase/data/default/java_app_logs/
> 29d4fa5482d17daf813e12732155e0ee/log/47b8ee796fc74ff0b8bf60318f99
> > 16/11/18 16:48:29 INFO hfile.CacheConfig: CacheConfig:disabled
> > Scanned kv count -> 0
>
>
>
> 2016-11-17 18:45 GMT-08:00 Ted Yu :
>
> > You would be able to find in region server log(s).
> >
> > Can you do quick inspection of the hfiles using:
> >
> > http://hbase.apache.org/book.html#hfile_tool
> >
> > BTW you're still on old release of 0.98.x, right ?
> >
> > On Thu, Nov 17, 2016 at 6:40 PM, Xi Yang  wrote:
> >
> > > Thank you for reply.
> > >
> > > Yes. There are many regions have this problem. And those regions were
> > > created several years ago.
> > > hbase.hregion.majorcompaction of our cluster is 7 days.
> > > Where can I see the last major compaction time?
> > >
> > > Thanks,
> > > Alex
> > >
> > > 2016-11-17 18:22 GMT-08:00 Ted Yu :
> > >
> > > > When was the last time major compaction was performed on this region
> ?
> > > >
> > > > Were you referring to the store files by 'them' in your question ?
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Nov 17, 2016 at 6:18 PM, Xi Yang 
> > wrote:
> > > >
> > > > > I have some regions are empty. But HBase doesn't remove them, why?
> > > > >
> > > > > You can see the storeFileSize is only 549. But I use hbase shell to
> > > scan
> > > > > this region, the result is empty
> > > > >
> > > > > "Namespace_default_table_java_app_logs_region_
> > > > > 3854912733f1acaaa8a255abd6b7b1ec_metric_storeFileCount"
> > > > > : 1,
> > > > > 

Re: Why hbase doesn't remove the empty region

2016-11-18 Thread Xi Yang
Yes, we're still on old release of 0.98.x

Many of empty region didn't be deleted. Take one for example:


> "Namespace_default_table_java_app_logs_region_29d4fa5482d17daf813e12732155e0ee_metric_storeCount"
> : 1,
>
> "Namespace_default_table_java_app_logs_region_29d4fa5482d17daf813e12732155e0ee_metric_storeFileCount"
> : 1,
>
> "Namespace_default_table_java_app_logs_region_29d4fa5482d17daf813e12732155e0ee_metric_memStoreSize"
> : 424,
>
> "Namespace_default_table_java_app_logs_region_29d4fa5482d17daf813e12732155e0ee_metric_storeFileSize"
> : 549,
>
> "Namespace_default_table_java_app_logs_region_29d4fa5482d17daf813e12732155e0ee_metric_compactionsCompletedCount"
> : 4,
>
> "Namespace_default_table_java_app_logs_region_29d4fa5482d17daf813e12732155e0ee_metric_numBytesCompactedCount"
> : 8616506268,
>
> "Namespace_default_table_java_app_logs_region_29d4fa5482d17daf813e12732155e0ee_metric_numFilesCompactedCount"
> : 4,


The log about last time major compaction of this region is:

2016-11-03 22:45:33,453 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Completed major compaction of 1 (all) file(s) in log of
> java_app_logs,CM-sjcmhpcapp01-1476076988799-135168375-178aebc1-4bfc-47c4-907f-952f50270f50-b,1476360804449.29d4fa5482d17daf813e12732155e0ee.
> into 47b8ee796fc74ff0b8bf60318f99(size=549), total size for store is
> 549. This selection was in queue for 0sec, and took 0sec to execute.
> 2016-11-03 22:45:33,454 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed
> compaction: Request =
> regionName=java_app_logs,CM-sjcmhpcapp01-1476076988799-135168375-178aebc1-4bfc-47c4-907f-952f50270f50-b,1476360804449.29d4fa5482d17daf813e12732155e0ee.,
> storeName=log, fileCount=1, fileSize=1.3 G, priority=9,
> time=25652433393376299; duration=0sec


You can find that its size is only 549 bytes at that time. But HBase didn't
remove it.

Here is the result I run HFile tool

$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f
> hdfs://xxx:8020/hbase/data/default/java_app_logs/29d4fa5482d17daf813e12732155e0ee/log/47b8ee796fc74ff0b8bf60318f99
> 16/11/18 16:48:25 INFO Configuration.deprecation: hadoop.native.lib is
> deprecated. Instead, use io.native.lib.available
> 16/11/18 16:48:29 INFO Configuration.deprecation: fs.default.name is
> deprecated. Instead, use fs.defaultFS
> Scanning ->
> hdfs://xxx:8020/hbase/data/default/java_app_logs/29d4fa5482d17daf813e12732155e0ee/log/47b8ee796fc74ff0b8bf60318f99
> 16/11/18 16:48:29 INFO hfile.CacheConfig: CacheConfig:disabled
> Scanned kv count -> 0



2016-11-17 18:45 GMT-08:00 Ted Yu :

> You would be able to find in region server log(s).
>
> Can you do quick inspection of the hfiles using:
>
> http://hbase.apache.org/book.html#hfile_tool
>
> BTW you're still on old release of 0.98.x, right ?
>
> On Thu, Nov 17, 2016 at 6:40 PM, Xi Yang  wrote:
>
> > Thank you for reply.
> >
> > Yes. There are many regions have this problem. And those regions were
> > created several years ago.
> > hbase.hregion.majorcompaction of our cluster is 7 days.
> > Where can I see the last major compaction time?
> >
> > Thanks,
> > Alex
> >
> > 2016-11-17 18:22 GMT-08:00 Ted Yu :
> >
> > > When was the last time major compaction was performed on this region ?
> > >
> > > Were you referring to the store files by 'them' in your question ?
> > >
> > > Cheers
> > >
> > > On Thu, Nov 17, 2016 at 6:18 PM, Xi Yang 
> wrote:
> > >
> > > > I have some regions are empty. But HBase doesn't remove them, why?
> > > >
> > > > You can see the storeFileSize is only 549. But I use hbase shell to
> > scan
> > > > this region, the result is empty
> > > >
> > > > "Namespace_default_table_java_app_logs_region_
> > > > 3854912733f1acaaa8a255abd6b7b1ec_metric_storeFileCount"
> > > > : 1,
> > > > "Namespace_default_table_java_app_logs_region_
> > > > 3854912733f1acaaa8a255abd6b7b1ec_metric_memStoreSize"
> > > > : 424,
> > > > "Namespace_default_table_java_app_logs_region_
> > > > 3854912733f1acaaa8a255abd6b7b1ec_metric_storeFileSize"
> > > > : 549,
> > > > "Namespace_default_table_java_app_logs_region_
> > > > 3854912733f1acaaa8a255abd6b7b1ec_metric_compactionsCompletedCount"
> > > > : 4,
> > > > "Namespace_default_table_java_app_logs_region_
> > > > 3854912733f1acaaa8a255abd6b7b1ec_metric_numBytesCompactedCount"
> > > > : 850327,
> > > > "Namespace_default_table_java_app_logs_region_
> > > > 3854912733f1acaaa8a255abd6b7b1ec_metric_numFilesCompactedCount"
> > > > : 4,
> > > >
> > >
> >
>


Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

2016-11-18 Thread Stack
On Fri, Nov 18, 2016 at 3:53 PM, Ted Yu  wrote:

> Thanks, Matteo.
>
> bq. restore is not clear if given an incremental id it will do the full
> restore from full up to that point or if i need to apply manually
> everything
>
> The restore takes into consideration of the dependent backup(s).
> So there is no need to apply preceding backup(s) manually.
>
>
I ask this question on the issue. It is not clear from the usage or doc how
to run a restore from incremental. Can you fix in doc and usage how so I
can be clear and try it. Currently I am stuck verifying a round trip backup
restore made of incrementals.

Thanks,
S



> On Fri, Nov 18, 2016 at 3:48 PM, Matteo Bertozzi 
> wrote:
>
> > I did one last pass to the mega patch. I don't see anything major that
> > should block the merge.
> >
> > - most of the code is isolated in the backup package
> > - all the backup code is client side
> > - there are few changes to the server side, mainly for cleaners, wal
> > rolling and similar (which is ok)
> > - there is a good number of tests, and an integration test
> >
> > the code seems to have still some left overs from the old implementation,
> > and some stuff needs a cleanup. but I don't think this should be used as
> an
> > argument to block the merge. I think the guys will keep working on this
> and
> > they may also get help of others once the patch is in master.
> >
> > I still have my concerns about the current limitations, but these are
> > things already planned for phase 3, so some of this stuff may even be in
> > the final 2.0.
> > but as long as we have a "current limitations" section in the user guide
> > mentioning important stuff like the ones below, I'm ok with it.
> >  - if you write to the table with Durability.SKIP_WALS your data will not
> > be in the incremental-backup
> >  - if you bulkload files that data will not be in the incremental backup
> > (HBASE-14417)
> >  - the incremental backup will not only contains the data of the table
> you
> > specified but also the regions from other tables that are on the same set
> > of RSs (HBASE-14141) ...maybe a note about security around this topic
> >  - the incremental backup will not contains just the "latest row" between
> > backup A and B, but it will also contains all the updates occurred in
> > between. but the restore does not allow you to restore up to a certain
> > point in time, the restore will always be up to the "latest backup
> point".
> >  - you should limit the number of "incremental" up to N (or maybe SIZE),
> to
> > avoid replay time becoming the bottleneck. (HBASE-14135)
> >
> > I'll be ok even with the above not being in the final 2.0,
> > but i'd like to see as blocker for the final 2.0 (not the merge)
> >  - the backup code moved in an hbase-backup module
> >  - and some more work around tools, especially to try to unify and make
> > simple the backup experience (simple example: in some case there is a
> > backup_id argument in others a backupId argument. or things like..
> restore
> > is not clear if given an incremental id it will do the full restore from
> > full up to that point or if i need to apply manually everything).
> >
> > in conclusion, I think we can open a merge vote. I'll be +1 on it, and I
> > think we should try to reject -1 with just a "code cleanup" motivation,
> > since there will still be work going on on the code after the merge.
> >
> > Matteo
> >
> >
> > On Sun, Nov 6, 2016 at 10:54 PM, Devaraj Das 
> wrote:
> >
> > > Stack and others, anything else on the patch? Merge to master now?
> > >
> >
>


Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

2016-11-18 Thread Stack
Nice writeup Matteo. I'm trying to get to where you are at but am not there
yet. Almost. Agree that it should be a blocker that all gets hoisted out of
core into a backup module and that the limitations are spelled out clearly
in doc.

I have spent a good bit of time reviewing and testing this feature. I would
like my review and concerns addressed and I'd like it to be clear how;
either explicit follow-on issues, pointers to where in the patch or doc my
remarks have been catered to, etc. Until then, I am against commit.

St.Ack


On Fri, Nov 18, 2016 at 3:48 PM, Matteo Bertozzi 
wrote:

> I did one last pass to the mega patch. I don't see anything major that
> should block the merge.
>
> - most of the code is isolated in the backup package
> - all the backup code is client side
> - there are few changes to the server side, mainly for cleaners, wal
> rolling and similar (which is ok)
> - there is a good number of tests, and an integration test
>
> the code seems to have still some left overs from the old implementation,
> and some stuff needs a cleanup. but I don't think this should be used as an
> argument to block the merge. I think the guys will keep working on this and
> they may also get help of others once the patch is in master.
>
> I still have my concerns about the current limitations, but these are
> things already planned for phase 3, so some of this stuff may even be in
> the final 2.0.
> but as long as we have a "current limitations" section in the user guide
> mentioning important stuff like the ones below, I'm ok with it.
>  - if you write to the table with Durability.SKIP_WALS your data will not
> be in the incremental-backup
>  - if you bulkload files that data will not be in the incremental backup
> (HBASE-14417)
>  - the incremental backup will not only contains the data of the table you
> specified but also the regions from other tables that are on the same set
> of RSs (HBASE-14141) ...maybe a note about security around this topic
>  - the incremental backup will not contains just the "latest row" between
> backup A and B, but it will also contains all the updates occurred in
> between. but the restore does not allow you to restore up to a certain
> point in time, the restore will always be up to the "latest backup point".
>  - you should limit the number of "incremental" up to N (or maybe SIZE), to
> avoid replay time becoming the bottleneck. (HBASE-14135)
>
> I'll be ok even with the above not being in the final 2.0,
> but i'd like to see as blocker for the final 2.0 (not the merge)
>  - the backup code moved in an hbase-backup module
>  - and some more work around tools, especially to try to unify and make
> simple the backup experience (simple example: in some case there is a
> backup_id argument in others a backupId argument. or things like.. restore
> is not clear if given an incremental id it will do the full restore from
> full up to that point or if i need to apply manually everything).
>
> in conclusion, I think we can open a merge vote. I'll be +1 on it, and I
> think we should try to reject -1 with just a "code cleanup" motivation,
> since there will still be work going on on the code after the merge.
>
> Matteo
>
>
> On Sun, Nov 6, 2016 at 10:54 PM, Devaraj Das  wrote:
>
> > Stack and others, anything else on the patch? Merge to master now?
> >
>


Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

2016-11-18 Thread Vladimir Rodionov
Thanks, Matteo

It is a good write up.

HBASE-14417 (bulk load support) is pretty close to code complete now. There
is already work in progress on
HBASE-14141 (Filtering WAL on backup). I would say they will be complete in
a 2-3 weeks.

and I agree that "code clean up before merge" requests should be ignored.
This is work in progress and we are doing clean up all the time.

-Vlad

On Fri, Nov 18, 2016 at 3:53 PM, Ted Yu  wrote:

> Thanks, Matteo.
>
> bq. restore is not clear if given an incremental id it will do the full
> restore from full up to that point or if i need to apply manually
> everything
>
> The restore takes into consideration of the dependent backup(s).
> So there is no need to apply preceding backup(s) manually.
>
> On Fri, Nov 18, 2016 at 3:48 PM, Matteo Bertozzi 
> wrote:
>
> > I did one last pass to the mega patch. I don't see anything major that
> > should block the merge.
> >
> > - most of the code is isolated in the backup package
> > - all the backup code is client side
> > - there are few changes to the server side, mainly for cleaners, wal
> > rolling and similar (which is ok)
> > - there is a good number of tests, and an integration test
> >
> > the code seems to have still some left overs from the old implementation,
> > and some stuff needs a cleanup. but I don't think this should be used as
> an
> > argument to block the merge. I think the guys will keep working on this
> and
> > they may also get help of others once the patch is in master.
> >
> > I still have my concerns about the current limitations, but these are
> > things already planned for phase 3, so some of this stuff may even be in
> > the final 2.0.
> > but as long as we have a "current limitations" section in the user guide
> > mentioning important stuff like the ones below, I'm ok with it.
> >  - if you write to the table with Durability.SKIP_WALS your data will not
> > be in the incremental-backup
> >  - if you bulkload files that data will not be in the incremental backup
> > (HBASE-14417)
> >  - the incremental backup will not only contains the data of the table
> you
> > specified but also the regions from other tables that are on the same set
> > of RSs (HBASE-14141) ...maybe a note about security around this topic
> >  - the incremental backup will not contains just the "latest row" between
> > backup A and B, but it will also contains all the updates occurred in
> > between. but the restore does not allow you to restore up to a certain
> > point in time, the restore will always be up to the "latest backup
> point".
> >  - you should limit the number of "incremental" up to N (or maybe SIZE),
> to
> > avoid replay time becoming the bottleneck. (HBASE-14135)
> >
> > I'll be ok even with the above not being in the final 2.0,
> > but i'd like to see as blocker for the final 2.0 (not the merge)
> >  - the backup code moved in an hbase-backup module
> >  - and some more work around tools, especially to try to unify and make
> > simple the backup experience (simple example: in some case there is a
> > backup_id argument in others a backupId argument. or things like..
> restore
> > is not clear if given an incremental id it will do the full restore from
> > full up to that point or if i need to apply manually everything).
> >
> > in conclusion, I think we can open a merge vote. I'll be +1 on it, and I
> > think we should try to reject -1 with just a "code cleanup" motivation,
> > since there will still be work going on on the code after the merge.
> >
> > Matteo
> >
> >
> > On Sun, Nov 6, 2016 at 10:54 PM, Devaraj Das 
> wrote:
> >
> > > Stack and others, anything else on the patch? Merge to master now?
> > >
> >
>


Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

2016-11-18 Thread Ted Yu
Thanks, Matteo.

bq. restore is not clear if given an incremental id it will do the full
restore from full up to that point or if i need to apply manually everything

The restore takes into consideration of the dependent backup(s).
So there is no need to apply preceding backup(s) manually.

On Fri, Nov 18, 2016 at 3:48 PM, Matteo Bertozzi 
wrote:

> I did one last pass to the mega patch. I don't see anything major that
> should block the merge.
>
> - most of the code is isolated in the backup package
> - all the backup code is client side
> - there are few changes to the server side, mainly for cleaners, wal
> rolling and similar (which is ok)
> - there is a good number of tests, and an integration test
>
> the code seems to have still some left overs from the old implementation,
> and some stuff needs a cleanup. but I don't think this should be used as an
> argument to block the merge. I think the guys will keep working on this and
> they may also get help of others once the patch is in master.
>
> I still have my concerns about the current limitations, but these are
> things already planned for phase 3, so some of this stuff may even be in
> the final 2.0.
> but as long as we have a "current limitations" section in the user guide
> mentioning important stuff like the ones below, I'm ok with it.
>  - if you write to the table with Durability.SKIP_WALS your data will not
> be in the incremental-backup
>  - if you bulkload files that data will not be in the incremental backup
> (HBASE-14417)
>  - the incremental backup will not only contains the data of the table you
> specified but also the regions from other tables that are on the same set
> of RSs (HBASE-14141) ...maybe a note about security around this topic
>  - the incremental backup will not contains just the "latest row" between
> backup A and B, but it will also contains all the updates occurred in
> between. but the restore does not allow you to restore up to a certain
> point in time, the restore will always be up to the "latest backup point".
>  - you should limit the number of "incremental" up to N (or maybe SIZE), to
> avoid replay time becoming the bottleneck. (HBASE-14135)
>
> I'll be ok even with the above not being in the final 2.0,
> but i'd like to see as blocker for the final 2.0 (not the merge)
>  - the backup code moved in an hbase-backup module
>  - and some more work around tools, especially to try to unify and make
> simple the backup experience (simple example: in some case there is a
> backup_id argument in others a backupId argument. or things like.. restore
> is not clear if given an incremental id it will do the full restore from
> full up to that point or if i need to apply manually everything).
>
> in conclusion, I think we can open a merge vote. I'll be +1 on it, and I
> think we should try to reject -1 with just a "code cleanup" motivation,
> since there will still be work going on on the code after the merge.
>
> Matteo
>
>
> On Sun, Nov 6, 2016 at 10:54 PM, Devaraj Das  wrote:
>
> > Stack and others, anything else on the patch? Merge to master now?
> >
>


Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912

2016-11-18 Thread Matteo Bertozzi
I did one last pass to the mega patch. I don't see anything major that
should block the merge.

- most of the code is isolated in the backup package
- all the backup code is client side
- there are few changes to the server side, mainly for cleaners, wal
rolling and similar (which is ok)
- there is a good number of tests, and an integration test

the code seems to have still some left overs from the old implementation,
and some stuff needs a cleanup. but I don't think this should be used as an
argument to block the merge. I think the guys will keep working on this and
they may also get help of others once the patch is in master.

I still have my concerns about the current limitations, but these are
things already planned for phase 3, so some of this stuff may even be in
the final 2.0.
but as long as we have a "current limitations" section in the user guide
mentioning important stuff like the ones below, I'm ok with it.
 - if you write to the table with Durability.SKIP_WALS your data will not
be in the incremental-backup
 - if you bulkload files that data will not be in the incremental backup
(HBASE-14417)
 - the incremental backup will not only contains the data of the table you
specified but also the regions from other tables that are on the same set
of RSs (HBASE-14141) ...maybe a note about security around this topic
 - the incremental backup will not contains just the "latest row" between
backup A and B, but it will also contains all the updates occurred in
between. but the restore does not allow you to restore up to a certain
point in time, the restore will always be up to the "latest backup point".
 - you should limit the number of "incremental" up to N (or maybe SIZE), to
avoid replay time becoming the bottleneck. (HBASE-14135)

I'll be ok even with the above not being in the final 2.0,
but i'd like to see as blocker for the final 2.0 (not the merge)
 - the backup code moved in an hbase-backup module
 - and some more work around tools, especially to try to unify and make
simple the backup experience (simple example: in some case there is a
backup_id argument in others a backupId argument. or things like.. restore
is not clear if given an incremental id it will do the full restore from
full up to that point or if i need to apply manually everything).

in conclusion, I think we can open a merge vote. I'll be +1 on it, and I
think we should try to reject -1 with just a "code cleanup" motivation,
since there will still be work going on on the code after the merge.

Matteo


On Sun, Nov 6, 2016 at 10:54 PM, Devaraj Das  wrote:

> Stack and others, anything else on the patch? Merge to master now?
>


[jira] [Created] (HBASE-17130) Add support to specify an arbitrary number of reducers when writing HFiles for bulk load

2016-11-18 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-17130:
-

 Summary: Add support to specify an arbitrary number of reducers 
when writing HFiles for bulk load
 Key: HBASE-17130
 URL: https://issues.apache.org/jira/browse/HBASE-17130
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Esteban Gutierrez


>From the discussion from HBASE-16894 there is a set of use cases where writing 
>to multiple regions in a single reducer can be helpful to reduce the overhead 
>of MR jobs when a large number of regions exist in an HBase cluster and some 
>regions can present a data skew, e.g. 100s or 1000s of regions with a very 
>small number of rows vs. regions with 10s or millions or rows as part of the 
>same job. And merging regions is not an option for the use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] HBase Logging Guidelines

2016-11-18 Thread Umesh Agashe
Thanks Ted! Anyone who has the link can now comment on the doc.

Umesh


On Fri, Nov 18, 2016 at 1:50 PM, Ted Yu  wrote:

> Can you give permission (to everyone) to comment on the doc ?
>
> On Fri, Nov 18, 2016 at 1:39 PM, Umesh Agashe 
> wrote:
>
> > Hello HBase Community!
> >
> > I am relatively recent addition to the HBase community. Since I started
> > working on the HBase, I had a few opportunities to debug the HBase code.
> I
> > would like to share with you the Logging Guidelines Document with the
> goal
> > to improve the readability and debuggability of HBase log files. I would
> > appreciate your reviews, suggestions, thoughts on this. Similar set of
> > guidelines helped us in the past. I hope HBase will benefit from it as
> > well.
> >
> > You can view the document here:
> > *https://docs.google.com/document/d/1a7P2MiUGpHKK0-
> > BFk3zGQ-2g5Kwvw7P-3zTtd-Vdvg8
> >  > BFk3zGQ-2g5Kwvw7P-3zTtd-Vdvg8>*
> > .
> >
> >
> > Thanks,
> > Umesh
> >
>


[jira] [Resolved] (HBASE-11402) Scanner performs redundand datanode requests

2016-11-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-11402.
---
Resolution: Duplicate

Resolving as duplicated by HBASE-10676

Do not take my closing of this issue as a discounting of the work done in here 
[~shmuma]; rather, your work has been carried over into the duplicate and later 
in HBASE-17072. This issue is still a problem. You identified the useless seek. 
The threadlocal in turn is super problematic.

> Scanner performs redundand datanode requests
> 
>
> Key: HBASE-11402
> URL: https://issues.apache.org/jira/browse/HBASE-11402
> Project: HBase
>  Issue Type: Bug
>  Components: HFile, Scanners
>Reporter: Max Lapan
>
> Using hbase 0.94.6 I found duplicate datanode requests of this sort:
> {noformat}
> 2014-06-09 14:12:22,039 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.103.0.73:50010, dest: /10.103.0.38:57897, bytes: 1056768, op: HDFS_READ, 
> cliID: DFSClient_NONMAPREDUCE_1702752887_26, offset: 35840, srvID: 
> DS-504316153-10.103.0.73-50010-1342437562377, blockid: 
> BP-404551095-10.103.0.38-1376045452213:blk_3541255952831727320_613837, 
> duration: 109928797000
> 2014-06-09 14:12:22,080 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.103.0.73:50010, dest: /10.103.0.38:57910, bytes: 1056768, op: HDFS_READ, 
> cliID: DFSClient_NONMAPREDUCE_1702752887_26, offset: 0, srvID: 
> DS-504316153-10.103.0.73-50010-1342437562377, blockid: 
> BP-404551095-10.103.0.38-1376045452213:blk_3541255952831727320_613837, 
> duration: 3825
> {noformat}
> After short investigation, I found the source of such behaviour:
> * StoreScanner in constructor calls StoreFileScanner::seek, which (after 
> several levels of calls) is calling HFileBlock::readBlockDataInternal which 
> reads block and pre-reads header of the next block.
> * This pre-readed header is stored in ThreadLocal variable 
> and stream is left in a position right behind the header of next block.
> * After constructor finished, scanner code does scanning, and, after 
> pre-readed block data finished, it calls HFileReaderV2::readNextDataBlock, 
> which again calls HFileBlock::readBlockDataInternal, but this call occured 
> from different thread and there is nothing usefull in ThreadLocal variable
> * Due to this, stream is asked to seek backwards, and this cause duplicate DN 
> request.
> As far as I understood from trunk code, the problem hasn't fixed yet.
> Log of calls with process above:
> {noformat}
> 2014-06-18 14:55:36,616 INFO 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex: loadDataBlockWithScanInfo: 
> entered
> 2014-06-18 14:55:36,616 INFO org.apache.hadoop.hbase.io.hfile.HFileReaderV2: 
> seekTo: readBlock, ofs = 0, size = -1
> 2014-06-18 14:55:36,617 INFO org.apache.hadoop.hbase.io.hfile.HFileReaderV2: 
> Before block read: path = 
> hdfs://tsthdp1.p:9000/hbase/webpagesII/ba16051997b1272f00bed5f65094dc63/p/c866b7b0eded4b
> 2014-06-18 14:55:36,617 INFO org.apache.hadoop.hbase.io.hfile.HFile: 
> readBlockDataInternal. Ofs = 0, is.pos = 137257042, ondDiskSizeWithHeader = -1
> 2014-06-18 14:55:36,617 INFO org.apache.hadoop.hbase.io.hfile.HFile: 
> readBlockDataInternal: prefetchHeader.ofs = -1, thread = 48
> 2014-06-18 14:55:36,617 INFO org.apache.hadoop.hbase.io.hfile.HFile: 
> FSReaderV2: readAtOffset: size = 24, offset = 0, peekNext = false
> 2014-06-18 14:55:36,617 INFO org.apache.hadoop.hdfs.DFSClient: seek: 
> targetPos = 0, pos = 137257042, blockEnd = 137257229
> 2014-06-18 14:55:36,617 INFO org.apache.hadoop.hdfs.DFSClient: seek: not 
> done, blockEnd = -1
> 2014-06-18 14:55:36,617 INFO org.apache.hadoop.hdfs.DFSClient: 
> readWithStrategy: before seek, pos = 0, blockEnd = -1, currentNode = 
> 10.103.0.73:50010
> 2014-06-18 14:55:36,618 INFO org.apache.hadoop.hdfs.DFSClient: getBlockAt: 
> blockEnd updated to 137257229
> 2014-06-18 14:55:36,618 INFO org.apache.hadoop.hdfs.DFSClient: blockSeekTo: 
> loop, target = 0
> 2014-06-18 14:55:36,618 INFO org.apache.hadoop.hdfs.DFSClient: 
> getBlockReader: dn = tsthdp2.p, file = 
> /hbase/webpagesII/ba16051997b1272f00bed5f65094dc63/p/c866b7b0eded4b42bc40aa9e18ac8a4b,
>  bl
> 2014-06-18 14:55:36,627 INFO org.apache.hadoop.hdfs.DFSClient: readBuffer: 
> ofs = 0, len = 24
> 2014-06-18 14:55:36,627 INFO org.apache.hadoop.hdfs.DFSClient: readBuffer: 
> try to read
> 2014-06-18 14:55:36,641 INFO org.apache.hadoop.hdfs.DFSClient: readBuffer: 
> done, len = 24
> 2014-06-18 14:55:36,641 INFO org.apache.hadoop.hbase.io.hfile.HFile: 
> FSReaderV2: readAtOffset: size = 35899, offset = 24, peekNext = true
> 2014-06-18 14:55:36,641 INFO org.apache.hadoop.hdfs.DFSClient: seek: 
> targetPos = 24, pos = 24, blockEnd = 137257229
> 2014-06-18 14:55:36,641 INFO org.apache.hadoop.hdfs.DFSClient: seek: check 

Re: [DISCUSS] HBase Logging Guidelines

2016-11-18 Thread Ted Yu
Can you give permission (to everyone) to comment on the doc ?

On Fri, Nov 18, 2016 at 1:39 PM, Umesh Agashe  wrote:

> Hello HBase Community!
>
> I am relatively recent addition to the HBase community. Since I started
> working on the HBase, I had a few opportunities to debug the HBase code. I
> would like to share with you the Logging Guidelines Document with the goal
> to improve the readability and debuggability of HBase log files. I would
> appreciate your reviews, suggestions, thoughts on this. Similar set of
> guidelines helped us in the past. I hope HBase will benefit from it as
> well.
>
> You can view the document here:
> *https://docs.google.com/document/d/1a7P2MiUGpHKK0-
> BFk3zGQ-2g5Kwvw7P-3zTtd-Vdvg8
>  BFk3zGQ-2g5Kwvw7P-3zTtd-Vdvg8>*
> .
>
>
> Thanks,
> Umesh
>


[jira] [Resolved] (HBASE-17121) Undo the building of xref as part of site build

2016-11-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-17121.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Resolving as done.

> Undo the building of xref as part of site build
> ---
>
> Key: HBASE-17121
> URL: https://issues.apache.org/jira/browse/HBASE-17121
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
>
> Remove xref generation as part of site build. It was useful once before 
> grepcode and easy perusal of src via git views: 
> https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=summary
> Replace the xref link with a pointer to apache git.
> DISCUSS thread on dev list: http://osdir.com/ml/general/2016-11/msg22051.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] HBase Logging Guidelines

2016-11-18 Thread Umesh Agashe
Hello HBase Community!

I am relatively recent addition to the HBase community. Since I started
working on the HBase, I had a few opportunities to debug the HBase code. I
would like to share with you the Logging Guidelines Document with the goal
to improve the readability and debuggability of HBase log files. I would
appreciate your reviews, suggestions, thoughts on this. Similar set of
guidelines helped us in the past. I hope HBase will benefit from it as well.

You can view the document here:
*https://docs.google.com/document/d/1a7P2MiUGpHKK0-BFk3zGQ-2g5Kwvw7P-3zTtd-Vdvg8
*
.


Thanks,
Umesh


Logging guidelines - Invitation to comment

2016-11-18 Thread Umesh Agashe (via Google Docs)

I've shared an item with you:

Logging guidelines
https://docs.google.com/document/d/1MNLbNbmSFXddbfqj2P0zqoHSOkf_2TZmWwzAQiLNi0o/edit?usp=sharing&invite=CJfd3Z0M&ts=582f6f49

It's not an attachment -- it's stored online. To open this item, just click  
the link above.


Hello HBase Community!

I am relatively recent addition to the HBase community. Since I started  
working on the HBase, I had a few opportunities to debug the HBase code. I  
would like to share with you the Logging Guidelines Document with the goal  
to improve the readability and debuggability of HBase log files. I would  
appreciate your reviews, suggestions, thoughts on this. Similar set of  
guidelines helped us in the past. I hope HBase will benefit from it as well.


Thanks,
Umesh


[jira] [Created] (HBASE-17129) Remove public from methods in DataType interface

2016-11-18 Thread Jan Hentschel (JIRA)
Jan Hentschel created HBASE-17129:
-

 Summary: Remove public from methods in DataType interface
 Key: HBASE-17129
 URL: https://issues.apache.org/jira/browse/HBASE-17129
 Project: HBase
  Issue Type: Improvement
Reporter: Jan Hentschel
Assignee: Jan Hentschel
Priority: Minor


There is no need to define the methods in the interface DataType as public. The 
visibility of the declared methods can be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] No regions on Master node in 2.0

2016-11-18 Thread Francis Liu
Just some extra bits of information:

Another way to isolate user regions from meta is you can create a regionserver 
group (HBASE-6721) dedicated to the system tables. This is what we do at Y!. If 
the load on meta gets too high (and it does), we split meta so the load gets 
spread across more regionservers (HBASE-11165) this way availability for any 
client is not affected. Tho agreeing with Stack that something is really broken 
if high priority rpcs cannot get through to meta. 
Does single writer to meta refer to the zkless assignment feature? If isn't 
that feature has been available since 0.98.6 (meta _not_ on master)? and we've 
been running with it on all our clusters for quite sometime now (with some 
enhancements ie split meta etc). 
Cheers,Francis 

On Wednesday, November 16, 2016 10:47 PM, Stack  wrote:
 

 On Wed, Nov 16, 2016 at 10:44 PM, Stack  wrote:

> On Wed, Nov 16, 2016 at 10:57 AM, Gary Helmling 
> wrote:
>
>>
>> Do you folks run the meta-carrying-master form G?
>
> Pardon me. I missed a paragraph. I see you folks do deploy this form.
St.Ack





> St.Ack
>
>
>
>
>
>>
>>
>> > > >
>> > > Is this just because meta had a dedicated server?
>> > >
>> > >
>> > I'm sure that having dedicated resources for meta helps.  But I don't
>> think
>> > that's sufficient.  The key is that master writes to meta are local, and
>> do
>> > not have to contend with the user requests to meta.
>> >
>> > It seems premature to be discussing dropping a working implementation
>> which
>> > eliminates painful parts of distributed consensus, until we have a
>> complete
>> > working alternative to evaluate.  Until then, why are we looking at
>> > features that are in use and work well?
>> >
>> >
>> >
>> How to move forward here? The Pv2 master is almost done. An ITBLL bakeoff
>> of new Pv2 based assign vs a Master that exclusively hosts hbase:meta?
>>
>>
>> I think that's a necessary test for proving out the new AM implementation.
>> But remember that we are comparing a feature which is actively supporting
>> production workloads with a line of active development.  I think there
>> should also be additional testing around situations of high meta load and
>> end-to-end assignment latency.
>>
>
>


   

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Enis Söztutar
Thanks for sharing this. Great work.

I don't see any reason why we cannot backport to branch-1.

Enis

On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell 
wrote:

> Yes, please, the patches will be useful to the community even if we decide
> not to backport into an official 1.x release.
>
>
> > On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
> >
> > Is the backported patch available anywhere? Not seeing it on the
> referenced
> > JIRA. If it ends up not getting officially backported to branch-1 due to
> > 2.0 around the corner, some of us who build our own deploy may want to
> > integrate into our builds. Thanks! These numbers look great
> >
> >> On Fri, Nov 18, 2016 at 12:20 PM Anoop John 
> wrote:
> >>
> >> Hi Yu Li
> >>   Good to see that the off heap work help you..  The perf
> >> numbers looks great.  So this is a compare of on heap L1 cache vs off
> heap
> >> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
> >> cache ON by default I believe.  Will raise a jira for that we can
> discuss
> >> under that.   Seems like L2 off heap cache for data blocks and L1 cache
> for
> >> index blocks seems a right choice.
> >>
> >> Thanks for the backport and the help in testing the feature..  You were
> >> able to find some corner case bugs and helped community to fix them..
> >> Thanks goes to ur whole team.
> >>
> >> -Anoop-
> >>
> >>
> >>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
> >>>
> >>> Sorry guys, let me retry the inline images:
> >>>
> >>> Performance w/o offheap:
> >>>
> >>> ​
> >>> Performance w/ offheap:
> >>>
> >>> ​
> >>> Peak Get QPS of one single RS during Singles' Day (11/11):
> >>>
> >>> ​
> >>>
> >>> And attach the files in case inline still not working:
> >>> ​​​
> >>> Performance_without_offheap.png
> >>> <
> >> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVG
> NtM0VxWC1n/view?usp=drive_web
> >>>
> >>> ​​
> >>> Performance_with_offheap.png
> >>> <
> >> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
> VrcUdPc2ww/view?usp=drive_web
> >>>
> >>> ​​
> >>> Peak_Get_QPS_of_Single_RS.png
> >>> <
> >> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
> F6bHpNYnJz/view?usp=drive_web
> >>>
> >>> ​
> >>>
> >>>
> >>> Best Regards,
> >>> Yu
> >>>
>  On 18 November 2016 at 19:29, Ted Yu  wrote:
> 
>  Yu:
>  With positive results, more hbase users would be asking for the
> backport
>  of offheap read path patches.
> 
>  Do you think you or your coworker has the bandwidth to publish
> backport
>  for branch-1 ?
> 
>  Thanks
> 
> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> >
> > Dear all,
> >
> > We have backported read path offheap (HBASE-11425) to our customized
>  hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
> >> more
>  than a month, and would like to share our experience, for what it's
> >> worth
>  (smile).
> >
> > Generally speaking, we gained a better and more stable
>  throughput/performance with offheap, and below are some details:
> > 1. QPS become more stable with offheap
> >
> > Performance w/o offheap:
> >
> >
> >
> > Performance w/ offheap:
> >
> >
> >
> > These data come from our online A/B test cluster (with 450 physical
>  machines, and each with 256G memory + 64 core) with real world
> >> workloads,
>  it shows using offheap we could gain a more stable throughput as well
> as
>  better performance
> >
> > Not showing fully online data here because for online we published
> the
>  version with both offheap and NettyRpcServer together, so no
> standalone
>  comparison data for offheap
> >
> > 2. Full GC frequency and cost
> >
> > Average Full GC STW time reduce from 11s to 7s with offheap.
> >
> > 3. Young GC frequency and cost
> >
> > No performance degradation observed with offheap.
> >
> > 4. Peak throughput of one single RS
> >
> > On Singles Day (11/11), peak throughput of one single RS reached
> 100K,
>  among which 90K from Get. Plus internet in/out data we could know the
>  average result size of get request is ~1KB
> >
> >
> >
> > Offheap are used on all online machines (more than 1600 nodes)
> instead
>  of LruCache, so the above QPS is gained from offheap bucketcache,
> along
>  with NettyRpcServer(HBASE-15756).
> >
> > Just let us know if any comments. Thanks.
> >
> > Best Regards,
> > Yu
> >
> >
> >
> >
> >
> >
> >
> 
> >>>
> >>>
> >>
>


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Loïc Chanel
Nope, same here !

Loïc CHANEL
System Big Data engineer
MS&T - WASABI - Worldline (Villeurbanne, France)

2016-11-18 9:54 GMT+01:00 Du, Jingcheng :

> Thanks Yu for the sharing, great achievements.
> It seems the images cannot be displayed? Maybe just me?
>
> Regards,
> Jingcheng
>
> From: Yu Li [mailto:car...@gmail.com]
> Sent: Friday, November 18, 2016 4:11 PM
> To: u...@hbase.apache.org; dev@hbase.apache.org
> Subject: Use experience and performance data of offheap from Alibaba
> online cluster
>
> Dear all,
>
> We have backported read path offheap (HBASE-11425) to our customized
> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
> than a month, and would like to share our experience, for what it's worth
> (smile).
>
> Generally speaking, we gained a better and more stable
> throughput/performance with offheap, and below are some details:
>
> 1. QPS become more stable with offheap
>
> Performance w/o offheap:
>
> [cid:part1.582d4b6424f071c]
>
> Performance w/ offheap:
>
> [cid:part2.582d4b6424f071c]
>
> These data come from our online A/B test cluster (with 450 physical
> machines, and each with 256G memory + 64 core) with real world workloads,
> it shows using offheap we could gain a more stable throughput as well as
> better performance
>
> Not showing fully online data here because for online we published the
> version with both offheap and NettyRpcServer together, so no standalone
> comparison data for offheap
>
> 2. Full GC frequency and cost
>
> Average Full GC STW time reduce from 11s to 7s with offheap.
>
> 3. Young GC frequency and cost
>
> No performance degradation observed with offheap.
>
> 4. Peak throughput of one single RS
>
> On Singles Day (11/11), peak throughput of one single RS reached 100K,
> among which 90K from Get. Plus internet in/out data we could know the
> average result size of get request is ~1KB
>
> [cid:part3.582d4b6424f071c]
>
> Offheap are used on all online machines (more than 1600 nodes) instead of
> LruCache, so the above QPS is gained from offheap bucketcache, along with
> NettyRpcServer(HBASE-15756).
> Just let us know if any comments. Thanks.
>
> Best Regards,
> Yu
>
>
>
>
>
>
>


[jira] [Created] (HBASE-17128) Find Cause of a Write Perf Regression in branch-1.2

2016-11-18 Thread stack (JIRA)
stack created HBASE-17128:
-

 Summary: Find Cause of a Write Perf Regression in branch-1.2
 Key: HBASE-17128
 URL: https://issues.apache.org/jira/browse/HBASE-17128
 Project: HBase
  Issue Type: Task
Reporter: stack


As reported by [~gbaecher] up on the mailing list, there is a regression in 
1.2. The regression is in a CDH version of 1.2 actually but the CDH hbase is a 
near pure 1.2. This is a working issue to figure which of the below changes 
brought on slower writes (The list comes from doing the following...git log 
--oneline  
remotes/origin/cdh5-1.2.0_5.8.0_dev..remotes/origin/cdh5-1.2.0_5.9.0_dev ... I 
stripped the few CDH specific changes, packaging and tagging only, and then 
made two groupings; candidates and the unlikelies):

{code}
  1 bbc6762 HBASE-16023 Fastpath for the FIFO rpcscheduler Adds an executor 
that does balanced queue and fast path handing off requests directly to waiting 
handlers if any present. Idea taken from Apace Kudu (incubating). See 
https://gerr#
  2 a260917 HBASE-16288 HFile intermediate block level indexes might recurse 
forever creating multi TB files
  3 5633281 HBASE-15811 Batch Get after batch Put does not fetch all Cells We 
were not waiting on all executors in a batch to complete. The test for 
no-more-executors was damaged by the 0.99/0.98.4 fix "HBASE-11403 Fix race 
conditions aro#
  4 780f720 HBASE-11625 - Verifies data before building HFileBlock. - Adds 
HFileBlock.Header class which contains information about location of fields. 
Testing: Adds CorruptedFSReaderImpl to TestChecksum. (Apekshit)
  5 d735680 HBASE-12133 Add FastLongHistogram for metric computation (Yi Deng)
  6 c4ee832 HBASE-15222 Use less contended classes for metrics
  7
  8 17320a4 HBASE-15683 Min latency in latency histograms are emitted as 
Long.MAX_VALUE
  9 283b39f HBASE-15396 Enhance mapreduce.TableSplit to add encoded region name
 10 39db592 HBASE-16195 Should not add chunk into chunkQueue if not using chunk 
pool in HeapMemStoreLAB
 11 5ff28b7 HBASE-16194 Should count in MSLAB chunk allocation into heap size 
change when adding duplicate cells
 12 5e3e0d2 HBASE-16318 fail build while rendering velocity template if 
dependency license isn't in whitelist.
 13 3ed66e3 HBASE-16318 consistently use the correct name for 'Apache License, 
Version 2.0'
 14 351832d HBASE-16340 exclude Xerces iplementation jars from coming in 
transitively.
 15 b6aa4be HBASE-16321 ensure no findbugs-jsr305
 16 4f9dde7 HBASE-16317 revert all ESAPI changes
 17 71b6a8a HBASE-16284 Unauthorized client can shutdown the cluster (Deokwoo 
Han)
 18 523753f HBASE-16450 Shell tool to dump replication queues
 19 ca5f2ee HBASE-16379 [replication] Minor improvement to 
replication/copy_tables_desc.rb
 20 effd105 HBASE-16135 PeerClusterZnode under rs of removed peer may never be 
deleted
 21 a5c6610 HBASE-16319 Fix TestCacheOnWrite after HBASE-16288
 22 1956bb0 HBASE-15808 Reduce potential bulk load intermediate space usage and 
waste
 23 031c54e HBASE-16096 Backport. Cleanly remove replication peers from 
ZooKeeper.
 24 60a3b12 HBASE-14963 Remove use of Guava Stopwatch from HBase client code 
(Devaraj Das)
 25 c7724fc HBASE-16207 can't restore snapshot without "Admin" permission
 26 8322a0b HBASE-16227 [Shell] Column value formatter not working in scans. 
Tested : manually using shell.
 27 8f86658 HBASE-14818 user_permission does not list namespace permissions (li 
xiang)
 28 775cd21 HBASE-15465 userPermission returned by getUserPermission() for the 
selected namespace does not have namespace set (li xiang)
 29 8d85aff HBASE-16093 Fix splits failed before creating daughter regions 
leave meta inconsistent
 30 bc41317 HBASE-16140 bump owasp.esapi from 2.1.0 to 2.1.0.1
 31 6fc70cd HBASE-16035 Nested AutoCloseables might not all get closed (Sean 
Mackrory)
 32 fe28fe84 HBASE-15891. Closeable resources potentially not getting closed if 
exception is thrown.
 33 1d2bf3c HBASE-14644 Region in transition metric is broken -- addendum 
(Huaxiang Sun)
 34 fd5f56c HBASE-16056 Procedure v2 - fix master crash for FileNotFound
 35 10cd038 HBASE-16034 Fix ProcedureTestingUtility#LoadCounter.setMaxProcId()
 36 dae4db4 HBASE-15872 Split TestWALProcedureStore
 37 e638d86 HBASE-14644 Region in transition metric is broken (Huaxiang Sun)
 38 f01b01d HBASE-15496 Throw RowTooBigException only for user scan/get 
(Guanghao Zhang)
 39 cc0ce66 HBASE-15746 Remove extra RegionCoprocessor preClose() in 
RSRpcServices#closeRegion (Stephen Yuan Jiang)
 40 923f6d7 HBASE-15873 ACL for snapshot restore / clone is not enforced
 41 62df392 HBASE-15946. Eliminate possible security concerns in Store File 
metrics.
 42 293db90 HBASE-15925 provide default values for hadoop compat module related 
properties that match default hadoop profile.
 43 b1b5b66 HBASE-15889. String case conversions are locale-sensitive, used 
without locale
 44 4a8c4e7 HBASE-15698 In

[jira] [Created] (HBASE-17127) Locate region should fail fast if underlying Connection already closed

2016-11-18 Thread Yu Li (JIRA)
Yu Li created HBASE-17127:
-

 Summary: Locate region should fail fast if underlying Connection 
already closed
 Key: HBASE-17127
 URL: https://issues.apache.org/jira/browse/HBASE-17127
 Project: HBase
  Issue Type: Bug
Reporter: Yu Li
Assignee: Yu Li


Currently if try to locate region when underlying connection is closed, we will 
retry and wait at least 10s for each round until exhausted (refer to the catch 
clause of {{RpcRetryingCallerImpl#callWithRetries}} and 
{{RegionServerCallable#sleep}} for more details), which is unnecessary and 
time-costing.

The issue is caused by incorrect manipulating connection which shows the 
disadvantage of force user to handle connection life cycle and proves the 
necessity to support auto-managed connection as we did before, as indicated in 
HBASE-17009

In this JIRA we will make it fail fast in the above case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Andrew Purtell
Yes, please, the patches will be useful to the community even if we decide not 
to backport into an official 1.x release.


> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault  
> wrote:
> 
> Is the backported patch available anywhere? Not seeing it on the referenced
> JIRA. If it ends up not getting officially backported to branch-1 due to
> 2.0 around the corner, some of us who build our own deploy may want to
> integrate into our builds. Thanks! These numbers look great
> 
>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John  wrote:
>> 
>> Hi Yu Li
>>   Good to see that the off heap work help you..  The perf
>> numbers looks great.  So this is a compare of on heap L1 cache vs off heap
>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>> cache ON by default I believe.  Will raise a jira for that we can discuss
>> under that.   Seems like L2 off heap cache for data blocks and L1 cache for
>> index blocks seems a right choice.
>> 
>> Thanks for the backport and the help in testing the feature..  You were
>> able to find some corner case bugs and helped community to fix them..
>> Thanks goes to ur whole team.
>> 
>> -Anoop-
>> 
>> 
>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>>> 
>>> Sorry guys, let me retry the inline images:
>>> 
>>> Performance w/o offheap:
>>> 
>>> ​
>>> Performance w/ offheap:
>>> 
>>> ​
>>> Peak Get QPS of one single RS during Singles' Day (11/11):
>>> 
>>> ​
>>> 
>>> And attach the files in case inline still not working:
>>> ​​​
>>> Performance_without_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVGNtM0VxWC1n/view?usp=drive_web
>>> 
>>> ​​
>>> Performance_with_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeFVrcUdPc2ww/view?usp=drive_web
>>> 
>>> ​​
>>> Peak_Get_QPS_of_Single_RS.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3F6bHpNYnJz/view?usp=drive_web
>>> 
>>> ​
>>> 
>>> 
>>> Best Regards,
>>> Yu
>>> 
 On 18 November 2016 at 19:29, Ted Yu  wrote:
 
 Yu:
 With positive results, more hbase users would be asking for the backport
 of offheap read path patches.
 
 Do you think you or your coworker has the bandwidth to publish backport
 for branch-1 ?
 
 Thanks
 
> On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> 
> Dear all,
> 
> We have backported read path offheap (HBASE-11425) to our customized
 hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
>> more
 than a month, and would like to share our experience, for what it's
>> worth
 (smile).
> 
> Generally speaking, we gained a better and more stable
 throughput/performance with offheap, and below are some details:
> 1. QPS become more stable with offheap
> 
> Performance w/o offheap:
> 
> 
> 
> Performance w/ offheap:
> 
> 
> 
> These data come from our online A/B test cluster (with 450 physical
 machines, and each with 256G memory + 64 core) with real world
>> workloads,
 it shows using offheap we could gain a more stable throughput as well as
 better performance
> 
> Not showing fully online data here because for online we published the
 version with both offheap and NettyRpcServer together, so no standalone
 comparison data for offheap
> 
> 2. Full GC frequency and cost
> 
> Average Full GC STW time reduce from 11s to 7s with offheap.
> 
> 3. Young GC frequency and cost
> 
> No performance degradation observed with offheap.
> 
> 4. Peak throughput of one single RS
> 
> On Singles Day (11/11), peak throughput of one single RS reached 100K,
 among which 90K from Get. Plus internet in/out data we could know the
 average result size of get request is ~1KB
> 
> 
> 
> Offheap are used on all online machines (more than 1600 nodes) instead
 of LruCache, so the above QPS is gained from offheap bucketcache, along
 with NettyRpcServer(HBASE-15756).
> 
> Just let us know if any comments. Thanks.
> 
> Best Regards,
> Yu
> 
> 
> 
> 
> 
> 
> 
 
>>> 
>>> 
>> 


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Bryan Beaudreault
Is the backported patch available anywhere? Not seeing it on the referenced
JIRA. If it ends up not getting officially backported to branch-1 due to
2.0 around the corner, some of us who build our own deploy may want to
integrate into our builds. Thanks! These numbers look great

On Fri, Nov 18, 2016 at 12:20 PM Anoop John  wrote:

> Hi Yu Li
>Good to see that the off heap work help you..  The perf
> numbers looks great.  So this is a compare of on heap L1 cache vs off heap
> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
> cache ON by default I believe.  Will raise a jira for that we can discuss
> under that.   Seems like L2 off heap cache for data blocks and L1 cache for
> index blocks seems a right choice.
>
> Thanks for the backport and the help in testing the feature..  You were
> able to find some corner case bugs and helped community to fix them..
> Thanks goes to ur whole team.
>
> -Anoop-
>
>
> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>
> > Sorry guys, let me retry the inline images:
> >
> > Performance w/o offheap:
> >
> > ​
> > Performance w/ offheap:
> >
> > ​
> > Peak Get QPS of one single RS during Singles' Day (11/11):
> >
> > ​
> >
> > And attach the files in case inline still not working:
> > ​​​
> >  Performance_without_offheap.png
> > <
> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVGNtM0VxWC1n/view?usp=drive_web
> >
> > ​​
> >  Performance_with_offheap.png
> > <
> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeFVrcUdPc2ww/view?usp=drive_web
> >
> > ​​
> >  Peak_Get_QPS_of_Single_RS.png
> > <
> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3F6bHpNYnJz/view?usp=drive_web
> >
> > ​
> >
> >
> > Best Regards,
> > Yu
> >
> > On 18 November 2016 at 19:29, Ted Yu  wrote:
> >
> >> Yu:
> >> With positive results, more hbase users would be asking for the backport
> >> of offheap read path patches.
> >>
> >> Do you think you or your coworker has the bandwidth to publish backport
> >> for branch-1 ?
> >>
> >> Thanks
> >>
> >> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> >> >
> >> > Dear all,
> >> >
> >> > We have backported read path offheap (HBASE-11425) to our customized
> >> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
> more
> >> than a month, and would like to share our experience, for what it's
> worth
> >> (smile).
> >> >
> >> > Generally speaking, we gained a better and more stable
> >> throughput/performance with offheap, and below are some details:
> >> > 1. QPS become more stable with offheap
> >> >
> >> > Performance w/o offheap:
> >> >
> >> >
> >> >
> >> > Performance w/ offheap:
> >> >
> >> >
> >> >
> >> > These data come from our online A/B test cluster (with 450 physical
> >> machines, and each with 256G memory + 64 core) with real world
> workloads,
> >> it shows using offheap we could gain a more stable throughput as well as
> >> better performance
> >> >
> >> > Not showing fully online data here because for online we published the
> >> version with both offheap and NettyRpcServer together, so no standalone
> >> comparison data for offheap
> >> >
> >> > 2. Full GC frequency and cost
> >> >
> >> > Average Full GC STW time reduce from 11s to 7s with offheap.
> >> >
> >> > 3. Young GC frequency and cost
> >> >
> >> > No performance degradation observed with offheap.
> >> >
> >> > 4. Peak throughput of one single RS
> >> >
> >> > On Singles Day (11/11), peak throughput of one single RS reached 100K,
> >> among which 90K from Get. Plus internet in/out data we could know the
> >> average result size of get request is ~1KB
> >> >
> >> >
> >> >
> >> > Offheap are used on all online machines (more than 1600 nodes) instead
> >> of LruCache, so the above QPS is gained from offheap bucketcache, along
> >> with NettyRpcServer(HBASE-15756).
> >> >
> >> > Just let us know if any comments. Thanks.
> >> >
> >> > Best Regards,
> >> > Yu
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
>


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Anoop John
Hi Yu Li
   Good to see that the off heap work help you..  The perf
numbers looks great.  So this is a compare of on heap L1 cache vs off heap
L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
cache ON by default I believe.  Will raise a jira for that we can discuss
under that.   Seems like L2 off heap cache for data blocks and L1 cache for
index blocks seems a right choice.

Thanks for the backport and the help in testing the feature..  You were
able to find some corner case bugs and helped community to fix them..
Thanks goes to ur whole team.

-Anoop-


On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:

> Sorry guys, let me retry the inline images:
>
> Performance w/o offheap:
>
> ​
> Performance w/ offheap:
>
> ​
> Peak Get QPS of one single RS during Singles' Day (11/11):
>
> ​
>
> And attach the files in case inline still not working:
> ​​​
>  Performance_without_offheap.png
> 
> ​​
>  Performance_with_offheap.png
> 
> ​​
>  Peak_Get_QPS_of_Single_RS.png
> 
> ​
>
>
> Best Regards,
> Yu
>
> On 18 November 2016 at 19:29, Ted Yu  wrote:
>
>> Yu:
>> With positive results, more hbase users would be asking for the backport
>> of offheap read path patches.
>>
>> Do you think you or your coworker has the bandwidth to publish backport
>> for branch-1 ?
>>
>> Thanks
>>
>> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
>> >
>> > Dear all,
>> >
>> > We have backported read path offheap (HBASE-11425) to our customized
>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
>> than a month, and would like to share our experience, for what it's worth
>> (smile).
>> >
>> > Generally speaking, we gained a better and more stable
>> throughput/performance with offheap, and below are some details:
>> > 1. QPS become more stable with offheap
>> >
>> > Performance w/o offheap:
>> >
>> >
>> >
>> > Performance w/ offheap:
>> >
>> >
>> >
>> > These data come from our online A/B test cluster (with 450 physical
>> machines, and each with 256G memory + 64 core) with real world workloads,
>> it shows using offheap we could gain a more stable throughput as well as
>> better performance
>> >
>> > Not showing fully online data here because for online we published the
>> version with both offheap and NettyRpcServer together, so no standalone
>> comparison data for offheap
>> >
>> > 2. Full GC frequency and cost
>> >
>> > Average Full GC STW time reduce from 11s to 7s with offheap.
>> >
>> > 3. Young GC frequency and cost
>> >
>> > No performance degradation observed with offheap.
>> >
>> > 4. Peak throughput of one single RS
>> >
>> > On Singles Day (11/11), peak throughput of one single RS reached 100K,
>> among which 90K from Get. Plus internet in/out data we could know the
>> average result size of get request is ~1KB
>> >
>> >
>> >
>> > Offheap are used on all online machines (more than 1600 nodes) instead
>> of LruCache, so the above QPS is gained from offheap bucketcache, along
>> with NettyRpcServer(HBASE-15756).
>> >
>> > Just let us know if any comments. Thanks.
>> >
>> > Best Regards,
>> > Yu
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Yu Li
@Ted:
bq. Do you think you or your coworker has the bandwidth to publish backport
for branch-1 ?
Yes, we'd like to try, but does this follow the tradition? AFAIK 2.0 is
about to come out (IIRC the plan is to release alpha by end of year?) and
this should be a core feature there for advertising 2.0? Please forgive my
conservatism if I am (Smile).

Removing user@ mailing list to keep the discussion among developers before
we got a conclusion.

Best Regards,
Yu

On 19 November 2016 at 00:44, Yu Li  wrote:

> Sorry guys, let me retry the inline images:
>
> Performance w/o offheap:
>
> ​
> Performance w/ offheap:
>
> ​
> Peak Get QPS of one single RS during Singles' Day (11/11):
>
> ​
>
> And attach the files in case inline still not working:
> ​​​
>  Performance_without_offheap.png
> 
> ​​
>  Performance_with_offheap.png
> 
> ​​
>  Peak_Get_QPS_of_Single_RS.png
> 
> ​
>
>
> Best Regards,
> Yu
>
> On 18 November 2016 at 19:29, Ted Yu  wrote:
>
>> Yu:
>> With positive results, more hbase users would be asking for the backport
>> of offheap read path patches.
>>
>> Do you think you or your coworker has the bandwidth to publish backport
>> for branch-1 ?
>>
>> Thanks
>>
>> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
>> >
>> > Dear all,
>> >
>> > We have backported read path offheap (HBASE-11425) to our customized
>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
>> than a month, and would like to share our experience, for what it's worth
>> (smile).
>> >
>> > Generally speaking, we gained a better and more stable
>> throughput/performance with offheap, and below are some details:
>> > 1. QPS become more stable with offheap
>> >
>> > Performance w/o offheap:
>> >
>> >
>> >
>> > Performance w/ offheap:
>> >
>> >
>> >
>> > These data come from our online A/B test cluster (with 450 physical
>> machines, and each with 256G memory + 64 core) with real world workloads,
>> it shows using offheap we could gain a more stable throughput as well as
>> better performance
>> >
>> > Not showing fully online data here because for online we published the
>> version with both offheap and NettyRpcServer together, so no standalone
>> comparison data for offheap
>> >
>> > 2. Full GC frequency and cost
>> >
>> > Average Full GC STW time reduce from 11s to 7s with offheap.
>> >
>> > 3. Young GC frequency and cost
>> >
>> > No performance degradation observed with offheap.
>> >
>> > 4. Peak throughput of one single RS
>> >
>> > On Singles Day (11/11), peak throughput of one single RS reached 100K,
>> among which 90K from Get. Plus internet in/out data we could know the
>> average result size of get request is ~1KB
>> >
>> >
>> >
>> > Offheap are used on all online machines (more than 1600 nodes) instead
>> of LruCache, so the above QPS is gained from offheap bucketcache, along
>> with NettyRpcServer(HBASE-15756).
>> >
>> > Just let us know if any comments. Thanks.
>> >
>> > Best Regards,
>> > Yu
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>


[jira] [Reopened] (HBASE-16935) Java API method Admin.deleteColumn(table, columnFamily) doesn't delete family's StoreFile from file system.

2016-11-18 Thread Mikhail Zvagelsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Zvagelsky reopened HBASE-16935:
---

It would be handy if we can completely delete column family and its store files 
from a table without disabling the table, in the case when we can guarantee 
what will be no read/write operations with this column family, and we don't 
need this family more.

> Java API method Admin.deleteColumn(table, columnFamily) doesn't delete 
> family's StoreFile from file system.
> ---
>
> Key: HBASE-16935
> URL: https://issues.apache.org/jira/browse/HBASE-16935
> Project: HBase
>  Issue Type: New Feature
>  Components: Admin
>Affects Versions: 1.2.3
>Reporter: Mikhail Zvagelsky
> Attachments: Selection_008.png
>
>
> The method deleteColumn(TableName tableName, byte[] columnName) of the class 
> org.apache.hadoop.hbase.client.Admin shoud delete specified column family 
> from specified table. (Despite of its name the method removes the family, not 
> a column - view the [issue| 
> https://issues.apache.org/jira/browse/HBASE-1989].)
> This method changes the table's schema, but it doesn't delete column family's 
> Store File from a file system. To be precise - I run this code:
> {code:|borderStyle=solid}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.TableName;
> import org.apache.hadoop.hbase.client.*;
> import org.apache.hadoop.hbase.util.Bytes;
> public class ToHBaseIssueTracker {
> public static void main(String[] args) throws IOException {
> TableName tableName = TableName.valueOf("test_table");
> HTableDescriptor desc = new HTableDescriptor(tableName);
> desc.addFamily(new HColumnDescriptor("cf1"));
> desc.addFamily(new HColumnDescriptor("cf2"));
> Configuration conf = HBaseConfiguration.create();
> Connection connection = ConnectionFactory.createConnection(conf);
> Admin admin = connection.getAdmin();
> admin.createTable(desc);
> HTable table = new HTable(conf, "test_table");
> for (int i = 0; i < 4; i++) {
> Put put = new Put(Bytes.toBytes(i)); // Use i as row key.
> put.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("a"), 
> Bytes.toBytes("value"));
> put.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("a"), 
> Bytes.toBytes("value"));
> table.put(put);
> }
> admin.deleteColumn(tableName, Bytes.toBytes("cf2"));
> admin.majorCompact(tableName);
> admin.close();
> }
> }
> {code}
> Then I see that the store file for the "cf2" family persists in file system.
> I observe this effect in standalone hbase installation and in 
> pseudo-distributed mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Yu Li
Sorry guys, let me retry the inline images:

Performance w/o offheap:

​
Performance w/ offheap:

​
Peak Get QPS of one single RS during Singles' Day (11/11):

​

And attach the files in case inline still not working:
​​​
 Performance_without_offheap.png

​​
 Performance_with_offheap.png

​​
 Peak_Get_QPS_of_Single_RS.png

​


Best Regards,
Yu

On 18 November 2016 at 19:29, Ted Yu  wrote:

> Yu:
> With positive results, more hbase users would be asking for the backport
> of offheap read path patches.
>
> Do you think you or your coworker has the bandwidth to publish backport
> for branch-1 ?
>
> Thanks
>
> > On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> >
> > Dear all,
> >
> > We have backported read path offheap (HBASE-11425) to our customized
> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
> than a month, and would like to share our experience, for what it's worth
> (smile).
> >
> > Generally speaking, we gained a better and more stable
> throughput/performance with offheap, and below are some details:
> > 1. QPS become more stable with offheap
> >
> > Performance w/o offheap:
> >
> >
> >
> > Performance w/ offheap:
> >
> >
> >
> > These data come from our online A/B test cluster (with 450 physical
> machines, and each with 256G memory + 64 core) with real world workloads,
> it shows using offheap we could gain a more stable throughput as well as
> better performance
> >
> > Not showing fully online data here because for online we published the
> version with both offheap and NettyRpcServer together, so no standalone
> comparison data for offheap
> >
> > 2. Full GC frequency and cost
> >
> > Average Full GC STW time reduce from 11s to 7s with offheap.
> >
> > 3. Young GC frequency and cost
> >
> > No performance degradation observed with offheap.
> >
> > 4. Peak throughput of one single RS
> >
> > On Singles Day (11/11), peak throughput of one single RS reached 100K,
> among which 90K from Get. Plus internet in/out data we could know the
> average result size of get request is ~1KB
> >
> >
> >
> > Offheap are used on all online machines (more than 1600 nodes) instead
> of LruCache, so the above QPS is gained from offheap bucketcache, along
> with NettyRpcServer(HBASE-15756).
> >
> > Just let us know if any comments. Thanks.
> >
> > Best Regards,
> > Yu
> >
> >
> >
> >
> >
> >
> >
>


Successful: HBase Generate Website

2016-11-18 Thread Apache Jenkins Server
Build status: Successful

If successful, the website and docs have been generated. To update the live 
site, follow the instructions below. If failed, skip to the bottom of this 
email.

Use the following commands to download the patch and apply it to a clean branch 
based on origin/asf-site. If you prefer to keep the hbase-site repo around 
permanently, you can skip the clone step.

  git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git

  cd hbase-site
  wget -O- 
https://builds.apache.org/job/hbase_generate_website/409/artifact/website.patch.zip
 | funzip > 046d1a56b242d8586d8bcd81d26c11a859651972.patch
  git fetch
  git checkout -b asf-site-046d1a56b242d8586d8bcd81d26c11a859651972 
origin/asf-site
  git am --whitespace=fix 046d1a56b242d8586d8bcd81d26c11a859651972.patch

At this point, you can preview the changes by opening index.html or any of the 
other HTML pages in your local 
asf-site-046d1a56b242d8586d8bcd81d26c11a859651972 branch.

There are lots of spurious changes, such as timestamps and CSS styles in 
tables, so a generic git diff is not very useful. To see a list of files that 
have been added, deleted, renamed, changed type, or are otherwise interesting, 
use the following command:

  git diff --name-status --diff-filter=ADCRTXUB origin/asf-site

To see only files that had 100 or more lines changed:

  git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}'

When you are satisfied, publish your changes to origin/asf-site using these 
commands:

  git commit --allow-empty -m "Empty commit" # to work around a current ASF 
INFRA bug
  git push origin asf-site-046d1a56b242d8586d8bcd81d26c11a859651972:asf-site
  git checkout asf-site
  git branch -D asf-site-046d1a56b242d8586d8bcd81d26c11a859651972

Changes take a couple of minutes to be propagated. You can verify whether they 
have been propagated by looking at the Last Published date at the bottom of 
http://hbase.apache.org/. It should match the date in the index.html on the 
asf-site branch in Git.

As a courtesy- reply-all to this email to let other committers know you pushed 
the site.



If failed, see https://builds.apache.org/job/hbase_generate_website/409/console

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Ted Yu
Yu:
With positive results, more hbase users would be asking for the backport of 
offheap read path patches. 

Do you think you or your coworker has the bandwidth to publish backport for 
branch-1 ?

Thanks 

> On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
> 
> Dear all,
> 
> We have backported read path offheap (HBASE-11425) to our customized 
> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more 
> than a month, and would like to share our experience, for what it's worth 
> (smile).
> 
> Generally speaking, we gained a better and more stable throughput/performance 
> with offheap, and below are some details:
> 1. QPS become more stable with offheap
> 
> Performance w/o offheap:
> 
> 
> 
> Performance w/ offheap:
> 
> 
> 
> These data come from our online A/B test cluster (with 450 physical machines, 
> and each with 256G memory + 64 core) with real world workloads, it shows 
> using offheap we could gain a more stable throughput as well as better 
> performance
> 
> Not showing fully online data here because for online we published the 
> version with both offheap and NettyRpcServer together, so no standalone 
> comparison data for offheap
> 
> 2. Full GC frequency and cost
> 
> Average Full GC STW time reduce from 11s to 7s with offheap.
> 
> 3. Young GC frequency and cost
> 
> No performance degradation observed with offheap.
> 
> 4. Peak throughput of one single RS
> 
> On Singles Day (11/11), peak throughput of one single RS reached 100K, among 
> which 90K from Get. Plus internet in/out data we could know the average 
> result size of get request is ~1KB
> 
> 
> 
> Offheap are used on all online machines (more than 1600 nodes) instead of 
> LruCache, so the above QPS is gained from offheap bucketcache, along with 
> NettyRpcServer(HBASE-15756).
> 
> Just let us know if any comments. Thanks.
> 
> Best Regards,
> Yu
> 
> 
> 
> 
> 
> 
> 


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread 曾伟展
+ 1

 原始邮件
发件人: 张铎
收件人: dev@hbase.apache.org; 
u...@hbase.apache.org
发送时间: 2016年11月18日(周五) 17:19
主题: Re: Use experience and performance data of offheap from Alibaba online 
cluster

正在载入邮件原文…


[jira] [Created] (HBASE-17126) Expose KeyValue#checkParameters() and checkForTagsLength() to be used by other Cell implementations

2016-11-18 Thread Xiang Li (JIRA)
Xiang Li created HBASE-17126:


 Summary: Expose KeyValue#checkParameters() and 
checkForTagsLength() to be used by other Cell implementations
 Key: HBASE-17126
 URL: https://issues.apache.org/jira/browse/HBASE-17126
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Affects Versions: 2.0.0
Reporter: Xiang Li
Assignee: Xiang Li
Priority: Minor
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread 张铎
I can not see the images either...

Du, Jingcheng 于2016年11月18日 周五16:57写道:

> Thanks Yu for the sharing, great achievements.
> It seems the images cannot be displayed? Maybe just me?
>
> Regards,
> Jingcheng
>
> From: Yu Li [mailto:car...@gmail.com]
> Sent: Friday, November 18, 2016 4:11 PM
> To: u...@hbase.apache.org; dev@hbase.apache.org
> Subject: Use experience and performance data of offheap from Alibaba
> online cluster
>
> Dear all,
>
> We have backported read path offheap (HBASE-11425) to our customized
> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
> than a month, and would like to share our experience, for what it's worth
> (smile).
>
> Generally speaking, we gained a better and more stable
> throughput/performance with offheap, and below are some details:
>
> 1. QPS become more stable with offheap
>
> Performance w/o offheap:
>
> [cid:part1.582d4b6424f071c]
>
> Performance w/ offheap:
>
> [cid:part2.582d4b6424f071c]
>
> These data come from our online A/B test cluster (with 450 physical
> machines, and each with 256G memory + 64 core) with real world workloads,
> it shows using offheap we could gain a more stable throughput as well as
> better performance
>
> Not showing fully online data here because for online we published the
> version with both offheap and NettyRpcServer together, so no standalone
> comparison data for offheap
>
> 2. Full GC frequency and cost
>
> Average Full GC STW time reduce from 11s to 7s with offheap.
>
> 3. Young GC frequency and cost
>
> No performance degradation observed with offheap.
>
> 4. Peak throughput of one single RS
>
> On Singles Day (11/11), peak throughput of one single RS reached 100K,
> among which 90K from Get. Plus internet in/out data we could know the
> average result size of get request is ~1KB
>
> [cid:part3.582d4b6424f071c]
>
> Offheap are used on all online machines (more than 1600 nodes) instead of
> LruCache, so the above QPS is gained from offheap bucketcache, along with
> NettyRpcServer(HBASE-15756).
> Just let us know if any comments. Thanks.
>
> Best Regards,
> Yu
>
>
>
>
>
>
>


RE: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Du, Jingcheng
Thanks Yu for the sharing, great achievements.
It seems the images cannot be displayed? Maybe just me?

Regards,
Jingcheng

From: Yu Li [mailto:car...@gmail.com]
Sent: Friday, November 18, 2016 4:11 PM
To: u...@hbase.apache.org; dev@hbase.apache.org
Subject: Use experience and performance data of offheap from Alibaba online 
cluster

Dear all,

We have backported read path offheap (HBASE-11425) to our customized 
hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more 
than a month, and would like to share our experience, for what it's worth 
(smile).

Generally speaking, we gained a better and more stable throughput/performance 
with offheap, and below are some details:

1. QPS become more stable with offheap

Performance w/o offheap:

[cid:part1.582d4b6424f071c]

Performance w/ offheap:

[cid:part2.582d4b6424f071c]

These data come from our online A/B test cluster (with 450 physical machines, 
and each with 256G memory + 64 core) with real world workloads, it shows using 
offheap we could gain a more stable throughput as well as better performance

Not showing fully online data here because for online we published the version 
with both offheap and NettyRpcServer together, so no standalone comparison data 
for offheap

2. Full GC frequency and cost

Average Full GC STW time reduce from 11s to 7s with offheap.

3. Young GC frequency and cost

No performance degradation observed with offheap.

4. Peak throughput of one single RS

On Singles Day (11/11), peak throughput of one single RS reached 100K, among 
which 90K from Get. Plus internet in/out data we could know the average result 
size of get request is ~1KB

[cid:part3.582d4b6424f071c]

Offheap are used on all online machines (more than 1600 nodes) instead of 
LruCache, so the above QPS is gained from offheap bucketcache, along with 
NettyRpcServer(HBASE-15756).
Just let us know if any comments. Thanks.

Best Regards,
Yu








[jira] [Created] (HBASE-17125) Inconsistent result when use filter to read data

2016-11-18 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-17125:
--

 Summary: Inconsistent result when use filter to read data
 Key: HBASE-17125
 URL: https://issues.apache.org/jira/browse/HBASE-17125
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
The oldest version doesn't remove immediately. But from the user view, the 
oldest version has gone. When user use a filter to query, if the filter skip a 
new version, then the oldest version will be seen again. But after compact the 
region, then the oldest version will never been seen. So it is weird for user. 
The query will get inconsistent result before and after region compaction.

The reason is matchColumn method of UserScanQueryMatcher. It first check the 
cell by filter, then check the number of versions needed. So if the filter skip 
the new version, then the oldest version will be seen again when it is not 
removed.

Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
solution for this problem. The first idea is check the number of versions 
first, then check the cell by filter. As the comment of setFilter, the filter 
is called after all tests for ttl, column match, deletes and max versions have 
been run.
{code}
  /**
   * Apply the specified server-side filter when performing the Query.
   * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
   * for ttl, column match, deletes and max versions have been run.
   * @param filter filter to run on the server
   * @return this for invocation chaining
   */
  public Query setFilter(Filter filter) {
this.filter = filter;
return this;
  }
{code}
But this idea has another problem, if a column's max version is 5 and the user 
query only need 3 versions. It first check the version's number, then check the 
cell by filter. So the cells number of the result may less than 3. But there 
are 2 versions which don't read anymore.

So the second idea has three steps.
1. check by the max versions of this column
2. check the kv by filter
3. check the versions which user need.
But this will lead the ScanQueryMatcher more complicated. And this will break 
the javadoc of Query.setFilter.

Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Yu Li
Dear all,

We have backported read path offheap (HBASE-11425) to our customized
hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for more
than a month, and would like to share our experience, for what it's worth
(smile).

Generally speaking, we gained a better and more stable
throughput/performance with offheap, and below are some details:

1. QPS become more stable with offheap

Performance w/o offheap:

Performance w/ offheap:

These data come from our online A/B test cluster (with 450 physical
machines, and each with 256G memory + 64 core) with real world workloads,
it shows using offheap we could gain a more stable throughput as well as
better performance

Not showing fully online data here because for online we published the
version with both offheap and NettyRpcServer together, so no standalone
comparison data for offheap

2. Full GC frequency and cost

Average Full GC STW time reduce from 11s to 7s with offheap.

3. Young GC frequency and cost

No performance degradation observed with offheap.

4. Peak throughput of one single RS

On Singles Day (11/11), peak throughput of one single RS reached 100K,
among which 90K from Get. Plus internet in/out data we could know the
average result size of get request is ~1KB

Offheap are used on all online machines (more than 1600 nodes) instead of
LruCache, so the above QPS is gained from offheap bucketcache, along with
NettyRpcServer(HBASE-15756).
Just let us know if any comments. Thanks.

Best Regards,
Yu