[jira] [Commented] (HBASE-17162) Avoid unconditional call to getXXXArray() in write path

2016-11-23 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692557#comment-15692557
 ] 

Anoop Sam John commented on HBASE-17162:


Still some places are left.
1. MOB - will cover as part of HBASE-17169
2. HRegion - append and increment - Plan to raise another sub task for this as 
we will need some bigger change there than just add condition for call.. There 
is already some TODO added in this path and I would like to do that also along 
with it.

> Avoid unconditional call to getXXXArray() in write path
> ---
>
> Key: HBASE-17162
> URL: https://issues.apache.org/jira/browse/HBASE-17162
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17162.patch
>
>
> Still some calls left. Patch will address these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC; purge ThreadLocal usage

2016-11-23 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692551#comment-15692551
 ] 

Anoop Sam John commented on HBASE-17072:



Oh you mean we will be always passing the block on disk size so no way we need 
the prefetched header. My doubt was that only.. Reading code more, I can see 
HFileScanner#seekTo() only passing the block on disk size as -1. Ya that is the 
1st block.
Other places we seems to pass some size.(block.getNextBlockOnDiskSize())
{code}
/**
   * The on-disk size of the next block, including the header and checksums if 
present, obtained by
   * peeking into the first {@link HConstants#HFILEBLOCK_HEADER_SIZE} bytes of 
the next block's
   * header, or UNSET if unknown.
   *
   * Blocks try to carry the size of the next block to read in this data 
member. They will even have
   * this value when served from cache. Could save a seek in the case where we 
are iterating through
   * a file and some of the blocks come from cache. If from cache, then having 
this info to hand
   * will save us doing a seek to read the header so we can read the body of a 
block.
   * TODO: see how effective this is at saving seeks.
   */
  private int nextBlockOnDiskSize = UNSET;
  {code}
  So it says the state we get by seeing next block's header bytes. So we can 
not really avoid it?  Or this comment is wrong now?

> CPU usage starts to climb up to 90-100% when using G1GC; purge ThreadLocal 
> usage
> 
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, HBASE-17072.master.005.patch, 
> disable-block-header-cache.patch, mat-threadlocals.png, mat-threads.png, 
> metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> 

[jira] [Updated] (HBASE-15786) Create DBB backed MSLAB pool

2016-11-23 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-15786:
---
Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

> Create DBB backed MSLAB pool
> 
>
> Key: HBASE-15786
> URL: https://issues.apache.org/jira/browse/HBASE-15786
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: ramkrishna.s.vasudevan
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-15786.patch, HBASE-15786_V2.patch, 
> HBASE-15786_V2.patch, HBASE-15786_V3.patch, HBASE-15786_V4.patch, 
> HBASE-15786_V5.patch
>
>
> We can make use of MSLAB pool for this off heap memstore. 
> Right now one can specify the global memstore size (heap size) as a % of max 
> memory using a config. We will add another config with which one can specify 
> the global off heap memstore size. This will be exact size not as %. When off 
> heap memstore in use, we will give this entire area for the MSLAB pool and 
> that will create off heap chunks. So when cells are added to memstore, the 
> cell data gets copied into the off heap MSLAB chunk spaces. Note that when 
> the pool size is not really enough and we need additional chunk creation, we 
> wont use off heap area for that.  We dony want to create so many on demand 
> DBBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15786) Create DBB backed MSLAB pool

2016-11-23 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-15786:
---
Attachment: HBASE-15786_V5.patch

> Create DBB backed MSLAB pool
> 
>
> Key: HBASE-15786
> URL: https://issues.apache.org/jira/browse/HBASE-15786
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: ramkrishna.s.vasudevan
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-15786.patch, HBASE-15786_V2.patch, 
> HBASE-15786_V2.patch, HBASE-15786_V3.patch, HBASE-15786_V4.patch, 
> HBASE-15786_V5.patch
>
>
> We can make use of MSLAB pool for this off heap memstore. 
> Right now one can specify the global memstore size (heap size) as a % of max 
> memory using a config. We will add another config with which one can specify 
> the global off heap memstore size. This will be exact size not as %. When off 
> heap memstore in use, we will give this entire area for the MSLAB pool and 
> that will create off heap chunks. So when cells are added to memstore, the 
> cell data gets copied into the off heap MSLAB chunk spaces. Note that when 
> the pool size is not really enough and we need additional chunk creation, we 
> wont use off heap area for that.  We dony want to create so many on demand 
> DBBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC; purge ThreadLocal usage

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692449#comment-15692449
 ] 

stack commented on HBASE-17072:
---

Patch v5 restores caching of the next-blocks header. While not needed for 
length, because we read the next header, the cached header is needed so we 
don't have to back up the stream. This was cause of the 20% slowdown when 
single-threaded. v5 is same speed as what we had before. Patch is in essence 
the patch up on HBASE-10676 by [~zhaojianbo].

I think this reading of next block's header is not needed any more. Will open 
new issue to undo it. It makes a bunch of stuff more complicated than it needs 
to be.

> CPU usage starts to climb up to 90-100% when using G1GC; purge ThreadLocal 
> usage
> 
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, HBASE-17072.master.005.patch, 
> disable-block-header-cache.patch, mat-threadlocals.png, mat-threads.png, 
> metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC; purge ThreadLocal usage

2016-11-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17072:
--
Summary: CPU usage starts to climb up to 90-100% when using G1GC; purge 
ThreadLocal usage  (was: CPU usage starts to climb up to 90-100% when using 
G1GC)

> CPU usage starts to climb up to 90-100% when using G1GC; purge ThreadLocal 
> usage
> 
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, HBASE-17072.master.005.patch, 
> disable-block-header-cache.patch, mat-threadlocals.png, mat-threads.png, 
> metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17072:
--
Attachment: HBASE-17072.master.005.patch

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, HBASE-17072.master.005.patch, 
> disable-block-header-cache.patch, mat-threadlocals.png, mat-threads.png, 
> metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692398#comment-15692398
 ] 

ramkrishna.s.vasudevan commented on HBASE-16417:


I agree. So this again takes back to the point that if we have L2 cache offheap 
then we will be benefited with MSLAB and chunkpool. Just saying.

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692392#comment-15692392
 ] 

stack commented on HBASE-10676:
---

HBASE-17072 actually includes the AtomicReference replacement for ThreadLocal 
part of this patch. Thanks.

> Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
> perforamce of scan
> 
>
> Key: HBASE-10676
> URL: https://issues.apache.org/jira/browse/HBASE-10676
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.99.0
>Reporter: zhaojianbo
>Assignee: zhaojianbo
> Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
> HBASE-10676-0.98-branchV2.patch
>
>
> PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
> backward seek operation as the comment said:
> {quote}
> we will not incur a backward seek operation if we have already read this 
> block's header as part of the previous read's look-ahead. And we also want to 
> skip reading the header again if it has already been read.
> {quote}
> But that is not the case. In the code of 0.98, prefetchedHeader is 
> threadlocal for one storefile reader, and in the RegionScanner 
> lifecycle,different rpc handlers will serve scan requests of the same 
> scanner. Even though one handler of previous scan call prefetched the next 
> block header, the other handlers of current scan call will still trigger a 
> backward seek operation. The process is like this:
> # rs handler1 serves the scan call, reads block1 and prefetches the header of 
> block2
> # rs handler2 serves the same scanner's next scan call, because rs handler2 
> doesn't know the header of block2 already prefetched by rs handler1, triggers 
> a backward seek and reads block2, and prefetches the header of block3.
> It is not the sequential read. So I think that the threadlocal is useless, 
> and should be abandoned. I did the work, and evaluated the performance of one 
> client, two client and four client scanning the same region with one 
> storefile.  The test environment is
> # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
> machine
> # A hbase cluster with a zk, a master, a regionserver in the same machine
> # clients are also in the same machine.
> So all the data is local. The storefile is about 22.7GB from our online data, 
> 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
> With the improvement, the client total scan time decreases 21% for the one 
> client case, 11% for the two clients case. But the four clients case is 
> almost the same. The details tests' data is the following:
> ||case||client||time(ms)||
> | original | 1 | 306222 |
> | new | 1 | 241313 |
> | original | 2 | 416390 |
> | new | 2 | 369064 |
> | original | 4 | 555986 |
> | new | 4 | 562152 |
> With some modification(see the comments below), the newest result is 
> ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
> |original|1|306222|new with synchronized|1|239510|new with 
> AtomicReference|1|241243|
> |original|2|416390|new with synchronized|2|365367|new with 
> AtomicReference|2|368952|
> |original|4|555986|new with synchronized|4|540642|new with 
> AtomicReference|4|545715|
> |original|8|854029|new with synchronized|8|852137|new with 
> AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17138) Backport read-path offheap (HBASE-11425) to branch-1

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692393#comment-15692393
 ] 

ramkrishna.s.vasudevan commented on HBASE-17138:


I think some of the building blocks may be a bigger change and may end up in 
problems which is not easy to back port. I agree with [~Apache9] here.
First we see what compatability issues we got and then based on that see if it 
is easier to back port.
Even i too don't remember all the compatability issues but I don't think there 
is anything in terms of USer APIs.

> Backport read-path offheap (HBASE-11425) to branch-1
> 
>
> Key: HBASE-17138
> URL: https://issues.apache.org/jira/browse/HBASE-17138
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Sun
>
> From the 
> [thread|http://mail-archives.apache.org/mod_mbox/hbase-user/201611.mbox/%3CCAM7-19%2Bn7cEiY4H9iLQ3N9V0NXppOPduZwk-hhgNLEaJfiV3kA%40mail.gmail.com%3E]
>  of sharing our experience and performance data of read-path offheap usage in 
> Alibaba search, we could see people are positive to have HBASE-11425 in 
> branch-1, so I'd like to create a JIRA and move the discussion and decision 
> making here.
> Echoing some comments from the mail thread:
> Bryan:
> Is the backported patch available anywhere? If it ends up not getting 
> officially backported to branch-1 due to 2.0 around the corner, some of us 
> who build our own deploy may want to integrate into our builds
> Andrew:
> Yes, please, the patches will be useful to the community even if we decide 
> not to backport into an official 1.x release.
> Enis:
> I don't see any reason why we cannot backport to branch-1.
> Ted:
> Opening a JIRA would be fine. This makes it easier for people to obtain the 
> patch(es)
> Nick:
> From the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time... I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps in 
> a 1.4 or 1.5 release timeframe.
> Anoop:
> Because of some compatibility issues, we decide that this will be done in 2.0 
> only..  Ya as Andy said, it would be great to share the 1.x backported 
> patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-17048) Introduce a helper class to calcuate suitable ByteBuf size when allocating send buffer in FanOutOneBlockAsyncDFSOutputHelper

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-17048:
--

Assignee: ramkrishna.s.vasudevan

> Introduce a helper class to calcuate suitable ByteBuf size when allocating 
> send buffer in FanOutOneBlockAsyncDFSOutputHelper
> 
>
> Key: HBASE-17048
> URL: https://issues.apache.org/jira/browse/HBASE-17048
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
>
> As [~ram_krish] mentioned in HBASE-17021
> https://issues.apache.org/jira/browse/HBASE-17021?focusedCommentId=15646938=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15646938
> The default ByteBuf size is 256B which is too small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17012) Handle Offheap cells in CompressedKvEncoder

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-17012:
---
Status: Patch Available  (was: Open)

> Handle Offheap cells in CompressedKvEncoder
> ---
>
> Key: HBASE-17012
> URL: https://issues.apache.org/jira/browse/HBASE-17012
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17012_1.patch, HBASE-17012_2.patch
>
>
> When we deal with off heap cells we will end up copying Cell components on 
> heap
> {code}
> public void write(Cell cell) throws IOException {
> .
>   write(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(), 
> compression.rowDict);
>   write(cell.getFamilyArray(), cell.getFamilyOffset(), 
> cell.getFamilyLength(),
>   compression.familyDict);
>   write(cell.getQualifierArray(), cell.getQualifierOffset(), 
> cell.getQualifierLength(),
>   compression.qualifierDict);
> ..
>   out.write(cell.getValueArray(), cell.getValueOffset(), 
> cell.getValueLength());
> ...
> {code}
> We need to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17012) Handle Offheap cells in CompressedKvEncoder

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-17012:
---
Attachment: HBASE-17012_2.patch

Patch that works with SecureWAL also. Now the CryptoOS is wrapped by the 
ByteBufferWriterOutputStream.

> Handle Offheap cells in CompressedKvEncoder
> ---
>
> Key: HBASE-17012
> URL: https://issues.apache.org/jira/browse/HBASE-17012
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17012_1.patch, HBASE-17012_2.patch
>
>
> When we deal with off heap cells we will end up copying Cell components on 
> heap
> {code}
> public void write(Cell cell) throws IOException {
> .
>   write(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(), 
> compression.rowDict);
>   write(cell.getFamilyArray(), cell.getFamilyOffset(), 
> cell.getFamilyLength(),
>   compression.familyDict);
>   write(cell.getQualifierArray(), cell.getQualifierOffset(), 
> cell.getQualifierLength(),
>   compression.qualifierDict);
> ..
>   out.write(cell.getValueArray(), cell.getValueOffset(), 
> cell.getValueLength());
> ...
> {code}
> We need to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17012) Handle Offheap cells in CompressedKvEncoder

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-17012:
---
Status: Open  (was: Patch Available)

> Handle Offheap cells in CompressedKvEncoder
> ---
>
> Key: HBASE-17012
> URL: https://issues.apache.org/jira/browse/HBASE-17012
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17012_1.patch
>
>
> When we deal with off heap cells we will end up copying Cell components on 
> heap
> {code}
> public void write(Cell cell) throws IOException {
> .
>   write(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(), 
> compression.rowDict);
>   write(cell.getFamilyArray(), cell.getFamilyOffset(), 
> cell.getFamilyLength(),
>   compression.familyDict);
>   write(cell.getQualifierArray(), cell.getQualifierOffset(), 
> cell.getQualifierLength(),
>   compression.qualifierDict);
> ..
>   out.write(cell.getValueArray(), cell.getValueOffset(), 
> cell.getValueLength());
> ...
> {code}
> We need to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-17140) Throw RegionOfflineException directly when request for a disabled table

2016-11-23 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692366#comment-15692366
 ] 

Guanghao Zhang edited comment on HBASE-17140 at 11/24/16 6:37 AM:
--

Now the RegionReplicaReplicationEndpoint will skip wal edits for disabled and 
dropped table. There are a cache for disabled and dropped tables that have a 
default expiry of 5 sec. So I add Thread.sleep(1) in 
TestRegionReplicaFailover. [~enis] I thought this doesn't need cache the 
disabled tables. It will miss some edits if we disable a table and immediately 
enable it again...


was (Author: zghaobac):
Now the RegionReplicaReplicationEndpoint will skip wal edits for disabled and 
dropped table. There are a cache for disabled and dropped tables that have a 
default expiry of 5 sec. So I add Thread.sleep(1) in 
TestRegionReplicaFailover. [~enis] I thought this didn't need cache the 
disabled tables. It will miss some edits if we disable a table and immediately 
enable it again...

> Throw RegionOfflineException directly when request for a disabled table
> ---
>
> Key: HBASE-17140
> URL: https://issues.apache.org/jira/browse/HBASE-17140
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-17140-v1.patch, HBASE-17140-v2.patch, 
> HBASE-17140-v3.patch
>
>
> Now when request for a disabled table, it need 3 rpc calls before fail.
> 1. get region location
> 2. send call to rs and get NotServeRegionException
> 3. retry and check the table state, then throw TableNotEnabledException
> The table state check is added for disabled table. But now the prepare method 
> in RegionServerCallable shows that all retry request will get table state 
> first.
> {code}
> public void prepare(final boolean reload) throws IOException {
> // check table state if this is a retry
> if (reload && !tableName.equals(TableName.META_TABLE_NAME) &&
> getConnection().isTableDisabled(tableName)) {
>   throw new TableNotEnabledException(tableName.getNameAsString() + " is 
> disabled.");
> }
> try (RegionLocator regionLocator = 
> connection.getRegionLocator(tableName)) {
>   this.location = regionLocator.getRegionLocation(row);
> }
> if (this.location == null) {
>   throw new IOException("Failed to find location, tableName=" + tableName 
> +
>   ", row=" + Bytes.toString(row) + ", reload=" + reload);
> }
> setStubByServiceName(this.location.getServerName());
> }
> {code}
> An improvement is set the region offline in HRegionInfo. Then throw the 
> RegionOfflineException when get region location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17140) Throw RegionOfflineException directly when request for a disabled table

2016-11-23 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692366#comment-15692366
 ] 

Guanghao Zhang commented on HBASE-17140:


Now the RegionReplicaReplicationEndpoint will skip wal edits for disabled and 
dropped table. There are a cache for disabled and dropped tables that have a 
default expiry of 5 sec. So I add Thread.sleep(1) in 
TestRegionReplicaFailover. [~enis] I thought this didn't need cache the 
disabled tables. It will miss some edits if we disable a table and immediately 
enable it again...

> Throw RegionOfflineException directly when request for a disabled table
> ---
>
> Key: HBASE-17140
> URL: https://issues.apache.org/jira/browse/HBASE-17140
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-17140-v1.patch, HBASE-17140-v2.patch, 
> HBASE-17140-v3.patch
>
>
> Now when request for a disabled table, it need 3 rpc calls before fail.
> 1. get region location
> 2. send call to rs and get NotServeRegionException
> 3. retry and check the table state, then throw TableNotEnabledException
> The table state check is added for disabled table. But now the prepare method 
> in RegionServerCallable shows that all retry request will get table state 
> first.
> {code}
> public void prepare(final boolean reload) throws IOException {
> // check table state if this is a retry
> if (reload && !tableName.equals(TableName.META_TABLE_NAME) &&
> getConnection().isTableDisabled(tableName)) {
>   throw new TableNotEnabledException(tableName.getNameAsString() + " is 
> disabled.");
> }
> try (RegionLocator regionLocator = 
> connection.getRegionLocator(tableName)) {
>   this.location = regionLocator.getRegionLocation(row);
> }
> if (this.location == null) {
>   throw new IOException("Failed to find location, tableName=" + tableName 
> +
>   ", row=" + Bytes.toString(row) + ", reload=" + reload);
> }
> setStubByServiceName(this.location.getServerName());
> }
> {code}
> An improvement is set the region offline in HRegionInfo. Then throw the 
> RegionOfflineException when get region location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-11-23 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692359#comment-15692359
 ] 

Eshcar Hillel commented on HBASE-16417:
---

My explanation is that with 100% writes the block cache is empty and does not 
take any memory from the heap.
Whereas when there is even just 5% reads (will be the same for even less) the 
block cache is full taking 40% (!!!) of the heap space (or 38% to be precise in 
our settings). Only 62% of the heap space are left to be used by memstore, 
chunk pool, compactions (both from disk and memory) and also the GC which is 
know to take a lot of space (g1gc).
So at some point there is just not enough space for all these to work well, or 
even to work at all.

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17140) Throw RegionOfflineException directly when request for a disabled table

2016-11-23 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-17140:
---
Attachment: HBASE-17140-v3.patch

Fix ut.

> Throw RegionOfflineException directly when request for a disabled table
> ---
>
> Key: HBASE-17140
> URL: https://issues.apache.org/jira/browse/HBASE-17140
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-17140-v1.patch, HBASE-17140-v2.patch, 
> HBASE-17140-v3.patch
>
>
> Now when request for a disabled table, it need 3 rpc calls before fail.
> 1. get region location
> 2. send call to rs and get NotServeRegionException
> 3. retry and check the table state, then throw TableNotEnabledException
> The table state check is added for disabled table. But now the prepare method 
> in RegionServerCallable shows that all retry request will get table state 
> first.
> {code}
> public void prepare(final boolean reload) throws IOException {
> // check table state if this is a retry
> if (reload && !tableName.equals(TableName.META_TABLE_NAME) &&
> getConnection().isTableDisabled(tableName)) {
>   throw new TableNotEnabledException(tableName.getNameAsString() + " is 
> disabled.");
> }
> try (RegionLocator regionLocator = 
> connection.getRegionLocator(tableName)) {
>   this.location = regionLocator.getRegionLocation(row);
> }
> if (this.location == null) {
>   throw new IOException("Failed to find location, tableName=" + tableName 
> +
>   ", row=" + Bytes.toString(row) + ", reload=" + reload);
> }
> setStubByServiceName(this.location.getServerName());
> }
> {code}
> An improvement is set the region offline in HRegionInfo. Then throw the 
> RegionOfflineException when get region location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17110) Add an "Overall Strategy" option(balanced both on table level and server level) to SimpleLoadBalancer

2016-11-23 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692329#comment-15692329
 ] 

Yu Li commented on HBASE-17110:
---

Ok, waiting for the comments (smile).

> Add an "Overall Strategy" option(balanced both on table level and server 
> level) to SimpleLoadBalancer
> -
>
> Key: HBASE-17110
> URL: https://issues.apache.org/jira/browse/HBASE-17110
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.4
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17110-V2.patch, HBASE-17110-V3.patch, 
> HBASE-17110-V4.patch, HBASE-17110-V5.patch, HBASE-17110.patch
>
>
> This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
> new strategy: "bytableOverall" which could be controlled by adding:
> {noformat}
> 
>   hbase.master.loadbalance.bytableOverall
>   true
> 
> {noformat}
> We have been using the strategy on our largest cluster for several months. 
> it's proven to be very helpful and stable, especially, the result is quite 
> visible to the users.
> Here is the reason why it's helpful:
> When operating large scale clusters(our case), some companies still prefer to 
> use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
> generation, etc. Current SimpleLoadBalancer has two modes: 
> 1. byTable, which only guarantees that the regions of one table could be 
> uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the 
> regions all together.
> If the pressures on different tables are different, the first byTable option 
> is the preferable one in most case. Yet, this choice sacrifice the cluster 
> level balance and would cause some servers to have significantly higher load, 
> e.g. 242 regions on server A but 417 regions on server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been 
> perfectly balanced on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
> table2 and table3 still keep balanced.   
> And this is what the new mode "byTableOverall" can achieve.
> Two UTs have been added as well and the last one demonstrates the advantage 
> of the new strategy.
> Also, a onConfigurationChange method has been implemented to hot control the 
> "slop" variable.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size

2016-11-23 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692328#comment-15692328
 ] 

Esteban Gutierrez commented on HBASE-17116:
---

+1 Thanks [~easyliangjob]

> [PerformanceEvaluation] Add option to configure block size
> --
>
> Key: HBASE-17116
> URL: https://issues.apache.org/jira/browse/HBASE-17116
> Project: HBase
>  Issue Type: Bug
>  Components: tooling
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.5
>Reporter: Esteban Gutierrez
>Assignee: Yi Liang
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-17116-V1.patch
>
>
> Followup from HBASE-9940 to add option to configure block size for 
> PerformanceEvaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size

2016-11-23 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692321#comment-15692321
 ] 

Jerry He commented on HBASE-17116:
--

+1

> [PerformanceEvaluation] Add option to configure block size
> --
>
> Key: HBASE-17116
> URL: https://issues.apache.org/jira/browse/HBASE-17116
> Project: HBase
>  Issue Type: Bug
>  Components: tooling
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.5
>Reporter: Esteban Gutierrez
>Assignee: Yi Liang
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-17116-V1.patch
>
>
> Followup from HBASE-9940 to add option to configure block size for 
> PerformanceEvaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17110) Add an "Overall Strategy" option(balanced both on table level and server level) to SimpleLoadBalancer

2016-11-23 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692290#comment-15692290
 ] 

Ashish Singhi commented on HBASE-17110:
---

Give me some time, I'm also checking it.
Thanks.

> Add an "Overall Strategy" option(balanced both on table level and server 
> level) to SimpleLoadBalancer
> -
>
> Key: HBASE-17110
> URL: https://issues.apache.org/jira/browse/HBASE-17110
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.4
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17110-V2.patch, HBASE-17110-V3.patch, 
> HBASE-17110-V4.patch, HBASE-17110-V5.patch, HBASE-17110.patch
>
>
> This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
> new strategy: "bytableOverall" which could be controlled by adding:
> {noformat}
> 
>   hbase.master.loadbalance.bytableOverall
>   true
> 
> {noformat}
> We have been using the strategy on our largest cluster for several months. 
> it's proven to be very helpful and stable, especially, the result is quite 
> visible to the users.
> Here is the reason why it's helpful:
> When operating large scale clusters(our case), some companies still prefer to 
> use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
> generation, etc. Current SimpleLoadBalancer has two modes: 
> 1. byTable, which only guarantees that the regions of one table could be 
> uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the 
> regions all together.
> If the pressures on different tables are different, the first byTable option 
> is the preferable one in most case. Yet, this choice sacrifice the cluster 
> level balance and would cause some servers to have significantly higher load, 
> e.g. 242 regions on server A but 417 regions on server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been 
> perfectly balanced on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
> table2 and table3 still keep balanced.   
> And this is what the new mode "byTableOverall" can achieve.
> Two UTs have been added as well and the last one demonstrates the advantage 
> of the new strategy.
> Also, a onConfigurationChange method has been implemented to hot control the 
> "slop" variable.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692284#comment-15692284
 ] 

stack commented on HBASE-17072:
---

No. The order. Before patch it is seek to  offset 0 twice, then to the trailer 
twice, and then off we go. W/ patch seeks to trailer twice first (which seems 
right), then two seeks to 0 and then back to trailer again (?).

I think I've figured why we are slow when single threaded. The cached header 
was stopping us rereading the header (seek back before reading in whole 
header). Let me see.

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, disable-block-header-cache.patch, 
> mat-threadlocals.png, mat-threads.png, metrics.png, slave1.svg, slave2.svg, 
> slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15787) Change the flush related heuristics to work with offheap size configured

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692275#comment-15692275
 ] 

ramkrishna.s.vasudevan commented on HBASE-15787:


bq.DefaultHeapMemoryTuner dealing with off heap flush count etc.. IMO we should 
avoid this.
I think here we are not dealing or tuning the offheap memory. What we are doing 
is that we are using those counts to make a better onheap tuning decision. So I 
don't think we are making the heap memory tuner to deal with offheap. That is 
what I think.
bq.Why we need on heap off heap diff types in FlushType ?
I just need that to know what is the exact flush count I get even when we have 
onheap memstore or offheap memstore. May be this enum we can move out of the 
tuner? I think I just added some comments over there in the patch to decide on 
it.
Actually the decision to increase/decrease memstore size and block size is more 
intelligent now. It does in terms terms of smaller steps. Say if we have global 
memstore limit as 12G even if we decrease the fraction by 0.1 (say from 0.42 to 
0.32 and xmx is 30G) we drastically go down to 9G. But the current code does 
not do that way. It still tries to do it with smaller steps.
So when there is an offheap memstore and the reads are more and we need more 
block cache space we need to reduce heap memstore limit. In that case we just 
try to reduce with the minimum step size. 
It is not directly possible to apply something like this
{code}
Blocked flush = blocked flush by heap overhead pressure + 0.75 * blocked flush 
by off heap pressure
{code}
We can only indirectly change the step size and in this case we try to make the 
step size as less as possible.
One thing is that in case of R/W workload eventually it will settle down in 
course of time or as in the current impl if we are not able to take a decision 
the current setting are left intact. So I think by all means we are safe.

> Change the flush related heuristics to work with offheap size configured
> 
>
> Key: HBASE-15787
> URL: https://issues.apache.org/jira/browse/HBASE-15787
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-15787.patch, HBASE-15787_1.patch
>
>
> With offheap MSLAB in place we may have to change the flush related 
> heuristics to work with offheap size configured rather than the java heap 
> size.
> Since we now have clear seperation of the memstore data size and memstore 
> heap size, for offheap memstore
> -> Decide if the global.offheap.memstore.size is breached for blocking 
> updates and force flushes. 
> -> If the onheap global.memstore.size is breached (due to heap overhead) even 
> then block updates and force flushes.
> -> The global.memstore.size.lower.limit is now by default 95% of the 
> global.memstore.size. So now we apply this 95% on the 
> global.offheap.memstore.size and also on global.memstore.size (as it was done 
> for onheap case).
> -> We will have new FlushTypes introduced
> {code}
>   ABOVE_ONHEAP_LOWER_MARK, /* happens due to lower mark breach of onheap 
> memstore settings
>   An offheap memstore can even breach the 
> onheap_lower_mark*/
>   ABOVE_ONHEAP_HIGHER_MARK,/* happens due to higher mark breach of onheap 
> memstore settings
>   An offheap memstore can even breach the 
> onheap_higher_mark*/
>   ABOVE_OFFHEAP_LOWER_MARK,/* happens due to lower mark breach of offheap 
> memstore settings*/
>   ABOVE_OFFHEAP_HIGHER_MARK;
> {code}
> -> regionServerAccounting does all the accounting.
> -> HeapMemoryTuner is what is litte tricky here. First thing to note is that 
> at no point it will try to increase or decrease the 
> global.offheap.memstore.size. If there is a heap pressure then it will try to 
> increase the memstore heap limit. 
> In case of offheap memstore there is always a chance that the heap pressure 
> does not increase. In that case we could ideally decrease the heap limit for 
> memstore. The current logic of heapmemory tuner is such that things will 
> naturally settle down. But on discussion what we thought is let us include 
> the flush count that happens due to offheap pressure but give that a lesser 
> weightage and thus ensure that the initial decrease on memstore heap limit 
> does not happen. Currently that fraction is set as 0.5. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692255#comment-15692255
 ] 

ramkrishna.s.vasudevan commented on HBASE-17072:


bq.I'm not sure why above changes file scan startup sequence.
Can you tell more on this as how the sequence is affected?  You mean in terms 
of the number of seeks as you had mentioned in the previous comment?

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, disable-block-header-cache.patch, 
> mat-threadlocals.png, mat-threads.png, metrics.png, slave1.svg, slave2.svg, 
> slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17049) Find out why AsyncFSWAL issues much more syncs than FSHLog

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692257#comment-15692257
 ] 

ramkrishna.s.vasudevan commented on HBASE-17049:


Larger batch size with WALPE helped me. With default batch size I could see 
lesser performance. I could verify that.
Let me check HBASE-17048.

> Find out why AsyncFSWAL issues much more syncs than FSHLog
> --
>
> Key: HBASE-17049
> URL: https://issues.apache.org/jira/browse/HBASE-17049
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: delay-sync.patch
>
>
> https://issues.apache.org/jira/browse/HBASE-16890?focusedCommentId=15647590=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15647590



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692240#comment-15692240
 ] 

ramkrishna.s.vasudevan commented on HBASE-16417:


Am more interested in that fig 8. Why do you think with MSLAB and chunk we get 
a much poor latency and through put? WE only have 5% reads.

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17144) Possible offheap read ByteBuffers leak

2016-11-23 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692135#comment-15692135
 ] 

binlijin commented on HBASE-17144:
--

[~anoop.hbase] [~carp84] Thanks very much for the review, will commit it soon 
if no other concern.

> Possible offheap read ByteBuffers leak
> --
>
> Key: HBASE-17144
> URL: https://issues.apache.org/jira/browse/HBASE-17144
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0
>
> Attachments: HBASE-17144.patch
>
>
> At HBASE-15788 we reuse off heap read BBs, the CallCleanup will not called in 
> some circumstances for example CALL_QUEUE_TOO_BIG_EXCEPTION and 
> readParamsFailedCall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17144) Possible offheap read ByteBuffers leak

2016-11-23 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692128#comment-15692128
 ] 

binlijin commented on HBASE-17144:
--

 Timed out case are irrelative, run it success locally.

> Possible offheap read ByteBuffers leak
> --
>
> Key: HBASE-17144
> URL: https://issues.apache.org/jira/browse/HBASE-17144
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0
>
> Attachments: HBASE-17144.patch
>
>
> At HBASE-15788 we reuse off heap read BBs, the CallCleanup will not called in 
> some circumstances for example CALL_QUEUE_TOO_BIG_EXCEPTION and 
> readParamsFailedCall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-23 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692109#comment-15692109
 ] 

Jingcheng Du commented on HBASE-17172:
--

Hi [~huaxiang], why do you need major compaction? You want to reduce the number 
of the files? If so, can the proposal in HBASE-16981 solve this and I guess the 
major compaction won't be needed if HBASE-16981 is implemented?

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17144) Possible offheap read ByteBuffers leak

2016-11-23 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692102#comment-15692102
 ] 

Yu Li commented on HBASE-17144:
---

Patch LGTM, +1

> Possible offheap read ByteBuffers leak
> --
>
> Key: HBASE-17144
> URL: https://issues.apache.org/jira/browse/HBASE-17144
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0
>
> Attachments: HBASE-17144.patch
>
>
> At HBASE-15788 we reuse off heap read BBs, the CallCleanup will not called in 
> some circumstances for example CALL_QUEUE_TOO_BIG_EXCEPTION and 
> readParamsFailedCall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-23 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692088#comment-15692088
 ] 

Jingcheng Du commented on HBASE-17172:
--

Thanks [~huaxiang]!
A major compaction compacts all the files even without del files which is slow. 
Is it related with the del files? How about to increase the number of threads 
to perform the compaction to reduce the running time?
Actually if the delete is rare, we can always keep the delete marker in hbase 
files in mob-enabled cf even in all files and major compaction. And we won't 
need the .del files in mob anymore.
If this slow is not related with the .del files, I guess we have to fix the 
slow compaction by implementing a distributed compaction. I filed a JIRA 
HBASE-15381 to implement this, the patch is there, but I didn't rebase for long 
time. Are you interested to take it?

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17110) Add an "Overall Strategy" option(balanced both on table level and server level) to SimpleLoadBalancer

2016-11-23 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692058#comment-15692058
 ] 

Guanghao Zhang commented on HBASE-17110:


Yeah, Thanks for the explanation. [~xharlie]

> Add an "Overall Strategy" option(balanced both on table level and server 
> level) to SimpleLoadBalancer
> -
>
> Key: HBASE-17110
> URL: https://issues.apache.org/jira/browse/HBASE-17110
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.4
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17110-V2.patch, HBASE-17110-V3.patch, 
> HBASE-17110-V4.patch, HBASE-17110-V5.patch, HBASE-17110.patch
>
>
> This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
> new strategy: "bytableOverall" which could be controlled by adding:
> {noformat}
> 
>   hbase.master.loadbalance.bytableOverall
>   true
> 
> {noformat}
> We have been using the strategy on our largest cluster for several months. 
> it's proven to be very helpful and stable, especially, the result is quite 
> visible to the users.
> Here is the reason why it's helpful:
> When operating large scale clusters(our case), some companies still prefer to 
> use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
> generation, etc. Current SimpleLoadBalancer has two modes: 
> 1. byTable, which only guarantees that the regions of one table could be 
> uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the 
> regions all together.
> If the pressures on different tables are different, the first byTable option 
> is the preferable one in most case. Yet, this choice sacrifice the cluster 
> level balance and would cause some servers to have significantly higher load, 
> e.g. 242 regions on server A but 417 regions on server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been 
> perfectly balanced on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
> table2 and table3 still keep balanced.   
> And this is what the new mode "byTableOverall" can achieve.
> Two UTs have been added as well and the last one demonstrates the advantage 
> of the new strategy.
> Also, a onConfigurationChange method has been implemented to hot control the 
> "slop" variable.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16489) Configuration parsing

2016-11-23 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692036#comment-15692036
 ] 

Enis Soztutar commented on HBASE-16489:
---

Thanks for the latest version of the patch. A couple of comments: 
- This still assumes that the hbase source code is at {{/usr/src/hbase}}. 
Instead you should use relative directories or find PWD and use {{$PWD/build}} 
or something. 
{code}
+const std::string kHBaseConfPath("/usr/src/hbase/hbase-native-client/build/");
{code}

Same thing with kDefHBaseConfPath as well. You can manually create a conf dir 
under build if you want to set a search path there.  
- Lets rename the files as well {{configuration_loader.h}} -> 
{{hbase_configuration_loader.h}} and same for {{.cc}}. 
- You should not do these kind of conditionals in unit tests: 
{code}
+  if (conf) {
+EXPECT_STREQ((*conf).Get("custom-prop", "").c_str(), "custom-value");
+EXPECT_STRNE((*conf).Get("custom-prop", "").c_str(), "some-value");
+  }
{code} 
The goal of the unit test is to fail with an exception if there is something 
wrong, and do assertions in the success code paths. In case configuration will 
not be parsed above (let's say a later patch breaks this path), we actually 
want to fail the test so that we can catch the issue. The above code instead 
will silently succeed, thus defeating the purpose of having a unit test. The 
best way would be to add an assertion after the config parsing which checks the 
conf optional is set. Something like this: 
{code}
+  HBaseConfigurationLoader loader;
+  std::vector resources { kHBaseSiteXml };
+  hbase::optional conf = loader.LoadResources(kHBaseConfPath,
+ resources);
+ASSERT_TRUE(conf.has_value()) << "Configuration parsing failed!";
+EXPECT_STREQ((*conf).Get("custom-prop", "").c_str(), "custom-value");
+EXPECT_STRNE((*conf).Get("custom-prop", "").c_str(), "some-value");
{code}

- Also forgot to mention earlier that in java code base, we have a strict line 
wrapping of 100 columns. Let's follow that in the C++ code base as well. Our 
precommit script (hadoopqa) checks for that for regular patches, but it is not 
hooked up for this branch and C++ code yet. So we can do the manual check for 
now. 




> Configuration parsing
> -
>
> Key: HBASE-16489
> URL: https://issues.apache.org/jira/browse/HBASE-16489
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
>Assignee: Sudeep Sunthankar
> Attachments: HBASE-16489.HBASE-14850.v1.patch, 
> HBASE-16489.HBASE-14850.v2.patch, HBASE-16489.HBASE-14850.v3.patch, 
> HBASE-16489.HBASE-14850.v4.patch, HBASE-16489.HBASE-14850.v5.patch, 
> HBASE-16489.HBASE-14850.v6.patch
>
>
> Reading hbase-site.xml is required to read various properties viz. 
> zookeeper-quorum, client retires etc.  We can either use Apache Xerces or 
> Boost libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9774) Provide a way for coprocessors to register and report custom metrics

2016-11-23 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691988#comment-15691988
 ] 

Enis Soztutar commented on HBASE-9774:
--

Liked HBASE-14282 which is just ripping metrics. This patch will give us DW 
based metrics first in the code base, then we can incrementally move to using 
more and more internal stuff and eventually contain everything for metrics2 in 
the hadoop-compat module. 

> Provide a way for coprocessors to register and report custom metrics
> 
>
> Key: HBASE-9774
> URL: https://issues.apache.org/jira/browse/HBASE-9774
> Project: HBase
>  Issue Type: New Feature
>  Components: Coprocessors, metrics
>Reporter: Gary Helmling
>
> It would help provide better visibility into what coprocessors are doing if 
> we provided a way for coprocessors to export their own metrics.  The general 
> idea is to:
> * extend access to the HBase "metrics bus" down into the coprocessor 
> environments
> * coprocessors can then register and increment custom metrics
> * coprocessor metrics are then reported along with all others through normal 
> mechanisms



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17110) Add an "Overall Strategy" option(balanced both on table level and server level) to SimpleLoadBalancer

2016-11-23 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-17110:
--
Issue Type: Improvement  (was: New Feature)

Updating JIRA type from "New Feature" to "Improvement" since we decided not to 
add a new strategy but to improve the existing one. Will update the JIRA title 
and complete release note later after review done.

> Add an "Overall Strategy" option(balanced both on table level and server 
> level) to SimpleLoadBalancer
> -
>
> Key: HBASE-17110
> URL: https://issues.apache.org/jira/browse/HBASE-17110
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.4
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17110-V2.patch, HBASE-17110-V3.patch, 
> HBASE-17110-V4.patch, HBASE-17110-V5.patch, HBASE-17110.patch
>
>
> This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
> new strategy: "bytableOverall" which could be controlled by adding:
> {noformat}
> 
>   hbase.master.loadbalance.bytableOverall
>   true
> 
> {noformat}
> We have been using the strategy on our largest cluster for several months. 
> it's proven to be very helpful and stable, especially, the result is quite 
> visible to the users.
> Here is the reason why it's helpful:
> When operating large scale clusters(our case), some companies still prefer to 
> use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
> generation, etc. Current SimpleLoadBalancer has two modes: 
> 1. byTable, which only guarantees that the regions of one table could be 
> uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the 
> regions all together.
> If the pressures on different tables are different, the first byTable option 
> is the preferable one in most case. Yet, this choice sacrifice the cluster 
> level balance and would cause some servers to have significantly higher load, 
> e.g. 242 regions on server A but 417 regions on server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been 
> perfectly balanced on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
> table2 and table3 still keep balanced.   
> And this is what the new mode "byTableOverall" can achieve.
> Two UTs have been added as well and the last one demonstrates the advantage 
> of the new strategy.
> Also, a onConfigurationChange method has been implemented to hot control the 
> "slop" variable.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17110) Add an "Overall Strategy" option(balanced both on table level and server level) to SimpleLoadBalancer

2016-11-23 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691981#comment-15691981
 ] 

Yu Li commented on HBASE-17110:
---

Patch v5 lgtm, +1.

[~zghaobac] do above answers from [~xharlie] address your concern? Thanks.

[~tedyu], [~anoop.hbase] and [~enis], any more comments on the new patch?

Preparing to commit it if no more comments/objections (smile), targeting for 
master, branch-1, branch-1.2 and branch-1.3(1.3.1).

> Add an "Overall Strategy" option(balanced both on table level and server 
> level) to SimpleLoadBalancer
> -
>
> Key: HBASE-17110
> URL: https://issues.apache.org/jira/browse/HBASE-17110
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.4
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17110-V2.patch, HBASE-17110-V3.patch, 
> HBASE-17110-V4.patch, HBASE-17110-V5.patch, HBASE-17110.patch
>
>
> This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
> new strategy: "bytableOverall" which could be controlled by adding:
> {noformat}
> 
>   hbase.master.loadbalance.bytableOverall
>   true
> 
> {noformat}
> We have been using the strategy on our largest cluster for several months. 
> it's proven to be very helpful and stable, especially, the result is quite 
> visible to the users.
> Here is the reason why it's helpful:
> When operating large scale clusters(our case), some companies still prefer to 
> use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
> generation, etc. Current SimpleLoadBalancer has two modes: 
> 1. byTable, which only guarantees that the regions of one table could be 
> uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the 
> regions all together.
> If the pressures on different tables are different, the first byTable option 
> is the preferable one in most case. Yet, this choice sacrifice the cluster 
> level balance and would cause some servers to have significantly higher load, 
> e.g. 242 regions on server A but 417 regions on server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been 
> perfectly balanced on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
> table2 and table3 still keep balanced.   
> And this is what the new mode "byTableOverall" can achieve.
> Two UTs have been added as well and the last one demonstrates the advantage 
> of the new strategy.
> Also, a onConfigurationChange method has been implemented to hot control the 
> "slop" variable.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15902) Scan Object

2016-11-23 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691972#comment-15691972
 ] 

Enis Soztutar commented on HBASE-15902:
---

Thanks Sudeep for the patch. This is a good start. 

- The patch file seems it is only for the second part of two commits: 
{code}
Subject: [PATCH 2/2] Adding sources for Scan object
{code}
Did you mean to attach the first part as well. I don't see changes to BUCK 
files, so maybe they are there? 

- This should be private: 
{code}
+  bool IsStartRowAndEqualsStopRow() const;
{code}

- Seems that we did not do the Query interface in the get patch. In the 
Java-land that is how we do carry the filter, as well as some extra methods. I 
would suggest that we should leave those out for now, and do another issue for 
the Query interface + the Filter methods. Query has bunch of other methods that 
we do not need, so we can discuss what to include there. 

- We are trying to simplify the Scan API elsewhere in the java client. Some of 
these will be deprecated and removed in java as well. There are a few methods 
that are in the Scan API to deal with how to do paging for large rows 
containing thousands / millions of cells. In our first cut, we would not deal 
with the complexity of passing partial results within rows at all. Once the 
client / scanner is in better shape, we can add that as a feature with better 
APIs (hopefully by that time Scan's API in Java will also be simplified). See 
https://issues.apache.org/jira/browse/HBASE-15484?focusedCommentId=15325647=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15325647.
 

For now, let's remove these methods, and keep the API simple (again we can 
revisit in a later issue). 
{code}
+Scan& Scan::SetRowOffsetPerColumnFamily(int store_offset) {
+int Scan::RowOffsetPerColumnFamily() const {
+  int MaxResultsPerColumnFamily() const;
+  Scan (int batch);
+  int Batch() const;
{code}
So basically, we should remove everything related to rowOffsetPerColumnFamily, 
MaxResultsPerColumnFamily,and Batch. We can keep AllowPartialResults for now 
since that is unlikely to change. The first implementation probably will not 
deal with partial results though. Sorry that we should have talked about this 
before. 

- We can remove this method as well, since it is relevant only on the server 
side (gets are internally processed as a Scan with start row = stop row, that 
is why there this method is used in some server side code paths). 
{code}
+  bool IsGetScan() const;
{code}

- In this patch, we are allocating TimeRange objects in heap via a pointer, 
versus in the Get object, we are embedding those objects: 
{code}
+Scan ::SetTimeRange(long min_stamp, long max_stamp) {
+  if (nullptr != tr_) {
+delete tr_;
+tr_ = nullptr;
+  }
+  tr_ = new TimeRange(min_stamp, max_stamp);
+  return *this;
+}
Get& Get::SetTimeRange(long min_timestamp, long max_timestamp) {
  this->tr_ = TimeRange(min_timestamp, max_timestamp);
  return *this;
}
{code}
We should stick with a single approach. Is the Get::SetTimeRange() ends up 
pointing to stack-allocated object (sorry my C++ memory model is rusty)? We 
should fix it if so.  


> Scan Object
> ---
>
> Key: HBASE-15902
> URL: https://issues.apache.org/jira/browse/HBASE-15902
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
>Assignee: Sudeep Sunthankar
> Attachments: HBASE-15902.HBASE-14850.patch, 
> HBASE-15902.HBASE-14850.v2.patch
>
>
> Patch for creating Scan objects. Scan objects thus created can be used by 
> Table implementation to fetch results for a given row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17167) Pass mvcc to client when scan

2016-11-23 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691902#comment-15691902
 ] 

Enis Soztutar commented on HBASE-17167:
---

Good news is that we discard the partial results if we restart the scanner in 
case allowPartialResults=false. Of course if we make it so that allow partials 
= true will also see the row atomically, it will be way better. 
Also see HBASE-9797. We need to track the read point not just for partial 
results, but across rows as well in case multi-row transactions are used (like 
we do in meta for example). 

> Pass mvcc to client when scan
> -
>
> Key: HBASE-17167
> URL: https://issues.apache.org/jira/browse/HBASE-17167
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> For the current implementation, if we use batch or allowPartial when scan, 
> then the row level atomic can not be guaranteed if we need to restart a scan 
> in the middle of a record due to region move or something else.
> We can return the mvcc used to open scanner to client and client could use 
> this mvcc to restart a scan to get row level atomic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691884#comment-15691884
 ] 

Hadoop QA commented on HBASE-17072:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
37s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
34s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
23m 25s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 9s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 119m 30s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840342/HBASE-17072.master.004.patch
 |
| JIRA Issue | HBASE-17072 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 12a30c7e2b51 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 0b0e857 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4605/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4605/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> 

[jira] [Commented] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size

2016-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691820#comment-15691820
 ] 

Hadoop QA commented on HBASE-17116:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
33s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
23m 12s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 31m 2s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
9s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.ipc.TestSimpleRpcScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840343/HBASE-17116-V1.patch |
| JIRA Issue | HBASE-17116 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 247426aaec1e 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 0b0e857 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4606/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/4606/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4606/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4606/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> [PerformanceEvaluation] Add option to configure block size
> --
>
> 

[jira] [Commented] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691728#comment-15691728
 ] 

Hudson commented on HBASE-17171:


SUCCESS: Integrated in Jenkins build HBase-1.4 #546 (See 
[https://builds.apache.org/job/HBase-1.4/546/])
HBASE-17171 Proactively catch the case when no time remains for random (stack: 
rev 2f35956eb831c11436b785c37bed0bd93550ab99)
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestTimeBoundedRequestsWithRegionReplicas.java


> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17171.001.patch
>
>
> Just noticed an odd error message that cropped up in some $dayjob internal 
> testing.
> Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
> would result in an error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Please configure 
> hbase.TimeBoundedMultiThreadedReader.runtime
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
> {noformat}
> After digging into the test a bit more, I realized that this is actually 
> failing because the remaining time left after the writers finish (that is, 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
> the time the writers took) was negative. So, the test harness passed a value 
> which always caused this error.
> We should catch when the time available for the readers is negative and throw 
> an appropriate error instructing the human to either increase the amount of 
> time for 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} or 
> decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size

2016-11-23 Thread Yi Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liang updated HBASE-17116:
-
Fix Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> [PerformanceEvaluation] Add option to configure block size
> --
>
> Key: HBASE-17116
> URL: https://issues.apache.org/jira/browse/HBASE-17116
> Project: HBase
>  Issue Type: Bug
>  Components: tooling
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.5
>Reporter: Esteban Gutierrez
>Assignee: Yi Liang
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-17116-V1.patch
>
>
> Followup from HBASE-9940 to add option to configure block size for 
> PerformanceEvaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell

2016-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691677#comment-15691677
 ] 

Hudson commented on HBASE-17160:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2007 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2007/])
HBASE-17160 Undo unnecessary inter-module dependency; spark to hbase-it (stack: 
rev 0b0e85746511a1dbcdb6c1c951f7b5b8f211d7f5)
* (edit) hbase-endpoint/pom.xml
* (edit) hbase-shell/pom.xml
* (edit) hbase-prefix-tree/pom.xml


> Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to 
> shell
> -
>
> Key: HBASE-17160
> URL: https://issues.apache.org/jira/browse/HBASE-17160
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17160.master.001.patch, 
> HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, 
> minor_hbase.png, untangled_hbase.png
>
>
> Very minor untangling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14198) Eclipse project generation is broken in master

2016-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691678#comment-15691678
 ] 

Hudson commented on HBASE-14198:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2007 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2007/])
HBASE-14198 Eclipse project generation is broken in master Small fix. (stack: 
rev a6f3057dbffe6fc55cb69dff73837655cfcf1cc1)
* (edit) pom.xml


> Eclipse project generation is broken in master
> --
>
> Key: HBASE-14198
> URL: https://issues.apache.org/jira/browse/HBASE-14198
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>
> After running 
> mvn eclipse:eclipse I tried to import projects into Eclipse (Luna) and got 
> multiple build errors, similar to:
> {code}
> Cannot nest output folder 'hbase-thrift/target/test-classes/META-INF' inside 
> output folder 'hbase-thrift/target/test-classes' hbase-thrift
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691676#comment-15691676
 ] 

Hudson commented on HBASE-17171:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2007 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2007/])
HBASE-17171 Proactively catch the case when no time remains for random (stack: 
rev c50a79a9eea06b592a73d7f689f2f85f025298bc)
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestTimeBoundedRequestsWithRegionReplicas.java


> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17171.001.patch
>
>
> Just noticed an odd error message that cropped up in some $dayjob internal 
> testing.
> Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
> would result in an error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Please configure 
> hbase.TimeBoundedMultiThreadedReader.runtime
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
> {noformat}
> After digging into the test a bit more, I realized that this is actually 
> failing because the remaining time left after the writers finish (that is, 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
> the time the writers took) was negative. So, the test harness passed a value 
> which always caused this error.
> We should catch when the time available for the readers is negative and throw 
> an appropriate error instructing the human to either increase the amount of 
> time for 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} or 
> decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size

2016-11-23 Thread Yi Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liang updated HBASE-17116:
-
Attachment: HBASE-17116-V1.patch

> [PerformanceEvaluation] Add option to configure block size
> --
>
> Key: HBASE-17116
> URL: https://issues.apache.org/jira/browse/HBASE-17116
> Project: HBase
>  Issue Type: Bug
>  Components: tooling
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.5
>Reporter: Esteban Gutierrez
>Assignee: Yi Liang
>Priority: Trivial
> Attachments: HBASE-17116-V1.patch
>
>
> Followup from HBASE-9940 to add option to configure block size for 
> PerformanceEvaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size

2016-11-23 Thread Yi Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691667#comment-15691667
 ] 

Yi Liang commented on HBASE-17116:
--

Hi [~esteban],
  Provide a patchm and add blocksize into PE. The output is showed below

hbase pe --blockSize=32768 sequentialWrite 1 
{code}
shell > describe 'TestTable'

COLUMN FAMILIES DESCRIPTION 
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KE
EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', CO
MPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '32
768', REPLICATION_SCOPE => '0'} 
{code}

> [PerformanceEvaluation] Add option to configure block size
> --
>
> Key: HBASE-17116
> URL: https://issues.apache.org/jira/browse/HBASE-17116
> Project: HBase
>  Issue Type: Bug
>  Components: tooling
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.5
>Reporter: Esteban Gutierrez
>Assignee: Yi Liang
>Priority: Trivial
>
> Followup from HBASE-9940 to add option to configure block size for 
> PerformanceEvaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size

2016-11-23 Thread Yi Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liang reassigned HBASE-17116:


Assignee: Yi Liang

> [PerformanceEvaluation] Add option to configure block size
> --
>
> Key: HBASE-17116
> URL: https://issues.apache.org/jira/browse/HBASE-17116
> Project: HBase
>  Issue Type: Bug
>  Components: tooling
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.5
>Reporter: Esteban Gutierrez
>Assignee: Yi Liang
>Priority: Trivial
>
> Followup from HBASE-9940 to add option to configure block size for 
> PerformanceEvaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691661#comment-15691661
 ] 

stack commented on HBASE-17072:
---

Posted a patch that removs the above though it saves an odd seek. Needs more 
study. I'm not sure why above changes file scan startup sequence.

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, disable-block-header-cache.patch, 
> mat-threadlocals.png, mat-threads.png, metrics.png, slave1.svg, slave2.svg, 
> slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17072:
--
Attachment: HBASE-17072.master.004.patch

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, disable-block-header-cache.patch, 
> mat-threadlocals.png, mat-threads.png, metrics.png, slave1.svg, slave2.svg, 
> slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17072:
--
Attachment: HBASE-17072.master.003.patch

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, HBASE-17072.master.003.patch, 
> HBASE-17072.master.004.patch, disable-block-header-cache.patch, 
> mat-threadlocals.png, mat-threads.png, metrics.png, slave1.svg, slave2.svg, 
> slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691636#comment-15691636
 ] 

stack commented on HBASE-17072:
---

Test fails because of the below part of the patch which tries to go to the 
index to find block lengths seeking. If I instrument the test, I can see that 
we seek one time less w/ the below in place. We are hitting cache one time 
extra which is why the test fails. The patch also seems to change how we start 
up our scan. Before patch we seek twice to point zero then twice to the 
trailer. With the below in place we seek to the trailer twice, to zero twice, 
and then back to the trailer. Not sure why.

{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java
index 4887550..be8fc89 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java
@@ -1087,7 +1087,6 @@ public class HFileReaderImpl implements HFile.Reader, 
Configurable {

 /**
  * Positions this scanner at the start of the file.
- *
  * @return false if empty file; i.e. a call to next would return false and
  * the current key and value are undefined.
  * @throws IOException
@@ -1104,12 +1103,14 @@ public class HFileReaderImpl implements HFile.Reader, 
Configurable {
   }

   long firstDataBlockOffset = 
reader.getTrailer().getFirstDataBlockOffset();
-  if (curBlock != null
-  && curBlock.getOffset() == firstDataBlockOffset) {
+  if (curBlock != null && curBlock.getOffset() == firstDataBlockOffset) {
 return processFirstDataBlock();
   }
-
-  readAndUpdateNewBlock(firstDataBlockOffset);
+  Cell firstKey = this.reader.getFirstKey();
+  HFileBlockIndex.BlockIndexReader indexReader = 
reader.getDataBlockIndexReader();
+  BlockWithScanInfo blockWithScanInfo = 
indexReader.loadDataBlockWithScanInfo(firstKey, curBlock,
+  cacheBlocks, pread, isCompaction, getEffectiveDataBlockEncoding());
+  updateCurrentBlock(blockWithScanInfo.getHFileBlock());
   return true;
 }

@@ -1119,16 +1120,6 @@ public class HFileReaderImpl implements HFile.Reader, 
Configurable {
   return true;
 }

-protected void readAndUpdateNewBlock(long firstDataBlockOffset) throws 
IOException,
-CorruptHFileException {
-  HFileBlock newBlock = reader.readBlock(firstDataBlockOffset, -1, 
cacheBlocks, pread,
-  isCompaction, true, BlockType.DATA, getEffectiveDataBlockEncoding());
-  if (newBlock.getOffset() < 0) {
-throw new IOException("Invalid block offset: " + newBlock.getOffset());
-  }
-  updateCurrentBlock(newBlock);
-}
-
 protected int loadBlockAndSeekToKey(HFileBlock seekToBlock, Cell 
nextIndexedKey,
 boolean rewind, Cell key, boolean seekBefore) throws IOException {
   if (this.curBlock == null
{code}

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, disable-block-header-cache.patch, 
> mat-threadlocals.png, mat-threads.png, metrics.png, slave1.svg, slave2.svg, 
> slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means 

[jira] [Commented] (HBASE-15314) Allow more than one backing file in bucketcache

2016-11-23 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691553#comment-15691553
 ] 

Aaron Tokhy commented on HBASE-15314:
-

I have started working on improving v2 of this patch to add some additional 
tests, while keeping the implementation mostly the same.  I've addressed some 
issues such as guaranteeing the total aggregate file size is greater than or 
equal to the total allocatable size (using long ceiling division).  This also 
adds some cleanup logic if an exception is thrown on allocation failure.

Included with the change is a set of parameterized tests as well as some 
changes to have TestBucketCache to also test various configurable IOEngine 
types.

> Allow more than one backing file in bucketcache
> ---
>
> Key: HBASE-15314
> URL: https://issues.apache.org/jira/browse/HBASE-15314
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>Assignee: Amal Joshy
> Attachments: HBASE-15314-v2.patch, HBASE-15314.patch
>
>
> Allow bucketcache use more than just one backing file: e.g. chassis has more 
> than one SSD in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17101) FavoredNodes should not apply to system tables

2016-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691545#comment-15691545
 ] 

Hadoop QA commented on HBASE-17101:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
33s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
32s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
23m 6s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 34s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 120m 33s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840318/HBASE-17101.master.001.patch
 |
| JIRA Issue | HBASE-17101 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 11b46f53c1ea 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 0b0e857 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4604/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4604/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> FavoredNodes should not apply to system tables
> --
>
> Key: HBASE-17101
> URL: https://issues.apache.org/jira/browse/HBASE-17101
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> 

[jira] [Commented] (HBASE-15429) Add a split policy for busy regions

2016-11-23 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691459#comment-15691459
 ] 

Ashu Pachauri commented on HBASE-15429:
---

The findbugs warning are not related to this patch.

> Add a split policy for busy regions
> ---
>
> Key: HBASE-15429
> URL: https://issues.apache.org/jira/browse/HBASE-15429
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: HBASE-15429-V1.patch, HBASE-15429-V2.patch, 
> HBASE-15429.branch-1.patch, HBASE-15429.patch
>
>
> Currently, all region split policies are based on size. However, in certain 
> cases, it is a wise choice to make a split decision based on number of 
> requests to the region and split busy regions.
> A crude metric is that if a region blocks writes often and throws 
> RegionTooBusyExceoption, it's probably a good idea to split it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16700) Allow for coprocessor whitelisting

2016-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691401#comment-15691401
 ] 

Hadoop QA commented on HBASE-16700:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
48s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 7s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
25s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
28s {color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 15s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: . {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 7s 
{color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m 44s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 123m 10s 
{color} | {color:green} root in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
35s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 274m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  Comparison of String objects using == or != in 
org.apache.hadoop.hbase.security.access.CoprocessorWhitelistMasterObserver.verifyCoprocessors(ObserverContext,
 HTableDescriptor)   At CoprocessorWhitelistMasterObserver.java:== or != in 
org.apache.hadoop.hbase.security.access.CoprocessorWhitelistMasterObserver.verifyCoprocessors(ObserverContext,
 HTableDescriptor)   At CoprocessorWhitelistMasterObserver.java:[line 182] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840277/HBASE-16700.006.patch 
|
| JIRA Issue | HBASE-16700 |
| Optional 

[jira] [Updated] (HBASE-17101) FavoredNodes should not apply to system tables

2016-11-23 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-17101:
-
Description: 
As described in the doc (see HBASE-15531), we would like to start with user 
tables for favored nodes. This task ensures FN does not apply to system tables.

System tables are in memory and won't benefit from favored nodes. Since we also 
maintain FN information for user regions in meta, it helps to keep 
implementation simpler by ignoring system tables for the first iterations.

  was:As described in the doc (see HBASE-15531), we would like to start with 
user tables for favored nodes. This task ensures FN does not apply to system 
tables.


> FavoredNodes should not apply to system tables
> --
>
> Key: HBASE-17101
> URL: https://issues.apache.org/jira/browse/HBASE-17101
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0
>
> Attachments: HBASE-17101.master.001.patch, 
> HBASE_17101_rough_draft.patch
>
>
> As described in the doc (see HBASE-15531), we would like to start with user 
> tables for favored nodes. This task ensures FN does not apply to system 
> tables.
> System tables are in memory and won't benefit from favored nodes. Since we 
> also maintain FN information for user regions in meta, it helps to keep 
> implementation simpler by ignoring system tables for the first iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17101) FavoredNodes should not apply to system tables

2016-11-23 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-17101:
-
Status: Patch Available  (was: Open)

> FavoredNodes should not apply to system tables
> --
>
> Key: HBASE-17101
> URL: https://issues.apache.org/jira/browse/HBASE-17101
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0
>
> Attachments: HBASE-17101.master.001.patch, 
> HBASE_17101_rough_draft.patch
>
>
> As described in the doc (see HBASE-15531), we would like to start with user 
> tables for favored nodes. This task ensures FN does not apply to system 
> tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17101) FavoredNodes should not apply to system tables

2016-11-23 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-17101:
-
Attachment: HBASE-17101.master.001.patch

> FavoredNodes should not apply to system tables
> --
>
> Key: HBASE-17101
> URL: https://issues.apache.org/jira/browse/HBASE-17101
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0
>
> Attachments: HBASE-17101.master.001.patch, 
> HBASE_17101_rough_draft.patch
>
>
> As described in the doc (see HBASE-15531), we would like to start with user 
> tables for favored nodes. This task ensures FN does not apply to system 
> tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12894) Upgrade Jetty to 9.2.6

2016-11-23 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691286#comment-15691286
 ] 

Sean Busbey commented on HBASE-12894:
-

Unfortunately, it's a judgement call. I generally don't trust POMs in maven 
central (especially when you're talking a license from an inherited parent pom 
section).

# If there's license information in the actual JAR artifact, that should 
probably be used
# If none from above, but there's license information in the source code 
repository, that should probably be used
# if none from above, but there's license information on individual source 
files, that should probably be used (hopefully they agree?)
# If none from above, but there's a project page with license info, that should 
probably be used. I'd favor project created website info over 
project-aggregator information. In your hk2 example above, that means I'd favor 
the license from hk2.java.net over the one java.net project description.
# If none from above, but there's  license information in maven metadata, that 
should be used. Naturally with immediate pom entries carrying more weight than 
those from parent poms.

When these things disagree, it's a good idea for us to document that they 
disagree in the comments for our supplemental-info file and file issues with 
the source project to fix their ambiguity.

When these things disagree too much, we should probably jettison the dependency 
until the source project fixes things. (an example would be if one of the 
sources of information claims that only a category-x license is allowed.)

> Upgrade Jetty to 9.2.6
> --
>
> Key: HBASE-12894
> URL: https://issues.apache.org/jira/browse/HBASE-12894
> Project: HBase
>  Issue Type: Improvement
>  Components: REST, UI
>Affects Versions: 0.98.0
>Reporter: Rick Hallihan
>Assignee: Guang Yang
>Priority: Critical
>  Labels: MicrosoftSupport
> Fix For: 2.0.0
>
> Attachments: HBASE-12894_Jetty9_v0.patch, 
> HBASE-12894_Jetty9_v1.patch, HBASE-12894_Jetty9_v1.patch, 
> HBASE-12894_Jetty9_v2.patch, HBASE-12894_Jetty9_v3.patch, 
> HBASE-12894_Jetty9_v4.patch, HBASE-12894_Jetty9_v5.patch, 
> HBASE-12894_Jetty9_v6.patch, HBASE-12894_Jetty9_v7.patch, 
> HBASE-12894_Jetty9_v8.patch, dependency_list_after, dependency_list_before
>
>
> The Jetty component that is used for the HBase Stargate REST endpoint is 
> version 6.1.26 and is fairly outdated. We recently had a customer inquire 
> about enabling cross-origin resource sharing (CORS) for the REST endpoint and 
> found that this older version does not include the necessary filter or 
> configuration options, highlighted at: 
> http://wiki.eclipse.org/Jetty/Feature/Cross_Origin_Filter
> The Jetty project has had significant updates through versions 7, 8 and 9, 
> including a transition to be an Eclipse subproject, so updating to the latest 
> version may be non-trivial. The last update to the Jetty component in 
> https://issues.apache.org/jira/browse/HBASE-3377 was a minor version update 
> and did not require significant work. This update will include a package 
> namespace update so there will likely be a larger number of required changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-11-23 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691259#comment-15691259
 ] 

Eshcar Hillel commented on HBASE-16417:
---

Evaluation results of benchmarks on a 3 machine cluster are attached. Main 
points:
(1) in write-only workload ic2 and dc outperform no compaction (with no mslabs) 
by 25%. This can be attributed in part to running less GC and in part to 
executing less IO.
dc improves write amplification by 25%. 
(2) in mixed workload All three options with no mslabs (no compaction, ic2, and 
dc) have comparable read latency and throughput. Avg read latency of no 
compaction with mslabs is 2.5x than the other options running with no mslabs. 
(one run in this setting even failed to complete).

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-11-23 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-16417:
--
Attachment: HBASE-16417-benchmarkresults-20161123.pdf

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17158) Avoid deadlock caused by HRegion#doDelta

2016-11-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691226#comment-15691226
 ] 

Hudson commented on HBASE-17158:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2006 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2006/])
HBASE-17158 Avoid deadlock caused by HRegion#doDelta (ChiaPing Tsai) (tedyu: 
rev 9f5b8a83b70cfb4aaf1e22be666a2516a0aa50ac)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide3.java


> Avoid deadlock caused by HRegion#doDelta
> 
>
> Key: HBASE-17158
> URL: https://issues.apache.org/jira/browse/HBASE-17158
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
> Fix For: 2.0.0
>
> Attachments: HBASE-17158.v0.patch, HBASE-17158.v1.patch, 
> HBASE-17158.v2.patch
>
>
> {code:title=HRegion.java|borderStyle=solid}
> private Result doDelta(Operation op, Mutation mutation, long nonceGroup, long 
> nonce,
>   boolean returnResults) throws IOException {
> checkReadOnly();
> checkResources();
> checkRow(mutation.getRow(), op.toString());
> checkFamilies(mutation.getFamilyCellMap().keySet());
> this.writeRequestsCount.increment();
> WriteEntry writeEntry = null;
> startRegionOperation(op);
> List results = returnResults? new ArrayList(mutation.size()): 
> null;
> RowLock rowLock = getRowLockInternal(mutation.getRow(), false);
> MemstoreSize memstoreSize = new MemstoreSize();
> }
> {code}
> The getRowLockInternal() should be moved inside the try block so that the 
> timeout won't cause the lock leak. Otherwise, we will stuck in 
> HRegion#doClose when closing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17167) Pass mvcc to client when scan

2016-11-23 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691218#comment-15691218
 ] 

Jerry He commented on HBASE-17167:
--

Good point.  The mvcc/seqid should be in the server/hfiles long enough.  There 
were discussions on that previously as well.

> Pass mvcc to client when scan
> -
>
> Key: HBASE-17167
> URL: https://issues.apache.org/jira/browse/HBASE-17167
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> For the current implementation, if we use batch or allowPartial when scan, 
> then the row level atomic can not be guaranteed if we need to restart a scan 
> in the middle of a record due to region move or something else.
> We can return the mvcc used to open scanner to client and client could use 
> this mvcc to restart a scan to get row level atomic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691204#comment-15691204
 ] 

Josh Elser commented on HBASE-17171:


Superb. Thanks for the quick reviews Stack and Ted (and the push, Stack)!

> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17171.001.patch
>
>
> Just noticed an odd error message that cropped up in some $dayjob internal 
> testing.
> Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
> would result in an error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Please configure 
> hbase.TimeBoundedMultiThreadedReader.runtime
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
> {noformat}
> After digging into the test a bit more, I realized that this is actually 
> failing because the remaining time left after the writers finish (that is, 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
> the time the writers took) was negative. So, the test harness passed a value 
> which always caused this error.
> We should catch when the time available for the readers is negative and throw 
> an appropriate error instructing the human to either increase the amount of 
> time for 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} or 
> decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell

2016-11-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17160:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master.

> Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to 
> shell
> -
>
> Key: HBASE-17160
> URL: https://issues.apache.org/jira/browse/HBASE-17160
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17160.master.001.patch, 
> HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, 
> minor_hbase.png, untangled_hbase.png
>
>
> Very minor untangling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16996) Implement storage/retrieval of filesystem-use quotas into quota table

2016-11-23 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691185#comment-15691185
 ] 

Josh Elser commented on HBASE-16996:


Ah, the spurious failures are because it's building against a feature branch 
now and the HBase master branch hadoop versions aren't be pared down. So, the 
hadoopcheck -1's can just be ignored.

Will have to run TestAvoidCellReferencesIntoShippedBlocks locally and see if 
it's flakey or what.

For now, will throw it up on RB.

> Implement storage/retrieval of filesystem-use quotas into quota table
> -
>
> Key: HBASE-16996
> URL: https://issues.apache.org/jira/browse/HBASE-16996
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-16996-HBASE-16961.002.patch, HBASE-16996.001.patch
>
>
> Provide read/write API for accessing the new filesystem-usage quotas in the 
> existing {{hbase:quota}} table.
> Make sure that both the client can read quotas the quotas in the table as 
> well as the Master can perform the necessary update/delete actions per the 
> quota RPCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17171:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.4.0
   Status: Resolved  (was: Patch Available)

Pushed to master and branch-1. Thanks for the patch [~elserj]

> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17171.001.patch
>
>
> Just noticed an odd error message that cropped up in some $dayjob internal 
> testing.
> Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
> would result in an error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Please configure 
> hbase.TimeBoundedMultiThreadedReader.runtime
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
> {noformat}
> After digging into the test a bit more, I realized that this is actually 
> failing because the remaining time left after the writers finish (that is, 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
> the time the writers took) was negative. So, the test harness passed a value 
> which always caused this error.
> We should catch when the time available for the readers is negative and throw 
> an appropriate error instructing the human to either increase the amount of 
> time for 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} or 
> decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17173) update ref guide links for discussions to use lists.apache.org

2016-11-23 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-17173:
---

 Summary: update ref guide links for discussions to use 
lists.apache.org
 Key: HBASE-17173
 URL: https://issues.apache.org/jira/browse/HBASE-17173
 Project: HBase
  Issue Type: Task
  Components: community, website
Reporter: Sean Busbey
Priority: Minor


Right now the [reference guide|http://hbase.apache.org/book.html] has several 
places where we link to discussions on dev@hbase to explain something or 
document where a decision was made.

Those links right now rely on "hadoop-search.com". we should update them to 
link to the now-available [lists.apache.org view of the mailing 
list|https://lists.apache.org/list.html?d...@hbase.apache.org] since it 
provides conversation views and is an ASF resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691072#comment-15691072
 ] 

Ted Yu commented on HBASE-17171:


+1

> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17171.001.patch
>
>
> Just noticed an odd error message that cropped up in some $dayjob internal 
> testing.
> Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
> would result in an error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Please configure 
> hbase.TimeBoundedMultiThreadedReader.runtime
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
> {noformat}
> After digging into the test a bit more, I realized that this is actually 
> failing because the remaining time left after the writers finish (that is, 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
> the time the writers took) was negative. So, the test harness passed a value 
> which always caused this error.
> We should catch when the time available for the readers is negative and throw 
> an appropriate error instructing the human to either increase the amount of 
> time for 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} or 
> decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691066#comment-15691066
 ] 

Hadoop QA commented on HBASE-17171:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
33s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 9s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
31m 11s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
9s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 46s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840295/HBASE-17171.001.patch 
|
| JIRA Issue | HBASE-17171 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 3797e3aa6c37 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / a6f3057 |
| Default Java | 1.8.0_111 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4603/testReport/ |
| modules | C: hbase-it U: hbase-it |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4603/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: 

[jira] [Updated] (HBASE-17158) Avoid deadlock caused by HRegion#doDelta

2016-11-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17158:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Avoid deadlock caused by HRegion#doDelta
> 
>
> Key: HBASE-17158
> URL: https://issues.apache.org/jira/browse/HBASE-17158
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
> Fix For: 2.0.0
>
> Attachments: HBASE-17158.v0.patch, HBASE-17158.v1.patch, 
> HBASE-17158.v2.patch
>
>
> {code:title=HRegion.java|borderStyle=solid}
> private Result doDelta(Operation op, Mutation mutation, long nonceGroup, long 
> nonce,
>   boolean returnResults) throws IOException {
> checkReadOnly();
> checkResources();
> checkRow(mutation.getRow(), op.toString());
> checkFamilies(mutation.getFamilyCellMap().keySet());
> this.writeRequestsCount.increment();
> WriteEntry writeEntry = null;
> startRegionOperation(op);
> List results = returnResults? new ArrayList(mutation.size()): 
> null;
> RowLock rowLock = getRowLockInternal(mutation.getRow(), false);
> MemstoreSize memstoreSize = new MemstoreSize();
> }
> {code}
> The getRowLockInternal() should be moved inside the try block so that the 
> timeout won't cause the lock leak. Otherwise, we will stuck in 
> HRegion#doClose when closing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-15902) Scan Object

2016-11-23 Thread Sudeep Sunthankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudeep Sunthankar reassigned HBASE-15902:
-

Assignee: Sudeep Sunthankar

> Scan Object
> ---
>
> Key: HBASE-15902
> URL: https://issues.apache.org/jira/browse/HBASE-15902
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
>Assignee: Sudeep Sunthankar
> Attachments: HBASE-15902.HBASE-14850.patch, 
> HBASE-15902.HBASE-14850.v2.patch
>
>
> Patch for creating Scan objects. Scan objects thus created can be used by 
> Table implementation to fetch results for a given row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15902) Scan Object

2016-11-23 Thread Sudeep Sunthankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudeep Sunthankar updated HBASE-15902:
--
Attachment: HBASE-15902.HBASE-14850.v2.patch

Patch for Scan API

> Scan Object
> ---
>
> Key: HBASE-15902
> URL: https://issues.apache.org/jira/browse/HBASE-15902
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
> Attachments: HBASE-15902.HBASE-14850.patch, 
> HBASE-15902.HBASE-14850.v2.patch
>
>
> Patch for creating Scan objects. Scan objects thus created can be used by 
> Table implementation to fetch results for a given row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-23 Thread huaxiang sun (JIRA)
huaxiang sun created HBASE-17172:


 Summary: Optimize major mob compaction with _del files
 Key: HBASE-17172
 URL: https://issues.apache.org/jira/browse/HBASE-17172
 Project: HBase
  Issue Type: Improvement
  Components: mob
Affects Versions: 2.0.0
Reporter: huaxiang sun
Assignee: huaxiang sun


Today, when there is a _del file in mobdir, with major mob compaction, every 
mob file will be recompacted, this causes lots of IO and slow down major mob 
compaction (may take months to finish). This needs to be improved. A few ideas 
are: 

1) Do not compact all _del files into one, instead, compact them based on 
groups with startKey as the key. Then use firstKey/startKey to make each mob 
file to see if the _del file needs to be included for this partition.

2). Based on the timerange of the _del file, compaction for files after that 
timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690976#comment-15690976
 ] 

stack commented on HBASE-17171:
---

+1

Excellent.

> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17171.001.patch
>
>
> Just noticed an odd error message that cropped up in some $dayjob internal 
> testing.
> Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
> would result in an error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Please configure 
> hbase.TimeBoundedMultiThreadedReader.runtime
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
> {noformat}
> After digging into the test a bit more, I realized that this is actually 
> failing because the remaining time left after the writers finish (that is, 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
> the time the writers took) was negative. So, the test harness passed a value 
> which always caused this error.
> We should catch when the time available for the readers is negative and throw 
> an appropriate error instructing the human to either increase the amount of 
> time for 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} or 
> decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17171:
---
Status: Patch Available  (was: Open)

> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17171.001.patch
>
>
> Just noticed an odd error message that cropped up in some $dayjob internal 
> testing.
> Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
> would result in an error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Please configure 
> hbase.TimeBoundedMultiThreadedReader.runtime
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
> {noformat}
> After digging into the test a bit more, I realized that this is actually 
> failing because the remaining time left after the writers finish (that is, 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
> the time the writers took) was negative. So, the test harness passed a value 
> which always caused this error.
> We should catch when the time available for the readers is negative and throw 
> an appropriate error instructing the human to either increase the amount of 
> time for 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} or 
> decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17171:
---
Attachment: HBASE-17171.001.patch

.001 The framework sets a configuration property to control how long reads
should be executed. When writes take too long, no time remains for reads
and the user sees an error about a property they must set. We should
prevent this case and log an appropriate message.

Also fixes a rogue character in the class-level javadoc.

> IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error 
> when readers have no time to run
> -
>
> Key: HBASE-17171
> URL: https://issues.apache.org/jira/browse/HBASE-17171
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17171.001.patch
>
>
> Just noticed an odd error message that cropped up in some $dayjob internal 
> testing.
> Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
> would result in an error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Please configure 
> hbase.TimeBoundedMultiThreadedReader.runtime
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
> {noformat}
> After digging into the test a bit more, I realized that this is actually 
> failing because the remaining time left after the writers finish (that is, 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
> the time the writers took) was negative. So, the test harness passed a value 
> which always caused this error.
> We should catch when the time available for the readers is negative and throw 
> an appropriate error instructing the human to either increase the amount of 
> time for 
> {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} or 
> decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17171) IntegrationTestTimeBoundedRequestsWithRegionReplicas fails with obtuse error when readers have no time to run

2016-11-23 Thread Josh Elser (JIRA)
Josh Elser created HBASE-17171:
--

 Summary: IntegrationTestTimeBoundedRequestsWithRegionReplicas 
fails with obtuse error when readers have no time to run
 Key: HBASE-17171
 URL: https://issues.apache.org/jira/browse/HBASE-17171
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Minor
 Fix For: 2.0.0


Just noticed an odd error message that cropped up in some $dayjob internal 
testing.

Sometimes, executions of IntegrationTestTimeBoundedRequestsWithRegionReplicas 
would result in an error:

{noformat}
Caused by: java.lang.IllegalArgumentException: Please configure 
hbase.TimeBoundedMultiThreadedReader.runtime
  at 
org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas$TimeBoundedMultiThreadedReader.
{noformat}

After digging into the test a bit more, I realized that this is actually 
failing because the remaining time left after the writers finish (that is, 
{{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} minus 
the time the writers took) was negative. So, the test harness passed a value 
which always caused this error.

We should catch when the time available for the readers is negative and throw 
an appropriate error instructing the human to either increase the amount of 
time for {{hbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime}} 
or decrease the amount of data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690952#comment-15690952
 ] 

stack commented on HBASE-17160:
---

I'll apply this in a while unless objection. It flattens our interdependencies 
amongst modules making them more standalone. Checkout last image posted ... 
compare to first.

> Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to 
> shell
> -
>
> Key: HBASE-17160
> URL: https://issues.apache.org/jira/browse/HBASE-17160
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17160.master.001.patch, 
> HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, 
> minor_hbase.png, untangled_hbase.png
>
>
> Very minor untangling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14198) Eclipse project generation is broken in master

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690932#comment-15690932
 ] 

stack commented on HBASE-14198:
---

I pushed the above.

> Eclipse project generation is broken in master
> --
>
> Key: HBASE-14198
> URL: https://issues.apache.org/jira/browse/HBASE-14198
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>
> After running 
> mvn eclipse:eclipse I tried to import projects into Eclipse (Luna) and got 
> multiple build errors, similar to:
> {code}
> Cannot nest output folder 'hbase-thrift/target/test-classes/META-INF' inside 
> output folder 'hbase-thrift/target/test-classes' hbase-thrift
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell

2016-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690938#comment-15690938
 ] 

Hadoop QA commented on HBASE-17160:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
27s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 58s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hbase-prefix-tree in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 19s 
{color} | {color:green} hbase-shell in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hbase-endpoint in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 24s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840197/HBASE-17160.master.002.patch
 |
| JIRA Issue | HBASE-17160 |
| Optional Tests |  asflicense  javac  javadoc  unit  xml  compile  |
| uname | Linux a63a496aad37 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 9f5b8a8 |
| Default Java | 1.8.0_111 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4602/testReport/ |
| modules | C: hbase-prefix-tree hbase-shell hbase-endpoint U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4602/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to 
> shell
> -
>
> Key: HBASE-17160
> URL: https://issues.apache.org/jira/browse/HBASE-17160
> Project: HBase
>  Issue Type: Bug
> 

[jira] [Commented] (HBASE-14198) Eclipse project generation is broken in master

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690928#comment-15690928
 ] 

stack commented on HBASE-14198:
---

Looking at master today... Here is one change needed to cleanup hbase-server 
error in eclipse.

{code}
kalashnikov:hbase.git stack$ git diff
diff --git a/pom.xml b/pom.xml
index 86edc82..7b2cfae 100644
--- a/pom.xml
+++ b/pom.xml
@@ -736,6 +736,7 @@
 [1.5,)
 
   process
+  bundle
 
   
   
{code}

Let me commit this.

> Eclipse project generation is broken in master
> --
>
> Key: HBASE-14198
> URL: https://issues.apache.org/jira/browse/HBASE-14198
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>
> After running 
> mvn eclipse:eclipse I tried to import projects into Eclipse (Luna) and got 
> multiple build errors, similar to:
> {code}
> Cannot nest output folder 'hbase-thrift/target/test-classes/META-INF' inside 
> output folder 'hbase-thrift/target/test-classes' hbase-thrift
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690869#comment-15690869
 ] 

stack commented on HBASE-17072:
---

bq. Normal read flow, we make use of readBlockDataInternal() and there we dont 
have prefetched next block header usage now.. 

For a 'normal read', the block size is passed into readBlockDataInternal. We 
keep the block size in the hfile index. This is where we get the length from.

Reran scan compares w/ 5 concurrent. W/o patch 972->1000 seconds to do 10M. 
Patched, its 979 to 985 seconds.

Single threaded Scan w/ nothing else happening on box runs slower with the 
patch. I see 1040 seconds vs 850 seconds about 20% slower. Let me see if I 
can figure why the difference (seeks are same).

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 2.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: HBASE-17072.master.001.patch, 
> HBASE-17072.master.002.patch, disable-block-header-cache.patch, 
> mat-threadlocals.png, mat-threads.png, metrics.png, slave1.svg, slave2.svg, 
> slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17158) Avoid deadlock caused by HRegion#doDelta

2016-11-23 Thread ChiaPing Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690794#comment-15690794
 ] 

ChiaPing Tsai commented on HBASE-17158:
---

This issue is caused by 
[HBASE-15158|https://issues.apache.org/jira/browse/HBASE-15158]. 15158 is only 
committed to master, and i don't find the lock leak from the branch-1.
So this issue is only available for master.

> Avoid deadlock caused by HRegion#doDelta
> 
>
> Key: HBASE-17158
> URL: https://issues.apache.org/jira/browse/HBASE-17158
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
> Fix For: 2.0.0
>
> Attachments: HBASE-17158.v0.patch, HBASE-17158.v1.patch, 
> HBASE-17158.v2.patch
>
>
> {code:title=HRegion.java|borderStyle=solid}
> private Result doDelta(Operation op, Mutation mutation, long nonceGroup, long 
> nonce,
>   boolean returnResults) throws IOException {
> checkReadOnly();
> checkResources();
> checkRow(mutation.getRow(), op.toString());
> checkFamilies(mutation.getFamilyCellMap().keySet());
> this.writeRequestsCount.increment();
> WriteEntry writeEntry = null;
> startRegionOperation(op);
> List results = returnResults? new ArrayList(mutation.size()): 
> null;
> RowLock rowLock = getRowLockInternal(mutation.getRow(), false);
> MemstoreSize memstoreSize = new MemstoreSize();
> }
> {code}
> The getRowLockInternal() should be moved inside the try block so that the 
> timeout won't cause the lock leak. Otherwise, we will stuck in 
> HRegion#doClose when closing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell

2016-11-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17160:
--
Status: Patch Available  (was: Reopened)

> Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to 
> shell
> -
>
> Key: HBASE-17160
> URL: https://issues.apache.org/jira/browse/HBASE-17160
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17160.master.001.patch, 
> HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, 
> minor_hbase.png, untangled_hbase.png
>
>
> Very minor untangling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16700) Allow for coprocessor whitelisting

2016-11-23 Thread Clay B. (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clay B. updated HBASE-16700:

Attachment: HBASE-16700.006.patch

This adds test cases for:
* A classpath coprocessor (here 
{{org.apache.hadoop.hbase.coprocessor.BaseRegionObserver}} since it's handy)
* The above test is a table creation test
* Adds back the accidentally dropped table creation test show the creation 
should fail
* Add HBase reference documentation (I had to build this dropping the 
{{}} tag in {{src/main/site/site.xml}} as {{mvn site}} was breaking for 
me as Jenkins recently reported.

As to:
{quote}I was asking whether we want to do class name white listing on top of 
path white listing. It should be fine for now.{quote}
I think it would make sense to allow certain tables certain paths or 
coprocessors but for now I think that's a pretty far down use case from what I 
have yet seen; certainly could be a follow-up JIRA if anyone wants it.

> Allow for coprocessor whitelisting
> --
>
> Key: HBASE-16700
> URL: https://issues.apache.org/jira/browse/HBASE-16700
> Project: HBase
>  Issue Type: Improvement
>  Components: Coprocessors
>Reporter: Clay B.
>Priority: Minor
>  Labels: security
> Attachments: HBASE-16700.000.patch, HBASE-16700.001.patch, 
> HBASE-16700.002.patch, HBASE-16700.003.patch, HBASE-16700.004.patch, 
> HBASE-16700.005.patch, HBASE-16700.006.patch
>
>
> Today one can turn off all non-system coprocessors with 
> {{hbase.coprocessor.user.enabled}} however, this disables very useful things 
> like Apache Phoenix's coprocessors. Some tenants of a multi-user HBase may 
> also need to run bespoke coprocessors. But as an operator I would not want 
> wanton coprocessor usage. Ideally, one could do one of two things:
> * Allow coprocessors defined in {{hbase-site.xml}} -- this can only be 
> administratively changed in most cases
> * Allow coprocessors from table descriptors but only if the coprocessor is 
> whitelisted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16984) Implement getScanner

2016-11-23 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690499#comment-15690499
 ] 

Yu Li commented on HBASE-16984:
---

Added some comments in RB.

And for the JIRA description, AsyncPrefetchClientScanner-> AsyncClientScanner 
(maybe you plan to change the class name to add "Prefetch" in or ever planned? 
smile)

> Implement getScanner
> 
>
> Key: HBASE-16984
> URL: https://issues.apache.org/jira/browse/HBASE-16984
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-16984-v1.patch, HBASE-16984-v2.patch, 
> HBASE-16984-v3.patch, HBASE-16984-v4.patch, HBASE-16984-v4.patch, 
> HBASE-16984.patch
>
>
> It will just return the old ResultScanner and work like the 
> AsyncPrefetchClientScanner. I think we still need this as we can not do time 
> consuming work in the ScanObserver introduced in HBASE-16838.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-23 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HBASE-16851:
-
Attachment: Accordion HBase In-Memory Compaction - Nov 23.pdf

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-23 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690414#comment-15690414
 ] 

Edward Bortnikov edited comment on HBASE-16851 at 11/23/16 3:30 PM:


Updated version of the blog post in 
https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ.
 

Hopefully, the final configuration syntax and technical description. 


was (Author: ebortnik):
Updated version of the blog post in 
https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ.
 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion_ HBase In-Memory Compaction - Oct 27.pdf, 
> HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-23 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690414#comment-15690414
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Updated version of the blog post in 
https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ.
 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion_ HBase In-Memory Compaction - Oct 27.pdf, 
> HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17158) Avoid deadlock caused by HRegion#doDelta

2016-11-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690393#comment-15690393
 ] 

Ted Yu commented on HBASE-17158:


Mind attaching patch for branch-1 ?

Thanks

> Avoid deadlock caused by HRegion#doDelta
> 
>
> Key: HBASE-17158
> URL: https://issues.apache.org/jira/browse/HBASE-17158
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
> Fix For: 2.0.0
>
> Attachments: HBASE-17158.v0.patch, HBASE-17158.v1.patch, 
> HBASE-17158.v2.patch
>
>
> {code:title=HRegion.java|borderStyle=solid}
> private Result doDelta(Operation op, Mutation mutation, long nonceGroup, long 
> nonce,
>   boolean returnResults) throws IOException {
> checkReadOnly();
> checkResources();
> checkRow(mutation.getRow(), op.toString());
> checkFamilies(mutation.getFamilyCellMap().keySet());
> this.writeRequestsCount.increment();
> WriteEntry writeEntry = null;
> startRegionOperation(op);
> List results = returnResults? new ArrayList(mutation.size()): 
> null;
> RowLock rowLock = getRowLockInternal(mutation.getRow(), false);
> MemstoreSize memstoreSize = new MemstoreSize();
> }
> {code}
> The getRowLockInternal() should be moved inside the try block so that the 
> timeout won't cause the lock leak. Otherwise, we will stuck in 
> HRegion#doClose when closing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17170) HBase is also retrying DoNotRetryIOException because of class loader differences.

2016-11-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690329#comment-15690329
 ] 

Ted Yu commented on HBASE-17170:


Thanks for reporting this, Ankit.

Want to attach a patch ?

> HBase is also retrying DoNotRetryIOException because of class loader 
> differences.
> -
>
> Key: HBASE-17170
> URL: https://issues.apache.org/jira/browse/HBASE-17170
> Project: HBase
>  Issue Type: Bug
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>
> The  class loader used by API exposed by hadoop and the context class loader 
> used by RunJar(bin/hadoop jar phoenix-client.jar …. ) are different resulting 
> in classes loaded from jar not visible to other current class loader used by 
> API. 
> {code}
> 16/04/26 21:18:00 INFO client.RpcRetryingCaller: Call exception, tries=32, 
> retries=35, started=491541 ms ago, cancelled=false, msg=
> 16/04/26 21:18:21 INFO client.RpcRetryingCaller: Call exception, tries=33, 
> retries=35, started=511747 ms ago, cancelled=false, msg=
> 16/04/26 21:18:41 INFO client.RpcRetryingCaller: Call exception, tries=34, 
> retries=35, started=531820 ms ago, cancelled=false, msg=
> Exception in thread "main" org.apache.phoenix.exception.PhoenixIOException: 
> Failed after attempts=35, exceptions:
> Tue Apr 26 21:09:49 UTC 2016, 
> RpcRetryingCaller{globalStartTime=1461704989282, pause=100, retries=35}, 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NamespaceExistException):
>  org.apache.hadoop.hbase.NamespaceExistException: SYSTEM
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.create(TableNamespaceManager.java:156)
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.create(TableNamespaceManager.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createNamespace(HMaster.java:2553)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createNamespace(MasterRpcServices.java:447)
> at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58043)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2115)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:102)
> {code}
> The actual problem is stated in the comment below 
> https://issues.apache.org/jira/browse/PHOENIX-3495?focusedCommentId=15677081=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15677081
> If we are not loading hbase classes from Hadoop classpath(from where hadoop 
> jars are getting loaded), then the RemoteException will not get unwrapped 
> because of ClassNotFoundException and the client will keep on retrying even 
> if the cause of exception is DoNotRetryIOException.
> RunJar#main() context class loader.
> {code}
> ClassLoader loader = createClassLoader(file, workDir);
> Thread.currentThread().setContextClassLoader(loader);
> Class mainClass = Class.forName(mainClassName, true, loader);
> Method main = mainClass.getMethod("main", new Class[] {
>   Array.newInstance(String.class, 0).getClass()
> });
> HBase classes can be loaded from jar(phoenix-client.jar):-
> hadoop --config /etc/hbase/conf/ jar 
> ~/git/apache/phoenix/phoenix-client/target/phoenix-4.9.0-HBase-1.2-client.jar 
> org.apache.phoenix.mapreduce.CsvBulkLoadTool   --table GIGANTIC_TABLE --input 
> /tmp/b.csv --zookeeper localhost:2181
> {code}
> API(using current class loader).
> {code}
> public class RpcRetryingCaller {
> public IOException unwrapRemoteException() {
> try {
>   Class realClass = Class.forName(getClassName());
>   return instantiateException(realClass.asSubclass(IOException.class));
> } catch(Exception e) {
>   // cannot instantiate the original exception, just return this
> }
> return this;
>   }
> {code}
> *Possible solution:-*
> We can create our own HBaseRemoteWithExtrasException(extension of 
> RemoteWithExtrasException) so that default class loader will be the one from 
> where the hbase classes are loaded and extend unwrapRemoteException() to 
> throw exception if the unwrapping doesn’t take place because of CNF 
> exception? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17162) Avoid unconditional call to getXXXArray() in write path

2016-11-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690221#comment-15690221
 ] 

ramkrishna.s.vasudevan commented on HBASE-17162:


+1 if it covers all.

> Avoid unconditional call to getXXXArray() in write path
> ---
>
> Key: HBASE-17162
> URL: https://issues.apache.org/jira/browse/HBASE-17162
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17162.patch
>
>
> Still some calls left. Patch will address these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17140) Throw RegionOfflineException directly when request for a disabled table

2016-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690208#comment-15690208
 ] 

Hadoop QA commented on HBASE-17140:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
1s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
26s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 22s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 16s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 135m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840239/HBASE-17140-v2.patch |
| JIRA Issue | HBASE-17140 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 13538b9dbfcf 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 511398f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4600/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/4600/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results 

[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-11-23 Thread Anastasia Braginsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690099#comment-15690099
 ] 

Anastasia Braginsky commented on HBASE-17081:
-

The new patch is also on the RB. And I am copy-pasting one of my RB answers 
here to make it more clear:

So here is how index-compaction now works:

1. THRESHOLD_PIPELINE_SEGMENTS is now set to 10
2. Till we reach this number of segment in the pipeline we will keep 
flattening only and if there is snapshot request we will flush everything
3. There is a big chance we never reach THRESHOLD_PIPELINE_SEGMENTS 
segments in the pipeline 
4. Actually once the first segment is flattened, we will always flush 
everything upon snapshot request (pay attention that none reverse the boolean 
once it is set)

Regarding what to do the same in data-compaction case, Eshcar is currently 
running the benchmarks with all possibilities and we can decide what is better 
to do based on her experiments.

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-11-23 Thread Anastasia Braginsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690085#comment-15690085
 ] 

Anastasia Braginsky commented on HBASE-17081:
-

Attaching the improved patch. I have answered all of the RB comments.

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-11-23 Thread Anastasia Braginsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anastasia Braginsky updated HBASE-17081:

Attachment: HBASE-17081-V02.patch

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17169) Remove Cell variants with ShareableMemory

2016-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690079#comment-15690079
 ] 

Hadoop QA commented on HBASE-17169:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
27s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 53s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 42s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 30s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 131m 39s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840236/HBASE-17169.patch |
| JIRA Issue | HBASE-17169 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux a2343c413ee8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 511398f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4599/testReport/ |
| modules | C: hbase-common hbase-server U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4599/console |
| Powered by | Apache Yetus 

  1   2   >