from:"Lars Hofhansl $JIRA$"

[jira] [Created] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy

2016-10-04 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-16765:
-

 Summary: Improve IncreasingToUpperBoundRegionSplitPolicy
 Key: HBASE-16765
 URL: https://issues.apache.org/jira/browse/HBASE-16765
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


We just did some experiments on some larger clusters and found that while using 
IncreasingToUpperBoundRegionSplitPolicy generally works well and is very 
convenient, it does tend to produce too many regions.

Since the logic is - by design - local, checking the number of regions of the 
table in question on the local server only, we end with more regions then 
necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-14613) Remove MemStoreChunkPool?

2016-10-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-14613.
---
Resolution: Won't Fix

> Remove MemStoreChunkPool?
> -
>
> Key: HBASE-14613
> URL: https://issues.apache.org/jira/browse/HBASE-14613
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 14613-0.98.txt, gc.png, writes.png
>
>
> I just stumbled across MemStoreChunkPool. The idea behind is to reuse chunks 
> of allocations rather than letting the GC handle this.
> Now, it's off by default, and it seems to me to be of dubious value. I'd 
> recommend just removing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16765) New SteppingRegionSplitPolicy, avoid too aggressive spread of regions for small tables.

2016-11-01 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-16765.
---
   Resolution: Fixed
 Assignee: Lars Hofhansl
Fix Version/s: 1.1.8
   0.98.24
   1.2.4
   1.4.0
   1.3.0
   2.0.0

> New SteppingRegionSplitPolicy, avoid too aggressive spread of regions for 
> small tables.
> ---
>
> Key: HBASE-16765
> URL: https://issues.apache.org/jira/browse/HBASE-16765
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 0.98.24, 1.1.8
>
> Attachments: 16765-0.98.txt
>
>
> We just did some experiments on some larger clusters and found that while 
> using IncreasingToUpperBoundRegionSplitPolicy generally works well and is 
> very convenient, it does tend to produce too many regions.
> Since the logic is - by design - local, checking the number of regions of the 
> table in question on the local server only, we end with more regions then 
> necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12570) Improve table configuration sanity checking

2016-11-11 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12570.
---
Resolution: Duplicate

> Improve table configuration sanity checking
> ---
>
> Key: HBASE-12570
> URL: https://issues.apache.org/jira/browse/HBASE-12570
> Project: HBase
>  Issue Type: Umbrella
>Reporter: James Taylor
>
> See PHOENIX-1473. If a split policy class cannot be resolved, then your HBase 
> cluster will be brought down as each region server that successively attempts 
> to open the region will not find the class and will bring itself down.
> One idea to prevent this would be to fail the CREATE TABLE or ALTER TABLE 
> admin call if the split policy class cannot be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12433) Coprocessors not dynamically reordered when reset priority

2016-11-11 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12433.
---
Resolution: Not A Bug

> Coprocessors not dynamically reordered when reset priority
> --
>
> Key: HBASE-12433
> URL: https://issues.apache.org/jira/browse/HBASE-12433
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Affects Versions: 0.98.7
>Reporter: James Taylor
>
> When modifying the coprocessor priority through the HBase shell, the order of 
> the firing of the coprocessors wasn't changing. It probably would have with a 
> cluster bounce, but if we can make it dynamic easily, that would be 
> preferable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16115) Missing security context in RegionObserver coprocessor when a compaction/split is triggered manually

2016-12-12 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-16115.
---
Resolution: Won't Fix

> Missing security context in RegionObserver coprocessor when a 
> compaction/split is triggered manually
> 
>
> Key: HBASE-16115
> URL: https://issues.apache.org/jira/browse/HBASE-16115
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.20
>Reporter: Lars Hofhansl
>
> We ran into an interesting phenomenon which can easily render a cluster 
> unusable.
> We loaded some tests data into a test table and forced a manual compaction 
> through the UI. We have some compaction hooks implemented in a region 
> observer, which writes back to another HBase table when the compaction 
> finishes. We noticed that this coprocessor is not setup correctly, it seems 
> the security context is missing.
> The interesting part is that this _only_ happens when the compaction is 
> triggere through the UI. Automatic compactions (major or minor) or when 
> triggered via the HBase shell (folling a kinit) work fine. Only the 
> UI-triggered compactions cause this issues and lead to essentially 
> neverending compactions, immovable regions, etc.
> Not sure what exactly the issue is, but I wanted to make sure I capture this.
> [~apurtell], [~ghelmling], FYI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17440) [0.98] Make sure DelayedClosing chore is stopped as soon as an HConnection is closed

2017-01-09 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-17440:
-

 Summary: [0.98] Make sure DelayedClosing chore is stopped as soon 
as an HConnection is closed
 Key: HBASE-17440
 URL: https://issues.apache.org/jira/browse/HBASE-17440
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


We're seeing many issue with run-away ZK client connection in long running app 
servers. 10k or more send or event threads are happening frequently.

While I looked around in the code I noticed that DelayedClosing closing is not 
immediately ended when an HConnection is closed, when there's an issue with 
HBase or ZK and client reconnect in a tight loop, this can lead temporarily to 
very many threads running. These will all get cleaned out after at most 60s, 
but during that time a lot of threads can be created.

The fix is a one-liner. We'll likely file other issues soon.

Interestingly branch-1 and beyond do not have this chore anymore, although - at 
least in branch-1 and later - I still see the ZooKeeperAliveConnection.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-17440) [0.98] Make sure DelayedClosing chore is stopped as soon as an HConnection is closed

2017-01-09 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-17440.
---
   Resolution: Fixed
Fix Version/s: 0.98.25

Done.

> [0.98] Make sure DelayedClosing chore is stopped as soon as an HConnection is 
> closed
> 
>
> Key: HBASE-17440
> URL: https://issues.apache.org/jira/browse/HBASE-17440
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.24
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.25
>
> Attachments: 17440.txt
>
>
> We're seeing many issue with run-away ZK client connection in long running 
> app servers. 10k or more send or event threads are happening frequently.
> While I looked around in the code I noticed that DelayedClosing closing is 
> not immediately ended when an HConnection is closed, when there's an issue 
> with HBase or ZK and client reconnect in a tight loop, this can lead 
> temporarily to very many threads running. These will all get cleaned out 
> after at most 60s, but during that time a lot of threads can be created.
> The fix is a one-liner. We'll likely file other issues soon.
> Interestingly branch-1 and beyond do not have this chore anymore, although - 
> at least in branch-1 and later - I still see the ZooKeeperAliveConnection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17877) Replace/improve HBase's byte[] comparator

2017-04-04 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-17877:
-

 Summary: Replace/improve HBase's byte[] comparator
 Key: HBASE-17877
 URL: https://issues.apache.org/jira/browse/HBASE-17877
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


[~vik.karma] did some extensive tests and found that Hadoop's version is faster 
- dramatically faster in some cases.

Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HBASE-5311) Allow inmemory Memstore compactions

2017-04-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-5311.
--
Resolution: Won't Fix

Closing old issue.

> Allow inmemory Memstore compactions
> ---
>
> Key: HBASE-5311
> URL: https://issues.apache.org/jira/browse/HBASE-5311
> Project: HBase
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
> Attachments: InternallyLayeredMap.java
>
>
> Just like we periodically compact the StoreFiles we should also periodically 
> compact the MemStore.
> During these compactions we eliminate deleted cells, expired cells, cells to 
> removed because of version count, etc, before we even do a memstore flush.
> Besides the optimization that we could get from this, it should also allow us 
> to remove the special handling of ICV, Increment, and Append (all of which 
> use upsert logic to avoid accumulating excessive cells in the Memstore).
> Not targeting this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HBASE-5475) Allow importtsv and Import to work truly offline when using bulk import option

2017-04-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-5475.
--
Resolution: Won't Fix

Closing old issue.

> Allow importtsv and Import to work truly offline when using bulk import option
> --
>
> Key: HBASE-5475
> URL: https://issues.apache.org/jira/browse/HBASE-5475
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Lars Hofhansl
>Priority: Minor
>
> Currently importtsv (and now also Import with HBASE-5440) support using 
> HFileOutputFormat for later bulk loading.
> However, currently that cannot be without having access to the table we're 
> going to import to, because both importtsv and Import need to lookup the 
> split points, and find the compression setting.
> It would be nice if there would be an offline way to provide the split point 
> and compression setting.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HBASE-10028) Cleanup metrics documentation

2017-04-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-10028.
---
Resolution: Won't Fix

Closing old issue.

> Cleanup metrics documentation
> -
>
> Key: HBASE-10028
> URL: https://issues.apache.org/jira/browse/HBASE-10028
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> The current documentation of the metrics is incomplete and at point incorrect 
> (HDFS latencies are in ns rather than ms for example).
> We should clean this up and add other related metrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HBASE-6492) Remove Reflection based Hadoop abstractions

2017-04-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-6492.
--
Resolution: Won't Fix

Closing old issue.

> Remove Reflection based Hadoop abstractions
> ---
>
> Key: HBASE-6492
> URL: https://issues.apache.org/jira/browse/HBASE-6492
> Project: HBase
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>
> In 0.96 we now have the Hadoop1-compat and Hadoop2-compat projects.
> The reflection we're using to deal with different versions of Hadoop should 
> be removed in favour of using the compact projects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HBASE-10145) Table creation should proceed in the presence of a stale znode

2017-04-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-10145.
---
Resolution: Won't Fix

Closing old issue.

> Table creation should proceed in the presence of a stale znode
> --
>
> Key: HBASE-10145
> URL: https://issues.apache.org/jira/browse/HBASE-10145
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Minor
>
> HBASE-7600 fixed a race condition where concurrent attempts to create the 
> same table could succeed.
> An unfortunate side effect is that it is now impossible to create a table as 
> long as the table's znode is around, which is an issue when a cluster was 
> wiped at the HDFS level.
> Minor issue as we have discussed this many times before, but it ought to be 
> possible to check whether the table directory exists and if not either create 
> it or remove the corresponding znode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HBASE-9739) HBaseClient does not behave nicely when the called thread is interrupted

2017-04-07 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-9739.
--
Resolution: Won't Fix

Old issue. No update. Closing.

> HBaseClient does not behave nicely when the called thread is interrupted
> 
>
> Key: HBASE-9739
> URL: https://issues.apache.org/jira/browse/HBASE-9739
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Just ran into a scenario where HBaseClient became permanently useless after 
> we interrupted the using thread.
> The problem is here:
> {code}
>   } catch(IOException e) {
> markClosed(e);
> {code}
> In sendParam(...).
> If the IOException is caused by an interrupt we should not close the 
> connection.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17893) Allow HBase to build against Hadoop 2.8.0

2017-04-07 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-17893:
-

 Summary: Allow HBase to build against Hadoop 2.8.0
 Key: HBASE-17893
 URL: https://issues.apache.org/jira/browse/HBASE-17893
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1671, column 8]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
{code}

The in the generated LICENSE.
{code}
This product includes Nimbus JOSE+JWT licensed under the The Apache Software 
License, Version 2.0.

${dep.licenses[0].comments}
Please check  this License for acceptability here:

https://www.apache.org/legal/resolved

If it is okay, then update the list named 'non_aggregate_fine' in the 
LICENSE.vm file.
If it isn't okay, then revert the change that added the dependency.

More info on the dependency:

com.nimbusds
nimbus-jose-jwt
3.9

maven central search
g:com.nimbusds AND a:nimbus-jose-jwt AND v:3.9

project website
https://bitbucket.org/connect2id/nimbus-jose-jwt
project source
https://bitbucket.org/connect2id/nimbus-jose-jwt
{code}

Maybe the problem is just that it says: Apache _Software_ License



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18000) Make sure we always return the scanner id with ScanResponse

2017-05-04 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-18000:
-

 Summary: Make sure we always return the scanner id with 
ScanResponse
 Key: HBASE-18000
 URL: https://issues.apache.org/jira/browse/HBASE-18000
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


Some external tooling (like OpenTSDB) relies on the scanner id to tie 
asynchronous responses back to their requests.

(see comments on HBASE-17489)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18165) Predicate based deletion during major compactions

2017-06-05 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-18165:
-

 Summary: Predicate based deletion during major compactions
 Key: HBASE-18165
 URL: https://issues.apache.org/jira/browse/HBASE-18165
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


In many cases it is expensive to place a delete per version, column, or family.
HBase should have way to specify a predicate and remove all Cells matching the 
predicate during the next compactions (major and minor).

Nothing more concrete. The tricky part would be to know when it is safe to 
remove the predicate, i.e. when we can be sure that all Cells matching the 
predicate actually have been removed.

Could potentially use HBASE-12859 for that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18228) HBCK improvements

2017-06-16 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-18228:
-

 Summary: HBCK improvements
 Key: HBASE-18228
 URL: https://issues.apache.org/jira/browse/HBASE-18228
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


We just had a prod issue and running HBCK the way we did actually causes more 
problems.
In part HBCK did stuff we did not expect, in part we had little visibility into 
what HBCK was doing, and in part the logging was confusing.

I'm proposing 2 improvements:
1. A dry-run mode. Run, and just list what would have been done.
2. An interactive mode. Run, and for each action request Y/N user input. So 
that a user can opt-out of stuff.

[~jmhsieh], FYI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19458) Allow building HBase 1.3.x against Hadoop 2.8.2

2017-12-07 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-19458:
-

 Summary: Allow building HBase 1.3.x against Hadoop 2.8.2
 Key: HBASE-19458
 URL: https://issues.apache.org/jira/browse/HBASE-19458
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19534) Document risks of RegionObserver.preStoreScannerOpen

2017-12-16 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-19534:
-

 Summary: Document risks of RegionObserver.preStoreScannerOpen
 Key: HBASE-19534
 URL: https://issues.apache.org/jira/browse/HBASE-19534
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


We just had an outage because we used preStoreScannerOpen, which, in our case, 
created a new StoreScanner. In HBase versions before 1.3 this caused a definite 
memory leak, a reference to the old StoreScanner (if not null) would be held by 
the store until the region is closed.

The 1.3 and later there's no such leak, but still the old scanner is not 
properly closed.

This should be added to the Javadoc and the ZooKeeperScanPolicyObserver example 
should be fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-13094) Consider Filters that are evaluated before deletes and see delete markers

2017-12-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-13094.
---
Resolution: Won't Fix

No interest. Closing.

> Consider Filters that are evaluated before deletes and see delete markers
> -
>
> Key: HBASE-13094
> URL: https://issues.apache.org/jira/browse/HBASE-13094
> Project: HBase
>  Issue Type: Brainstorming
>  Components: regionserver, Scanners
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 13094-0.98.txt
>
>
> That would be good for full control filtering of all cells, such as needed 
> for some transaction implementations.
> [~ghelmling]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-15453) [Performance] Considering reverting HBASE-10015 - reinstate synchronized in StoreScanner

2017-12-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-15453.
---
Resolution: Won't Fix

Lemme just close. In 1.3+ it's not an issue anyway (the need to synchronize is 
gone there)

> [Performance] Considering reverting HBASE-10015 - reinstate synchronized in 
> StoreScanner
> 
>
> Key: HBASE-15453
> URL: https://issues.apache.org/jira/browse/HBASE-15453
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Critical
> Attachments: 15453-0.98.txt
>
>
> In HBASE-10015 back then I found that intrinsic locks (synchronized) in 
> StoreScanner are slower that explicit locks.
> I was surprised by this. To make sure I added a simple perf test and many 
> folks ran it on their machines. All found that explicit locks were faster.
> Now... I just ran that test again. On the latest JDK8 I find that now the 
> intrinsic locks are significantly faster:
> (OpenJDK Runtime Environment (build 1.8.0_72-b15))
> Explicit locks:
> 10 runs  mean:2223.6 sigma:72.29412147609237
> Intrinsic locks:
> 10 runs  mean:1865.3 sigma:32.63755505548784
> I confirmed the same with timing some Phoenix scans. We can save a bunch of 
> time by changing this back 
> Arrghhh... So maybe it's time to revert this now...?
> (Note that in trunk due to [~ram_krish]'s work, we do not lock in 
> StoreScanner anymore)
> I'll attach the perf test and a patch that changes lock to synchronized, if 
> some folks could run this on 0.98, that'd be great.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19631) Allow building HBase 1.5.x against Hadoop 3.0.0

2017-12-26 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-19631:
-

 Summary: Allow building HBase 1.5.x against Hadoop 3.0.0
 Key: HBASE-19631
 URL: https://issues.apache.org/jira/browse/HBASE-19631
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-19631) Allow building HBase 1.5.x against Hadoop 3.0.0

2018-01-25 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-19631.
---
Resolution: Fixed

Committed to hbase-1 branch.

> Allow building HBase 1.5.x against Hadoop 3.0.0
> ---
>
> Key: HBASE-19631
> URL: https://issues.apache.org/jira/browse/HBASE-19631
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: 19631.txt
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20446) Allow building HBase 1.x against Hadoop 3.1.0

2018-04-17 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-20446:
-

 Summary: Allow building HBase 1.x against Hadoop 3.1.0
 Key: HBASE-20446
 URL: https://issues.apache.org/jira/browse/HBASE-20446
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 20446.txt





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20459) Majority of scan time in HBase-1 spent in size estimation

2018-04-19 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-20459:
-

 Summary: Majority of scan time in HBase-1 spent in size estimation
 Key: HBASE-20459
 URL: https://issues.apache.org/jira/browse/HBASE-20459
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
 Attachments: Screenshot_20180419_162559.png





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-6562) Fake KVs are sometimes passed to filters

2018-05-12 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-6562.
--
Resolution: Fixed

In 1.4+ this should be fixed.

> Fake KVs are sometimes passed to filters
> 
>
> Key: HBASE-6562
> URL: https://issues.apache.org/jira/browse/HBASE-6562
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 6562-0.94-v1.txt, 6562-0.96-v1.txt, 6562-v2.txt, 
> 6562-v3.txt, 6562-v4.txt, 6562-v5.txt, 6562.txt, minimalTest.java
>
>
> In internal tests at Salesforce we found that fake row keys sometimes are 
> passed to filters (Filter.filterRowKey(...) specifically).
> The KVs are eventually filtered by the StoreScanner/ScanQueryMatcher, but the 
> row key is passed to filterRowKey in RegionScannImpl *before* that happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21033) Separate StoreHeap from StoreFileHeap

2018-08-09 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-21033:
-

 Summary: Separate StoreHeap from StoreFileHeap
 Key: HBASE-21033
 URL: https://issues.apache.org/jira/browse/HBASE-21033
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl


Currently KeyValueHeap is used for both, heaps of StoreScanners at the Region 
level as well as heaps of StoreFileScanners (and a MemstoreScanner) at the 
Store level.

This is various problems:
 # Some incorrect method usage can only be deduced at runtime via runtime 
exception.
 # In profiling sessions it's hard to distinguish the two.
 # It's just not clean :)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-11767) [0.94] Unnecessary garbage produced by schema metrics during scanning

2014-08-16 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-11767:
-

 Summary: [0.94] Unnecessary garbage produced by schema metrics 
during scanning
 Key: HBASE-11767
 URL: https://issues.apache.org/jira/browse/HBASE-11767
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.23


Near the end of StoreScanner.next(...) we find this gem:
{code}
} finally {
  if (cumulativeMetric > 0 && metric != null) {
RegionMetricsStorage.incrNumericMetric(this.metricNamePrefix + metric,
cumulativeMetric);
  }
}
{code}

So, for each row generated we build up a new metric string, that will be 
identical for each invocation of the StoreScanner anyway (a store scanner is 
valid for at most one region and one operation).




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-11767) [0.94] Unnecessary garbage produced by schema metrics during scanning

2014-08-16 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-11767.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.94.
Thanks for having a look [~mbertozzi]

> [0.94] Unnecessary garbage produced by schema metrics during scanning
> -
>
> Key: HBASE-11767
> URL: https://issues.apache.org/jira/browse/HBASE-11767
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.23
>
> Attachments: 11767.txt
>
>
> Near the end of StoreScanner.next(...) we find this gem:
> {code}
> } finally {
>   if (cumulativeMetric > 0 && metric != null) {
> RegionMetricsStorage.incrNumericMetric(this.metricNamePrefix + metric,
> cumulativeMetric);
>   }
> }
> {code}
> So, for each row generated we build up a new metric string, that will be 
> identical for each invocation of the StoreScanner anyway (a store scanner is 
> valid for at most one region and one operation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HBASE-11778) Scale timestamps by 1000

2014-08-19 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-11778:
-

 Summary: Scale timestamps by 1000
 Key: HBASE-11778
 URL: https://issues.apache.org/jira/browse/HBASE-11778
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


The KV timestamps are used for various reasons:
# ordering of KVs
# resolving conflicts
# enforce TTL

Currently we assume that the timestamps have a resolution of 1ms, and because 
of that we made the resolution at which we can determine time identical to the 
resolution at which we can store time.
I think it is time to disentangle the two... At least allow a higher resolution 
of time to be stored. That way we could have a centralized transaction oracle 
that produces ids that relate to wall clock time, and at the same time allow 
producing more than 1000/s.

The simplest way is to just store time in us (microseconds). I.e. we'd still 
collect time in ms by default and just multiply that with 1000 before we store 
it. With 8 bytes that still gives us a range of 292471 years.

We'd have grandfather in old data. Could write a metadata entry into each HFile 
declaring what the TS resolution is if it is different from ms.

Not sure, yet, how this would relate to using the TS for things like seqIds.

Let's do some brainstorming. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (HBASE-9746) RegionServer can't start when replication tries to replicate to an unknown host

2014-08-21 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-9746:
--


We've just seen this again. It's bad when HBase cannot come up due to missing 
DNS entries for a slave cluster.

> RegionServer can't start when replication tries to replicate to an unknown 
> host
> ---
>
> Key: HBASE-9746
> URL: https://issues.apache.org/jira/browse/HBASE-9746
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.12
>Reporter: Lars Hofhansl
>Priority: Minor
> Fix For: 0.99.0, 2.0.0, 0.98.7, 0.94.24
>
>
> Just ran into this:
> {code}
> 13/10/11 00:37:02 [regionserver60020] WARN  zookeeper.ZKConfig(204): 
> java.net.UnknownHostException: : Name or service not known
>   at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1239)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1155)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1091)
>   at java.net.InetAddress.getByName(InetAddress.java:1041)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:201)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:147)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1096)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:749)
>   at java.lang.Thread.run(Thread.java:722)
> 13/10/11 00:37:02 [regionserver60020] ERROR zookeeper.ZKConfig(210): no valid 
> quorum servers found in zoo.cfg
> 13/10/11 00:37:02 [regionserver60020] WARN  regionserver.HRegionServer(1108): 
> Exception in region server : 
> java.io.IOException: Unable to determine ZooKeeper ensemble
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:116)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:153)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1096)
>   a

[jira] [Created] (HBASE-11811) Use binary search for seeking into a block

2014-08-22 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-11811:
-

 Summary: Use binary search for seeking into a block
 Key: HBASE-11811
 URL: https://issues.apache.org/jira/browse/HBASE-11811
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


Currently upon every seek (including Gets) we need to linearly look through the 
block from the beginning until we find the Cell we are looking for.

It should be possible to build a simple cache of offsets of Cells for each 
block as it is loaded and then use binary search to find the Cell in question.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (HBASE-9746) RegionServer can't start when replication tries to replicate to an unknown host

2014-08-25 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-9746:
--


Looks like this has broken some tests potentially... Will check tomorrow. 
Reopening for now.

> RegionServer can't start when replication tries to replicate to an unknown 
> host
> ---
>
> Key: HBASE-9746
> URL: https://issues.apache.org/jira/browse/HBASE-9746
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.12
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>
> Attachments: 9746-0.98.txt, 9746-trunk.txt
>
>
> Just ran into this:
> {code}
> 13/10/11 00:37:02 [regionserver60020] WARN  zookeeper.ZKConfig(204): 
> java.net.UnknownHostException: : Name or service not known
>   at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1239)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1155)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1091)
>   at java.net.InetAddress.getByName(InetAddress.java:1041)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:201)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:147)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1096)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:749)
>   at java.lang.Thread.run(Thread.java:722)
> 13/10/11 00:37:02 [regionserver60020] ERROR zookeeper.ZKConfig(210): no valid 
> quorum servers found in zoo.cfg
> 13/10/11 00:37:02 [regionserver60020] WARN  regionserver.HRegionServer(1108): 
> Exception in region server : 
> java.io.IOException: Unable to determine ZooKeeper ensemble
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:116)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:153)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDu

[jira] [Resolved] (HBASE-9746) RegionServer can't start when replication tries to replicate to an unknown host

2014-08-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-9746.
--

Resolution: Fixed

Committed addendum.
Meh, that was more painful than it should have been.

> RegionServer can't start when replication tries to replicate to an unknown 
> host
> ---
>
> Key: HBASE-9746
> URL: https://issues.apache.org/jira/browse/HBASE-9746
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.12
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>
> Attachments: 9746-0.98.txt, 9746-addendum.txt, 9746-trunk.txt
>
>
> Just ran into this:
> {code}
> 13/10/11 00:37:02 [regionserver60020] WARN  zookeeper.ZKConfig(204): 
> java.net.UnknownHostException: : Name or service not known
>   at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:894)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1286)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1239)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1155)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1091)
>   at java.net.InetAddress.getByName(InetAddress.java:1041)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:201)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:147)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1096)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:749)
>   at java.lang.Thread.run(Thread.java:722)
> 13/10/11 00:37:02 [regionserver60020] ERROR zookeeper.ZKConfig(210): no valid 
> quorum servers found in zoo.cfg
> 13/10/11 00:37:02 [regionserver60020] WARN  regionserver.HRegionServer(1108): 
> Exception in region server : 
> java.io.IOException: Unable to determine ZooKeeper ensemble
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:116)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:153)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:127)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.reloadZkWatcher(ReplicationPeer.java:170)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeer.(ReplicationPeer.java:69)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:343)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:308)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectExistingPeers(ReplicationZookeeper.java:189)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.(ReplicationZookeeper.java:156)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:89)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:3986)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:3955)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1412)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServ

[jira] [Created] (HBASE-11876) RegionScanner.nextRaw(...) should update metrics

2014-09-01 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-11876:
-

 Summary: RegionScanner.nextRaw(...) should update metrics
 Key: HBASE-11876
 URL: https://issues.apache.org/jira/browse/HBASE-11876
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.6
Reporter: Lars Hofhansl


I added the RegionScanner.nextRaw(...) to allow "smart" client to avoid some of 
the default work that HBase is doing, such as {start|stop}RegionOperation and 
synchronized(scanner) for each row.

Metrics should follow the same approach. Collecting them per row is expensive 
and a caller should have the option to collect those later or to avoid 
collecting them completely.

We can also save some cycles in RSRcpServices.scan(...) if we updated the 
metric only once/batch instead of each row.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-11923) Potentiial race condition in RecoverableZookeeper.checkZk()

2014-09-09 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-11923:
-

 Summary: Potentiial race condition in 
RecoverableZookeeper.checkZk()
 Key: HBASE-11923
 URL: https://issues.apache.org/jira/browse/HBASE-11923
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.99.0, 2.0.0, 0.98.7, 0.94.24


[~apurtell] pointed out a potential race condition in 
RecoverableZookeeper.checkZk() that I introduced in parent.

If multiple threads would enter that method at the same time without a valid 
Zookeeper reference, we could leak Zookeeper objects.

Since this is not a on a hot code path we should just synchronize the two 
involved methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-11935) Unbounded creation of Replication Failover workers

2014-09-10 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-11935:
-

 Summary: Unbounded creation of Replication Failover workers
 Key: HBASE-11935
 URL: https://issues.apache.org/jira/browse/HBASE-11935
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Critical
 Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1


We just ran into a production incident with TCP SYN storms on port 2181 
(zookeeper).

In our case the slave cluster was not running. When we bounced the primary 
cluster we saw an "unbounded" number of failover threads all hammering the 
hosts on the slave ZK machines (which did not run ZK at the time)... Causing 
overall degradation of network performance between datacenters.

Looking at the code we noticed that the thread pool handling of the Failover 
workers was probably unintended.

Patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-11963) Synchronize peer cluster replication connection attempts

2014-09-12 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-11963.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Fixed. Thanks [~sukuna...@gmail.com].

> Synchronize peer cluster replication connection attempts
> 
>
> Key: HBASE-11963
> URL: https://issues.apache.org/jira/browse/HBASE-11963
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Maddineni Sukumar
> Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
> Attachments: 11963-0.94.txt, HBASE-11963-0.98.patch, HBASE-11963.patch
>
>
> Synchronize peer cluster connection attempts to avoid races and rate limit 
> connections when multiple replication sources try to connect to the peer 
> cluster. If the peer cluster is down we can get out of control over time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-11957) Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect

2014-09-19 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-11957.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

> Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() 
> seems incorrect
> -
>
> Key: HBASE-11957
> URL: https://issues.apache.org/jira/browse/HBASE-11957
> Project: HBase
>  Issue Type: Bug
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Critical
> Fix For: 0.94.24
>
> Attachments: HBASE-5974-0.94-v1.diff, verify-test.patch
>
>
> HBASE-5974:Scanner retry behavior with RPC timeout on next() seems incorrect, 
> which cause data missing in hbase scan.
> I think we should fix it in 0.94.
> [~lhofhansl]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12077) FilterList create many ArrayList$Itr object per row.

2014-09-23 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12077:
-

 Summary: FilterList create many ArrayList$Itr object per row.
 Key: HBASE-12077
 URL: https://issues.apache.org/jira/browse/HBASE-12077
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
 Attachments: 12077-0.98.txt





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12091) optionally ignore edits for dropped tables for replication

2014-09-24 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12091:
-

 Summary: optionally ignore edits for dropped tables for replication
 Key: HBASE-12091
 URL: https://issues.apache.org/jira/browse/HBASE-12091
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


We just ran into a scenario where we dropped a table from both the source and 
the sink, but the source still has outstanding edits that now it could not get 
rid of. Now all replication is backed up behind these unreplicatable edits.
We should have an option to ignore edits for tables dropped at the source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-11957) Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect

2014-09-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-11957:
---

> Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() 
> seems incorrect
> -
>
> Key: HBASE-11957
> URL: https://issues.apache.org/jira/browse/HBASE-11957
> Project: HBase
>  Issue Type: Bug
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Critical
> Fix For: 0.94.24
>
> Attachments: HBASE-5974-0.94-v1.diff, verify-test.patch
>
>
> HBASE-5974:Scanner retry behavior with RPC timeout on next() seems incorrect, 
> which cause data missing in hbase scan.
> I think we should fix it in 0.94.
> [~lhofhansl]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-11957) Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect

2014-09-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-11957.
---
Resolution: Fixed

> Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() 
> seems incorrect
> -
>
> Key: HBASE-11957
> URL: https://issues.apache.org/jira/browse/HBASE-11957
> Project: HBase
>  Issue Type: Bug
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Critical
> Fix For: 0.94.24
>
> Attachments: 11957-addendum.txt, HBASE-5974-0.94-v1.diff, 
> verify-test.patch
>
>
> HBASE-5974:Scanner retry behavior with RPC timeout on next() seems incorrect, 
> which cause data missing in hbase scan.
> I think we should fix it in 0.94.
> [~lhofhansl]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12113) Backport to 0.94: HBASE-5525 Truncate and preserve region boundaries option

2014-09-28 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12113.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.94.
Thanks [~busbey]

> Backport to 0.94: HBASE-5525 Truncate and preserve region boundaries option
> ---
>
> Key: HBASE-12113
> URL: https://issues.apache.org/jira/browse/HBASE-12113
> Project: HBase
>  Issue Type: Task
>  Components: shell
>Affects Versions: 0.94.24
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Fix For: 0.94.24
>
> Attachments: HBASE-12113-0.94.1.patch.txt
>
>
> per [mailing 
> list|http://mail-archives.apache.org/mod_mbox/hbase-user/201409.mbox/%3C1411880636.89071.YahooMailNeo%40web140604.mail.bf1.yahoo.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12114) Meta table cache hashing may access the wrong table

2014-09-29 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12114.
---
Resolution: Fixed

Committed to 0.94

Thanks for the patch [~heliangliang].
If you get time to think of a test, we can add one in another jira.

> Meta table cache hashing may access the wrong table
> ---
>
> Key: HBASE-12114
> URL: https://issues.apache.org/jira/browse/HBASE-12114
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Reporter: He Liangliang
>Assignee: He Liangliang
> Fix For: 0.94.24
>
> Attachments: HBASE-12114.patch
>
>
> The cache key used a hash code from a weak hash algorithm 
> Bytes.mapKey(tableName), which can cause collisions in a quite high 
> probability then make the client accessing a wrong region from another table.
> Example: "2fx0" and "2fvn"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-12019) hbase-daemon.sh overwrite HBASE_ROOT_LOGGER and HBASE_SECURITY_LOGGER variables

2014-09-29 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-12019:
---

[~stack], looks like you misapplied the 0.94 patch. (replaced DRFA\{S} with 
RFA\{S})

> hbase-daemon.sh overwrite HBASE_ROOT_LOGGER and HBASE_SECURITY_LOGGER 
> variables
> ---
>
> Key: HBASE-12019
> URL: https://issues.apache.org/jira/browse/HBASE-12019
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.99.0, 0.94.23, 0.98.6
> Environment: Linux
>Reporter: Sebastien Barrier
>Assignee: Sebastien Barrier
>Priority: Minor
>  Labels: patch
> Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
> Attachments: HBASE-12019.patch
>
>
> hbase-env.sh is supposed to be used to set environment variables like 
> HBASE_ROOT_LOGGER and HBASE_SECURITY_LOGGER but hbase-daemon.sh overwrite the 
> export of thoses, the benefit of hbase-env.sh is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12019) hbase-daemon.sh overwrite HBASE_ROOT_LOGGER and HBASE_SECURITY_LOGGER variables

2014-09-29 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12019.
---
Resolution: Fixed

Done. Rolling new RC.

> hbase-daemon.sh overwrite HBASE_ROOT_LOGGER and HBASE_SECURITY_LOGGER 
> variables
> ---
>
> Key: HBASE-12019
> URL: https://issues.apache.org/jira/browse/HBASE-12019
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.99.0, 0.94.23, 0.98.6
> Environment: Linux
>Reporter: Sebastien Barrier
>Assignee: Sebastien Barrier
>Priority: Minor
>  Labels: patch
> Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
> Attachments: 12019-0.94-addendum.txt, HBASE-12019.patch
>
>
> hbase-env.sh is supposed to be used to set environment variables like 
> HBASE_ROOT_LOGGER and HBASE_SECURITY_LOGGER but hbase-daemon.sh overwrite the 
> export of thoses, the benefit of hbase-env.sh is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12146) RegionServerTracker should escape data in log messages

2014-10-01 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12146:
-

 Summary: RegionServerTracker should escape data in log messages
 Key: HBASE-12146
 URL: https://issues.apache.org/jira/browse/HBASE-12146
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Trivial


Trivial thing I observed when testing 0.94.24RC2.

I see a log message of the form:
2014-10-01 13:52:35,632 INFO 
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: Rs node: 
/hbase/rs/newbunny,52514,1412196754788 data: PBUο^C

Obviously the tracker does not escape the value (or maybe it shouldn't log it 
in the fist place)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12173) Backport: [PE] Allow random value size

2014-10-04 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12173:
-

 Summary: Backport: [PE] Allow random value size
 Key: HBASE-12173
 URL: https://issues.apache.org/jira/browse/HBASE-12173
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
 Fix For: 0.98.7, 0.94.25






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-12146) RegionServerTracker should escape data in log messages

2014-10-04 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-12146:
---

Still want the message fix in 0.94 and 0.98 :)

> RegionServerTracker should escape data in log messages
> --
>
> Key: HBASE-12146
> URL: https://issues.apache.org/jira/browse/HBASE-12146
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
>Priority: Trivial
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12146.txt
>
>
> Trivial thing I observed when testing 0.94.24RC2.
> I see a log message of the form:
> 2014-10-01 13:52:35,632 INFO 
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: Rs node: 
> /hbase/rs/newbunny,52514,1412196754788 data: PBUο^C
> Obviously the tracker does not escape the value (or maybe it shouldn't log it 
> in the fist place)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12171) Backport: PerformanceEvaluation: getSplits doesn't provide right splits.

2014-10-04 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12171.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.94 (without the totalRows bit)

Thanks [~jmspaggi]

> Backport: PerformanceEvaluation: getSplits doesn't provide right splits.
> 
>
> Key: HBASE-12171
> URL: https://issues.apache.org/jira/browse/HBASE-12171
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.23
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
> Fix For: 0.94.25
>
> Attachments: HBASE-12171-v0-0.94.patch
>
>
> Only in 0.94 branch. getSplits provides an extra region. when asked 24, will 
> get 25. etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12303) Seek to next row after family delete markers

2014-10-20 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12303:
-

 Summary: Seek to next row after family delete markers
 Key: HBASE-12303
 URL: https://issues.apache.org/jira/browse/HBASE-12303
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


Currently we seek to the next column when we encounter a family delete marker.
I think we safely seek the current store to next row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12311) Version stats in HFiles?

2014-10-21 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12311:
-

 Summary: Version stats in HFiles?
 Key: HBASE-12311
 URL: https://issues.apache.org/jira/browse/HBASE-12311
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


In HBASE-9778 I basically punted the decision on whether doing repeated 
scanner.next() called instead of the issueing (re)seeks to the user.
I think we can do better.

One way do that is maintain simple stats of what the maximum number of versions 
we've seen for any row/col combination and store these in the HFile's metadata 
(just like the timerange, oldest Put, etc).

Then we estimate fairly accurately whether we have to expect lots of versions 
(i.e. seek between columns is better) or not (in which case we'd issue repeated 
next()'s).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12334) Handling of DeserializationException causes needless retry on failure

2014-10-24 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12334.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed to 0.98+

> Handling of DeserializationException causes needless retry on failure
> -
>
> Key: HBASE-12334
> URL: https://issues.apache.org/jira/browse/HBASE-12334
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.7
>Reporter: James Taylor
>Assignee: Lars Hofhansl
>  Labels: Phoenix
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12334-0.98.txt
>
>
> If an unexpected exception occurs when deserialization occurs for a custom 
> filter, the exception gets wrapped in a DeserializationException. Since this 
> exception is in turn wrapped in an IOException, the many loop retry logic 
> kicks in. The net effect is that this same deserialization error occurs again 
> and again as the retries occur, just causing the client to wait needlessly.
> IMO, either the parseFrom methods should be allowed to throw whatever type of 
> IOException they'd like, in which case they could throw a 
> DoNotRetryIOException, or a DeserializationException should be wrapped in a 
> DoNotRetryIOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-10-28 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12363:
-

 Summary: KEEP_DELETED_CELLS considered harmful?
 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl


Brainstorming...

This morning in the train (of all places) I realized a fundamental issue in how 
KEEP_DELETED_CELLS is implemented.

The problem is around knowing when it is safe to remove a delete marker (we 
cannot remove it unless all cells affected by it are remove otherwise).
This was particularly hard for family marker, since they sort before all cells 
of a row, and hence scanning forward through an HFile you cannot know whether 
the family markers are still needed until at least the entire row is scanned.

My solution was to keep the TS of the oldest put in any given HFile, and only 
remove delete markers older than that TS.
That sounds good on the face of it... But now imagine you wrote a version of 
ROW 1 and then never update it again. Then later you write a billion other rows 
and delete them all. Since the TS of the cells in ROW 1 is older than all the 
delete markers for the other billion rows, these will never be collected... At 
least for the region that hosts ROW 1 after a major compaction.

I don't see a good way out of this. In parent I outlined these four solutions:
So there are three options I think:
# Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply 
to deleted rows or delete marker rows (wouldn't know how long to keep family 
deletes in that case). (MAX)VERSIONS would still be enforced on all rows types 
except for family delete markers.
# Translate family delete markers to column delete marker at (major) compaction 
time.
# Change HFileWriterV* to keep track of the earliest put TS in a store and 
write it to the file metadata. Use that use expire delete marker that are older 
and hence can't affect any puts in the file.
# Have Store.java keep track of the earliest put in internalFlushCache and 
compactStore and then append it to the file metadata. That way HFileWriterV* 
would not need to know about KVs.

And I implemented #4.

I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-12274) Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() may produce null pointer exception

2014-10-30 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-12274:
---

Just seeing this now.
Adding synchronized to nextRaw is *not* OK.
I added nextRaw specifically for callers who know what they are doing (like 
Phoenix).

This needs to reverted and we need to find another solution.

-1


> Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() 
> may produce null pointer exception
> --
>
> Key: HBASE-12274
> URL: https://issues.apache.org/jira/browse/HBASE-12274
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12274-region-server.log, 12274-v2.txt, 12274-v2.txt, 
> 12274-v3.txt
>
>
> I saw the following in region server log:
> {code}
> 2014-10-15 03:28:36,976 ERROR 
> [B.DefaultRpcServer.handler=0,queue=0,port=60020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5023)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4932)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4923)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3245)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is where the NPE happened:
> {code}
> // Let's see what we have in the storeHeap.
> KeyValue current = this.storeHeap.peek();
> {code}
> The cause was race between nextInternal(called through nextRaw) and close 
> methods.
> nextRaw() is not synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12411) Avoid seek + read completely?

2014-11-02 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12411:
-

 Summary: Avoid seek + read completely?
 Key: HBASE-12411
 URL: https://issues.apache.org/jira/browse/HBASE-12411
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


In the light of HDFS-6735 we might want to consider refraining from seek + read 
completely and only perform preads.

For example currently a compaction can lock out every other scanner over the 
file which the compaction is currently reading for compaction.

At the very least we can introduce an option to avoid seek + read, so we can 
allow testing this in various scenarios.
This will definitely be of great importance for projects like Phoenix which 
parallelize queries intra region (and hence readers will used concurrently by 
multiple scanner with high likelihood.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12417) Scan copy constructor does not retain "small" attribute

2014-11-03 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12417:
-

 Summary: Scan copy constructor does not retain "small" attribute
 Key: HBASE-12417
 URL: https://issues.apache.org/jira/browse/HBASE-12417
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


See Scan(Scan), the small member is not copied over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12417) Scan copy constructor does not retain small attribute

2014-11-03 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12417.
---
   Resolution: Fixed
Fix Version/s: 0.99.2
   0.98.8
   2.0.0
 Hadoop Flags: Reviewed

Thanks [~apurtell]. Committed to 0.98+

> Scan copy constructor does not retain small attribute
> -
>
> Key: HBASE-12417
> URL: https://issues.apache.org/jira/browse/HBASE-12417
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12417.txt
>
>
> See Scan(Scan), the small member is not copied over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12363) Improve how KEEP_DELETED_CELLS works with MIN_VERSIONS

2014-11-04 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12363.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.98, -1, and master.

Thanks for the reviews.

> Improve how KEEP_DELETED_CELLS works with MIN_VERSIONS
> --
>
> Key: HBASE-12363
> URL: https://issues.apache.org/jira/browse/HBASE-12363
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>  Labels: Phoenix
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12363-0.98.txt, 12363-1.0.txt, 12363-master.txt, 
> 12363-test.txt, 12363-v2.txt, 12363-v3.txt
>
>
> Brainstorming...
> This morning in the train (of all places) I realized a fundamental issue in 
> how KEEP_DELETED_CELLS is implemented.
> The problem is around knowing when it is safe to remove a delete marker (we 
> cannot remove it unless all cells affected by it are remove otherwise).
> This was particularly hard for family marker, since they sort before all 
> cells of a row, and hence scanning forward through an HFile you cannot know 
> whether the family markers are still needed until at least the entire row is 
> scanned.
> My solution was to keep the TS of the oldest put in any given HFile, and only 
> remove delete markers older than that TS.
> That sounds good on the face of it... But now imagine you wrote a version of 
> ROW 1 and then never update it again. Then later you write a billion other 
> rows and delete them all. Since the TS of the cells in ROW 1 is older than 
> all the delete markers for the other billion rows, these will never be 
> collected... At least for the region that hosts ROW 1 after a major 
> compaction.
> Note, in a sense that is what HBase is supposed to do when keeping deleted 
> cells: Keep them until they would be removed by some other means (for example 
> TTL, or MAX_VERSION when new versions are inserted).
> The specific problem here is that even as all KVs affected by a delete marker 
> are expired this way the marker would not be removed if there just one older 
> KV in the HStore.
> I don't see a good way out of this. In parent I outlined these four solutions:
> So there are three options I think:
> # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
> apply to deleted rows or delete marker rows (wouldn't know how long to keep 
> family deletes in that case). (MAX)VERSIONS would still be enforced on all 
> rows types except for family delete markers.
> # Translate family delete markers to column delete marker at (major) 
> compaction time.
> # Change HFileWriterV* to keep track of the earliest put TS in a store and 
> write it to the file metadata. Use that use expire delete marker that are 
> older and hence can't affect any puts in the file.
> # Have Store.java keep track of the earliest put in internalFlushCache and 
> compactStore and then append it to the file metadata. That way HFileWriterV* 
> would not need to know about KVs.
> And I implemented #4.
> I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12443) After increasing the TTL value of a Hbase Table , table gets inaccessible. Scan table not working.

2014-11-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12443.
---
Resolution: Duplicate

Dup of HBASE-11419.
Please do file the same issue again. In HBASE-11419 you mention 0.94.6 being 
your version. Can you try with a later version of 0.94?
You can upgrade from 0.94.6 directly to 0.94.24 with a rolling restart.

> After increasing the TTL value of a Hbase Table , table gets inaccessible. 
> Scan table not working.
> --
>
> Key: HBASE-12443
> URL: https://issues.apache.org/jira/browse/HBASE-12443
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Reporter: Prabhu Joseph
>Priority: Blocker
>
> After increasing the TTL value of a Hbase Table , table gets inaccessible. 
> Scan table not working.
> Scan in hbase shell throws
> java.lang.IllegalStateException: Block index not loaded
> at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV1.blockContainingKey(HFileReaderV1.java:181)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV1$AbstractScannerV1.seekTo(HFileReaderV1.java:426)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:131)
> at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2015)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:3706)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1761)
> at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1753)
> at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1730)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2409)
> at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> STEPS to Reproduce:
>  create 'debugger',{NAME => 'd',TTL => 15552000}
>  put 'debugger','jdb','d:desc','Java debugger',1399699792000
>  disable 'debugger'
> alter 'debugger',{NAME => 'd',TTL => 6912}
> enable 'debugger'
> scan 'debugger'
> Reason for the issue:
>When inserting already expired data in debugger table, hbase creates a 
> hfile with empty data 
> block and index block. On scanning table, StoreFile.Reader checks whether the 
> TimeRangeTracker's maximum timestamp is greater than ttl value, so it skips 
> the empty file.
>   But when ttl is changed, the maximum timestamp will be lesser than ttl 
> value, so StoreFile.Reader tries to read index block from HFile leading to 
> java.lang.IllegalStateException: Block index not loaded.
> SOLUTION:
> StoreFile.java 
>boolean passesTimerangeFilter(Scan scan, long oldestUnexpiredTS) {
>   if (timeRangeTracker == null) {
> return true;
>   } else {
> return timeRangeTracker.includesTimeRange(scan.getTimeRange()) &&
> timeRangeTracker.getMaximumTimestamp() >= oldestUnexpiredTS;
>   }
> }
> In the above method, by checking whether there are entries in the hfile by 
> using FixedFileTrailer
> block we can skip scanning the empty hfile.
> // changed code will solve the issue
>  boolean passesTimerangeFilter(Scan scan, long oldestUnexpiredTS) {
>   if (timeRangeTracker == null) {
> return true;
>   } else {
> return timeRangeTracker.includesTimeRange(scan.getTimeRange()) &&
> timeRangeTracker.getMaximumTimestamp() >= oldestUnexpiredTS && 
> reader.getEntries()>0;
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-10 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12457:
-

 Summary: Regions in transition for a long time when CLOSE 
interleaves with a slow compaction
 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


Under heave load we have observed regions remaining in transition for 20 
minutes when the master requests a close while a slow compaction is running.

The pattern is always something like this:
# RS starts a compaction
# HM request the region to be closed on this RS
# Compaction is not aborted for another 20 minutes
# The region is in transition and not usable.

In every case I tracked down so far the time between the requested CLOSE and 
abort of the compaction is almost exactly 20 minutes, which is suspicious.

Of course part of the issue is having compactions that take over 20 minutes, 
but maybe we can do better here.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12545) Fix backwards compatibility issue introduced with HBASE-12363

2014-11-20 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12545:
-

 Summary: Fix backwards compatibility issue introduced with 
HBASE-12363
 Key: HBASE-12545
 URL: https://issues.apache.org/jira/browse/HBASE-12545
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12185) Deadlock in HConnectionManager$HConnectionImplementation

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12185.
---
   Resolution: Cannot Reproduce
Fix Version/s: (was: 0.94.26)

Closing until we can confirm that this is still an issue in the latest 0.94.
There have 23 versions since 0.94.2.

Please feel free to re-open.

> Deadlock in HConnectionManager$HConnectionImplementation
> 
>
> Key: HBASE-12185
> URL: https://issues.apache.org/jira/browse/HBASE-12185
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.94.2
> Environment: CDH 4.2.0
>Reporter: Michael Tamm
>Priority: Critical
>
> Here you can see the relevant section of a thread dump:
> {noformat}
> Found one Java-level deadlock:
> =
> "AsyncSave-700512-Worker-EventThread":
>   waiting to lock monitor 0x7f8d90eecd20 (object 0x0005c0a8e1d0, a 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker),
>   which is held by "AsyncSave-700546-Worker"
> "AsyncSave-700546-Worker":
>   waiting to lock monitor 0x7f8d90149700 (object 0x000571404180, a 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation),
>   which is held by "AsyncSave-700512-Worker-EventThread"
> Java stack information for the threads listed above:
> ===
> "AsyncSave-700512-Worker-EventThread":
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.stop(ZooKeeperNodeTracker.java:98)
>   - waiting to lock <0x0005c0a8e1d0> (a 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackers(HConnectionManager.java:603)
>   - locked <0x000571404180> (a 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1681)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:389)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> "AsyncSave-700546-Worker":
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackers(HConnectionManager.java:598)
>   - waiting to lock <0x000571404180> (a 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1681)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:132)
>   - locked <0x0005c0a8e1d0> (a 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
>   at 
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:83)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:841)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:954)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:852)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:954)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:856)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:813)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1503)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1388)
>   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:955)
>   at 
> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.flushCommits(HTablePool.java:449)
>   at ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12621) Explain in the book when KEEP_DELETED_CELLS is useful

2014-12-02 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12621:
-

 Summary: Explain in the book when KEEP_DELETED_CELLS is useful
 Key: HBASE-12621
 URL: https://issues.apache.org/jira/browse/HBASE-12621
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl


KEEP_DELETED_CELLS seems to be very confusing.
The books need further clarification when this setting is useful and what the 
implications are.

(and maybe we should discuss if there's a simpler way to achieve what I 
intended to achieve with this when I implemented it first)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12657) The Region is not being split and far exceeds the desired maximum size.

2014-12-11 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12657.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.94.
Thanks for patch [~vrodionov].

> The Region is not being split and far exceeds the desired maximum size.
> ---
>
> Key: HBASE-12657
> URL: https://issues.apache.org/jira/browse/HBASE-12657
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 0.94.25
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 0.94.26
>
> Attachments: HBASE-12657-0.94.patch, HBASE-12657-0.94.patch.2
>
>
> We are seeing this behavior when creating indexes in one of our environment.
> When an index is being created, most of the "requests" go into a single 
> region.  The amount of time to create an index seems to take longer than 
> usual and it can take days for the regions to compact and split after the 
> index is created.
> Here is a du of the HBase index table:
> {code}
> -bash-4.1$ sudo -su hdfs hadoop fs -du /hbase/43681
> 705  /hbase/43681/.tableinfo.01
> 0/hbase/43681/.tmp
> 27981697293  /hbase/43681/0492e22092e21d35fca8e779b21ec797
> 539687093/hbase/43681/832298c4e975fc47210feb6bac3d2f71
> 560660531/hbase/43681/be9bdb3bdf9365afe5fe90db4247d82c
> 7081938297   /hbase/43681/cd440e524f96fbe0719b2fe969848560
> 6297860287   /hbase/43681/dc893a2d8daa08c689dc69e6bb2c5b50
> 7189607722   /hbase/43681/ffbceaea5e2f142dbe6cd4cbeacc00e8
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-10071) Backport HBASE-6592 to 0.94 branch

2014-12-15 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-10071.
---
   Resolution: Won't Fix
Fix Version/s: (was: 0.94.26)

> Backport HBASE-6592 to 0.94 branch
> --
>
> Key: HBASE-10071
> URL: https://issues.apache.org/jira/browse/HBASE-10071
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Affects Versions: 0.94.14
>Reporter: cuijianwei
>Assignee: Sean Busbey
> Attachments: HBASE-10071-0.94-v1.patch
>
>
> Users tend to run hbase shell to query hbase quickly. The result will be 
> shown as binary format which may not look clear enough when users write 
> columns using specified types, such as long/int/short. Therefore, it may be 
> helpful if the results could be shown as specified format. We make a patch to 
> extend get/scan in hbase shell in which user could specify the data type in 
> get/scan for each column as:
> {code}
> scan 'table', {COLUMNS=>['CF:QF:long']}
> get 'table', 'r0', {COLUMN=>'CF:QF:long'}
> {code}
> Then, the result will be shown as Long type. The result of above get will be:
> {code}
> COLUMNCELL
>   
>  
>  CF:QFtimestamp=24311261, value=24311229
> {code}
> This extended format is compatible with previous format, if users do not 
> specify the data type, the command will also work and output binary format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12765) SplitTransaction creates too many threads (potentially)

2014-12-27 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12765:
-

 Summary: SplitTransaction creates too many threads (potentially)
 Key: HBASE-12765
 URL: https://issues.apache.org/jira/browse/HBASE-12765
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


In splitStoreFiles(...) we create a new thread pool with as many threads as 
there are files to split.
We should be able to do better. During times of very heavy write loads there 
might be a lot of files to split and multiple splits might be going on at the 
same time on the same region server.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12776) SpliTransaction: Log number of files to be split

2014-12-29 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12776:
-

 Summary: SpliTransaction: Log number of files to be split
 Key: HBASE-12776
 URL: https://issues.apache.org/jira/browse/HBASE-12776
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 1.0.0, 2.0.0, 0.98.10, 0.94.27
 Attachments: 12765.txt





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12779) SplitTransaction: Add metrics

2014-12-29 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12779:
-

 Summary: SplitTransaction: Add metrics
 Key: HBASE-12779
 URL: https://issues.apache.org/jira/browse/HBASE-12779
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12776) SpliTransaction: Log number of files to be split

2014-12-29 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12776.
---
  Resolution: Fixed
Assignee: Lars Hofhansl
Hadoop Flags: Reviewed

Committed to all branches.
(the 0.94 version only has the first part as it does not keep track of the 
daughter a/b counts, and it did not make sense to add that part)

> SpliTransaction: Log number of files to be split
> 
>
> Key: HBASE-12776
> URL: https://issues.apache.org/jira/browse/HBASE-12776
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 2.0.0, 0.98.10, 1.0.1, 0.94.27
>
> Attachments: 12765.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12173) Backport: [PE] Allow random value size

2015-01-04 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12173.
---
   Resolution: Later
Fix Version/s: (was: 0.98.10)

> Backport: [PE] Allow random value size
> --
>
> Key: HBASE-12173
> URL: https://issues.apache.org/jira/browse/HBASE-12173
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: Lars Hofhansl
>
> Looked at it briefly. Didn't have time to fix up the patch.
> Actually, removing this from 0.94.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12779) SplitTransaction: Add metrics

2015-01-05 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12779.
---
Resolution: Fixed

And... 0.98

> SplitTransaction: Add metrics
> -
>
> Key: HBASE-12779
> URL: https://issues.apache.org/jira/browse/HBASE-12779
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 1.0.0, 2.0.0, 0.98.10, 1.1.0
>
> Attachments: 12779-v2-0.98.txt, 12779-v2.txt, 12779.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-7724) [0.94] Just OPENED regions are ignored by ServerShutdownHandler and go unassigned for ever

2015-01-09 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-7724.
--
Resolution: Duplicate

> [0.94] Just OPENED regions are ignored by ServerShutdownHandler and go 
> unassigned for ever
> --
>
> Key: HBASE-7724
> URL: https://issues.apache.org/jira/browse/HBASE-7724
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: stack
>
> Visiting a user today, I came across following interesting case (0.94.2 
> HBase).
> A server was added to cluster.  It was assigned regions by the balancer.  A 
> bunch opened on the regionserver and just after the open, the regionserver 
> was manually shutdown.  There was a lag processing the zk region open events 
> in the master (because clean shutdown, there was no zk activity when the 
> regions were closed on the shutdown regionserver).  Processing the server 
> shutdown, we do this for a good few of the regions that had just been opened 
> on the regionserver:
> 2013-01-30 02:41:19,917 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skip assigning 
> region 
> OBFUSCATED_TABLE,OBFUSCATED_STARTROW,1344723216908.55e9cb551edeea0b52bb91af7c2de199.
>  state=OPEN, ts=1359513674715, server=XX.XX.18.40,10304,1359513445136
> Seems like outright bug that'd we'd skip a region that is in transition that 
> is in the OPEN state.
> More detail to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-8265) Simplify ScanQueryMatcher

2015-01-09 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-8265.
--
Resolution: Won't Fix

No longer applicable with trunk changes.

> Simplify ScanQueryMatcher
> -
>
> Key: HBASE-8265
> URL: https://issues.apache.org/jira/browse/HBASE-8265
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> HBASE-8188 has an experimental patch to simplify ScanQueryMatcher and some of 
> its caller. In this issue we can discuss further possible changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12859) Major compaction completion tracker

2015-01-14 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12859:
-

 Summary: Major compaction completion tracker
 Key: HBASE-12859
 URL: https://issues.apache.org/jira/browse/HBASE-12859
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


In various scenarios it is helpful to know a guaranteed timestamp up to which 
all data in a table was major compacted.
We can do that keeping a major compaction timestamp in META.
A client then can iterate all region of a table and find a definite timestamp, 
which is the oldest compaction timestamp of any of the regions.

[~apurtell], [~ghelmling], [~giacomotaylor].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-10528) DefaultBalancer selects plans to move regions onto draining nodes

2015-01-15 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-10528.
---
Resolution: Fixed

> DefaultBalancer selects plans to move regions onto draining nodes
> -
>
> Key: HBASE-10528
> URL: https://issues.apache.org/jira/browse/HBASE-10528
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.5
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 1.0.0, 2.0.0, 0.98.10, 1.1.0, 0.94.27
>
> Attachments: 10528-1.0.addendum, HBASE-10528-0.94.patch, 
> HBASE-10528-0.98.patch, HBASE-10528-0.98.v2.patch, HBASE-10528.patch, 
> HBASE-10528.v2.patch
>
>
> We have quite a large cluster > 100k regions, and we needed to isolate a 
> region was very hot until we could push a patch.  We put this region on its 
> own regionserver and set it in the draining state.  The default balancer was 
> selecting regions to move to this cluster for its region plans.  
> It just so happened for other tables, the default load balancer was creating 
> plans for the draining servers, even though they were not available to move 
> regions to.  Thus we were closing regions, then attempting to move them to 
> the draining server then finding out its draining. 
> We had to disable the balancer to resolve this issue.
> There are some approaches we can take here.
> 1. Exclude draining servers altogether, don't even pass those into the load 
> balancer from HMaster.
> 2. We could exclude draining servers from ceiling and floor calculations 
> where we could potentially skip load balancing because those draining servers 
> wont be represented when deciding whether to balance.
> 3. Along with #2 when assigning regions, we would skip plans to assign 
> regions to those draining servers.
> I am in favor of #1 which is simply removes servers as candidates for 
> balancing if they are in the draining state.
> But I would love to hear what everyone else thinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12904) Threading issues in region_mover.rb

2015-01-22 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12904:
-

 Summary: Threading issues in region_mover.rb
 Key: HBASE-12904
 URL: https://issues.apache.org/jira/browse/HBASE-12904
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


We've seen various race conditions when using region_mover with multiple 
threads.

{code}
NoMethodError: undefined method `getScanner' for nil:NilClass
  isSuccessfulScan at 
/home/sfdc/current//bigdata-hbase/hbase/hbase/bin/region_mover.rb:138
 unloadRegions at 
/home/sfdc/current//bigdata-hbase/hbase/hbase/bin/region_mover.rb:360
{code}

{code}
NoMethodError: undefined method `[]=' for nil:NilClass
   getTable at 
/home/sfdc/current//bigdata-hbase/hbase/hbase/bin/region_mover.rb:64
  unloadRegions at 
/home/sfdc/current//bigdata-hbase/hbase/hbase/bin/region_mover.rb:359
{code}

Looking at getTable, it's not thread safe. So the multithreaded that was added 
is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12910) describe in the shell is broken

2015-01-22 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12910:
-

 Summary: describe in the shell is broken
 Key: HBASE-12910
 URL: https://issues.apache.org/jira/browse/HBASE-12910
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.10
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical


Just noticed that in 0.98 describe does not work in the shell.
This is caused by HBASE-12832.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12859) New master API to track major compaction completion

2015-01-29 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12859.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed to 2.0 and 1.1

> New master API to track major compaction completion
> ---
>
> Key: HBASE-12859
> URL: https://issues.apache.org/jira/browse/HBASE-12859
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.1.0
>
> Attachments: 12859-v1.txt, 12859-v2.txt, 12859-v3.txt, 12859-v4.txt, 
> 12859-v5.txt, 12859-v6.txt, 12859-v7-1.1.txt, 12859-v7.txt, 
> 12859-wip-UNFINISHED.txt
>
>
> In various scenarios it is helpful to know a guaranteed timestamp up to which 
> all data in a table was major compacted.
> We can do that keeping a major compaction timestamp in META.
> A client then can iterate all region of a table and find a definite 
> timestamp, which is the oldest compaction timestamp of any of the regions.
> [~apurtell], [~ghelmling], [~giacomotaylor].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12945) Port: New master API to track major compaction completion to 0.8

2015-01-29 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12945:
-

 Summary: Port: New master API to track major compaction completion 
to 0.8
 Key: HBASE-12945
 URL: https://issues.apache.org/jira/browse/HBASE-12945
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
 Fix For: 0.98.11






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12976) Default hbase.client.scanner.max.result.size

2015-02-04 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-12976:
-

 Summary: Default hbase.client.scanner.max.result.size
 Key: HBASE-12976
 URL: https://issues.apache.org/jira/browse/HBASE-12976
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


Setting scanner caching is somewhat of a black art. It's hard to estimate ahead 
of time how large the result set will be.

I propose we hbase.client.scanner.max.result.size to 2mb. That is good 
compromise between performance and buffer usage on typical networks (avoiding 
OOMs when the caching was chosen too high).

To an HTable client this is completely transparent.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12968) [0.94]SecureServer should not ignore CallQueueSize

2015-02-07 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12968.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.94 branch.
Thanks for the patch.

> [0.94]SecureServer should not ignore CallQueueSize
> --
>
> Key: HBASE-12968
> URL: https://issues.apache.org/jira/browse/HBASE-12968
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.15
>Reporter: hongyu bi
>Assignee: hongyu bi
> Fix For: 0.94.27
>
> Attachments: HBASE-12968-v0.patch
>
>
> Per HBASE-5190 HBaseServer will reject the request If callQueueSize exceed 
> "ipc.server.max.callqueue.length", but SecureServer ignore this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-878) cell purge feature

2015-02-08 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-878.
-
Resolution: Implemented

We have delete markers that target a specific (including the latest) version of 
the any cell, making the next older one visible.
Closing this after six years!

> cell purge feature
> --
>
> Key: HBASE-878
> URL: https://issues.apache.org/jira/browse/HBASE-878
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Affects Versions: 0.2.0
>Reporter: Michael Bieniosek
>
> Sometimes cells get inserted by accident, and we want to delete them so that 
> the cells behind them become visible.  "delete" just inserts a deleted cell 
> at a newer timestamp, which makes the entire column disappear; but in some 
> cases it's preferable to make the next newest value the current value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-1225) [performance] Return results on scanner open, not just on next + make close asynchronous and unnecessary if scanner is exhausted

2015-02-08 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-1225.
--
Resolution: Implemented

We have "result from first RPC" and we also have "small" scan that finish the 
entire operation in one RPC (if possible).
We do not have the asynchronous close per se, but it would happen when the 
lease expired. Closing old issue.
(Feel free to re-open, obviously)

> [performance] Return results on scanner open, not just on next + make close 
> asynchronous and unnecessary if scanner is exhausted
> 
>
> Key: HBASE-1225
> URL: https://issues.apache.org/jira/browse/HBASE-1225
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-1406) Refactor HRS

2015-02-08 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-1406.
--
Resolution: Won't Fix

Doesn't look like this is going to happen.

> Refactor HRS
> 
>
> Key: HBASE-1406
> URL: https://issues.apache.org/jira/browse/HBASE-1406
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.20.0
>Reporter: Nitay Joffe
>
> The current HRS implementation is a mess, especially after ZooKeeper 
> additions to handle Session Expired events. See HBASE-1311.
> It contains logic to restart itself which caused a lot of fields to be 
> non-final that can be. It should be split out to two separate class, one that 
> runs HRS duties and one that watches over the first in a join/restart loop. 
> Using this means a special event like a ZooKeeper Session Expired wouldn't 
> require special restart code, just an abort. The new wrapper class will 
> handle restarting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-2121) HBase client doesn't retry the right number of times when a region is unavailable

2015-02-08 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-2121.
--
Resolution: Duplicate

I fixed this in a later issue. (can't find it, right now, but it is fixed)

> HBase client doesn't retry the right number of times when a region is 
> unavailable
> -
>
> Key: HBASE-2121
> URL: https://issues.apache.org/jira/browse/HBASE-2121
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.20.2, 0.90.0
>Reporter: Benoit Sigoure
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries
>  retries 10 times (by default).   It ends up calling 
> HConnectionManager$TableServers.locateRegionInMeta, which retries 10 times on 
> its own.  So the HBase client is effectively retrying 100 times before giving 
> up, instead of 10 (10 is the default hbase.client.retries.number).
> I'm using hbase trunk HEAD.  I verified this bug is also in 0.20.2.
> Sample call stack:
>  org.apache.hadoop.hbase.client.RegionOfflineException: region offline: 
> mytable,,1263421423787
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:709)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:640)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:609)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:430)
>   at 
> org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1047)
>   at 
> org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:836)
>   at 
> org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:756)
>   at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:354)
>   at 
> How to reproduce:
> with a trivial HBase client (mine was just trying to scan the table), start 
> the client, take offline the table the client uses, tell the client to start 
> the scan.  The client will not give up after 10 attempts, unlike what it's 
> supposed to do.
> If locateRegionInMeta is only ever called from getRegionServerWithRetries, 
> then the fix is trivial: just remove the retry logic in there.  If it has 
> some other callers who possibly relied on the retry logic in 
> locateRegionInMeta, then the fix is going to be a bit more involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-2506) Too easy to OOME a RS

2015-02-08 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-2506.
--
Resolution: Duplicate

Done in HBASE-12976 (and continued in HBASE-11544)

> Too easy to OOME a RS
> -
>
> Key: HBASE-2506
> URL: https://issues.apache.org/jira/browse/HBASE-2506
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Daniel Cryans
>  Labels: moved_from_0_20_5
>
> Testing a cluster with 1GB heap, I found that we are letting the region 
> servers kill themselves too easily when scanning using pre-fetching. To 
> reproduce, get 10-20M rows using PE and run a count in the shell using CACHE 
> => 3 or any other very high number. For good measure, here's the stack 
> trace:
> {code}
> 2010-04-30 13:20:23,241 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, 
> aborting.
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.hbase.client.Result.writeArray(Result.java:478)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:312)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.write(HbaseObjectWritable.java:229)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:941)
> 2010-04-30 13:20:23,241 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
> request=0.0, regions=29, stores=29, storefiles=44, storefileIndexSize=6, 
> memstoreSize=255,
>  compactionQueueSize=0, usedHeap=926, maxHeap=987, blockCacheSize=1700064, 
> blockCacheFree=205393696, blockCacheCount=0, blockCacheHitRatio=0
> {code}
> I guess the same could happen with largish write buffers. We need something 
> better than OOME.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-2807) Create a helper method to compare family from KV

2015-02-08 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-2807.
--
Resolution: Implemented

We have that now:
# KeyValue.matchingFamily
and/or
# Bytes.equals with Cell.getFamily{Array|Offset|Length}

> Create a helper method to compare family from KV
> 
>
> Key: HBASE-2807
> URL: https://issues.apache.org/jira/browse/HBASE-2807
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>
> From Stack's review in HBASE-2223:
> {quote}
> It'd be good if we didn't have to create this byte [] per edit.  I should 
> reinstitute the comparator I had that took KVs but only compared family 
> portion of the KV... no need to create family byte [] then.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-11935) ZooKeeper connection storm after queue failover with slave cluster down

2015-02-12 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-11935.
---
Resolution: Implemented

> ZooKeeper connection storm after queue failover with slave cluster down
> ---
>
> Key: HBASE-11935
> URL: https://issues.apache.org/jira/browse/HBASE-11935
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>Reporter: Lars Hofhansl
>Priority: Critical
>
> We just ran into a production incident with TCP SYN storms on port 2181 
> (zookeeper).
> In our case the slave cluster was not running. When we bounced the primary 
> cluster we saw an "unbounded" number of failover threads all hammering the 
> hosts on the slave ZK machines (which did not run ZK at the time)... Causing 
> overall degradation of network performance between datacenters.
> Looking at the code we noticed that the thread pool handling of the Failover 
> workers was probably unintended.
> Patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13046) Review replication timeouts

2015-02-14 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-13046:
-

 Summary: Review replication timeouts
 Key: HBASE-13046
 URL: https://issues.apache.org/jira/browse/HBASE-13046
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


See discussion on HBASE-12971.
We should review all the config options we have for replication timeouts and 
make sure the defaults make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12971) Replication stuck due to large default value for replication.source.maxretriesmultiplier

2015-02-14 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12971.
---
Resolution: Fixed

Pushed to 1.0.1, 1.1, and 2.0

> Replication stuck due to large default value for 
> replication.source.maxretriesmultiplier
> 
>
> Key: HBASE-12971
> URL: https://issues.apache.org/jira/browse/HBASE-12971
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 1.0.0, 0.98.10
>Reporter: Adrian Muraru
> Fix For: 2.0.0, 1.0.1, 1.1.0
>
> Attachments: 12971-v2.txt, 12971.txt
>
>
> We are setting in hbase-site the default value of 300 for 
> {{replication.source.maxretriesmultiplier}} introduced in HBASE-11964.
> While this value works fine to recover for transient errors with remote ZK 
> quorum from the peer Hbase cluster - it proved to have side effects in the 
> code introduced in HBASE-11367 Pluggable replication endpoint, where the 
> default is much lower (10).
> See:
> 1. 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L169
> 2. 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L79
> The the two default values are definitely conflicting - when 
> {{replication.source.maxretriesmultiplier}} is set in the hbase-site to 300 
> this will lead to a  sleep time of 300*300 (25h!) when a sockettimeout 
> exception is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13068) Unexpected client exception with slow scan

2015-02-18 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-13068:
-

 Summary: Unexpected client exception with slow scan
 Key: HBASE-13068
 URL: https://issues.apache.org/jira/browse/HBASE-13068
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


I just came across in interesting exception:
{code}
Caused by: java.io.IOException: Call 10 not added as the connection 
newbunny/127.0.0.1:60020/ClientService/lars (auth:SIMPLE)/6 is closing
at 
org.apache.hadoop.hbase.ipc.RpcClient$Connection.addCall(RpcClient.java:495)
at 
org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1534)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
... 13 more
{code}

Called from here:
{code}
at 
org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:291)
at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:160)
at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:115)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:91)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:247)
{code}

This happened when I scanned with multiple client against a single region 
server when all data is filtered at the server by a filter.
I had 10 clients, the region server has 30 handles.

This means the scanners are not getting closed and their lease has to expire.

The workaround is to increase hbase.ipc.client.connection.maxidletime.
But it's strange that this *only* happens at close time. And since I am not 
using up all handlers there shouldn't be any starvation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-02-20 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-13082:
-

 Summary: Coarsen StoreScanner locks to RegionScanner
 Key: HBASE-13082
 URL: https://issues.apache.org/jira/browse/HBASE-13082
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


Continuing where HBASE-10015 left of.
We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
the lock already held by the RegionScanner.
In tests this shows quite a scan improvement and reduced CPU (the fences make 
the cores wait for memory fetches).

There are some drawbacks too:
* All calls to RegionScanner need to be remain synchronized
* Implementors of coprocessors need to be diligent in following the locking 
contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
required in the documentation (not picking on Phoenix, this one is my fault as 
I told them it's OK)
* possible starving of flushes and compaction with heavy read load. 
RegionScanner operations would keep getting the locks and the 
flushes/compactions would not be able finalize the set of files.

I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13094) Consider Filters that are evaluated before deletes and see delete markers

2015-02-24 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-13094:
-

 Summary: Consider Filters that are evaluated before deletes and 
see delete markers
 Key: HBASE-13094
 URL: https://issues.apache.org/jira/browse/HBASE-13094
 Project: HBase
  Issue Type: Brainstorming
  Components: regionserver, Scanners
Reporter: Lars Hofhansl


That would be good for full control filtering of all cells, such as needed for 
some transaction implementations.

[~ghelmling]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13109) Use scanner look ahead for timeranges as well

2015-02-25 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-13109:
-

 Summary: Use scanner look ahead for timeranges as well
 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor


This is a continuation of HBASE-9778.
We've seen a scenario of a very slow scan over a region using a timerange that 
happens to fall after the ts of any Cell in the region.
Turns out we spend a lot of time seeking.

Tested with a 5 column table, and the scan is 5x faster when the timerange 
falls before all Cells' ts.
We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13159) Consider RangeReferences with transformations

2015-03-05 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-13159:
-

 Summary: Consider RangeReferences with transformations
 Key: HBASE-13159
 URL: https://issues.apache.org/jira/browse/HBASE-13159
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


Currently we have References used by HalfStoreReaders and HFileLinks.
For various use case we have here we have need for a RangeReferences with 
simple transformation of the keys.
That would allow us to map HFiles between region or even tables without copying 
any data.

We can probably combine HalfStores, HFileLinks, and RangeReferences into a 
single concept:
* RangeReference = arbitrary start and stop row, arbitrary key transformation
* HFileLink = start and stop keys set to the linked file's start/stop key, 
transformation = identity
* (HalfStore) References = start/stop key set according to top or bottom 
reference, transformation = identity

Note this is a *brainstorming* issue. :)
(Could start with just references with arbitrary start/stop keys, and do 
transformations later)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-03-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-13082:
---

Reverted... Meh :(

> Coarsen StoreScanner locks to RegionScanner
> ---
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.1.0
>
> Attachments: 13082-test.txt, 13082.txt, 13082.txt, gc.png, gc.png, 
> gc.png, hits.png, next.png, next.png
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 >

1 - 100 of 645 matches

Mail list logo