[jira] [Created] (HBASE-5813) Retry immediately after a NotServingRegionException in a multiput

2012-04-17 Thread Mikhail Bautin (Created) (JIRA)
Retry immediately after a NotServingRegionException in a multiput
-

 Key: HBASE-5813
 URL: https://issues.apache.org/jira/browse/HBASE-5813
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


After we get some errors in a multiput we invalidate the region location cache 
and wait for the configured time interval according to the backoff policy. 
However, if all errors in multiput processing were 
NotServingRegionExceptions, we don't really need to wait. We can retry 
immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5803) [89-fb] Upgrade hbase 0.89-fb to Thrift 0.8.0 and bring Thrift server enhancements from trunk

2012-04-16 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Upgrade hbase 0.89-fb to Thrift 0.8.0 and bring Thrift server 
enhancements from trunk
-

 Key: HBASE-5803
 URL: https://issues.apache.org/jira/browse/HBASE-5803
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


TBoundedThreadPoolServer has been a problem for us when there is a large number 
of clients. We need to migrate to 0.8.0. in 89-fb and bring the relevant 
improvements from trunk, including supporting TThreadedSelectorServer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5763) Fix random failures in TestFSErrorsExposed

2012-04-10 Thread Mikhail Bautin (Created) (JIRA)
Fix random failures in TestFSErrorsExposed
--

 Key: HBASE-5763
 URL: https://issues.apache.org/jira/browse/HBASE-5763
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5744) Thrift server metrics should be long instead of int

2012-04-06 Thread Mikhail Bautin (Created) (JIRA)
Thrift server metrics should be long instead of int
---

 Key: HBASE-5744
 URL: https://issues.apache.org/jira/browse/HBASE-5744
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor


As we measure our Thrift call latencies in nanoseconds, we need to make 
latencies long instead of int everywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5731) Make max line length 100 in linter

2012-04-05 Thread Mikhail Bautin (Created) (JIRA)
Make max line length 100 in linter
--

 Key: HBASE-5731
 URL: https://issues.apache.org/jira/browse/HBASE-5731
 Project: HBase
  Issue Type: New Feature
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


We have switched to 100 characters per line in our Java files. Making the 
change in the linter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5708) [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test

2012-04-03 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
--

 Key: HBASE-5708
 URL: https://issues.apache.org/jira/browse/HBASE-5708
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor


Some map-reduce-based tests are failing when executed concurrently in 89-fb 
because mini-map-reduce cluster uses /tmp/hadoop-username for temporary data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5700) [89-fb] Fix TestMiniClusterLoad* test failures

2012-04-02 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Fix TestMiniClusterLoad* test failures
--

 Key: HBASE-5700
 URL: https://issues.apache.org/jira/browse/HBASE-5700
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


Porting TestMiniClusterLoad* tests to 89-fb in HBASE-5679 uncovered certain 
problems with mini-cluster setup in 89-fb that need to be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5703) Bound the number of threads in HRegionThriftServer

2012-04-02 Thread Mikhail Bautin (Created) (JIRA)
Bound the number of threads in HRegionThriftServer
--

 Key: HBASE-5703
 URL: https://issues.apache.org/jira/browse/HBASE-5703
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We need to bound the number of threads spawned in HRegionThriftServer, 
similarly to what was done in HBASE-4863 to the standalone Thrift gateway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5679) [89-fb] Port load test tool and related unit tests from trunk to 89-fb

2012-03-30 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Port load test tool and related unit tests from trunk to 89-fb
--

 Key: HBASE-5679
 URL: https://issues.apache.org/jira/browse/HBASE-5679
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


When open-sourcing LoadTestTool that originated in 89-fb, numerous improvements 
to the tool were made, and unit tests based on the tool were created as part of 
HBASE-4908. These improvements need to be ported back to 89-fb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5684) Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust

2012-03-30 Thread Mikhail Bautin (Created) (JIRA)
Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust
---

 Key: HBASE-5684
 URL: https://issues.apache.org/jira/browse/HBASE-5684
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


Currently ProcessBasedLocalHBaseCluster runs on top of raw local filesystem. We 
need it to start a process-based HDFS cluster as well. We also need to make the 
whole thing more stable so we can use it in unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5612) Data types for HBase values

2012-03-21 Thread Mikhail Bautin (Created) (JIRA)
Data types for HBase values
---

 Key: HBASE-5612
 URL: https://issues.apache.org/jira/browse/HBASE-5612
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


In many real-life applications all values in a certain column family are of a 
certain data type, e.g. 64-bit integer. We could specify that in the column 
descriptor and enable data type-specific compression such as variable-length 
integer encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5601) Add per-column-family data block cache hit ratios

2012-03-19 Thread Mikhail Bautin (Created) (JIRA)
Add per-column-family data block cache hit ratios
-

 Key: HBASE-5601
 URL: https://issues.apache.org/jira/browse/HBASE-5601
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin


In addition to the overall block cache hit ratio it would be extremely useful 
to have per-column-family data block cache hit ratio metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5602) Add cache access pattern statistics and report hot blocks/keys

2012-03-19 Thread Mikhail Bautin (Created) (JIRA)
Add cache access pattern statistics and report hot blocks/keys
--

 Key: HBASE-5602
 URL: https://issues.apache.org/jira/browse/HBASE-5602
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin


In many practical applications it would be very useful to know how well 
utilized the block cache is, i.e. how many times we actually access a block 
once it gets into cache. This would also allow to evaluate cache-on-write on 
flush. In addition, we need to keep track of and report some set of hottest 
block in cache, and possibly even hottest keys. This would allow to diagnose 
hot-row problems in real time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5575) Configure Arcanist lint engine for HBase

2012-03-13 Thread Mikhail Bautin (Created) (JIRA)
Configure Arcanist lint engine for HBase


 Key: HBASE-5575
 URL: https://issues.apache.org/jira/browse/HBASE-5575
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We need to enable Arcanist lint engine in HBase, so that a commit could be 
checked by running arc lint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5576) Configure Arcanist lint engine for HBase

2012-03-13 Thread Mikhail Bautin (Created) (JIRA)
Configure Arcanist lint engine for HBase


 Key: HBASE-5576
 URL: https://issues.apache.org/jira/browse/HBASE-5576
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We need to be able to use arc lint to check a patch for code style errors 
before submission.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5566) [89-fb] Region server can get stuck getMaster on master failover

2012-03-12 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Region server can get stuck getMaster on master failover


 Key: HBASE-5566
 URL: https://issues.apache.org/jira/browse/HBASE-5566
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


Reported by Prakash. We have a retry loop in HRegionServer.getMaster where we 
do not read the location of the master from ZK, so a region server can get 
stuck there on master failover. We need to add a unit test to reliably catch 
this, and fix the bug.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5557) [89-fb] Fix incorrect writer / thread interaction in HBaseTest

2012-03-09 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Fix incorrect writer / thread interaction in HBaseTest
--

 Key: HBASE-5557
 URL: https://issues.apache.org/jira/browse/HBASE-5557
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


In the HBaseTest load test we have a condition when the writer has not written 
any keys but the reader might attempt to read key 0, resulting in a failure. 
This bug is specific to 89-fb because it has been fixed while open-sourcing 
HBaseTest as LoadTestTool, and those improvements still have not been 
back-ported to 89-fb. Doing a temporary fix now and we will get to the 
back-port later. 

12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = 
cfcd208495d565ef66e7dff9f98764da:0
12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to 
get actions for key = cfcd208495d565ef66e7dff9f98764da:0
12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = 
cfcd208495d565ef66e7dff9f98764da:0
12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = 
cfcd208495d565ef66e7dff9f98764da:0
12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to 
get actions for key = cfcd208495d565ef66e7dff9f98764da:0
12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to 
get actions for key = cfcd208495d565ef66e7dff9f98764da:0
12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = 
cfcd208495d565ef66e7dff9f98764da:0
12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to 
get actions for key = cfcd208495d565ef66e7dff9f98764da:0
12/03/09 14:12:52 ERROR utils.MultiThreadedReader: Aborting run -- found more 
than three errors


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5469) Add baseline compression efficiency to DataBlockEncodingTool

2012-02-23 Thread Mikhail Bautin (Created) (JIRA)
Add baseline compression efficiency to DataBlockEncodingTool


 Key: HBASE-5469
 URL: https://issues.apache.org/jira/browse/HBASE-5469
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


DataBlockEncodingTool currently does not provide baseline compression 
efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if we 
are using LZO to compress blocks, we would like to have the following columns 
in the report (possibly as percentages of raw data size).

Baseline K+V in blockcache  |   Baseline K + V on disk  (LZO compressed)  | K + 
V  DataBlockEncoded in block cache |   K + V DataBlockEncoded + LZOCompressed 
(on disk)

Background: we never store compressed blocks in cache, but we always store 
encoded data blocks in cache if data block encoding is enabled for the column 
family.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5470) Make DataBlockEncodingTool work correctly with no native compression codecs loaded

2012-02-23 Thread Mikhail Bautin (Created) (JIRA)
Make DataBlockEncodingTool work correctly with no native compression codecs 
loaded
--

 Key: HBASE-5470
 URL: https://issues.apache.org/jira/browse/HBASE-5470
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


DataBlockEncodingTool was fixed as part of porting data block encoding 
(HBASE-4218) to 89-fb 
(https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1245291, 
https://reviews.facebook.net/D1659). The bug appeared when using GZ as baseline 
compression codec but not loading native Hadoop libraries, in which case the 
compressor instance would be null. The purpose of this JIRA is to bring the 
trunk version of DataBlockEncodingTool to parity with the trunk version, and 
further improvements to the tool will be made separately.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5442) Use builder pattern in StoreFile and HFile

2012-02-21 Thread Mikhail Bautin (Created) (JIRA)
Use builder pattern in StoreFile and HFile
--

 Key: HBASE-5442
 URL: https://issues.apache.org/jira/browse/HBASE-5442
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We have five ways to create an HFile writer, two ways to create a StoreFile 
writer, and the sets of parameters keep changing, creating a lot of confusion, 
especially when porting patches across branches. The same thing is happening to 
HColumnDescriptor. I think we should move to a builder pattern solution, e.g.

{code:java}
  HFileWriter w = HFile.getWriterBuilder(conf, some common args)
  .setParameter1(value1)
  .setParameter2(value2)
  ...
  .build();
{code}

Each parameter setter being on its own line will make merges/cherry-pick work 
properly, we will not have to even mention default parameters again, and we can 
eliminate a dozen impossible-to-remember constructors.

This particular JIRA addresses StoreFile and HFile refactoring. For 
HColumnDescriptor refactoring see HBASE-5357.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5382) Test that we always cache index and bloom blocks

2012-02-10 Thread Mikhail Bautin (Created) (JIRA)
Test that we always cache index and bloom blocks


 Key: HBASE-5382
 URL: https://issues.apache.org/jira/browse/HBASE-5382
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin


This is a unit test that should have been part of HBASE-4683 but was not 
committed. The original test was reviewed https://reviews.facebook.net/D807. 
Submitting unit test as a separate JIRA and patch, and extending the scope of 
the test to also handle the case when block cache is enabled for the column 
family. The new review is at https://reviews.facebook.net/D1695.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Mikhail Bautin (Created) (JIRA)
Reuse compression streams in HFileBlock.Writer
--

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin


We need to to reuse compression streams in HFileBlock.Writer instead of 
allocating them every time. The motivation is that when using Java's built-in 
implementation of Gzip, we allocate a new GZIPOutputStream object and an 
associated native data structure any time. This is one suspected cause of 
recent TestHFileBlock failures on Hadoop QA: 
https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5375) Ensure that compactions use already cached blocks but do not cache new data blocks

2012-02-09 Thread Mikhail Bautin (Created) (JIRA)
Ensure that compactions use already cached blocks but do not cache new data 
blocks
--

 Key: HBASE-5375
 URL: https://issues.apache.org/jira/browse/HBASE-5375
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin


Create a unit test to verify that compactions reuse existing cached blocks but 
do not thrash the cache with newly read blocks. Also need to verify that we 
only read every data block once, e.g. that we don't re-read the block on every 
next() operation. HBASE-1597 did not seem to include a unit test, so we need to 
add a test now. This and HBASE-4683 (the unit test that was not checked in) are 
the remaining missing pieces before we can close HBASE-3976.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5357) Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation

2012-02-08 Thread Mikhail Bautin (Created) (JIRA)
Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation


 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We have five ways to create an HFile writer, two ways to create a StoreFile 
writer, and the sets of parameters keep changing, creating a lot of confusion, 
especially when porting patches across branches. The same thing is happening to 
HColumnDescriptor. I think we should move to a builder pattern solution, e.g.

{code:java}
  HFileWriter w = HFile.getWriterBuilder(conf, some common args)
  .setParameter1(value1)
  .setParameter2(value2)
  ...
  .instantiate();
{code}

Each parameter setter being on the same line will make merges/cherry-pick work 
properly, we will not have to even mention default parameters again, and we can 
eliminate a dozen impossible-to-remember constructors.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5344) [89-fb] Scan unassigned region directory on master failover

2012-02-06 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Scan unassigned region directory on master failover
---

 Key: HBASE-5344
 URL: https://issues.apache.org/jira/browse/HBASE-5344
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


In case the master dies after a regionserver writes region state as OPENED or 
CLOSED in ZK but before the update is received by master and written to meta, 
the new master that comes up has to pick up the region state from ZK and write 
it to meta. Otherwise we can get multiply-assigned regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5320) Create client API to handle HBase maintenance gracefully

2012-02-01 Thread Mikhail Bautin (Created) (JIRA)
Create client API to handle HBase maintenance gracefully


 Key: HBASE-5320
 URL: https://issues.apache.org/jira/browse/HBASE-5320
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Priority: Minor


When we do HBase cluster maintenance, we typically have to manually stop or 
disable the client temporarily. It would be nice to have a way for the client 
to find out that HBase in undergoing maintenance through an appropriate API and 
gracefully handle it on its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5261) Update HBase for Java 7

2012-01-23 Thread Mikhail Bautin (Created) (JIRA)
Update HBase for Java 7
---

 Key: HBASE-5261
 URL: https://issues.apache.org/jira/browse/HBASE-5261
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin


We need to make sure that HBase compiles and works with JDK 7. Once we verify 
it is reasonably stable, we can explore utilizing the G1 garbage collector. 
When all deployments are ready to move to JDK 7, we can start using new 
language features, but in the transition period we will need to maintain a 
codebase that compiles both with JDK 6 and JDK 7.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5262) Structured event log for HBase for monitoring and auto-tuning performance

2012-01-23 Thread Mikhail Bautin (Created) (JIRA)
Structured event log for HBase for monitoring and auto-tuning performance
-

 Key: HBASE-5262
 URL: https://issues.apache.org/jira/browse/HBASE-5262
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin


Creating this JIRA to open a discussion about a structured (machine-readable) 
log that will record events such as compaction start/end times, compaction 
input/output files, their sizes, the same for flushes, etc. This can be stored 
e.g. in a new system table in HBase itself. The data from this log can then be 
analyzed and used to optimize compactions at run time, or otherwise auto-tune 
HBase configuration to reduce the number of knobs the user has to configure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5263) Preserving cached data on compactions through cache-on-write

2012-01-23 Thread Mikhail Bautin (Created) (JIRA)
Preserving cached data on compactions through cache-on-write


 Key: HBASE-5263
 URL: https://issues.apache.org/jira/browse/HBASE-5263
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the block 
cache on compactions if cache-on-write is enabled. However, it would be ideal 
to reduce the effect compactions have on the cached data. For every block we 
are writing for a compacted file we can decide whether it needs to be cached 
based on whether the original blocks containing the same data were already in 
cache. More precisely, for every HFile reader in a compaction we can maintain a 
boolean flag saying whether the current key-value came from a disk IO or the 
block cache. In the HFile writer for the compaction's output we can maintain a 
flag that is set if any of the key-values in the block being written came from 
a cached block, use that flag at the end of a block to decide whether to 
cache-on-write the block, and reset the flag to false on a block boundary. If 
such an inclusive approach would still trash the cache, we could restrict the 
total number of blocks to be cached per an output HFile, switch to an and 
logic instead of or logic for deciding whether to cache an output file block, 
or only cache a certain percentage of output file blocks that contain some of 
the previously cached data. 

Thanks to Nicolas for this elegant online algorithm idea!


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5230) Unit test to ensure compactions don't cache data on write

2012-01-18 Thread Mikhail Bautin (Created) (JIRA)
Unit test to ensure compactions don't cache data on write
-

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
write during compactions even if cache-on-write is enabled generally enabled). 
This is because we have very different implementations of HBASE-3976 without 
HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig 
(presumably it's there but not sure if it even works, since the patch in 
HBASE-3976 may not have been committed). We need to create a unit test to 
verify that we don't cache data blocks on write during compactions, and resolve 
HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5224) midkey() returns 12 extra bytes in HFile v2

2012-01-17 Thread Mikhail Bautin (Created) (JIRA)
midkey() returns 12 extra bytes in HFile v2
---

 Key: HBASE-5224
 URL: https://issues.apache.org/jira/browse/HBASE-5224
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


HFile's midkey() is implemented as the first key of the middle index block both 
HFile v1 and HFile v2 (the middle leaf index block is used in v2). However, in 
HFile v2 midkey() currently grabs 12 more bytes from the next leaf index entry, 
representing the offset and compressed size of the data block pointed to by 
that entry. While this probably does not affect the interpretation of the 
returned buffer as an HBase key (the last 12 bytes are simply discarded), this 
has to be cleaned up. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5130) A map-reduce wrapper for HBase test suite (mrunit)

2012-01-05 Thread Mikhail Bautin (Created) (JIRA)
A map-reduce wrapper for HBase test suite (mrunit)


 Key: HBASE-5130
 URL: https://issues.apache.org/jira/browse/HBASE-5130
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We have a tool we call mrunit that runs HBase unit tests on a map-reduce 
cluster. We need modify it to use distributed cache to deploy the code on the 
cluster instead of our internal deployment tool, and open-source it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5048) Use EnvironmentEdgeManager.currentTimeMillis() instead of System.currentTimeMillis()

2011-12-15 Thread Mikhail Bautin (Created) (JIRA)
Use EnvironmentEdgeManager.currentTimeMillis() instead of 
System.currentTimeMillis()


 Key: HBASE-5048
 URL: https://issues.apache.org/jira/browse/HBASE-5048
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Priority: Minor


We need to switch to using EnvironmentEdgeManager.currentTimeMillis() instead 
of System.currentTimeMillis() across the codebase to reduce confusion when 
writing tests that require custom timing of operations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5031) [89-fb] Remove hard-coded non-existent host name from TestScanner

2011-12-14 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Remove hard-coded non-existent host name from TestScanner 
--

 Key: HBASE-5031
 URL: https://issues.apache.org/jira/browse/HBASE-5031
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor


TestScanner is failing on 0.89-fb because it has a hard-coded fake host name 
that it is trying to look up. Replacing this with 127.0.0.1:random_port 
instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5010) Filter HFiles based on TTL

2011-12-12 Thread Mikhail Bautin (Created) (JIRA)
Filter HFiles based on TTL
--

 Key: HBASE-5010
 URL: https://issues.apache.org/jira/browse/HBASE-5010
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


In ScanWildcardColumnTracker we have

{
this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;

...

  private boolean isExpired(long timestamp) {
return timestamp  oldestStamp;
  }
}

but this time range filtering does not participate in HFile selection. In one 
real case this caused next() calls to time out because all KVs in a table got 
expired, but next() had to iterate over the whole table to find that out. We 
should be able to filter out those HFiles right away. I think a reasonable 
approach is to add a default timerange filter to every scan for a CF with a 
finite TTL and utilize existing filtering in 
StoreFile.Reader.passesTimerangeFilter.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5000) Speed up simultaneous reads of a block when block caching is turned off

2011-12-09 Thread Mikhail Bautin (Created) (JIRA)
Speed up simultaneous reads of a block when block caching is turned off
---

 Key: HBASE-5000
 URL: https://issues.apache.org/jira/browse/HBASE-5000
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor


With block caching, when one client starts reading a block and another one 
comes around asking for the same block, the second client waits for the first 
one to finish reading and returns the block from cache. This is achieved by 
locking on the block offset using IdLock, a sparse lock primitive allowing to 
lock on arbitrary long numbers. However, in case there is no block caching, 
there is no reason to wait for other clients that are reading the same block. 
One challenge optimizing this that we don't necessary have accurate information 
about whether other HFile API clients interested in the block would cache it.

Setting priority as minor, as it is very unusual to turn off block caching.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4976) Add compaction/flush queue size metrics mistakenly removed by HFile v2

2011-12-07 Thread Mikhail Bautin (Created) (JIRA)
Add compaction/flush queue size metrics mistakenly removed by HFile v2
--

 Key: HBASE-4976
 URL: https://issues.apache.org/jira/browse/HBASE-4976
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4962) Optimize time range scans using a delete Bloom filter

2011-12-05 Thread Mikhail Bautin (Created) (JIRA)
Optimize time range scans using a delete Bloom filter
-

 Key: HBASE-4962
 URL: https://issues.apache.org/jira/browse/HBASE-4962
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


To speed up time range scans we need to seek to the maximum timestamp of the 
requested range,instead of going to the first KV of the (row, column) pair and 
iterating from there. If we don't know the (row, column), e.g. if it is not 
specified in the query, we need to go to end of the current row/column pair 
first, get a KV from there, and do another seek to (row', column', 
timerange_max) from there. We can only skip over to the timerange_max timestamp 
when we know that there are no DeleteColumn records at the top of that 
row/column with a higher timestamp. We can utilize another Bloom filter keyed 
on (row, column) to quickly find that out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4963) [89-fb] Per-table getsize metrics are broken

2011-12-05 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Per-table getsize metrics are broken


 Key: HBASE-4963
 URL: https://issues.apache.org/jira/browse/HBASE-4963
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


We need to make sure we get per-(table, CF) get size metrics in 0.89-fb, 
similarly to what was done in https://reviews.facebook.net/D483 for the trunk. 
Currently we only get metrics such as 

  hadoop.regionserver_cf.cf.getsize

even with per-table metrics turned on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4952) Master startup is too slow on HBase trunk

2011-12-04 Thread Mikhail Bautin (Created) (JIRA)
Master startup is too slow on HBase trunk
-

 Key: HBASE-4952
 URL: https://issues.apache.org/jira/browse/HBASE-4952
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


When I start the HBase trunk master on my five-node cluster, it gets stuck in 
the state initializing master service threads for a minute or two, then 
waiting for regionserver number to settle, and only then starts log 
splitting. We don't have such delays in the 0.89-fb master, and I believe we 
can optimize the new master to eliminate these delays as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4953) Root region does not get assigned after doing kill -9 on all daemons and restarting HBase

2011-12-04 Thread Mikhail Bautin (Created) (JIRA)
Root region does not get assigned after doing kill -9 on all daemons and 
restarting HBase
-

 Key: HBASE-4953
 URL: https://issues.apache.org/jira/browse/HBASE-4953
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin


When doing a kill -9 on all HBase processes and attempting to re-start HBase, 
the master does not properly assign the root region. The 
/hbase/root-region-server znode still contains the old regionserver, but the 
regionserver referenced in it does not get assigned the root region. This might 
get resolved after the znode expires, though, but some testing is required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-11-30 Thread Mikhail Bautin (Created) (JIRA)
HBase cluster test tool (port from 0.89-fb)
---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


Porting one of our HBase cluster test tools (a single-process multi-threaded 
load generator and verifier) from 0.89-fb to trunk.
I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that 
it has some features that I have not tried yet (some kind of a kill test, and 
some way to run HBase as multiple processes on one machine).
The main utility of this piece of code for us has been the HBaseClusterTest 
command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
load test in our five-node dev cluster testing, e.g.:

hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
GZIP

I will be using this code to load-test the delta encoding patch and making 
fixes, but I am submitting the patch for early feedback. I will probably try 
out its other functionality and comment on how it works.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Created) (JIRA)
Make HBase Thrift server more configurable and add a command-line UI test
-

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


This started as an internal hotfix where we found out that the Thrift server 
spawned 15000 threads. To bound the thread pool size I added a custom thread 
pool server implementation called HBaseThreadPoolServer into HBase codebase, 
and made the following parameters configurable from both command line and as 
config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
Under an increasing load, the server creates new threads for every connection 
before the pool size reaches minWorkerThreads. After that, the server puts new 
connections into the queue and only creates a new thread when the queue is 
full. If an attempt to create a new thread fails, the server drops connection. 
The default TThreadPoolServer would crash in that case, but it never happened 
because the thread pool was unbounded, so the server would hang indefinitely, 
consume a lot of memory, and cause huge latency spikes on the client side.

Another part of this fix is refactoring and unit testing of the command-line 
part of the Thrift server. The logic there is sufficiently complicated, and the 
existing ThriftServer class does not test that part at all. The new 
TestThriftServerCmdLine test starts the Thrift server on a random port with 
various combinations of options and talks to it through the client API from 
another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4867) A tool to merge configuration files

2011-11-24 Thread Mikhail Bautin (Created) (JIRA)
A tool to merge configuration files
---

 Key: HBASE-4867
 URL: https://issues.apache.org/jira/browse/HBASE-4867
 Project: HBase
  Issue Type: New Feature
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


With our cluster configuration setup it would be good to have a tool that would 
merge HBase configuration, so that files appearing later in the list would 
override properties specified in earlier files. This way we could merge 
application-specific configuration file with the cluster-specific configuration 
file (with the latter overriding the former) and produce a single HBase 
configuration file to install on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2011-11-18 Thread Mikhail Bautin (Created) (JIRA)
A fully automated comprehensive distributed integration test for HBase
--

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


To properly verify that a particular version of HBase is good for production 
deployment we need a better way to do real cluster testing after incremental 
changes. Running unit tests is good, but we also need to deploy HBase to a 
cluster, run integration tests, load tests, Thrift server tests, kill some 
region servers, kill the master, and produce a report. All of this needs to 
happen in 20-30 minutes with minimal manual intervention. I think this way we 
can combine agile development with high stability of the codebase. I am 
envisioning a high-level framework written in a scripting language (e.g. 
Python) that would abstract external operations such as deploy to test 
cluster, kill a particular server, run load test A, run load test B (we 
already have a few kinds of load tests implemented in Java, and we could write 
a Thrift load test in Python). This tool should also produce intermediate 
output, allowing to catch problems early and restart the test.

No implementation has yet been done. Any ideas or suggestions are welcome.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4824) TestZKLeaderManager is flaky

2011-11-18 Thread Mikhail Bautin (Created) (JIRA)
TestZKLeaderManager is flaky


 Key: HBASE-4824
 URL: https://issues.apache.org/jira/browse/HBASE-4824
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor


TestZKLeaderManager is flaky. It failed in a full test suite run for me, then 
passed when I reran it locally, but then failed when I ran it in a loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4795) Fix TestHFileBlock when running on a 32-bit JVM

2011-11-15 Thread Mikhail Bautin (Created) (JIRA)
Fix TestHFileBlock when running on a 32-bit JVM
---

 Key: HBASE-4795
 URL: https://issues.apache.org/jira/browse/HBASE-4795
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


Our Hudson test server seems to run a 32-bit JVM. This patch fixes 
TestHFileBlock to work correctly for both 64-bit and 32-bit JVM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4768) Per-(table, columnFamily) metrics with configurable table name inclusion

2011-11-10 Thread Mikhail Bautin (Created) (JIRA)
Per-(table, columnFamily) metrics with configurable table name inclusion


 Key: HBASE-4768
 URL: https://issues.apache.org/jira/browse/HBASE-4768
 Project: HBase
  Issue Type: New Feature
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


As we kept adding more granular block read and block cache usage statistics, a 
combinatorial explosion of various cases to monitor started to happen, 
especially when we wanted both per-table/column family/block type statistics 
and aggregate statistics on various subsets of these dimensions. Here, we 
un-clutters HFile readers, LruBlockCache, StoreFile, etc. by creating a 
centralized class that knows how to update all kinds of per-table/CF/block type 
counters. 

Table name and column family configuration have been pushed to a base class, 
SchemaConfigured. This is convenient as many of existing classes that have 
these properties (HFile readers/writers, HFile blocks, etc.) did not have a 
base class. Whether to collect per-(table, columnFamily) or per-columnFamily 
only metrics can be configured with the hbase.metrics.showTableName 
configuration key. We don't expect this configuration to change at runtime, so 
we cache the setting statically and log a warning when an attempt is made to 
flip it once already set. This way we don't have to pass configuration to a lot 
more places, e.g. everywhere an HFile reader is instantiated.

Thanks to Liyin for his initial version of per-table metrics patch and a lot of 
valuable feedback.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4757) [89-fb] Fix TestHQuorumPeer for non-default values of hbase.tmp.dir

2011-11-07 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Fix TestHQuorumPeer for non-default values of hbase.tmp.dir 


 Key: HBASE-4757
 URL: https://issues.apache.org/jira/browse/HBASE-4757
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


TestHQuorumPeer currently fails if hbase.tmp.dir is different from 
/tmp/hbase-username. However, for our internal parallel test runner we use a 
different temporary HBase directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4758) [89-fb] Make test methods independent in TestMasterTransitions

2011-11-07 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Make test methods independent in TestMasterTransitions
--

 Key: HBASE-4758
 URL: https://issues.apache.org/jira/browse/HBASE-4758
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


Currently TestMasterTransitions is flaky, and one way to hopefully make it more 
stable is to create a separate MiniHBaseCluster for every test method, and get 
rid of BeforeClass/AfterClass. So far I have successfully run 
TestMasterTransitions a few times with the fix, while it was failing without 
the fix.

TestMasterTransitions in trunk is a different story (most of the test is 
commented out in the trunk) and is out of scope of this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4746) Use a random ZK client port in unit tests so we can run them in parallel

2011-11-03 Thread Mikhail Bautin (Created) (JIRA)
Use a random ZK client port in unit tests so we can run them in parallel


 Key: HBASE-4746
 URL: https://issues.apache.org/jira/browse/HBASE-4746
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin


The hard-coded ZK client port has long been a problem for running HBase test 
suite in parallel. The mini ZK cluster should run on a random free port, and 
that port should be passed to all parts of the unit tests that need to talk to 
the mini cluster. In fact, randomizing the port exposes a lot of places in the 
code where a new configuration is instantiated, and as a result the client 
tries to talk to the default ZK client port and times out.

The initial fix is for 0.89-fb, where it already allows to run unit tests in 
parallel in 10 minutes. A fix for the trunk will follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4704) A JRuby script for identifying active master

2011-10-30 Thread Mikhail Bautin (Created) (JIRA)
A JRuby script for identifying active master


 Key: HBASE-4704
 URL: https://issues.apache.org/jira/browse/HBASE-4704
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Trivial
 Fix For: 0.94.0


This simple script reads the HBase master ZK node and outputs the hostname of 
the active master. This is needed so that operational scripts can decide where 
the primary master is running. I am also including a one-line hbase-jruby 
script so we can make our jruby scripts proper UNIX executables by including an 
#!/usr/bin/env hbase-jruby at the top.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4607) SplitLogWorker should correctly terminate when waiting for ZK node

2011-10-17 Thread Mikhail Bautin (Created) (JIRA)
SplitLogWorker should correctly terminate when waiting for ZK node
--

 Key: HBASE-4607
 URL: https://issues.apache.org/jira/browse/HBASE-4607
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


This is an attempt to fix the fact that SplitLogWorker threads are not being 
terminated properly in some unit tests. This probably does not happen in 
production because the master always creates the log-splitting ZK node, but it 
does happen in 89-fb. Thanks to Prakash Khemani for help on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4534) A new unit test for lazy seek and StoreScanner in general

2011-10-03 Thread Mikhail Bautin (Created) (JIRA)
A new unit test for lazy seek and StoreScanner in general
-

 Key: HBASE-4534
 URL: https://issues.apache.org/jira/browse/HBASE-4534
 Project: HBase
  Issue Type: Test
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


A randomized unit test for Gets/Scans (all-row, single-row, multi-row, 
all-column, single-column, and multi-column). Also all combinations of Bloom 
filters and compression (NONE vs GZIP) are tested. The unit test flushes 
multiple StoreFiles with disjoint timestamp ranges and runs various types of 
queries against them. Currently we are not testing overlapping timestamp ranges.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4522) Make hbase-site-custom.xml override the hbase-site.xml

2011-09-30 Thread Mikhail Bautin (Created) (JIRA)
Make hbase-site-custom.xml override the hbase-site.xml
--

 Key: HBASE-4522
 URL: https://issues.apache.org/jira/browse/HBASE-4522
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Liyin Tang
Priority: Minor
 Fix For: 0.94.0


The motivation for diff is that we want to override some config change for any 
specific cluster easily by just adding the config entries in the 
hbase-site-custom.xml for that cluster. This change adds the 
hbase-site-custom.xml configuration file into HBaseConfiguration.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4516) HFile-level load tester with compaction and random-read workloads

2011-09-29 Thread Mikhail Bautin (Created) (JIRA)
HFile-level load tester with compaction and random-read workloads
-

 Key: HBASE-4516
 URL: https://issues.apache.org/jira/browse/HBASE-4516
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Priority: Minor
 Fix For: 0.94.0


This is a load testing tool for HFile implementations, which supports two 
workloads:
- Compactions (merge the input HFiles). A special case of this is only one 
input, which allows to do HFile format conversions.
- Random reads. Launches the specified number of threads that do seeks and 
short scans on randomly generated keys.

The original purpose of this tool was to ensure that HFile format v2 did not 
introduce performance regressions.

Keys for the read workload are generated randomly between the first and the 
last key of the HFile. At each position, instead of precisely calculating the 
correct probability for every byte value b, we select a uniformly random byte 
between in the allowed [low, high] range. In addition, there is a heuristic 
that determines the positions at which the key has hex characters, and the 
random key contains hex characters at those positions as well.

Example output for the random read workload:
Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 
10147, unique keys: 232779


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4520) Better handling of Bloom filter type discrepancy between HFile and CF config

2011-09-29 Thread Mikhail Bautin (Created) (JIRA)
Better handling of Bloom filter type discrepancy between HFile and CF config


 Key: HBASE-4520
 URL: https://issues.apache.org/jira/browse/HBASE-4520
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Fix For: 0.94.0


Modify StoreFile to make it clear where Bloom filter type settings come from. 
We have two sources of truth: (1) HFile; and (2) CF configuration. (1) takes 
precedence in the reader, and (2) takes precedence in the writer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira