[jira] [Commented] (HBASE-3629) Update our thrift to 0.6

2011-04-25 Thread Jake Farrell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024802#comment-13024802
 ] 

Jake Farrell commented on HBASE-3629:
-

Libthrift and libfb303 now available in the apache repo 
(http://repo1.maven.org/maven2)

dependency
  groupIdorg.apache.thrift/groupId
  artifactIdlibthrift/artifactId
  version[0.6.1,)/version
/dependency

 Update our thrift to 0.6
 

 Key: HBASE-3629
 URL: https://issues.apache.org/jira/browse/HBASE-3629
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Moaz Reyad

 HBASE-3117 was about updating to 0.5.  Moaz Reyad over in that issue is 
 trying to move us to 0.6.  Lets move the 0.6 upgrade effort here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3629) Update our thrift to 0.6

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3629:
-

Attachment: pom.diff

Here is patch for our pom.  Needs more before can commit; need to regen using 
thrift 6.x.

 Update our thrift to 0.6
 

 Key: HBASE-3629
 URL: https://issues.apache.org/jira/browse/HBASE-3629
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Moaz Reyad
 Attachments: pom.diff


 HBASE-3117 was about updating to 0.5.  Moaz Reyad over in that issue is 
 trying to move us to 0.6.  Lets move the 0.6 upgrade effort here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-2470) Add Scan.setTimeRange() support in Shell

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2470.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to TRUNK.  Thank you for the patch Harsh.

 Add Scan.setTimeRange() support in Shell
 

 Key: HBASE-2470
 URL: https://issues.apache.org/jira/browse/HBASE-2470
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.20.3
Reporter: Lars George
Priority: Trivial
  Labels: noob
 Attachments: HBASE-2470.r1.diff, HBASE-2470.r2.diff


 The shell does not support scans by time range.
 This is trivial, simply add two more optional values MINTIMESTAMP and 
 MAXTIMESTAMP and if both are set call Scan.setTimeRange(minStamp, maxStamp).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3812) Tidy up naming consistency and documentation in coprocessor framework

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024854#comment-13024854
 ] 

jirapos...@reviews.apache.org commented on HBASE-3812:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/660/
---

Review request for hbase and Gary Helmling.


Summary
---

As described at hbase-3812:
* rename BaseRegionObserverCoprocessor - BaseRegionObserver
* in BaseMasterObserver, rename ObserverContext parameter variable from env 
to c or ctx
* unnecessary public modifier for methods in RegionObserver interface


This addresses bug hbase-3812.
https://issues.apache.org/jira/browse/hbase-3812


Diffs
-

  src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
8df6aa4 
  src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
 a8c7cb2 
  src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 5c5d18d 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/SampleRegionWALObserver.java 
c1fe58d 
  src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java 
e201b3f 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 
0a07e03 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
 20a8d7d 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALCoprocessors.java 
27c38f9 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/660/diff


Testing
---

Tests passed locally.


Thanks,

Mingjie



 Tidy up naming consistency and documentation in coprocessor framework
 -

 Key: HBASE-3812
 URL: https://issues.apache.org/jira/browse/HBASE-3812
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Gary Helmling
Priority: Critical
 Fix For: 0.92.0


 We have a few naming inconsistencies in the coprocessor API and some stale 
 javadocs that have been spotted by Lars George as he digs through it.  We 
 should clean these up before we have an official release and are forced to go 
 through a round of deprecation to make any changes.
 Current items on the list:
  * rename BaseRegionObserverCoprocessor - BaseRegionObserver
  * in BaseMasterObserver, rename ObserverContext parameter variable from 
 env to c or ctx
  * unnecessary public modifier for methods in RegionObserver interface
 As part of this, we should take a pass through the javadocs and verify they 
 are up to date with what is currently implemented.
 Please tack on other cosmetic changes or inconsistencies as you find them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3812) Tidy up naming consistency and documentation in coprocessor framework

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024867#comment-13024867
 ] 

jirapos...@reviews.apache.org commented on HBASE-3812:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/660/#review534
---

Ship it!


+1 on commit (Minors below to resolve on commit)


src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
https://reviews.apache.org/r/660/#comment

2011



src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
https://reviews.apache.org/r/660/#comment1112

I see RegionObserver but not 'Coprocessor' Is there a CP Interface?



src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
https://reviews.apache.org/r/660/#comment1113

you - your



src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java
https://reviews.apache.org/r/660/#comment1114

Good



src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
https://reviews.apache.org/r/660/#comment1115

2011


- Michael


On 2011-04-25 17:11:22, Mingjie Lai wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/660/
bq.  ---
bq.  
bq.  (Updated 2011-04-25 17:11:22)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  As described at hbase-3812:
bq.  * rename BaseRegionObserverCoprocessor - BaseRegionObserver
bq.  * in BaseMasterObserver, rename ObserverContext parameter variable from 
env to c or ctx
bq.  * unnecessary public modifier for methods in RegionObserver interface
bq.  
bq.  
bq.  This addresses bug hbase-3812.
bq.  https://issues.apache.org/jira/browse/hbase-3812
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
8df6aa4 
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
 a8c7cb2 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 
5c5d18d 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/SampleRegionWALObserver.java 
c1fe58d 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java 
e201b3f 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 
0a07e03 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
 20a8d7d 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALCoprocessors.java 
27c38f9 
bq.src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/660/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests passed locally.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Mingjie
bq.  
bq.



 Tidy up naming consistency and documentation in coprocessor framework
 -

 Key: HBASE-3812
 URL: https://issues.apache.org/jira/browse/HBASE-3812
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Gary Helmling
Priority: Critical
 Fix For: 0.92.0


 We have a few naming inconsistencies in the coprocessor API and some stale 
 javadocs that have been spotted by Lars George as he digs through it.  We 
 should clean these up before we have an official release and are forced to go 
 through a round of deprecation to make any changes.
 Current items on the list:
  * rename BaseRegionObserverCoprocessor - BaseRegionObserver
  * in BaseMasterObserver, rename ObserverContext parameter variable from 
 env to c or ctx
  * unnecessary public modifier for methods in RegionObserver interface
 As part of this, we should take a pass through the javadocs and verify they 
 are up to date with what is currently implemented.
 Please tack on other cosmetic changes or inconsistencies as you find them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3539) Improve shell help to reflect all possible options

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3539.
--

  Resolution: Fixed
Assignee: Harsh J Chouraria
Hadoop Flags: [Reviewed]

Applied to TRUNK.  Thank you for the patch Harsh (I added you as a contributor 
and assigned you this issue).

 Improve shell help to reflect all possible options
 --

 Key: HBASE-3539
 URL: https://issues.apache.org/jira/browse/HBASE-3539
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.90.0
Reporter: Lars George
Assignee: Harsh J Chouraria
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-3539.r1.diff


 The shell is not consistent in its help texts. For example:
 {code}
 Scan a table; pass table name and optionally a dictionary of scanner
 specifications.  Scanner specifications may include one or more of
 the following: LIMIT, STARTROW, STOPROW, TIMESTAMP, or COLUMNS.  If
 no columns are specified, all columns will be scanned.
 {code}
 but in the code you have
 {code}
  filter = args[FILTER]
  startrow = args[STARTROW] || ''
  stoprow = args[STOPROW]
  timestamp = args[TIMESTAMP]
  columns = args[COLUMNS] || args[COLUMN] || get_all_columns
  cache = args[CACHE_BLOCKS] || true
  versions = args[VERSIONS] || 1
 {code}
 VERSIONS is missing from the help. 
 Check all commands and make sure all options are stated and examples given.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-2470) Add Scan.setTimeRange() support in Shell

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-2470:


Assignee: Harsh J Chouraria

Assigning issue to Harsh

 Add Scan.setTimeRange() support in Shell
 

 Key: HBASE-2470
 URL: https://issues.apache.org/jira/browse/HBASE-2470
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.20.3
Reporter: Lars George
Assignee: Harsh J Chouraria
Priority: Trivial
  Labels: noob
 Attachments: HBASE-2470.r1.diff, HBASE-2470.r2.diff


 The shell does not support scans by time range.
 This is trivial, simply add two more optional values MINTIMESTAMP and 
 MAXTIMESTAMP and if both are set call Scan.setTimeRange(minStamp, maxStamp).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3817) HBase Shell has an issue accepting FILTER for the 'scan' command.

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3817.
--

  Resolution: Fixed
Assignee: Harsh J Chouraria
Hadoop Flags: [Reviewed]

Committed to branch and trunk.  Thank you for the patch Harsh.

 HBase Shell has an issue accepting FILTER for the 'scan' command.
 -

 Key: HBASE-3817
 URL: https://issues.apache.org/jira/browse/HBASE-3817
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.90.2
 Environment: OSX/Ruby
Reporter: Harsh J Chouraria
Assignee: Harsh J Chouraria
Priority: Trivial
  Labels: filter, scan, shell
 Fix For: 0.90.3, 0.92.0

 Attachments: HBASE-3817.r1.diff


 Stack had encountered+fixed an issue nearly a couple of years ago related to 
 FILTER not being accepted by the 'scan' command in the shell. This, however, 
 didn't make it to the trunk I believe. I hit it today while revamping some 
 docs for HBASE-3539
 The thread where Stack had posted a patch: 
 http://mail-archives.apache.org/mod_mbox/hbase-user/200912.mbox/%3c7c962aed0912181049l2f9110c3q43b1e3d897a27...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-2470) Add Scan.setTimeRange() support in Shell

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2470:
-

Fix Version/s: 0.90.3

Brought back into branch.

 Add Scan.setTimeRange() support in Shell
 

 Key: HBASE-2470
 URL: https://issues.apache.org/jira/browse/HBASE-2470
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.20.3
Reporter: Lars George
Assignee: Harsh J Chouraria
Priority: Trivial
  Labels: noob
 Fix For: 0.90.3

 Attachments: HBASE-2470.r1.diff, HBASE-2470.r2.diff


 The shell does not support scans by time range.
 This is trivial, simply add two more optional values MINTIMESTAMP and 
 MAXTIMESTAMP and if both are set call Scan.setTimeRange(minStamp, maxStamp).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3634) Fix JavaDoc for put(ListPut puts) in HTableInterface

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3634.
--

   Resolution: Fixed
Fix Version/s: (was: 0.92.0)
   0.90.3
 Assignee: Harsh J Chouraria
 Hadoop Flags: [Reviewed]

Applied branch and trunk.  Thank you for the patch Harsh.

 Fix JavaDoc for put(ListPut puts) in HTableInterface
 --

 Key: HBASE-3634
 URL: https://issues.apache.org/jira/browse/HBASE-3634
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.1
Reporter: Lars George
Assignee: Harsh J Chouraria
Priority: Trivial
 Fix For: 0.90.3

 Attachments: HBASE-3634.r1.diff


 We say this in the interface:
 {code}
   /**
* Puts some data in the table, in batch.
* p
* If {@link #isAutoFlush isAutoFlush} is false, the update is buffered
* until the internal buffer is full.
* @param puts The list of mutations to apply.  The list gets modified by 
 this
* method (in particular it gets re-ordered, so the order in which the 
 elements
* are inserted in the list gives no guarantee as to the order in which the
* {@link Put}s are executed).
* @throws IOException if a remote or network exception occurs. In that case
* the {@code puts} argument will contain the {@link Put} instances that
* have not be successfully applied.
* @since 0.20.0
*/
   void put(ListPut puts) throws IOException;
 {code}
 This is outdated and needs to be updated to reflect that this is nothing else 
 but a client side iteration over all puts, but using the write buffer to 
 aggregate to one RPC. The list is never modified and after the call contains 
 the same number of elements.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3812) Tidy up naming consistency and documentation in coprocessor framework

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024886#comment-13024886
 ] 

jirapos...@reviews.apache.org commented on HBASE-3812:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/660/
---

(Updated 2011-04-25 17:58:04.147369)


Review request for hbase and Gary Helmling.


Changes
---

Changes according to Stack's review.


Summary
---

As described at hbase-3812:
* rename BaseRegionObserverCoprocessor - BaseRegionObserver
* in BaseMasterObserver, rename ObserverContext parameter variable from env 
to c or ctx
* unnecessary public modifier for methods in RegionObserver interface


This addresses bug hbase-3812.
https://issues.apache.org/jira/browse/hbase-3812


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
8df6aa4 
  src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java 
PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
 a8c7cb2 
  src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 5c5d18d 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/SampleRegionWALObserver.java 
c1fe58d 
  src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java 
e201b3f 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 
0a07e03 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
 20a8d7d 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALCoprocessors.java 
27c38f9 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/660/diff


Testing
---

Tests passed locally.


Thanks,

Mingjie



 Tidy up naming consistency and documentation in coprocessor framework
 -

 Key: HBASE-3812
 URL: https://issues.apache.org/jira/browse/HBASE-3812
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Gary Helmling
Priority: Critical
 Fix For: 0.92.0


 We have a few naming inconsistencies in the coprocessor API and some stale 
 javadocs that have been spotted by Lars George as he digs through it.  We 
 should clean these up before we have an official release and are forced to go 
 through a round of deprecation to make any changes.
 Current items on the list:
  * rename BaseRegionObserverCoprocessor - BaseRegionObserver
  * in BaseMasterObserver, rename ObserverContext parameter variable from 
 env to c or ctx
  * unnecessary public modifier for methods in RegionObserver interface
 As part of this, we should take a pass through the javadocs and verify they 
 are up to date with what is currently implemented.
 Please tack on other cosmetic changes or inconsistencies as you find them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024894#comment-13024894
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review536
---


I think its almost there.   This patch won't compile (see below for why).  I'd 
be game for applying the next version.  This patch has come on a long way.  
Lets make new issues after applying it for issues found in it (This patch does 
include a nice set of tests).

- Michael


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024893#comment-13024893
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review535
---


I think its almost there.   This patch won't compile (see below for why).  I'd 
be game for applying the next version.  This patch has come on a long way.  
Lets make new issues after applying it for issues found in it (This patch does 
include a nice set of tests).


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1116

I agree w/ the review that suggested we spell out 'agg' rather than use the 
abbreviation, especially in javadoc.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1117

if should be 'where'.  Should we throw an exception if multiple families 
supplied so users are not surprised when they don't get answers for multiple 
families?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1118

I'd say leave implementation details out of the public javadoc (the bit 
about calling private methods)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1119

Does Scan do this test?  Internally? (I'm not sure)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1120

'should' or 'does'?  I think you want to say the latter?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1121

Why this javadoc?  Don't we inherit javadoc from the Interface?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1122

Whats this?  We do nothing on serialization?  Is that right?  It could be.  
It just strikes me as a little odd.  Maybe put a comment in here to say 
'nothing to serialize'?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1123

Do we agree that AggregateCpProtocol was not a good name, that rather it 
should be AggregateProtocol since cp is in the package name?

I see you have a AP later in this patch.  Let me go look at it.

I think I see whats going on... you didn't mean to include this in the 
patch?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1124

Otherwise, this Interface looks good.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment1125

Yeah, this class shouldn't be included either.


- Michael


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.

[jira] [Assigned] (HBASE-3812) Tidy up naming consistency and documentation in coprocessor framework

2011-04-25 Thread Mingjie Lai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingjie Lai reassigned HBASE-3812:
--

Assignee: Mingjie Lai

 Tidy up naming consistency and documentation in coprocessor framework
 -

 Key: HBASE-3812
 URL: https://issues.apache.org/jira/browse/HBASE-3812
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Gary Helmling
Assignee: Mingjie Lai
Priority: Critical
 Fix For: 0.92.0


 We have a few naming inconsistencies in the coprocessor API and some stale 
 javadocs that have been spotted by Lars George as he digs through it.  We 
 should clean these up before we have an official release and are forced to go 
 through a round of deprecation to make any changes.
 Current items on the list:
  * rename BaseRegionObserverCoprocessor - BaseRegionObserver
  * in BaseMasterObserver, rename ObserverContext parameter variable from 
 env to c or ctx
  * unnecessary public modifier for methods in RegionObserver interface
 As part of this, we should take a pass through the javadocs and verify they 
 are up to date with what is currently implemented.
 Please tack on other cosmetic changes or inconsistencies as you find them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024900#comment-13024900
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review537
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1126

In my version, I have:
public interface ColumnInterpreterT extends Serializable {

There is no need to extend Writable.


- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024906#comment-13024906
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 18:31:17, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.   https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99
bq.  
bq.   In my version, I have:
bq.   public interface ColumnInterpreterT extends Serializable {
bq.   
bq.   There is no need to extend Writable.
bq.  
bq.  Michael Stack wrote:
bq.  Ok.  Then we should remove that from the patch (Good one Ted)

Whoops.  Sorry, did you say Serializeable Ted as in java.io.serializable?  We 
don't want j.i.s.  Thats what Writable replaces.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review537
---


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024907#comment-13024907
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review540
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1129

Yes, but it seems Writable is the standard way to go in Hadoop for these 
RPCs. No big issue either way.
Since it doesn't have any state, there is nothing to serialize here. It 
seems we can have make this as static util class (?).


- Himanshu


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3818) docs adding troubleshooting.xml

2011-04-25 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024910#comment-13024910
 ] 

Jean-Daniel Cryans commented on HBASE-3818:
---

Excellent job Doug, I'm going to refer people to this document a lot.

 docs adding troubleshooting.xml
 ---

 Key: HBASE-3818
 URL: https://issues.apache.org/jira/browse/HBASE-3818
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Fix For: 0.90.3

 Attachments: troubleshooting.xml


 Adding troubleshooting.xml in docs.  The prose is a port of JD's 
 troubleshooting writeup into docbook form.  I added shell of common errors at 
 bottom plus added log info at top.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024911#comment-13024911
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review538
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1127

I exchanged emails with Himanshu about supporting multiple column families.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1131

I prefer using Serializable for the interpreter which is stateless.

It is supported by HbaseObjectWritable.



- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3812) Tidy up naming consistency and documentation in coprocessor framework

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3812.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to trunk.  Thanks for the patch Mingjie.

 Tidy up naming consistency and documentation in coprocessor framework
 -

 Key: HBASE-3812
 URL: https://issues.apache.org/jira/browse/HBASE-3812
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Gary Helmling
Assignee: Mingjie Lai
Priority: Critical
 Fix For: 0.92.0


 We have a few naming inconsistencies in the coprocessor API and some stale 
 javadocs that have been spotted by Lars George as he digs through it.  We 
 should clean these up before we have an official release and are forced to go 
 through a round of deprecation to make any changes.
 Current items on the list:
  * rename BaseRegionObserverCoprocessor - BaseRegionObserver
  * in BaseMasterObserver, rename ObserverContext parameter variable from 
 env to c or ctx
  * unnecessary public modifier for methods in RegionObserver interface
 As part of this, we should take a pass through the javadocs and verify they 
 are up to date with what is currently implemented.
 Please tack on other cosmetic changes or inconsistencies as you find them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024917#comment-13024917
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review542
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1132

ok. now it throws an exception when  1 families are defined.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1133

removed the javadoc related to private method calls



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1134

I didn't want it to propagate to the server just to return an exception. I 
thought that aggregate functions should work on distinct set of rows, ie, 
startRow  endRow should always be true (except when they are empty). 
There is a check in HTable- getStartKeysInRange() that throws an exception 
when startRow  endRow, but I needed the boundary condition too.
Please let me know if this condition we should remove this condition.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1135

Yes, its not there in later versions.


- Himanshu


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024919#comment-13024919
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 18:58:37, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.   https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99
bq.  
bq.   I prefer using Serializable for the interpreter which is stateless.
bq.   
bq.   It is supported by HbaseObjectWritable.
bq.  

My personal opinion is it will not go well with others if one uses 
Serializable. It is supported in HBaseObjectWritable only for legacy reasons I 
believe.


- Himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review538
---


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3818) docs adding troubleshooting.xml

2011-04-25 Thread Doug Meil (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024924#comment-13024924
 ] 

Doug Meil commented on HBASE-3818:
--

Thanks!

 docs adding troubleshooting.xml
 ---

 Key: HBASE-3818
 URL: https://issues.apache.org/jira/browse/HBASE-3818
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Fix For: 0.90.3

 Attachments: troubleshooting.xml


 Adding troubleshooting.xml in docs.  The prose is a port of JD's 
 troubleshooting writeup into docbook form.  I added shell of common errors at 
 bottom plus added log info at top.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-1512:
---

Attachment: patch-1512-8.txt

Changes as per Stack's review. 
Major changes include:
a) LongColumnInterpreter still implements Writable (though with empty 
read/write methods). 
b) Exception is thrown in case of more than one family is defined.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024926#comment-13024926
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review546
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1140

There are 10 classes which implement Serializable under 
src/main/java/org/apache/hadoop/hbase/rest/model/

And:
public class PairT1, T2 implements Serializable
./src/main/java/org/apache/hadoop/hbase/util/Pair.java
  public static class CustomSerializable implements Serializable {
./src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java


- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024932#comment-13024932
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 19:44:07, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.   https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99
bq.  
bq.   There are 10 classes which implement Serializable under 
src/main/java/org/apache/hadoop/hbase/rest/model/
bq.   
bq.   And:
bq.   public class PairT1, T2 implements Serializable
bq.   ./src/main/java/org/apache/hadoop/hbase/util/Pair.java
bq. public static class CustomSerializable implements Serializable {
bq.   
./src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java

My guess is that for REST, its because of the REST engine we use.  For Pair, we 
should probably change it to be Writable.  Same for CustomSerializable.

Otherwise, Ted, if you +1 Himanshu's patch, I'll commit it.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review546
---


On 2011-04-25 19:53:33, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-25 19:53:33)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024933#comment-13024933
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-25 19:53:33.129920)


Review request for hbase and Gary Helmling.


Changes
---

Changes as per Stack's review.
Major changes include:
a) LongColumnInterpreter still implements Writable (though with empty 
read/write methods).
b) Exception is thrown in case of more than one family is defined.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024934#comment-13024934
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review548
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1145

I am fine with using Writable.


- Ted


On 2011-04-25 19:53:33, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-25 19:53:33)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024938#comment-13024938
 ] 

jirapos...@reviews.apache.org commented on HBASE-3777:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/643/#review543
---


Good work lads (Karthick and Ted reviewing).  Small nitpicks below.  Lets get 
this in if all tests pass.


src/main/java/org/apache/hadoop/hbase/HConstants.java
https://reviews.apache.org/r/643/#comment1136

Copy/paste issue (minor)



src/main/java/org/apache/hadoop/hbase/HConstants.java
https://reviews.apache.org/r/643/#comment1137

Thanks for moving these configs. in here.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/643/#comment1141

This looks like a good change.



src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
https://reviews.apache.org/r/643/#comment1143

Implement Closeable now you've added close?



src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
https://reviews.apache.org/r/643/#comment1142

Good



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
https://reviews.apache.org/r/643/#comment1146

This is painful, but makes sense.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
https://reviews.apache.org/r/643/#comment1147

Not important but if closed, just return immediately and then you can save 
indenting whole method.  Not important.  Just style diff.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
https://reviews.apache.org/r/643/#comment1148

So, this is just insurance as you say in the issue.  Thats fine I'd say (I 
agree w/ Ted that we shouldn't rely on finalize)



src/main/java/org/apache/hadoop/hbase/client/HTable.java
https://reviews.apache.org/r/643/#comment1149

Good.



src/main/java/org/apache/hadoop/hbase/client/HTable.java
https://reviews.apache.org/r/643/#comment1150

Yeah, this is ugly its almost as though you should have a special 
method for it, one that does not up the counters?



src/main/java/org/apache/hadoop/hbase/client/HTable.java
https://reviews.apache.org/r/643/#comment1151

Just remove this.



src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
https://reviews.apache.org/r/643/#comment1152

Just remove.



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/643/#comment1153

Interesting but I go along w/ it.  Looks like we only made this connection 
for CT?  If so, bad design fixed by your CT change.



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
https://reviews.apache.org/r/643/#comment1154

ditto


- Michael


On 2011-04-22 21:16:59, Karthick Sankarachary wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/643/
bq.  ---
bq.  
bq.  (Updated 2011-04-22 21:16:59)
bq.  
bq.  
bq.  Review request for hbase and Ted Yu.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Judging from the javadoc in HConnectionManager, sharing connections across 
multiple clients going to the same cluster is supposedly a good thing. However, 
the fact that there is a one-to-one mapping between a configuration and 
connection instance, kind of works against that goal. Specifically, when you 
create HTable instances using a given Configuration instance and a copy 
thereof, we end up with two distinct HConnection instances under the covers. Is 
this really expected behavior, especially given that the configuration instance 
gets cloned a lot?
bq.  
bq.  Here, I'd like to play devil's advocate and propose that we deep-compare 
HBaseConfiguration instances, so that multiple HBaseConfiguration instances 
that have the same properties map to the same HConnection instance. In case one 
is concerned that a single HConnection is insufficient for sharing amongst 
clients, to quote the javadoc, then one should be able to mark a given 
HBaseConfiguration instance as being uniquely identifiable.
bq.  
bq.  
bq.  This addresses bug HBASE-3777.
bq.  https://issues.apache.org/jira/browse/HBASE-3777
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5701639 
bq.src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 
be31179 
bq.src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java afb666a 
bq.src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
c348f7a 
bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java edacf56 
bq.

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024940#comment-13024940
 ] 

stack commented on HBASE-1512:
--

@Himanshu I think you uploaded the wrong patch for v8.  Its all about 
CursorCallable...

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3766) Manage Connections Through Reference Counts

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024941#comment-13024941
 ] 

stack commented on HBASE-3766:
--

Is this issue still relevant after the addition of ref counting over in 
HBASE-3777?

 Manage Connections Through Reference Counts
 ---

 Key: HBASE-3766
 URL: https://issues.apache.org/jira/browse/HBASE-3766
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3766.patch, HBASE-3766.patch, HBASE-3766.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 As of now, the onus is on the developer to explicitly close connections once 
 they're done using them. Furthermore, since connections are managed based on 
 the identity of the {{Configuration}} object, one is forced to clone the 
 configuration object in order to be able to clean it up safely (for a case in 
 point, see {{HTablePool's}} constructor). As a matter of fact, this issue has 
 been well-documented in the HConnectionManager class:
 {quote}
 But sharing connections makes clean up of {{HConnection}} instances a little 
 awkward.  Currently, clients cleanup by calling 
 {{#deleteConnection(Configuration, boolean)}}.  This will shutdown the 
 zookeeper connection the {{HConnection}} was using and clean up all 
 {{HConnection}} resources as well as stopping proxies to servers out on the 
 cluster. Not running the cleanup will not end the world; it'll just stall the 
 closeup some and spew some zookeeper connection failed messages into the log. 
  Running the cleanup on a {{HConnection}} that is subsequently used by 
 another will cause breakage so be careful running cleanup. To create a 
 {{HConnection}} that is not shared by others, you can create a new 
 {{Configuration}} instance, pass this new instance to 
 {{#getConnection(Configuration)}}, and then when done, close it up by doing 
 something like the following:
 {code}
  Configuration newConfig = new Configuration(originalConf);
  HConnection connection = HConnectionManager.getConnection(newConfig);
  // Use the connection to your hearts' delight and then when done...
  HConnectionManager.deleteConnection(newConfig, true);
 {code}
 {quote}
 Here, we propose a reference-count based mechanism for managing connections 
 that will allow {{HTables}} to clean up after themselves. In particular, we 
 extend the {{HConnectionInterface}} interface so as to facilitate reference 
 counting, where, typically, a reference indicates that it is being used by a 
 {{HTable}}, although there could be other sources. 
 To elaborate, when a HTable is constructed, it increments the reference count 
 on the connection given to it. Similarly, when it is closed, that reference 
 count is decremented. In the event there are no more references to that 
 connection, {{HTable#close}} takes it upon itself to delete the connection, 
 thereby sparing the developer from doing so.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3670) Fix error handling in get(ListGet gets)

2011-04-25 Thread Harsh J Chouraria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024942#comment-13024942
 ] 

Harsh J Chouraria commented on HBASE-3670:
--

Ok, I believe if we mention an exception is thrown it is alright (its always 
better the dev knows the causes of a failure). Perhaps we can add it as an 
@throws along with IOExceptions, but am not sure if that's considered as API 
breakage.

I'm now confused on what class of docs to change here, the HTableInterface or 
the HTable itself? Both have duplicate logs at places, and different in some.

 Fix error handling in get(ListGet gets)
 -

 Key: HBASE-3670
 URL: https://issues.apache.org/jira/browse/HBASE-3670
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Lars George
 Fix For: 0.92.0


 See HBASE-3634 for details. The get(ListGet gets) call needs to catch (or 
 rather use a try/finally) the exception thrown by batch() and copy the Result 
 instances over and return it. If that is not intended then we need to fix the 
 JavaDoc in HTableInterface to reflect the new behavior. 
 In general it seems to make sense to check the various methods (list based 
 put, get, delete compared to batch) and agree on the correct behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3766) Manage Connections Through Reference Counts

2011-04-25 Thread Karthick Sankarachary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick Sankarachary updated HBASE-3766:
-

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Manage Connections Through Reference Counts
 ---

 Key: HBASE-3766
 URL: https://issues.apache.org/jira/browse/HBASE-3766
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3766.patch, HBASE-3766.patch, HBASE-3766.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 As of now, the onus is on the developer to explicitly close connections once 
 they're done using them. Furthermore, since connections are managed based on 
 the identity of the {{Configuration}} object, one is forced to clone the 
 configuration object in order to be able to clean it up safely (for a case in 
 point, see {{HTablePool's}} constructor). As a matter of fact, this issue has 
 been well-documented in the HConnectionManager class:
 {quote}
 But sharing connections makes clean up of {{HConnection}} instances a little 
 awkward.  Currently, clients cleanup by calling 
 {{#deleteConnection(Configuration, boolean)}}.  This will shutdown the 
 zookeeper connection the {{HConnection}} was using and clean up all 
 {{HConnection}} resources as well as stopping proxies to servers out on the 
 cluster. Not running the cleanup will not end the world; it'll just stall the 
 closeup some and spew some zookeeper connection failed messages into the log. 
  Running the cleanup on a {{HConnection}} that is subsequently used by 
 another will cause breakage so be careful running cleanup. To create a 
 {{HConnection}} that is not shared by others, you can create a new 
 {{Configuration}} instance, pass this new instance to 
 {{#getConnection(Configuration)}}, and then when done, close it up by doing 
 something like the following:
 {code}
  Configuration newConfig = new Configuration(originalConf);
  HConnection connection = HConnectionManager.getConnection(newConfig);
  // Use the connection to your hearts' delight and then when done...
  HConnectionManager.deleteConnection(newConfig, true);
 {code}
 {quote}
 Here, we propose a reference-count based mechanism for managing connections 
 that will allow {{HTables}} to clean up after themselves. In particular, we 
 extend the {{HConnectionInterface}} interface so as to facilitate reference 
 counting, where, typically, a reference indicates that it is being used by a 
 {{HTable}}, although there could be other sources. 
 To elaborate, when a HTable is constructed, it increments the reference count 
 on the connection given to it. Similarly, when it is closed, that reference 
 count is decremented. In the event there are no more references to that 
 connection, {{HTable#close}} takes it upon itself to delete the connection, 
 thereby sparing the developer from doing so.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3766) Manage Connections Through Reference Counts

2011-04-25 Thread Karthick Sankarachary (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024945#comment-13024945
 ] 

Karthick Sankarachary commented on HBASE-3766:
--

No, it's not relevant anymore, given that we ended up doing it as part of 
HBASE-3777. For the sake of completeness, I'll add a comment here later on how 
reference counting ended up being implemented.

 Manage Connections Through Reference Counts
 ---

 Key: HBASE-3766
 URL: https://issues.apache.org/jira/browse/HBASE-3766
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3766.patch, HBASE-3766.patch, HBASE-3766.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 As of now, the onus is on the developer to explicitly close connections once 
 they're done using them. Furthermore, since connections are managed based on 
 the identity of the {{Configuration}} object, one is forced to clone the 
 configuration object in order to be able to clean it up safely (for a case in 
 point, see {{HTablePool's}} constructor). As a matter of fact, this issue has 
 been well-documented in the HConnectionManager class:
 {quote}
 But sharing connections makes clean up of {{HConnection}} instances a little 
 awkward.  Currently, clients cleanup by calling 
 {{#deleteConnection(Configuration, boolean)}}.  This will shutdown the 
 zookeeper connection the {{HConnection}} was using and clean up all 
 {{HConnection}} resources as well as stopping proxies to servers out on the 
 cluster. Not running the cleanup will not end the world; it'll just stall the 
 closeup some and spew some zookeeper connection failed messages into the log. 
  Running the cleanup on a {{HConnection}} that is subsequently used by 
 another will cause breakage so be careful running cleanup. To create a 
 {{HConnection}} that is not shared by others, you can create a new 
 {{Configuration}} instance, pass this new instance to 
 {{#getConnection(Configuration)}}, and then when done, close it up by doing 
 something like the following:
 {code}
  Configuration newConfig = new Configuration(originalConf);
  HConnection connection = HConnectionManager.getConnection(newConfig);
  // Use the connection to your hearts' delight and then when done...
  HConnectionManager.deleteConnection(newConfig, true);
 {code}
 {quote}
 Here, we propose a reference-count based mechanism for managing connections 
 that will allow {{HTables}} to clean up after themselves. In particular, we 
 extend the {{HConnectionInterface}} interface so as to facilitate reference 
 counting, where, typically, a reference indicates that it is being used by a 
 {{HTable}}, although there could be other sources. 
 To elaborate, when a HTable is constructed, it increments the reference count 
 on the connection given to it. Similarly, when it is closed, that reference 
 count is decremented. In the event there are no more references to that 
 connection, {{HTable#close}} takes it upon itself to delete the connection, 
 thereby sparing the developer from doing so.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3670) Fix error handling in get(ListGet gets)

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024948#comment-13024948
 ] 

stack commented on HBASE-3670:
--

It won't break the API if what you add is subclass of IOE.

Change HTI I'd say.  Maybe file another issue if HTI and HT differ.  Doc should 
be in one only (FYI, HTI came after HT).

 Fix error handling in get(ListGet gets)
 -

 Key: HBASE-3670
 URL: https://issues.apache.org/jira/browse/HBASE-3670
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Lars George
 Fix For: 0.92.0


 See HBASE-3634 for details. The get(ListGet gets) call needs to catch (or 
 rather use a try/finally) the exception thrown by batch() and copy the Result 
 instances over and return it. If that is not intended then we need to fix the 
 JavaDoc in HTableInterface to reflect the new behavior. 
 In general it seems to make sense to check the various methods (list based 
 put, get, delete compared to batch) and agree on the correct behavior.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-1512:
---

Attachment: patch-1512-9.txt

My bad, uploaded a wrong patch. Sorry for the mishap Stack.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512-9.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024968#comment-13024968
 ] 

jirapos...@reviews.apache.org commented on HBASE-3777:
--



bq.  On 2011-04-23 03:02:04, Ted Yu wrote:
bq.   src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java, 
line 147
bq.   https://reviews.apache.org/r/643/diff/3/?file=16927#file16927line147
bq.  
bq.   Here we mix user code with test cluster management code.
bq.   I think table.close() should be called first in the finally block.

Closing the table before shutting down the cluster seems appropriate.


- Karthick


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/643/#review532
---


On 2011-04-22 21:16:59, Karthick Sankarachary wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/643/
bq.  ---
bq.  
bq.  (Updated 2011-04-22 21:16:59)
bq.  
bq.  
bq.  Review request for hbase and Ted Yu.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Judging from the javadoc in HConnectionManager, sharing connections across 
multiple clients going to the same cluster is supposedly a good thing. However, 
the fact that there is a one-to-one mapping between a configuration and 
connection instance, kind of works against that goal. Specifically, when you 
create HTable instances using a given Configuration instance and a copy 
thereof, we end up with two distinct HConnection instances under the covers. Is 
this really expected behavior, especially given that the configuration instance 
gets cloned a lot?
bq.  
bq.  Here, I'd like to play devil's advocate and propose that we deep-compare 
HBaseConfiguration instances, so that multiple HBaseConfiguration instances 
that have the same properties map to the same HConnection instance. In case one 
is concerned that a single HConnection is insufficient for sharing amongst 
clients, to quote the javadoc, then one should be able to mark a given 
HBaseConfiguration instance as being uniquely identifiable.
bq.  
bq.  
bq.  This addresses bug HBASE-3777.
bq.  https://issues.apache.org/jira/browse/HBASE-3777
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5701639 
bq.src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 
be31179 
bq.src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java afb666a 
bq.src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
c348f7a 
bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java edacf56 
bq.src/main/java/org/apache/hadoop/hbase/client/HTablePool.java 88827a8 
bq.src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 9e3f4d1 
bq.
src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java 
d76e333 
bq.
src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
 ed88bfa 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 79a48ba 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
d0a1e11 
bq.
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
 78c3b42 
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 5da5e34 
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java b624d28 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 7f5b377 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 
dc471c4 
bq.src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 
e25184e 
bq.src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 
60320a3 
bq.src/test/java/org/apache/hadoop/hbase/client/TestHCM.java b01a2d2 
bq.src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java 
624f4a8 
bq.src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java 8992dbb 
bq.  
bq.  Diff: https://reviews.apache.org/r/643/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Karthick
bq.  
bq.



 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, 
 

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024967#comment-13024967
 ] 

jirapos...@reviews.apache.org commented on HBASE-3777:
--



bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HConstants.java, line 116
bq.   https://reviews.apache.org/r/643/diff/3/?file=16908#file16908line116
bq.  
bq.   Copy/paste issue (minor)

Will change it to Default limit on concurrent client-side zookeeper 
connections.


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HConstants.java, line 442
bq.   https://reviews.apache.org/r/643/diff/3/?file=16908#file16908line442
bq.  
bq.   Thanks for moving these configs. in here.

Yeah, the HConnectionKey would not have looked pretty if we hadn't moves those 
configs to HConstants. 

I will remove the trailing space. 


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 
177
bq.   https://reviews.apache.org/r/643/diff/3/?file=16909#file16909line177
bq.  
bq.   This looks like a good change.

As a matter of fact, the CatalogTracker was the only class that was being 
handed a connection, which made cleanup tricky since it didn't really own that 
connection (as Todd rightly pointed out). Making it take a configuration seemed 
like the most pragmatic thing to do.


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 63
bq.   https://reviews.apache.org/r/643/diff/3/?file=16910#file16910line63
bq.  
bq.   Implement Closeable now you've added close?

Yes, we can. I'll make HConnection implement Closeable as well. If you want, we 
can make HTablePool implement Closeable by calling closeTablePool on all of its 
tables.


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, 
line 265
bq.   https://reviews.apache.org/r/643/diff/3/?file=16911#file16911line265
bq.  
bq.   This is painful, but makes sense.

A small price to pay, in my opinion.


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, 
line 1207
bq.   https://reviews.apache.org/r/643/diff/3/?file=16911#file16911line1207
bq.  
bq.   Not important but if closed, just return immediately and then you 
can save indenting whole method.  Not important.  Just style diff.

Will do.


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, 
line 1667
bq.   https://reviews.apache.org/r/643/diff/3/?file=16911#file16911line1667
bq.  
bq.   So, this is just insurance as you say in the issue.  Thats fine I'd 
say (I agree w/ Ted that we shouldn't rely on finalize)

Exactly - it's just insurance, a fall-back in case some thread somewhere was 
unable to close the connection for whatever reason.


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 355
bq.   https://reviews.apache.org/r/643/diff/3/?file=16917#file16917line355
bq.  
bq.   Interesting but I go along w/ it.  Looks like we only made this 
connection for CT?  If so, bad design fixed by your CT change.

Yes, for the most part, the connection that was being given to CT was not used 
for anything else. There was one exception though (TestCatalogTracker), which 
was doing all kinds of things on the connection outside of the CT, and to 
accomodate that, I left open a package-level constructor in CT that is visible 
only by that test case (it'd be too much trouble to change it).


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HTablePool.java, line 150
bq.   https://reviews.apache.org/r/643/diff/3/?file=16913#file16913line150
bq.  
bq.   Just remove.

Ok, will remove all dead code.


bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 259
bq.   https://reviews.apache.org/r/643/diff/3/?file=16912#file16912line259
bq.  
bq.   Yeah, this is ugly its almost as though you should have a 
special method for it, one that does not up the counters?

Just a thought - how about if we hide the ugliness in HCM, like so:

  public abstract class ConnectableT {
public Configuration conf;

public Connectable(Configuration conf) {
  this.conf = conf;
}

public abstract T connect(Connection connection);
  }

  public static T T execute(ConnectableT connectable) {
if (connectable == null || connectable.conf == null) {
  return null;
}
HConfiguration conf = connectable.conf;
HConnection connection = HConnectionManager.getConnection(conf);
try {
 

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024969#comment-13024969
 ] 

jirapos...@reviews.apache.org commented on HBASE-3777:
--



bq.  On 2011-04-23 02:14:11, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, 
line 228
bq.   https://reviews.apache.org/r/643/diff/3/?file=16911#file16911line228
bq.  
bq.   Shall we break out of the loop here ?

Will do.


- Karthick


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/643/#review531
---


On 2011-04-22 21:16:59, Karthick Sankarachary wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/643/
bq.  ---
bq.  
bq.  (Updated 2011-04-22 21:16:59)
bq.  
bq.  
bq.  Review request for hbase and Ted Yu.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Judging from the javadoc in HConnectionManager, sharing connections across 
multiple clients going to the same cluster is supposedly a good thing. However, 
the fact that there is a one-to-one mapping between a configuration and 
connection instance, kind of works against that goal. Specifically, when you 
create HTable instances using a given Configuration instance and a copy 
thereof, we end up with two distinct HConnection instances under the covers. Is 
this really expected behavior, especially given that the configuration instance 
gets cloned a lot?
bq.  
bq.  Here, I'd like to play devil's advocate and propose that we deep-compare 
HBaseConfiguration instances, so that multiple HBaseConfiguration instances 
that have the same properties map to the same HConnection instance. In case one 
is concerned that a single HConnection is insufficient for sharing amongst 
clients, to quote the javadoc, then one should be able to mark a given 
HBaseConfiguration instance as being uniquely identifiable.
bq.  
bq.  
bq.  This addresses bug HBASE-3777.
bq.  https://issues.apache.org/jira/browse/HBASE-3777
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5701639 
bq.src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 
be31179 
bq.src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java afb666a 
bq.src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
c348f7a 
bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java edacf56 
bq.src/main/java/org/apache/hadoop/hbase/client/HTablePool.java 88827a8 
bq.src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 9e3f4d1 
bq.
src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java 
d76e333 
bq.
src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
 ed88bfa 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 79a48ba 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
d0a1e11 
bq.
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
 78c3b42 
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 5da5e34 
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java b624d28 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 7f5b377 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 
dc471c4 
bq.src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 
e25184e 
bq.src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 
60320a3 
bq.src/test/java/org/apache/hadoop/hbase/client/TestHCM.java b01a2d2 
bq.src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java 
624f4a8 
bq.src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java 8992dbb 
bq.  
bq.  Diff: https://reviews.apache.org/r/643/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Karthick
bq.  
bq.



 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, 
 HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, 
 HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across 

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024977#comment-13024977
 ] 

jirapos...@reviews.apache.org commented on HBASE-3777:
--



bq.  On 2011-04-25 20:05:54, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 259
bq.   https://reviews.apache.org/r/643/diff/3/?file=16912#file16912line259
bq.  
bq.   Yeah, this is ugly its almost as though you should have a 
special method for it, one that does not up the counters?
bq.  
bq.  Karthick Sankarachary wrote:
bq.  Just a thought - how about if we hide the ugliness in HCM, like so:
bq.  
bq.public abstract class ConnectableT {
bq.  public Configuration conf;
bq.  
bq.  public Connectable(Configuration conf) {
bq.this.conf = conf;
bq.  }
bq.  
bq.  public abstract T connect(Connection connection);
bq.}
bq.  
bq.public static T T execute(ConnectableT connectable) {
bq.  if (connectable == null || connectable.conf == null) {
bq.return null;
bq.  }
bq.  HConfiguration conf = connectable.conf;
bq.  HConnection connection = HConnectionManager.getConnection(conf);
bq.  try {
bq.return connectable.connect(connection);
bq.  } finally {
bq.HConnectionManager.deleteConnection(conf, false);
bq.  }
bq.}
bq.  
bq.  That way, the HTable call would look somewhat prettier:
bq.  
bq.HConnectionManager.execute(new ConnectableBoolean(conf) {
bq.  public Boolean connect(Connection connection) {
bq.return connection.isTableEnabled(tableName);
bq.  }
bq.});

BTW, if we bypass the reference counters in this situation, there's a chance, 
albeit small, that the connection might get closed by someone else while this 
guy is still trying to talk to it, which could result in a connection is 
closed type of error.


- Karthick


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/643/#review543
---


On 2011-04-22 21:16:59, Karthick Sankarachary wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/643/
bq.  ---
bq.  
bq.  (Updated 2011-04-22 21:16:59)
bq.  
bq.  
bq.  Review request for hbase and Ted Yu.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Judging from the javadoc in HConnectionManager, sharing connections across 
multiple clients going to the same cluster is supposedly a good thing. However, 
the fact that there is a one-to-one mapping between a configuration and 
connection instance, kind of works against that goal. Specifically, when you 
create HTable instances using a given Configuration instance and a copy 
thereof, we end up with two distinct HConnection instances under the covers. Is 
this really expected behavior, especially given that the configuration instance 
gets cloned a lot?
bq.  
bq.  Here, I'd like to play devil's advocate and propose that we deep-compare 
HBaseConfiguration instances, so that multiple HBaseConfiguration instances 
that have the same properties map to the same HConnection instance. In case one 
is concerned that a single HConnection is insufficient for sharing amongst 
clients, to quote the javadoc, then one should be able to mark a given 
HBaseConfiguration instance as being uniquely identifiable.
bq.  
bq.  
bq.  This addresses bug HBASE-3777.
bq.  https://issues.apache.org/jira/browse/HBASE-3777
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 5701639 
bq.src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 
be31179 
bq.src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java afb666a 
bq.src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
c348f7a 
bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java edacf56 
bq.src/main/java/org/apache/hadoop/hbase/client/HTablePool.java 88827a8 
bq.src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 9e3f4d1 
bq.
src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java 
d76e333 
bq.
src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
 ed88bfa 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 79a48ba 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
d0a1e11 
bq.
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
 78c3b42 
bq.

[jira] [Resolved] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-1512.
--

   Resolution: Fixed
Fix Version/s: 0.92.0
 Release Note: A coprocessor to do basic aggregating; max, min, counts, etc.
 Hadoop Flags: [Reviewed]

Committed to TRUNK.  Thank you for the nice feature Himanshu. Nice counseling 
and reviewing Ted.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
Assignee: Himanshu Vashishtha
 Fix For: 0.92.0

 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512-9.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3819) TestSplitLogWorker has too many SLWs running -- makes for contention and occasional failures

2011-04-25 Thread stack (JIRA)
TestSplitLogWorker has too many SLWs running -- makes for contention and 
occasional failures


 Key: HBASE-3819
 URL: https://issues.apache.org/jira/browse/HBASE-3819
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: tslw.patch

I noticed that TSPLW has a background SLW running.  Sometimes it wins the race 
for tasks messing up tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3819) TestSplitLogWorker has too many SLWs running -- makes for contention and occasional failures

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3819:
-

Attachment: tslw.patch

Patch that makes a new SLW per test (and cleans up in finally clause before 
exit)

 TestSplitLogWorker has too many SLWs running -- makes for contention and 
 occasional failures
 

 Key: HBASE-3819
 URL: https://issues.apache.org/jira/browse/HBASE-3819
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: tslw.patch


 I noticed that TSPLW has a background SLW running.  Sometimes it wins the 
 race for tasks messing up tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3819) TestSplitLogWorker has too many SLWs running -- makes for contention and occasional failures

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3819.
--

   Resolution: Fixed
Fix Version/s: 0.92.0
 Assignee: stack

Committed small fix to trunk.

 TestSplitLogWorker has too many SLWs running -- makes for contention and 
 occasional failures
 

 Key: HBASE-3819
 URL: https://issues.apache.org/jira/browse/HBASE-3819
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: tslw.patch


 I noticed that TSPLW has a background SLW running.  Sometimes it wins the 
 race for tasks messing up tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3749) Master can't exit when open port failed

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3749.
--

   Resolution: Fixed
Fix Version/s: 0.90.3
 Hadoop Flags: [Reviewed]

I think this is right.  It'll be caught higher up.  Applied to branch and 
trunk.  Thanks for the patch gaojinchao.

 Master can't exit when open port failed
 ---

 Key: HBASE-3749
 URL: https://issues.apache.org/jira/browse/HBASE-3749
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.1
Reporter: gaojinchao
 Fix For: 0.90.3

 Attachments: HMasterPachV1_Trunk.patch


 When Hmaster crashed  and restart , The Hmaster is hung up.
 // start up all service threads.
 startServiceThreads();  this open 
 port failed!
 // Wait for region servers to report in.  Returns count of regions.
 int regionCount = this.serverManager.waitForRegionServers();
 // TODO: Should do this in background rather than block master startup
 this.fileSystemManager.
   splitLogAfterStartup(this.serverManager.getOnlineServers());
 // Make sure root and meta assigned before proceeding.
 assignRootAndMeta();   --- hung up this 
 function, because of root can't be assigned.
   if (!catalogTracker.verifyRootRegionLocation(timeout)) {
   this.assignmentManager.assignRoot();
   this.catalogTracker.waitForRoot();   --- This statement 
 code is hung up. 
   assigned++;
 }
 Log is as:
 2011-04-07 16:38:22,850 INFO org.mortbay.log: Logging to 
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
 org.mortbay.log.Slf4jLog
 2011-04-07 16:38:22,908 INFO org.apache.hadoop.http.HttpServer: Port returned 
 by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening 
 the listener on 60010
 2011-04-07 16:38:22,909 FATAL org.apache.hadoop.hbase.master.HMaster: Failed 
 startup
 java.net.BindException: Address already in use
  at sun.nio.ch.Net.bind(Native Method)
  at 
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
  at 
 org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
  at org.apache.hadoop.http.HttpServer.start(HttpServer.java:445)
  at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:542)
  at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:373)
  at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
 2011-04-07 16:38:22,910 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 2011-04-07 16:38:22,911 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Exiting wait on regionserver(s) to checkin; count=0, stopped=true, count of 
 regions out on cluster=0
 2011-04-07 16:38:22,914 DEBUG 
 org.apache.hadoop.hbase.master.MasterFileSystem: No log files to split, 
 proceeding...
 2011-04-07 16:38:22,930 INFO org.apache.hadoop.ipc.HbaseRPC: Server at 
 167-6-1-12/167.6.1.12:60020 could not be reached after 1 tries, giving up.
 2011-04-07 16:38:22,930 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region 
 location in ZooKeeper
 2011-04-07 16:38:22,941 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x22f2c49d2590021 Creating (or updating) unassigned node for 
 70236052 with OFFLINE state
 2011-04-07 16:38:22,956 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Server stopped; skipping 
 assign of -ROOT-,,0.70236052 state=OFFLINE, ts=1302165502941
 2011-04-07 16:38:32,746 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: 
 167-6-1-11:6.timeoutMonitor exiting
 2011-04-07 16:39:22,770 INFO org.apache.hadoop.hbase.master.LogCleaner: 
 master-167-6-1-11:6.oldLogCleaner exiting  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025008#comment-13025008
 ] 

stack commented on HBASE-3674:
--

bq. I actually missed when this went in. Having the default be to skip errors 
seems really really dangerous especially as a change in a release branch.

Default was true but we would keep going anyways; it was incorrectly 
implemented.  I opened HBASE-3675 at the time.  Suggest that we continue the 
discussion around how dangerous changing the config is/was over there.  In here 
I'll just make sure the distributed splitter has same flags as the splitter it 
replaces.

Thanks Todd for helping keeping us honest.

 Treat ChecksumException as we would a ParseException splitting logs; else we 
 replay split on every restart
 --

 Key: HBASE-3674
 URL: https://issues.apache.org/jira/browse/HBASE-3674
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3674-v2.txt, 3674.txt


 In short, a ChecksumException will fail log processing for a server so we 
 skip out w/o archiving logs.  On restart, we'll then reprocess the logs -- 
 hit the checksumexception anew, usually -- and so on.
 Here is the splitLog method (edited):
 {code}
   private ListPath splitLog(final FileStatus[] logfiles) throws IOException 
 {
 
 outputSink.startWriterThreads(entryBuffers);
 
 try {
   int i = 0;
   for (FileStatus log : logfiles) {
Path logPath = log.getPath();
 long logLength = log.getLen();
 splitSize += logLength;
 LOG.debug(Splitting hlog  + (i++ + 1) +  of  + logfiles.length
 + :  + logPath + , length= + logLength);
 try {
   recoverFileLease(fs, logPath, conf);
   parseHLog(log, entryBuffers, fs, conf);
   processedLogs.add(logPath);
 } catch (EOFException eof) {
   // truncated files are expected if a RS crashes (see HBASE-2643)
   LOG.info(EOF from hlog  + logPath + . Continuing);
   processedLogs.add(logPath);
 } catch (FileNotFoundException fnfe) {
   // A file may be missing if the region server was able to archive it
   // before shutting down. This means the edits were persisted already
   LOG.info(A log was missing  + logPath +
   , probably because it was moved by the +
now dead region server. Continuing);
   processedLogs.add(logPath);
 } catch (IOException e) {
   // If the IOE resulted from bad file format,
   // then this problem is idempotent and retrying won't help
   if (e.getCause() instanceof ParseException ||
   e.getCause() instanceof ChecksumException) {
 LOG.warn(ParseException from hlog  + logPath + .  continuing);
 processedLogs.add(logPath);
   } else {
 if (skipErrors) {
   LOG.info(Got while parsing hlog  + logPath +
 . Marking as corrupted, e);
   corruptedLogs.add(logPath);
 } else {
   throw e;
 }
   }
 }
   }
   if (fs.listStatus(srcDir).length  processedLogs.size()
   + corruptedLogs.size()) {
 throw new OrphanHLogAfterSplitException(
 Discovered orphan hlog after split. Maybe the 
 + HRegionServer was not dead when we started);
   }
   archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); 
  
 } finally {
   splits = outputSink.finishWritingAndClose();
 }
 return splits;
   }
 {code}
 Notice how we'll only archive logs only if we successfully split all logs.  
 We won't archive 31 of 35 files if we happen to get a checksum exception on 
 file 32.
 I think we should treat a ChecksumException the same as a ParseException; a 
 retry will not fix it if HDFS could not get around the ChecksumException 
 (seems like in our case all replicas were corrupt).
 Here is a play-by-play from the logs:
 {code}
 813572 2011-03-18 20:31:44,687 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 
 35: 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481,
  length=150   65662813573 2011-03-18 20:31:44,687 INFO 
 org.apache.hadoop.hbase.util.FSUtils: Recovering file 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481
 
 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: 
 Found checksum error: b[0, 
 

[jira] [Updated] (HBASE-3741) OpenRegionHandler and CloseRegionHandler are possibly racing

2011-04-25 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3741:
--

Attachment: HBASE-3741-rsfix-v2.patch

This is the same patch but with a new unit test, a new exception type and its 
handling in the master.

 OpenRegionHandler and CloseRegionHandler are possibly racing
 

 Key: HBASE-3741
 URL: https://issues.apache.org/jira/browse/HBASE-3741
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3

 Attachments: HBASE-3741-rsfix-v2.patch, HBASE-3741-rsfix.patch


 This is a serious issue about a race between regions being opened and closed 
 in region servers. We had this situation where the master tried to unassign a 
 region for balancing, failed, force unassigned it, force assigned it 
 somewhere else, failed to open it on another region server (took too long), 
 and then reassigned it back to the original region server. A few seconds 
 later, the region server processed the first closed and the region was left 
 unassigned.
 This is from the master log:
 {quote}
 11-04-05 15:11:17,758 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
 Sent CLOSE to serverName=sv4borg42,60020,1300920459477, load=(requests=187, 
 regions=574, usedHeap=3918, maxHeap=6973) for region 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
 2011-04-05 15:12:10,021 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  state=PENDING_CLOSE, ts=1302041477758
 2011-04-05 15:12:10,021 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
 ...
 2011-04-05 15:14:45,783 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  state=CLOSED, ts=1302041685733
 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x42ec2cece810b68 Creating (or updating) unassigned node for 
 1470298961 with OFFLINE state
 ...
 2011-04-05 15:14:45,885 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961;
  
 plan=hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961,
  src=sv4borg42,60020,1300920459477, dest=sv4borg40,60020,1302041218196
 2011-04-05 15:14:45,885 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  to sv4borg40,60020,1302041218196
 2011-04-05 15:15:39,410 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  state=PENDING_OPEN, ts=1302041700944
 2011-04-05 15:15:39,410 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
 2011-04-05 15:15:39,410 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  state=PENDING_OPEN, ts=1302041700944
 ...
 2011-04-05 15:15:39,410 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  so generated a random one; 
 hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961,
  src=, dest=sv4borg42,60020,1300920459477; 19 (online=19, exclude=null) 
 available servers
 2011-04-05 15:15:39,410 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  to sv4borg42,60020,1300920459477
 2011-04-05 15:15:40,951 DEBUG 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
 master:6-0x42ec2cece810b68 Received ZooKeeper 

[jira] [Created] (HBASE-3820) Splitlog() executed while the namenode was in safemode may cause data-loss

2011-04-25 Thread Jieshan Bean (JIRA)
Splitlog() executed while the namenode was in safemode may cause data-loss
--

 Key: HBASE-3820
 URL: https://issues.apache.org/jira/browse/HBASE-3820
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: Jieshan Bean


I found this problem while the namenode went into safemode due to some unclear 
reasons. 
There's one patch about this problem:

   try {
  HLogSplitter splitter = HLogSplitter.createLogSplitter(
conf, rootdir, logDir, oldLogDir, this.fs);
  try {
splitter.splitLog();
  } catch (OrphanHLogAfterSplitException e) {
LOG.warn(Retrying splitting because of:, e);
// An HLogSplitter instance can only be used once.  Get new instance.
splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
  oldLogDir, this.fs);
splitter.splitLog();
  }
  splitTime = splitter.getTime();
  splitLogSize = splitter.getSize();
} catch (IOException e) {
  checkFileSystem();
  LOG.error(Failed splitting  + logDir.toString(), e);
  master.abort(Shutting down HBase cluster: Failed splitting hlog 
files..., e);
} finally {
  this.splitLogLock.unlock();
}

And it was really give some useful help to some extent, while the namenode 
process exited or been killed, but not considered the Namenode safemode 
exception.
   I think the root reason is the method of checkFileSystem().
   It gives out an method to check whether the HDFS works normally(Read and 
write could be success), and that maybe the original propose of this method. 
This's how this method implements:

DistributedFileSystem dfs = (DistributedFileSystem) fs;
try {
  if (dfs.exists(new Path(/))) {  
return;
  }
} catch (IOException e) {
  exception = RemoteExceptionHandler.checkIOException(e);
}
   
   I have check the hdfs code, and learned that while the namenode was in 
safemode ,the dfs.exists(new Path(/)) returned true. Because the file system 
could provide read-only service. So this method just checks the dfs whether 
could be read. I think it's not reasonable.

   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3675) hbase.hlog.split.skip.errors is false by default but we don't act properly when its true; can make for inconsistent view

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025038#comment-13025038
 ] 

stack commented on HBASE-3675:
--

@Prakash We could do that.  Same w/ being liberal about ignoring errors in last 
file.

This is a kind of tough one in that on the one hand, the first time this option 
comes up is when folks are trying to get their cluster back online and they 
can't because it won't process the logs.  In this context, they are usually a 
bit pissed off.  On the other hand, we don't want data loss.

I suppose we need to doc this as one of the important configs. over in our 
website book.  We also need to make this flag work properly as part of this 
issue.

 hbase.hlog.split.skip.errors is false by default but we don't act properly 
 when its true; can make for inconsistent view
 

 Key: HBASE-3675
 URL: https://issues.apache.org/jira/browse/HBASE-3675
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical

 So, by default hbase.hlog.split.skip.error is false so we should not be 
 skipping errors (What should we do, abort?).
 Anyways, see https://issues.apache.org/jira/browse/HBASE-3674.  It has 
 checksum error on near to last log BUT it writes out recovered.edits  gotten 
 so far.  We then go and assign the regions anyways, applying edits gotten so 
 far, though there are edits behind the checksum error still to be recovered.  
 Not good.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3820) Splitlog() executed while the namenode was in safemode may cause data-loss

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025039#comment-13025039
 ] 

stack commented on HBASE-3820:
--

@Jieshan Bean Please add a patch (See 'attach file' above) with your change 
only in it.  See http://wiki.apache.org/hadoop/Hbase/HowToContribute for how to 
make a patch if you are unclear.  Thank you.

 Splitlog() executed while the namenode was in safemode may cause data-loss
 --

 Key: HBASE-3820
 URL: https://issues.apache.org/jira/browse/HBASE-3820
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: Jieshan Bean

 I found this problem while the namenode went into safemode due to some 
 unclear reasons. 
 There's one patch about this problem:
try {
   HLogSplitter splitter = HLogSplitter.createLogSplitter(
 conf, rootdir, logDir, oldLogDir, this.fs);
   try {
 splitter.splitLog();
   } catch (OrphanHLogAfterSplitException e) {
 LOG.warn(Retrying splitting because of:, e);
 // An HLogSplitter instance can only be used once.  Get new instance.
 splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
   oldLogDir, this.fs);
 splitter.splitLog();
   }
   splitTime = splitter.getTime();
   splitLogSize = splitter.getSize();
 } catch (IOException e) {
   checkFileSystem();
   LOG.error(Failed splitting  + logDir.toString(), e);
   master.abort(Shutting down HBase cluster: Failed splitting hlog 
 files..., e);
 } finally {
   this.splitLogLock.unlock();
 }
 And it was really give some useful help to some extent, while the namenode 
 process exited or been killed, but not considered the Namenode safemode 
 exception.
I think the root reason is the method of checkFileSystem().
It gives out an method to check whether the HDFS works normally(Read and 
 write could be success), and that maybe the original propose of this method. 
 This's how this method implements:
 DistributedFileSystem dfs = (DistributedFileSystem) fs;
 try {
   if (dfs.exists(new Path(/))) {  
 return;
   }
 } catch (IOException e) {
   exception = RemoteExceptionHandler.checkIOException(e);
 }

I have check the hdfs code, and learned that while the namenode was in 
 safemode ,the dfs.exists(new Path(/)) returned true. Because the file 
 system could provide read-only service. So this method just checks the dfs 
 whether could be read. I think it's not reasonable.
 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3820) Splitlog() executed while the namenode was in safemode may cause data-loss

2011-04-25 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025044#comment-13025044
 ] 

Jieshan Bean commented on HBASE-3820:
-

Thanks stack.
I'm just taking some test to verify my modification. And after that, I will 
commit the patch immediately.

 Splitlog() executed while the namenode was in safemode may cause data-loss
 --

 Key: HBASE-3820
 URL: https://issues.apache.org/jira/browse/HBASE-3820
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: Jieshan Bean

 I found this problem while the namenode went into safemode due to some 
 unclear reasons. 
 There's one patch about this problem:
try {
   HLogSplitter splitter = HLogSplitter.createLogSplitter(
 conf, rootdir, logDir, oldLogDir, this.fs);
   try {
 splitter.splitLog();
   } catch (OrphanHLogAfterSplitException e) {
 LOG.warn(Retrying splitting because of:, e);
 // An HLogSplitter instance can only be used once.  Get new instance.
 splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
   oldLogDir, this.fs);
 splitter.splitLog();
   }
   splitTime = splitter.getTime();
   splitLogSize = splitter.getSize();
 } catch (IOException e) {
   checkFileSystem();
   LOG.error(Failed splitting  + logDir.toString(), e);
   master.abort(Shutting down HBase cluster: Failed splitting hlog 
 files..., e);
 } finally {
   this.splitLogLock.unlock();
 }
 And it was really give some useful help to some extent, while the namenode 
 process exited or been killed, but not considered the Namenode safemode 
 exception.
I think the root reason is the method of checkFileSystem().
It gives out an method to check whether the HDFS works normally(Read and 
 write could be success), and that maybe the original propose of this method. 
 This's how this method implements:
 DistributedFileSystem dfs = (DistributedFileSystem) fs;
 try {
   if (dfs.exists(new Path(/))) {  
 return;
   }
 } catch (IOException e) {
   exception = RemoteExceptionHandler.checkIOException(e);
 }

I have check the hdfs code, and learned that while the namenode was in 
 safemode ,the dfs.exists(new Path(/)) returned true. Because the file 
 system could provide read-only service. So this method just checks the dfs 
 whether could be read. I think it's not reasonable.
 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025051#comment-13025051
 ] 

stack commented on HBASE-3674:
--

bq. The patch sets the hbase.hlog.split.skip.errors to true by default. I am 
wondering why the CheckSumException was not ignored as originally proposed?

Prakash, there are two patches.  The first adds ignoring checksumexception.

 Treat ChecksumException as we would a ParseException splitting logs; else we 
 replay split on every restart
 --

 Key: HBASE-3674
 URL: https://issues.apache.org/jira/browse/HBASE-3674
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3674-v2.txt, 3674.txt


 In short, a ChecksumException will fail log processing for a server so we 
 skip out w/o archiving logs.  On restart, we'll then reprocess the logs -- 
 hit the checksumexception anew, usually -- and so on.
 Here is the splitLog method (edited):
 {code}
   private ListPath splitLog(final FileStatus[] logfiles) throws IOException 
 {
 
 outputSink.startWriterThreads(entryBuffers);
 
 try {
   int i = 0;
   for (FileStatus log : logfiles) {
Path logPath = log.getPath();
 long logLength = log.getLen();
 splitSize += logLength;
 LOG.debug(Splitting hlog  + (i++ + 1) +  of  + logfiles.length
 + :  + logPath + , length= + logLength);
 try {
   recoverFileLease(fs, logPath, conf);
   parseHLog(log, entryBuffers, fs, conf);
   processedLogs.add(logPath);
 } catch (EOFException eof) {
   // truncated files are expected if a RS crashes (see HBASE-2643)
   LOG.info(EOF from hlog  + logPath + . Continuing);
   processedLogs.add(logPath);
 } catch (FileNotFoundException fnfe) {
   // A file may be missing if the region server was able to archive it
   // before shutting down. This means the edits were persisted already
   LOG.info(A log was missing  + logPath +
   , probably because it was moved by the +
now dead region server. Continuing);
   processedLogs.add(logPath);
 } catch (IOException e) {
   // If the IOE resulted from bad file format,
   // then this problem is idempotent and retrying won't help
   if (e.getCause() instanceof ParseException ||
   e.getCause() instanceof ChecksumException) {
 LOG.warn(ParseException from hlog  + logPath + .  continuing);
 processedLogs.add(logPath);
   } else {
 if (skipErrors) {
   LOG.info(Got while parsing hlog  + logPath +
 . Marking as corrupted, e);
   corruptedLogs.add(logPath);
 } else {
   throw e;
 }
   }
 }
   }
   if (fs.listStatus(srcDir).length  processedLogs.size()
   + corruptedLogs.size()) {
 throw new OrphanHLogAfterSplitException(
 Discovered orphan hlog after split. Maybe the 
 + HRegionServer was not dead when we started);
   }
   archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); 
  
 } finally {
   splits = outputSink.finishWritingAndClose();
 }
 return splits;
   }
 {code}
 Notice how we'll only archive logs only if we successfully split all logs.  
 We won't archive 31 of 35 files if we happen to get a checksum exception on 
 file 32.
 I think we should treat a ChecksumException the same as a ParseException; a 
 retry will not fix it if HDFS could not get around the ChecksumException 
 (seems like in our case all replicas were corrupt).
 Here is a play-by-play from the logs:
 {code}
 813572 2011-03-18 20:31:44,687 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 
 35: 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481,
  length=150   65662813573 2011-03-18 20:31:44,687 INFO 
 org.apache.hadoop.hbase.util.FSUtils: Recovering file 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481
 
 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: 
 Found checksum error: b[0, 
 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374

 

[jira] [Updated] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart

2011-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3674:
-

Attachment: 3674-distributed.txt

How about this Prakash?  This adds back the changes and sets default to true 
for the fail flag.

 Treat ChecksumException as we would a ParseException splitting logs; else we 
 replay split on every restart
 --

 Key: HBASE-3674
 URL: https://issues.apache.org/jira/browse/HBASE-3674
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3674-distributed.txt, 3674-v2.txt, 3674.txt


 In short, a ChecksumException will fail log processing for a server so we 
 skip out w/o archiving logs.  On restart, we'll then reprocess the logs -- 
 hit the checksumexception anew, usually -- and so on.
 Here is the splitLog method (edited):
 {code}
   private ListPath splitLog(final FileStatus[] logfiles) throws IOException 
 {
 
 outputSink.startWriterThreads(entryBuffers);
 
 try {
   int i = 0;
   for (FileStatus log : logfiles) {
Path logPath = log.getPath();
 long logLength = log.getLen();
 splitSize += logLength;
 LOG.debug(Splitting hlog  + (i++ + 1) +  of  + logfiles.length
 + :  + logPath + , length= + logLength);
 try {
   recoverFileLease(fs, logPath, conf);
   parseHLog(log, entryBuffers, fs, conf);
   processedLogs.add(logPath);
 } catch (EOFException eof) {
   // truncated files are expected if a RS crashes (see HBASE-2643)
   LOG.info(EOF from hlog  + logPath + . Continuing);
   processedLogs.add(logPath);
 } catch (FileNotFoundException fnfe) {
   // A file may be missing if the region server was able to archive it
   // before shutting down. This means the edits were persisted already
   LOG.info(A log was missing  + logPath +
   , probably because it was moved by the +
now dead region server. Continuing);
   processedLogs.add(logPath);
 } catch (IOException e) {
   // If the IOE resulted from bad file format,
   // then this problem is idempotent and retrying won't help
   if (e.getCause() instanceof ParseException ||
   e.getCause() instanceof ChecksumException) {
 LOG.warn(ParseException from hlog  + logPath + .  continuing);
 processedLogs.add(logPath);
   } else {
 if (skipErrors) {
   LOG.info(Got while parsing hlog  + logPath +
 . Marking as corrupted, e);
   corruptedLogs.add(logPath);
 } else {
   throw e;
 }
   }
 }
   }
   if (fs.listStatus(srcDir).length  processedLogs.size()
   + corruptedLogs.size()) {
 throw new OrphanHLogAfterSplitException(
 Discovered orphan hlog after split. Maybe the 
 + HRegionServer was not dead when we started);
   }
   archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); 
  
 } finally {
   splits = outputSink.finishWritingAndClose();
 }
 return splits;
   }
 {code}
 Notice how we'll only archive logs only if we successfully split all logs.  
 We won't archive 31 of 35 files if we happen to get a checksum exception on 
 file 32.
 I think we should treat a ChecksumException the same as a ParseException; a 
 retry will not fix it if HDFS could not get around the ChecksumException 
 (seems like in our case all replicas were corrupt).
 Here is a play-by-play from the logs:
 {code}
 813572 2011-03-18 20:31:44,687 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 
 35: 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481,
  length=150   65662813573 2011-03-18 20:31:44,687 INFO 
 org.apache.hadoop.hbase.util.FSUtils: Recovering file 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481
 
 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: 
 Found checksum error: b[0, 
 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374

 6174735f6275636b6574000d9fa4d5dc012ec9c7cbaf000001006d005d0008002337626262663764626431616561366234616130656334383436653732333132643a32390764656661756c746170695f616e64726f69645f6c6f67676564

 

[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart

2011-04-25 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025071#comment-13025071
 ] 

Prakash Khemani commented on HBASE-3674:


+1

 Treat ChecksumException as we would a ParseException splitting logs; else we 
 replay split on every restart
 --

 Key: HBASE-3674
 URL: https://issues.apache.org/jira/browse/HBASE-3674
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3674-distributed.txt, 3674-v2.txt, 3674.txt


 In short, a ChecksumException will fail log processing for a server so we 
 skip out w/o archiving logs.  On restart, we'll then reprocess the logs -- 
 hit the checksumexception anew, usually -- and so on.
 Here is the splitLog method (edited):
 {code}
   private ListPath splitLog(final FileStatus[] logfiles) throws IOException 
 {
 
 outputSink.startWriterThreads(entryBuffers);
 
 try {
   int i = 0;
   for (FileStatus log : logfiles) {
Path logPath = log.getPath();
 long logLength = log.getLen();
 splitSize += logLength;
 LOG.debug(Splitting hlog  + (i++ + 1) +  of  + logfiles.length
 + :  + logPath + , length= + logLength);
 try {
   recoverFileLease(fs, logPath, conf);
   parseHLog(log, entryBuffers, fs, conf);
   processedLogs.add(logPath);
 } catch (EOFException eof) {
   // truncated files are expected if a RS crashes (see HBASE-2643)
   LOG.info(EOF from hlog  + logPath + . Continuing);
   processedLogs.add(logPath);
 } catch (FileNotFoundException fnfe) {
   // A file may be missing if the region server was able to archive it
   // before shutting down. This means the edits were persisted already
   LOG.info(A log was missing  + logPath +
   , probably because it was moved by the +
now dead region server. Continuing);
   processedLogs.add(logPath);
 } catch (IOException e) {
   // If the IOE resulted from bad file format,
   // then this problem is idempotent and retrying won't help
   if (e.getCause() instanceof ParseException ||
   e.getCause() instanceof ChecksumException) {
 LOG.warn(ParseException from hlog  + logPath + .  continuing);
 processedLogs.add(logPath);
   } else {
 if (skipErrors) {
   LOG.info(Got while parsing hlog  + logPath +
 . Marking as corrupted, e);
   corruptedLogs.add(logPath);
 } else {
   throw e;
 }
   }
 }
   }
   if (fs.listStatus(srcDir).length  processedLogs.size()
   + corruptedLogs.size()) {
 throw new OrphanHLogAfterSplitException(
 Discovered orphan hlog after split. Maybe the 
 + HRegionServer was not dead when we started);
   }
   archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); 
  
 } finally {
   splits = outputSink.finishWritingAndClose();
 }
 return splits;
   }
 {code}
 Notice how we'll only archive logs only if we successfully split all logs.  
 We won't archive 31 of 35 files if we happen to get a checksum exception on 
 file 32.
 I think we should treat a ChecksumException the same as a ParseException; a 
 retry will not fix it if HDFS could not get around the ChecksumException 
 (seems like in our case all replicas were corrupt).
 Here is a play-by-play from the logs:
 {code}
 813572 2011-03-18 20:31:44,687 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 
 35: 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481,
  length=150   65662813573 2011-03-18 20:31:44,687 INFO 
 org.apache.hadoop.hbase.util.FSUtils: Recovering file 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481
 
 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: 
 Found checksum error: b[0, 
 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374

 6174735f6275636b6574000d9fa4d5dc012ec9c7cbaf000001006d005d0008002337626262663764626431616561366234616130656334383436653732333132643a32390764656661756c746170695f616e64726f69645f6c6f67676564