[jira] [Commented] (HBASE-12346) Scan's default auths behavior under Visibility labels

2014-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192978#comment-14192978
 ] 

Hadoop QA commented on HBASE-12346:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12678663/HBASE-12346-master-v3.patch
  against trunk revision .
  ATTACHMENT ID: 12678663

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11552//console

This message is automatically generated.

 Scan's default auths behavior under Visibility labels
 -

 Key: HBASE-12346
 URL: https://issues.apache.org/jira/browse/HBASE-12346
 Project: HBase
  Issue Type: Bug
  Components: API, security
Affects Versions: 0.98.7, 0.99.1
Reporter: Jerry He
 Fix For: 0.98.8, 0.99.2

 Attachments: HBASE-12346-master-v2.patch, 
 HBASE-12346-master-v3.patch, HBASE-12346-master.patch


 In Visibility Labels security, a set of labels (auths) are administered and 
 associated with a user.
 A user can normally  only see cell data during scan that are part of the 
 user's label set (auths).
 Scan uses setAuthorizations to indicates its wants to use the auths to access 
 the cells.
 Similarly in the shell:
 {code}
 scan 'table1', AUTHORIZATIONS = ['private']
 {code}
 But it is a surprise to find that setAuthorizations seems to be 'mandatory' 
 in the default visibility label security setting.  Every scan needs to 
 setAuthorizations before the scan can get any cells even the cells are under 
 the labels the request user is part of.
 The following steps will illustrate the issue:
 Run as superuser.
 {code}
 1. create a visibility label called 'private'
 2. create 'table1'
 3. put into 'table1' data and label the data as 'private'
 4. set_auths 'user1', 'private'
 5. grant 'user1', 'RW', 'table1'
 {code}
 Run as 

[jira] [Commented] (HBASE-12406) Bulk load fails in 0.98 against hadoop-1 due to unmatched family name

2014-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192983#comment-14192983
 ] 

Hadoop QA commented on HBASE-12406:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678664/12406-0.98-v1.txt
  against trunk revision .
  ATTACHMENT ID: 12678664

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11553//console

This message is automatically generated.

 Bulk load fails in 0.98 against hadoop-1 due to unmatched family name
 -

 Key: HBASE-12406
 URL: https://issues.apache.org/jira/browse/HBASE-12406
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
 Fix For: 0.98.8

 Attachments: 12406-0.98-v1.txt


 From 
 https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/614/testReport/org.apache.hadoop.hbase.mapreduce/TestCopyTable/testCopyTableWithBulkload/
  :
 {code}
 java.io.IOException: Unmatched family names found: unmatched family names in 
 HFiles to be bulkloaded: [_logs]; valid family names of table testCopyTable2 
 are: [family]
   at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:268)
   at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:907)
   at org.apache.hadoop.hbase.mapreduce.CopyTable.run(CopyTable.java:344)
 {code}
 The above failure was due to the presence of history directory under _logs 
 directory.
 e.g.
 {code}
 hdfs://nn:59313/user/tyu/copytable/4282249372082687850/_logs/history
 {code}
 HBASE-12375 removed check for directory name which starts with 

[jira] [Updated] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-12363:
--
Attachment: 12363-master.txt

Here's a patch.
* Adds new TTL option to KEEP_DELETED_CELLS
* 100% backwards compatible in HColumnDescriptor (can parse the old 'true', 
'false' string)
* 100% compatible in shell (arg.to_s.upcase to boolean and strings will work 
exactly as before)
* the only difference is that a newly created table will show 'TRUE' instead 
'true', even that is compatible forward compatible for old case, as the old 
code will try to parse it as Boolean
* added tests

Now, ScanQueryMatcher doesn't exactly look nicer now. If somebody suggests some 
easy simplifications here I'm happy to incorporate them. 

It's think it's time to refactor it... For another jira.

TL;DR: with KEEP_DELETED_CELLS=TTL deleted cells *and* their delete markers 
are removed when the TTL expired (regardless of MIN_VERSION setting). I.e. one 
can keep TTL + MIN_VERSIONS and still get rid of old deleted rows.

We could even add another enum: MAKERS_ONLY and remove the 
hbase.hstore.time.to.purge.deletes config option, but that's also another 
jira.

 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how KEEP_DELETED_CELLS is implemented.
 The problem is around knowing when it is safe to remove a delete marker (we 
 cannot remove it unless all cells affected by it are remove otherwise).
 This was particularly hard for family marker, since they sort before all 
 cells of a row, and hence scanning forward through an HFile you cannot know 
 whether the family markers are still needed until at least the entire row is 
 scanned.
 My solution was to keep the TS of the oldest put in any given HFile, and only 
 remove delete markers older than that TS.
 That sounds good on the face of it... But now imagine you wrote a version of 
 ROW 1 and then never update it again. Then later you write a billion other 
 rows and delete them all. Since the TS of the cells in ROW 1 is older than 
 all the delete markers for the other billion rows, these will never be 
 collected... At least for the region that hosts ROW 1 after a major 
 compaction.
 Note, in a sense that is what HBase is supposed to do when keeping deleted 
 cells: Keep them until they would be removed by some other means (for example 
 TTL, or MAX_VERSION when new versions are inserted).
 The specific problem here is that even as all KVs affected by a delete marker 
 are expired this way the marker would not be removed if there just one older 
 KV in the HStore.
 I don't see a good way out of this. In parent I outlined these four solutions:
 So there are three options I think:
 # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
 apply to deleted rows or delete marker rows (wouldn't know how long to keep 
 family deletes in that case). (MAX)VERSIONS would still be enforced on all 
 rows types except for family delete markers.
 # Translate family delete markers to column delete marker at (major) 
 compaction time.
 # Change HFileWriterV* to keep track of the earliest put TS in a store and 
 write it to the file metadata. Use that use expire delete marker that are 
 older and hence can't affect any puts in the file.
 # Have Store.java keep track of the earliest put in internalFlushCache and 
 compactStore and then append it to the file metadata. That way HFileWriterV* 
 would not need to know about KVs.
 And I implemented #4.
 I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192990#comment-14192990
 ] 

Lars Hofhansl edited comment on HBASE-12363 at 11/1/14 6:38 AM:


Here's a patch.
* Adds new TTL option to KEEP_DELETED_CELLS
* 100% backwards compatible in HColumnDescriptor (can parse the old 'true', 
'false' string)
* 100% compatible in shell (arg.to_s.upcase so boolean and strings will work 
exactly as before)
* the only difference is that a newly created table will show 'TRUE' instead of 
'true'; even that is compatible forward compatible for old case, as the old 
code will try to parse it as Boolean
* added tests

Now, ScanQueryMatcher doesn't exactly look nicer now. If somebody suggests some 
easy simplifications here I'm happy to incorporate them. 

It's think it's time to refactor it... For another jira.

TL;DR: with KEEP_DELETED_CELLS=TTL deleted cells *and* their delete markers 
are removed when the TTL expired (regardless of MIN_VERSION setting). I.e. one 
can keep TTL + MIN_VERSIONS and still get rid of old deleted rows.

We could even add another enum: MAKERS_ONLY and remove the 
hbase.hstore.time.to.purge.deletes config option, but that's also another 
jira.


was (Author: lhofhansl):
Here's a patch.
* Adds new TTL option to KEEP_DELETED_CELLS
* 100% backwards compatible in HColumnDescriptor (can parse the old 'true', 
'false' string)
* 100% compatible in shell (arg.to_s.upcase to boolean and strings will work 
exactly as before)
* the only difference is that a newly created table will show 'TRUE' instead 
'true', even that is compatible forward compatible for old case, as the old 
code will try to parse it as Boolean
* added tests

Now, ScanQueryMatcher doesn't exactly look nicer now. If somebody suggests some 
easy simplifications here I'm happy to incorporate them. 

It's think it's time to refactor it... For another jira.

TL;DR: with KEEP_DELETED_CELLS=TTL deleted cells *and* their delete markers 
are removed when the TTL expired (regardless of MIN_VERSION setting). I.e. one 
can keep TTL + MIN_VERSIONS and still get rid of old deleted rows.

We could even add another enum: MAKERS_ONLY and remove the 
hbase.hstore.time.to.purge.deletes config option, but that's also another 
jira.

 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how KEEP_DELETED_CELLS is implemented.
 The problem is around knowing when it is safe to remove a delete marker (we 
 cannot remove it unless all cells affected by it are remove otherwise).
 This was particularly hard for family marker, since they sort before all 
 cells of a row, and hence scanning forward through an HFile you cannot know 
 whether the family markers are still needed until at least the entire row is 
 scanned.
 My solution was to keep the TS of the oldest put in any given HFile, and only 
 remove delete markers older than that TS.
 That sounds good on the face of it... But now imagine you wrote a version of 
 ROW 1 and then never update it again. Then later you write a billion other 
 rows and delete them all. Since the TS of the cells in ROW 1 is older than 
 all the delete markers for the other billion rows, these will never be 
 collected... At least for the region that hosts ROW 1 after a major 
 compaction.
 Note, in a sense that is what HBase is supposed to do when keeping deleted 
 cells: Keep them until they would be removed by some other means (for example 
 TTL, or MAX_VERSION when new versions are inserted).
 The specific problem here is that even as all KVs affected by a delete marker 
 are expired this way the marker would not be removed if there just one older 
 KV in the HStore.
 I don't see a good way out of this. In parent I outlined these four solutions:
 So there are three options I think:
 # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
 apply to deleted rows or delete marker rows (wouldn't know how long to keep 
 family deletes in that case). (MAX)VERSIONS would still be enforced on all 
 rows types except for family delete markers.
 # Translate family delete markers to column delete marker at (major) 
 compaction time.
 # Change HFileWriterV* to keep track of the earliest put TS in a store and 
 write it to the file metadata. Use that use expire delete marker that are 
 older and hence can't affect any puts in the file.
 # Have Store.java keep track of the earliest put in internalFlushCache and 
 compactStore and 

[jira] [Assigned] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HBASE-12363:
-

Assignee: Lars Hofhansl

 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how KEEP_DELETED_CELLS is implemented.
 The problem is around knowing when it is safe to remove a delete marker (we 
 cannot remove it unless all cells affected by it are remove otherwise).
 This was particularly hard for family marker, since they sort before all 
 cells of a row, and hence scanning forward through an HFile you cannot know 
 whether the family markers are still needed until at least the entire row is 
 scanned.
 My solution was to keep the TS of the oldest put in any given HFile, and only 
 remove delete markers older than that TS.
 That sounds good on the face of it... But now imagine you wrote a version of 
 ROW 1 and then never update it again. Then later you write a billion other 
 rows and delete them all. Since the TS of the cells in ROW 1 is older than 
 all the delete markers for the other billion rows, these will never be 
 collected... At least for the region that hosts ROW 1 after a major 
 compaction.
 Note, in a sense that is what HBase is supposed to do when keeping deleted 
 cells: Keep them until they would be removed by some other means (for example 
 TTL, or MAX_VERSION when new versions are inserted).
 The specific problem here is that even as all KVs affected by a delete marker 
 are expired this way the marker would not be removed if there just one older 
 KV in the HStore.
 I don't see a good way out of this. In parent I outlined these four solutions:
 So there are three options I think:
 # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
 apply to deleted rows or delete marker rows (wouldn't know how long to keep 
 family deletes in that case). (MAX)VERSIONS would still be enforced on all 
 rows types except for family delete markers.
 # Translate family delete markers to column delete marker at (major) 
 compaction time.
 # Change HFileWriterV* to keep track of the earliest put TS in a store and 
 write it to the file metadata. Use that use expire delete marker that are 
 older and hence can't affect any puts in the file.
 # Have Store.java keep track of the earliest put in internalFlushCache and 
 compactStore and then append it to the file metadata. That way HFileWriterV* 
 would not need to know about KVs.
 And I implemented #4.
 I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-12363:
--
Status: Patch Available  (was: Open)

 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how KEEP_DELETED_CELLS is implemented.
 The problem is around knowing when it is safe to remove a delete marker (we 
 cannot remove it unless all cells affected by it are remove otherwise).
 This was particularly hard for family marker, since they sort before all 
 cells of a row, and hence scanning forward through an HFile you cannot know 
 whether the family markers are still needed until at least the entire row is 
 scanned.
 My solution was to keep the TS of the oldest put in any given HFile, and only 
 remove delete markers older than that TS.
 That sounds good on the face of it... But now imagine you wrote a version of 
 ROW 1 and then never update it again. Then later you write a billion other 
 rows and delete them all. Since the TS of the cells in ROW 1 is older than 
 all the delete markers for the other billion rows, these will never be 
 collected... At least for the region that hosts ROW 1 after a major 
 compaction.
 Note, in a sense that is what HBase is supposed to do when keeping deleted 
 cells: Keep them until they would be removed by some other means (for example 
 TTL, or MAX_VERSION when new versions are inserted).
 The specific problem here is that even as all KVs affected by a delete marker 
 are expired this way the marker would not be removed if there just one older 
 KV in the HStore.
 I don't see a good way out of this. In parent I outlined these four solutions:
 So there are three options I think:
 # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
 apply to deleted rows or delete marker rows (wouldn't know how long to keep 
 family deletes in that case). (MAX)VERSIONS would still be enforced on all 
 rows types except for family delete markers.
 # Translate family delete markers to column delete marker at (major) 
 compaction time.
 # Change HFileWriterV* to keep track of the earliest put TS in a store and 
 write it to the file metadata. Use that use expire delete marker that are 
 older and hence can't affect any puts in the file.
 # Have Store.java keep track of the earliest put in internalFlushCache and 
 compactStore and then append it to the file metadata. That way HFileWriterV* 
 would not need to know about KVs.
 And I implemented #4.
 I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192991#comment-14192991
 ] 

Hadoop QA commented on HBASE-12363:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678669/12363-master.txt
  against trunk revision .
  ATTACHMENT ID: 12678669

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 27 new 
or modified tests.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail.

Compilation errors resume:
[ERROR] COMPILATION ERROR : 
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[162,23]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[429,36]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[794,10]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[819,48]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[162,63]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[798,14]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[811,61]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[811,85]
 cannot find symbol
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on 
project hbase-client: Compilation failure: Compilation failure:
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[162,23]
 cannot find symbol
[ERROR] symbol:   class KeepDeletedCells
[ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[429,36]
 cannot find symbol
[ERROR] symbol:   class KeepDeletedCells
[ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[794,10]
 cannot find symbol
[ERROR] symbol:   class KeepDeletedCells
[ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[819,48]
 cannot find symbol
[ERROR] symbol:   class KeepDeletedCells
[ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[162,63]
 cannot find symbol
[ERROR] symbol:   variable KeepDeletedCells
[ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[798,14]
 cannot find symbol
[ERROR] symbol:   variable KeepDeletedCells
[ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[811,61]
 cannot find symbol
[ERROR] symbol:   variable KeepDeletedCells
[ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[811,85]
 cannot find symbol
[ERROR] symbol:   variable KeepDeletedCells
[ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor
[ERROR] - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 

[jira] [Updated] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-12363:
--
Attachment: (was: 12363-master.txt)

 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how KEEP_DELETED_CELLS is implemented.
 The problem is around knowing when it is safe to remove a delete marker (we 
 cannot remove it unless all cells affected by it are remove otherwise).
 This was particularly hard for family marker, since they sort before all 
 cells of a row, and hence scanning forward through an HFile you cannot know 
 whether the family markers are still needed until at least the entire row is 
 scanned.
 My solution was to keep the TS of the oldest put in any given HFile, and only 
 remove delete markers older than that TS.
 That sounds good on the face of it... But now imagine you wrote a version of 
 ROW 1 and then never update it again. Then later you write a billion other 
 rows and delete them all. Since the TS of the cells in ROW 1 is older than 
 all the delete markers for the other billion rows, these will never be 
 collected... At least for the region that hosts ROW 1 after a major 
 compaction.
 Note, in a sense that is what HBase is supposed to do when keeping deleted 
 cells: Keep them until they would be removed by some other means (for example 
 TTL, or MAX_VERSION when new versions are inserted).
 The specific problem here is that even as all KVs affected by a delete marker 
 are expired this way the marker would not be removed if there just one older 
 KV in the HStore.
 I don't see a good way out of this. In parent I outlined these four solutions:
 So there are three options I think:
 # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
 apply to deleted rows or delete marker rows (wouldn't know how long to keep 
 family deletes in that case). (MAX)VERSIONS would still be enforced on all 
 rows types except for family delete markers.
 # Translate family delete markers to column delete marker at (major) 
 compaction time.
 # Change HFileWriterV* to keep track of the earliest put TS in a store and 
 write it to the file metadata. Use that use expire delete marker that are 
 older and hence can't affect any puts in the file.
 # Have Store.java keep track of the earliest put in internalFlushCache and 
 compactStore and then append it to the file metadata. That way HFileWriterV* 
 would not need to know about KVs.
 And I implemented #4.
 I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-12363:
--
Attachment: 12363-master.txt

Whoops... Correct version this time.

 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how KEEP_DELETED_CELLS is implemented.
 The problem is around knowing when it is safe to remove a delete marker (we 
 cannot remove it unless all cells affected by it are remove otherwise).
 This was particularly hard for family marker, since they sort before all 
 cells of a row, and hence scanning forward through an HFile you cannot know 
 whether the family markers are still needed until at least the entire row is 
 scanned.
 My solution was to keep the TS of the oldest put in any given HFile, and only 
 remove delete markers older than that TS.
 That sounds good on the face of it... But now imagine you wrote a version of 
 ROW 1 and then never update it again. Then later you write a billion other 
 rows and delete them all. Since the TS of the cells in ROW 1 is older than 
 all the delete markers for the other billion rows, these will never be 
 collected... At least for the region that hosts ROW 1 after a major 
 compaction.
 Note, in a sense that is what HBase is supposed to do when keeping deleted 
 cells: Keep them until they would be removed by some other means (for example 
 TTL, or MAX_VERSION when new versions are inserted).
 The specific problem here is that even as all KVs affected by a delete marker 
 are expired this way the marker would not be removed if there just one older 
 KV in the HStore.
 I don't see a good way out of this. In parent I outlined these four solutions:
 So there are three options I think:
 # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
 apply to deleted rows or delete marker rows (wouldn't know how long to keep 
 family deletes in that case). (MAX)VERSIONS would still be enforced on all 
 rows types except for family delete markers.
 # Translate family delete markers to column delete marker at (major) 
 compaction time.
 # Change HFileWriterV* to keep track of the earliest put TS in a store and 
 write it to the file metadata. Use that use expire delete marker that are 
 older and hence can't affect any puts in the file.
 # Have Store.java keep track of the earliest put in internalFlushCache and 
 compactStore and then append it to the file metadata. That way HFileWriterV* 
 would not need to know about KVs.
 And I implemented #4.
 I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193026#comment-14193026
 ] 

Hadoop QA commented on HBASE-12363:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678670/12363-master.txt
  against trunk revision .
  ATTACHMENT ID: 12678670

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 27 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3784 checkstyle errors (more than the trunk's current 3781 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the trunk's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+return setValue(KEEP_DELETED_CELLS, (keepDeletedCells ? 
KeepDeletedCells.TRUE : KeepDeletedCells.FALSE).toString());
+this.keepDeletedCells = scan.isRaw() ? KeepDeletedCells.TRUE : isUserScan 
? KeepDeletedCells.FALSE : scanInfo.getKeepDeletedCells();
+this.seePastDeleteMarkers = scanInfo.getKeepDeletedCells() != 
KeepDeletedCells.FALSE  isUserScan;
+ScanInfo scanInfo = new ScanInfo(null, 0, 1, HConstants.LATEST_TIMESTAMP, 
KeepDeletedCells.FALSE,
+  
family.setKeepDeletedCells(org.apache.hadoop.hbase.KeepDeletedCells.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::KEEP_DELETED_CELLS).to_s.upcase))
 if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::KEEP_DELETED_CELLS)

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/patchReleaseAuditWarnings.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11555//console

This message is automatically generated.

 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how 

[jira] [Reopened] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091

2014-11-01 Thread Dima Spivak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima Spivak reopened HBASE-12285:
-

Lots of failing builds recently with {{Stream Closed}} being replaced with 
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.18-SNAPSHOT:test 
(secondPartTestsExecution) on project hbase-server: There was a timeout or 
other error in the fork - [Help 1]
{code}
since we switched to Surefire 2.18-SNAPSHOT. I'm also still bothered by not 
being able to answer [~stack]'s question of why this was only hitting branch-1 
(even when using the known-faulty 2.17 version), so I'm reopening this.

 Builds are failing, possibly because of SUREFIRE-1091
 -

 Key: HBASE-12285
 URL: https://issues.apache.org/jira/browse/HBASE-12285
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dima Spivak
Assignee: Dima Spivak
Priority: Blocker
 Fix For: 2.0.0, 0.99.2

 Attachments: HBASE-12285_branch-1_v1.patch, 
 HBASE-12285_branch-1_v1.patch


 Our branch-1 builds on builds.apache.org have been failing in recent days 
 after we switched over to an official version of Surefire a few days back 
 (HBASE-4955). The version we're using, 2.17, is hit by a bug 
 ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results 
 in an IOException, which looks like what we're seeing on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12405) WAL accounting by Store

2014-11-01 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-12405:
-
Description: 
HBASE-10201 has made flush decisions per Store, but has not done enough work on 
HLog, so there are two problems:
1. We record minSeqId both in HRegion and FSHLog, which is a duplication.
2. There maybe holes in WAL accounting.
For example, assume family A with sequence id 1 and 3, family B with seqId 
2. If we flush family A, we can only record that WAL before sequence id 1 can 
be removed safely. If we do a replay at this point, sequence id 3 will also be 
replayed which is unnecessary.

  was:
HBASE-10201 has made flush decisions per Store, but has not done enough work on 
HLog, so there are two problems:
1. We record minSeqId both in HRegion and FSHLog, which is a duplication.
2. There maybe holes in WAL accounting.
For example, assume family A with sequence id 1 and 3, family B with seqId 
2. If we flush family A, we can only record that WAL before sequence id 1 can 
be removed safely. If we do a replay at this point, sequence id 4 will also be 
replayed which is unnecessary.


 WAL accounting by Store
 ---

 Key: HBASE-12405
 URL: https://issues.apache.org/jira/browse/HBASE-12405
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: zhangduo
Assignee: zhangduo

 HBASE-10201 has made flush decisions per Store, but has not done enough work 
 on HLog, so there are two problems:
 1. We record minSeqId both in HRegion and FSHLog, which is a duplication.
 2. There maybe holes in WAL accounting.
 For example, assume family A with sequence id 1 and 3, family B with 
 seqId 2. If we flush family A, we can only record that WAL before sequence id 
 1 can be removed safely. If we do a replay at this point, sequence id 3 will 
 also be replayed which is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12393) The regionserver web will throw exception if we disable block cache

2014-11-01 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-12393:
--
Labels: patch  (was: )
Status: Patch Available  (was: Open)

To avoid invoking disabled blockcache's method, we use an additional 
statement(else if) to evaluate the value of blockcache.
If blockcache is null, it will display Block Cache is disabled on the web 
page of blockcache stats.

 The regionserver web will throw exception if we disable block cache
 ---

 Key: HBASE-12393
 URL: https://issues.apache.org/jira/browse/HBASE-12393
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.7
 Environment: ubuntu 12.04 64bits, hadoop-2.2.0, hbase-0.98.7-hadoop2
Reporter: ChiaPing Tsai
Priority: Minor
  Labels: patch
 Attachments: HBASE-12393.patch


 The CacheConfig.getBlockCache() will return the null point when we set 
 hfile.block.cache.size to zero.
 It caused the BlockCacheTmplImpl.java:123 to throw null exception.
 {code}
 org.jamon.escaping.Escaping.HTML.write(org.jamon.emit.StandardEmitter.valueOf(StringUtils.humanReadableInt(cacheConfig.getBlockCache().size())),
  jamonWriter);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12393) The regionserver web will throw exception if we disable block cache

2014-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193136#comment-14193136
 ] 

Hadoop QA commented on HBASE-12393:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678584/HBASE-12393.patch
  against trunk revision .
  ATTACHMENT ID: 12678584

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3782 checkstyle errors (more than the trunk's current 3781 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11556//console

This message is automatically generated.

 The regionserver web will throw exception if we disable block cache
 ---

 Key: HBASE-12393
 URL: https://issues.apache.org/jira/browse/HBASE-12393
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.7
 Environment: ubuntu 12.04 64bits, hadoop-2.2.0, hbase-0.98.7-hadoop2
Reporter: ChiaPing Tsai
Priority: Minor
  Labels: patch
 Attachments: HBASE-12393.patch


 The CacheConfig.getBlockCache() will return the null point when we set 
 hfile.block.cache.size to zero.
 It caused the BlockCacheTmplImpl.java:123 to throw null exception.
 {code}
 org.jamon.escaping.Escaping.HTML.write(org.jamon.emit.StandardEmitter.valueOf(StringUtils.humanReadableInt(cacheConfig.getBlockCache().size())),
  jamonWriter);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12406) Bulk load fails in 0.98 against hadoop-1 due to unmatched family name

2014-11-01 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193181#comment-14193181
 ] 

Anoop Sam John commented on HBASE-12406:


Any other such 'to be excluded' dirs?
Ping [~ashish singhi]

 Bulk load fails in 0.98 against hadoop-1 due to unmatched family name
 -

 Key: HBASE-12406
 URL: https://issues.apache.org/jira/browse/HBASE-12406
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
 Fix For: 0.98.8

 Attachments: 12406-0.98-v1.txt


 From 
 https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/614/testReport/org.apache.hadoop.hbase.mapreduce/TestCopyTable/testCopyTableWithBulkload/
  :
 {code}
 java.io.IOException: Unmatched family names found: unmatched family names in 
 HFiles to be bulkloaded: [_logs]; valid family names of table testCopyTable2 
 are: [family]
   at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:268)
   at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:907)
   at org.apache.hadoop.hbase.mapreduce.CopyTable.run(CopyTable.java:344)
 {code}
 The above failure was due to the presence of history directory under _logs 
 directory.
 e.g.
 {code}
 hdfs://nn:59313/user/tyu/copytable/4282249372082687850/_logs/history
 {code}
 HBASE-12375 removed check for directory name which starts with underscore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12393) The regionserver web will throw exception if we disable block cache

2014-11-01 Thread ChiaPing Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193208#comment-14193208
 ] 

ChiaPing Tsai commented on HBASE-12393:
---

{quote}
-1 tests included. The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
{quote}
Not added any UT as it was only message change on UI.
The manual steps are shown below:
# set hfile.block.cache.size to zero.
# open the RegionServer UI and there are no nullpointexception anymore.
# click on Stats of Block Cache and the message Block Cache is disabled 
will appear

{quote}
-1 checkstyle. The applied patch generated 3782 checkstyle errors (more than 
the trunk's current 3781 errors).
{quote}
The BlockCacheTmplImpl.java is the auto-generated Jamon implementation. The 
white space error is due to the code style of Jamon.



 The regionserver web will throw exception if we disable block cache
 ---

 Key: HBASE-12393
 URL: https://issues.apache.org/jira/browse/HBASE-12393
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.7
 Environment: ubuntu 12.04 64bits, hadoop-2.2.0, hbase-0.98.7-hadoop2
Reporter: ChiaPing Tsai
Priority: Minor
  Labels: patch
 Attachments: HBASE-12393.patch


 The CacheConfig.getBlockCache() will return the null point when we set 
 hfile.block.cache.size to zero.
 It caused the BlockCacheTmplImpl.java:123 to throw null exception.
 {code}
 org.jamon.escaping.Escaping.HTML.write(org.jamon.emit.StandardEmitter.valueOf(StringUtils.humanReadableInt(cacheConfig.getBlockCache().size())),
  jamonWriter);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-12406) Bulk load fails in 0.98 against hadoop-1 due to unmatched family name

2014-11-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-12406:
--

Assignee: Ted Yu

 Bulk load fails in 0.98 against hadoop-1 due to unmatched family name
 -

 Key: HBASE-12406
 URL: https://issues.apache.org/jira/browse/HBASE-12406
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.98.8

 Attachments: 12406-0.98-v1.txt


 From 
 https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/614/testReport/org.apache.hadoop.hbase.mapreduce/TestCopyTable/testCopyTableWithBulkload/
  :
 {code}
 java.io.IOException: Unmatched family names found: unmatched family names in 
 HFiles to be bulkloaded: [_logs]; valid family names of table testCopyTable2 
 are: [family]
   at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:268)
   at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:907)
   at org.apache.hadoop.hbase.mapreduce.CopyTable.run(CopyTable.java:344)
 {code}
 The above failure was due to the presence of history directory under _logs 
 directory.
 e.g.
 {code}
 hdfs://nn:59313/user/tyu/copytable/4282249372082687850/_logs/history
 {code}
 HBASE-12375 removed check for directory name which starts with underscore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193213#comment-14193213
 ] 

Ted Yu commented on HBASE-12363:


What if a table with KEEP_DELETED_CELLS set to TTL is exported to a cluster 
which is running an older release ?
Would the exported table be parsed correctly ?

 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how KEEP_DELETED_CELLS is implemented.
 The problem is around knowing when it is safe to remove a delete marker (we 
 cannot remove it unless all cells affected by it are remove otherwise).
 This was particularly hard for family marker, since they sort before all 
 cells of a row, and hence scanning forward through an HFile you cannot know 
 whether the family markers are still needed until at least the entire row is 
 scanned.
 My solution was to keep the TS of the oldest put in any given HFile, and only 
 remove delete markers older than that TS.
 That sounds good on the face of it... But now imagine you wrote a version of 
 ROW 1 and then never update it again. Then later you write a billion other 
 rows and delete them all. Since the TS of the cells in ROW 1 is older than 
 all the delete markers for the other billion rows, these will never be 
 collected... At least for the region that hosts ROW 1 after a major 
 compaction.
 Note, in a sense that is what HBase is supposed to do when keeping deleted 
 cells: Keep them until they would be removed by some other means (for example 
 TTL, or MAX_VERSION when new versions are inserted).
 The specific problem here is that even as all KVs affected by a delete marker 
 are expired this way the marker would not be removed if there just one older 
 KV in the HStore.
 I don't see a good way out of this. In parent I outlined these four solutions:
 So there are three options I think:
 # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
 apply to deleted rows or delete marker rows (wouldn't know how long to keep 
 family deletes in that case). (MAX)VERSIONS would still be enforced on all 
 rows types except for family delete markers.
 # Translate family delete markers to column delete marker at (major) 
 compaction time.
 # Change HFileWriterV* to keep track of the earliest put TS in a store and 
 write it to the file metadata. Use that use expire delete marker that are 
 older and hence can't affect any puts in the file.
 # Have Store.java keep track of the earliest put in internalFlushCache and 
 compactStore and then append it to the file metadata. That way HFileWriterV* 
 would not need to know about KVs.
 And I implemented #4.
 I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout

2014-11-01 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-12403:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to 0.98+. Thanks folks.

 IntegrationTestMTTR flaky due to aggressive RS restart timeout
 --

 Key: HBASE-12403
 URL: https://issues.apache.org/jira/browse/HBASE-12403
 Project: HBase
  Issue Type: Test
  Components: integration tests
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12403.00.patch


 TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS 
 must connect to the Master before it can be online, this is not long enough 
 time in an environment where the Master can also be killed.
 Failure from the console says the test failed because a 
 RestartRsHoldingMetaAction timed out.
 {noformat}
 Caused by: java.io.IOException: did timeout waiting for region server to 
 start:ip-172-31-42-248.ec2.internal
 at 
 org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153)
 at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 This is only reported at the end of the test run. There's no indication as to 
 when during the test run this failure happened. The timeout on the start RS 
 operation is 60 seconds.
 Hacking out the start/stop messages from the logs during the time window when 
 this test ran, it appears that at one point the RS took 2min 12s between when 
 it was launched and when it reported for duty
 {noformat}
 Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248
 2014-10-31 14:55:29,049 INFO  [regionserver60020] regionserver.HRegionServer: 
 Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on 
 ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118
 {noformat}
 The RS came up without incident. It spent 1min 4s of that time waiting on the 
 master to start, attempted to report for duty from 14:54:28 to 14:55:24.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193306#comment-14193306
 ] 

Ted Yu commented on HBASE-12394:


Mind putting patch on reviewboard ?
hbase.mapreduce.scan.regionspermapper controls how many mappers would be used.
Have you considered specifying number of mappers for this feature ?

Thanks

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394.patch


 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
  
 The following new files are included in this patch:
 TableMultiRegionInputFormat.java
 TableMultiRegionInputFormatBase.java
 TableMultiRegionMapReduceUtil.java
 *TestTableMultiRegionInputFormatScan1.java
 *TestTableMultiRegionInputFormatScan2.java
 *TestTableMultiRegionInputFormatScanBase.java
 *TestTableMultiRegionMapReduceUtil.java
  
 The files start with * are tests.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 This is an example,which means each mapper has 3 regions as input.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
 Text.class, Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12399) Master startup race between metrics and RpcServer

2014-11-01 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-12399:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to 0.98+

 Master startup race between metrics and RpcServer
 -

 Key: HBASE-12399
 URL: https://issues.apache.org/jira/browse/HBASE-12399
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12399.patch, HBASE-12399.00.patch


 Seeing this on CM tests with frequent master thrashing
 {noformat}
 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] 
 impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12402) ZKPermissionWatcher race condition in refreshing the cache leaving stale ACLs and causing AccessDenied

2014-11-01 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193372#comment-14193372
 ] 

Enis Soztutar commented on HBASE-12402:
---

I have run IntegrationTestIngest with CM on a cluster of 4 nodes 10 times to 
test the change. It seem good to go. 

 ZKPermissionWatcher race condition in refreshing the cache leaving stale ACLs 
 and causing AccessDenied
 --

 Key: HBASE-12402
 URL: https://issues.apache.org/jira/browse/HBASE-12402
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: hbase-12402_v1.patch


 In testing, we have seen an issue where a region in a newly created table 
 will throw AccessDeniedException. 
 There seems to be a race condition in the ZKPermissionWatcher when it is just 
 starting up, and a new table is created around the same time. 
 The master just created the table, and adds permissions to acl table:
 {code}
 2014-10-30 19:21:26,494 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-32-87:6-0] access.AccessControlLists: 
 Writing permission with rowKey loadtest_d1 hrt_qa: RWXCA
 {code}
 One of the region servers is just starting: 
 {code}
 Thu Oct 30 19:21:11 UTC 2014 Starting regionserver on ip-172-31-32-90
 2014-10-30 19:21:13,915 INFO  [main] util.VersionInfo: HBase 
 0.98.4.2.2.0.0-1194-hadoop2
 {code}
 The node creation event is received 
 {code}
 2014-10-30 19:21:26,764 DEBUG [regionserver60020-EventThread] 
 access.ZKPermissionWatcher: Updating permissions cache from node loadtest_d1 
 with data: 
 PBUF\x0A0\x0A\x06hrt_qa\x12\x08\x03\x0A\x16\x0A\x07default\x12\x0Bloadtest_d1
  \x00 \x01 \x02 \x03 \x04
 {code}
 which put the write data to the cache, only to be invalidated later shortly: 
 {code}
 ...
 2014-10-30 19:21:26,855 DEBUG [RS_OPEN_REGION-ip-172-31-32-90:60020-1] 
 access.ZKPermissionWatcher: Updating permissions cache from node 
 tabletwo_copytable_cell_versions_two with data: 
 PBUF\x0AI\x0A\x06hrt_qa\x12?\x08\x03;\x0A/\x0A\x07default\x12$tabletwo_copytable_cell_versions_two
  \x00 \x01 \x02 \x03 \x04
 2014-10-30 19:21:26,856 DEBUG [RS_OPEN_REGION-ip-172-31-32-90:60020-1] 
 access.ZKPermissionWatcher: Updating permissions cache from node loadtest_d1 
 with data: PBUF
 2014-10-30 19:21:26,856 DEBUG [RS_OPEN_REGION-ip-172-31-32-90:60020-1] 
 access.ZKPermissionWatcher: Updating permissions cache from node 
 tablefour_cell_version_snapshots_copy with data: PBUF
 ...
 {code}
 Notice that the threads are different. The first one is the zk event 
 notification thread, vs the other is the thread from OpenRegionHandler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12399) Master startup race between metrics and RpcServer

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193376#comment-14193376
 ] 

Hudson commented on HBASE-12399:


SUCCESS: Integrated in HBase-TRUNK #5735 (See 
[https://builds.apache.org/job/HBase-TRUNK/5735/])
HBASE-12399 Master startup race between metrics and RpcServer (ndimiduk: rev 
b5764a8e74179bfc0c09a416d51271116b903c2c)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServerWrapperImpl.java


 Master startup race between metrics and RpcServer
 -

 Key: HBASE-12399
 URL: https://issues.apache.org/jira/browse/HBASE-12399
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12399.patch, HBASE-12399.00.patch


 Seeing this on CM tests with frequent master thrashing
 {noformat}
 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] 
 impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193375#comment-14193375
 ] 

Hudson commented on HBASE-12403:


SUCCESS: Integrated in HBase-TRUNK #5735 (See 
[https://builds.apache.org/job/HBase-TRUNK/5735/])
HBASE-12403 IntegrationTestMTTR flaky due to aggressive RS restart timeout 
(ndimiduk: rev 3c06b48181e22eb4ce91d6d8a455a1617f13d85f)
* hbase-it/src/test/java/org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java


 IntegrationTestMTTR flaky due to aggressive RS restart timeout
 --

 Key: HBASE-12403
 URL: https://issues.apache.org/jira/browse/HBASE-12403
 Project: HBase
  Issue Type: Test
  Components: integration tests
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12403.00.patch


 TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS 
 must connect to the Master before it can be online, this is not long enough 
 time in an environment where the Master can also be killed.
 Failure from the console says the test failed because a 
 RestartRsHoldingMetaAction timed out.
 {noformat}
 Caused by: java.io.IOException: did timeout waiting for region server to 
 start:ip-172-31-42-248.ec2.internal
 at 
 org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153)
 at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 This is only reported at the end of the test run. There's no indication as to 
 when during the test run this failure happened. The timeout on the start RS 
 operation is 60 seconds.
 Hacking out the start/stop messages from the logs during the time window when 
 this test ran, it appears that at one point the RS took 2min 12s between when 
 it was launched and when it reported for duty
 {noformat}
 Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248
 2014-10-31 14:55:29,049 INFO  [regionserver60020] regionserver.HRegionServer: 
 Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on 
 ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118
 {noformat}
 The RS came up without incident. It spent 1min 4s of that time waiting on the 
 master to start, attempted to report for duty from 14:54:28 to 14:55:24.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12399) Master startup race between metrics and RpcServer

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193408#comment-14193408
 ] 

Hudson commented on HBASE-12399:


FAILURE: Integrated in HBase-1.0 #405 (See 
[https://builds.apache.org/job/HBase-1.0/405/])
HBASE-12399 Master startup race between metrics and RpcServer (ndimiduk: rev 
c3a7f2f3bbb2a12bfffeff6d181e619a1545c41a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServerWrapperImpl.java


 Master startup race between metrics and RpcServer
 -

 Key: HBASE-12399
 URL: https://issues.apache.org/jira/browse/HBASE-12399
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12399.patch, HBASE-12399.00.patch


 Seeing this on CM tests with frequent master thrashing
 {noformat}
 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] 
 impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193407#comment-14193407
 ] 

Hudson commented on HBASE-12403:


FAILURE: Integrated in HBase-1.0 #405 (See 
[https://builds.apache.org/job/HBase-1.0/405/])
HBASE-12403 IntegrationTestMTTR flaky due to aggressive RS restart timeout 
(ndimiduk: rev 687710eb2869817952461796d04e35de29a98fdb)
* hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java


 IntegrationTestMTTR flaky due to aggressive RS restart timeout
 --

 Key: HBASE-12403
 URL: https://issues.apache.org/jira/browse/HBASE-12403
 Project: HBase
  Issue Type: Test
  Components: integration tests
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12403.00.patch


 TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS 
 must connect to the Master before it can be online, this is not long enough 
 time in an environment where the Master can also be killed.
 Failure from the console says the test failed because a 
 RestartRsHoldingMetaAction timed out.
 {noformat}
 Caused by: java.io.IOException: did timeout waiting for region server to 
 start:ip-172-31-42-248.ec2.internal
 at 
 org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153)
 at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 This is only reported at the end of the test run. There's no indication as to 
 when during the test run this failure happened. The timeout on the start RS 
 operation is 60 seconds.
 Hacking out the start/stop messages from the logs during the time window when 
 this test ran, it appears that at one point the RS took 2min 12s between when 
 it was launched and when it reported for duty
 {noformat}
 Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248
 2014-10-31 14:55:29,049 INFO  [regionserver60020] regionserver.HRegionServer: 
 Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on 
 ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118
 {noformat}
 The RS came up without incident. It spent 1min 4s of that time waiting on the 
 master to start, attempted to report for duty from 14:54:28 to 14:55:24.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12399) Master startup race between metrics and RpcServer

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193437#comment-14193437
 ] 

Hudson commented on HBASE-12399:


FAILURE: Integrated in HBase-0.98 #647 (See 
[https://builds.apache.org/job/HBase-0.98/647/])
HBASE-12399 Master startup race between metrics and RpcServer (ndimiduk: rev 
da145ae2da11d0b59f47ca78bb26c166a84bf386)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServerWrapperImpl.java


 Master startup race between metrics and RpcServer
 -

 Key: HBASE-12399
 URL: https://issues.apache.org/jira/browse/HBASE-12399
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12399.patch, HBASE-12399.00.patch


 Seeing this on CM tests with frequent master thrashing
 {noformat}
 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] 
 impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193436#comment-14193436
 ] 

Hudson commented on HBASE-12403:


FAILURE: Integrated in HBase-0.98 #647 (See 
[https://builds.apache.org/job/HBase-0.98/647/])
HBASE-12403 IntegrationTestMTTR flaky due to aggressive RS restart timeout 
(ndimiduk: rev 414bed7197097db4e2ce638f46d9996fdfb305b1)
* hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java


 IntegrationTestMTTR flaky due to aggressive RS restart timeout
 --

 Key: HBASE-12403
 URL: https://issues.apache.org/jira/browse/HBASE-12403
 Project: HBase
  Issue Type: Test
  Components: integration tests
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12403.00.patch


 TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS 
 must connect to the Master before it can be online, this is not long enough 
 time in an environment where the Master can also be killed.
 Failure from the console says the test failed because a 
 RestartRsHoldingMetaAction timed out.
 {noformat}
 Caused by: java.io.IOException: did timeout waiting for region server to 
 start:ip-172-31-42-248.ec2.internal
 at 
 org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153)
 at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 This is only reported at the end of the test run. There's no indication as to 
 when during the test run this failure happened. The timeout on the start RS 
 operation is 60 seconds.
 Hacking out the start/stop messages from the logs during the time window when 
 this test ran, it appears that at one point the RS took 2min 12s between when 
 it was launched and when it reported for duty
 {noformat}
 Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248
 2014-10-31 14:55:29,049 INFO  [regionserver60020] regionserver.HRegionServer: 
 Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on 
 ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118
 {noformat}
 The RS came up without incident. It spent 1min 4s of that time waiting on the 
 master to start, attempted to report for duty from 14:54:28 to 14:55:24.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193450#comment-14193450
 ] 

Hudson commented on HBASE-12403:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #615 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/615/])
HBASE-12403 IntegrationTestMTTR flaky due to aggressive RS restart timeout 
(ndimiduk: rev 414bed7197097db4e2ce638f46d9996fdfb305b1)
* hbase-it/src/test/java/org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java


 IntegrationTestMTTR flaky due to aggressive RS restart timeout
 --

 Key: HBASE-12403
 URL: https://issues.apache.org/jira/browse/HBASE-12403
 Project: HBase
  Issue Type: Test
  Components: integration tests
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12403.00.patch


 TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS 
 must connect to the Master before it can be online, this is not long enough 
 time in an environment where the Master can also be killed.
 Failure from the console says the test failed because a 
 RestartRsHoldingMetaAction timed out.
 {noformat}
 Caused by: java.io.IOException: did timeout waiting for region server to 
 start:ip-172-31-42-248.ec2.internal
 at 
 org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153)
 at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52)
 at 
 org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559)
 at 
 org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 This is only reported at the end of the test run. There's no indication as to 
 when during the test run this failure happened. The timeout on the start RS 
 operation is 60 seconds.
 Hacking out the start/stop messages from the logs during the time window when 
 this test ran, it appears that at one point the RS took 2min 12s between when 
 it was launched and when it reported for duty
 {noformat}
 Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248
 2014-10-31 14:55:29,049 INFO  [regionserver60020] regionserver.HRegionServer: 
 Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on 
 ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118
 {noformat}
 The RS came up without incident. It spent 1min 4s of that time waiting on the 
 master to start, attempted to report for duty from 14:54:28 to 14:55:24.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12399) Master startup race between metrics and RpcServer

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193451#comment-14193451
 ] 

Hudson commented on HBASE-12399:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #615 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/615/])
HBASE-12399 Master startup race between metrics and RpcServer (ndimiduk: rev 
da145ae2da11d0b59f47ca78bb26c166a84bf386)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServerWrapperImpl.java


 Master startup race between metrics and RpcServer
 -

 Key: HBASE-12399
 URL: https://issues.apache.org/jira/browse/HBASE-12399
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12399.patch, HBASE-12399.00.patch


 Seeing this on CM tests with frequent master thrashing
 {noformat}
 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] 
 impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43)
   at 
 org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?

2014-11-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193510#comment-14193510
 ] 

Lars Hofhansl commented on HBASE-12363:
---

Obviously that is not going to work. The old code would interpret that as not 
true (i.e. false) and have KEEP_DELETED_CELLS disabled.

One would have to be aware of that before enabling the new feature.

I also need to fix the long lines and put an interface 
annotation/comment/license into the KeepDeletedCells enum.


 KEEP_DELETED_CELLS considered harmful?
 --

 Key: HBASE-12363
 URL: https://issues.apache.org/jira/browse/HBASE-12363
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
  Labels: Phoenix
 Attachments: 12363-master.txt, 12363-test.txt


 Brainstorming...
 This morning in the train (of all places) I realized a fundamental issue in 
 how KEEP_DELETED_CELLS is implemented.
 The problem is around knowing when it is safe to remove a delete marker (we 
 cannot remove it unless all cells affected by it are remove otherwise).
 This was particularly hard for family marker, since they sort before all 
 cells of a row, and hence scanning forward through an HFile you cannot know 
 whether the family markers are still needed until at least the entire row is 
 scanned.
 My solution was to keep the TS of the oldest put in any given HFile, and only 
 remove delete markers older than that TS.
 That sounds good on the face of it... But now imagine you wrote a version of 
 ROW 1 and then never update it again. Then later you write a billion other 
 rows and delete them all. Since the TS of the cells in ROW 1 is older than 
 all the delete markers for the other billion rows, these will never be 
 collected... At least for the region that hosts ROW 1 after a major 
 compaction.
 Note, in a sense that is what HBase is supposed to do when keeping deleted 
 cells: Keep them until they would be removed by some other means (for example 
 TTL, or MAX_VERSION when new versions are inserted).
 The specific problem here is that even as all KVs affected by a delete marker 
 are expired this way the marker would not be removed if there just one older 
 KV in the HStore.
 I don't see a good way out of this. In parent I outlined these four solutions:
 So there are three options I think:
 # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
 apply to deleted rows or delete marker rows (wouldn't know how long to keep 
 family deletes in that case). (MAX)VERSIONS would still be enforced on all 
 rows types except for family delete markers.
 # Translate family delete markers to column delete marker at (major) 
 compaction time.
 # Change HFileWriterV* to keep track of the earliest put TS in a store and 
 write it to the file metadata. Use that use expire delete marker that are 
 older and hence can't affect any puts in the file.
 # Have Store.java keep track of the earliest put in internalFlushCache and 
 compactStore and then append it to the file metadata. That way HFileWriterV* 
 would not need to know about KVs.
 And I implemented #4.
 I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-12219) Cache more efficiently getAll() and get() in FSTableDescriptors

2014-11-01 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-12219:
---

Reverted branch-1 patch and addendum.  Builds are unstable starting w/ this 
patch going in.  I'm reverting till build is back to stable again then will put 
stuff back.

 Cache more efficiently getAll() and get() in FSTableDescriptors
 ---

 Key: HBASE-12219
 URL: https://issues.apache.org/jira/browse/HBASE-12219
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.24, 0.99.1, 0.98.6.1
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
  Labels: scalability
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12219-0.98.patch, HBASE-12219-0.98.v1.patch, 
 HBASE-12219-0.99.addendum.patch, HBASE-12219-0.99.patch, 
 HBASE-12219-v1.patch, HBASE-12219-v1.patch, HBASE-12219.v0.txt, 
 HBASE-12219.v2.patch, HBASE-12219.v3.patch, list.png


 Currently table descriptors and tables are cached once they are accessed for 
 the first time. Next calls to the master only require a trip to HDFS to 
 lookup the modified time in order to reload the table descriptors if 
 modified. However in clusters with a large number of tables or concurrent 
 clients and this can be too aggressive to HDFS and the master causing 
 contention to process other requests. A simple solution is to have a TTL 
 based cached for FSTableDescriptors#getAll() and  
 FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to 
 process those calls faster without causing contention without having to 
 perform a trip to HDFS for every call. to listtables() or getTableDescriptor()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES

2014-11-01 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-12407:
-

 Summary: HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY 
in CONNECTION_PROPERTIES 
 Key: HBASE-12407
 URL: https://issues.apache.org/jira/browse/HBASE-12407
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.1, 0.98.7, 2.0.0
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong


This causes a HTable instance with custom 
RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable 
internal may use a cached connection without this custom conf setting because 
CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES

2014-11-01 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-12407:
--
Attachment: HBASE-12407.patch

 HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in 
 CONNECTION_PROPERTIES 
 ---

 Key: HBASE-12407
 URL: https://issues.apache.org/jira/browse/HBASE-12407
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.7, 0.99.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: HBASE-12407.patch


 This causes a HTable instance with custom 
 RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable 
 internal may use a cached connection without this custom conf setting because 
 CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES

2014-11-01 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-12407:
--
Status: Patch Available  (was: Open)

 HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in 
 CONNECTION_PROPERTIES 
 ---

 Key: HBASE-12407
 URL: https://issues.apache.org/jira/browse/HBASE-12407
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.1, 0.98.7, 2.0.0
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: HBASE-12407.patch


 This causes a HTable instance with custom 
 RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable 
 internal may use a cached connection without this custom conf setting because 
 CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES

2014-11-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193593#comment-14193593
 ] 

Ted Yu commented on HBASE-12407:


+1

 HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in 
 CONNECTION_PROPERTIES 
 ---

 Key: HBASE-12407
 URL: https://issues.apache.org/jira/browse/HBASE-12407
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.7, 0.99.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: HBASE-12407.patch


 This causes a HTable instance with custom 
 RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable 
 internal may use a cached connection without this custom conf setting because 
 CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES

2014-11-01 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193597#comment-14193597
 ] 

Enis Soztutar commented on HBASE-12407:
---

This looks good. Remember that cached/managed connections are going away. So we 
should switch to using new style of connections in Phoenix in the future. 

 HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in 
 CONNECTION_PROPERTIES 
 ---

 Key: HBASE-12407
 URL: https://issues.apache.org/jira/browse/HBASE-12407
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.7, 0.99.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: HBASE-12407.patch


 This causes a HTable instance with custom 
 RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable 
 internal may use a cached connection without this custom conf setting because 
 CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12219) Cache more efficiently getAll() and get() in FSTableDescriptors

2014-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193614#comment-14193614
 ] 

Hudson commented on HBASE-12219:


SUCCESS: Integrated in HBase-1.0 #406 (See 
[https://builds.apache.org/job/HBase-1.0/406/])
HBASE-12219 Cache more efficiently getAll() and get() in FSTableDescriptors; 
REVERTgit log! branch-1 patch AND addendum (stack: rev 
0aca51e89cd0fe69d9cd57648949df5c5b506c53)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestFSTableDescriptors.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java


 Cache more efficiently getAll() and get() in FSTableDescriptors
 ---

 Key: HBASE-12219
 URL: https://issues.apache.org/jira/browse/HBASE-12219
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.24, 0.99.1, 0.98.6.1
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
  Labels: scalability
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12219-0.98.patch, HBASE-12219-0.98.v1.patch, 
 HBASE-12219-0.99.addendum.patch, HBASE-12219-0.99.patch, 
 HBASE-12219-v1.patch, HBASE-12219-v1.patch, HBASE-12219.v0.txt, 
 HBASE-12219.v2.patch, HBASE-12219.v3.patch, list.png


 Currently table descriptors and tables are cached once they are accessed for 
 the first time. Next calls to the master only require a trip to HDFS to 
 lookup the modified time in order to reload the table descriptors if 
 modified. However in clusters with a large number of tables or concurrent 
 clients and this can be too aggressive to HDFS and the master causing 
 contention to process other requests. A simple solution is to have a TTL 
 based cached for FSTableDescriptors#getAll() and  
 FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to 
 process those calls faster without causing contention without having to 
 perform a trip to HDFS for every call. to listtables() or getTableDescriptor()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES

2014-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193638#comment-14193638
 ] 

Hadoop QA commented on HBASE-12407:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678737/HBASE-12407.patch
  against trunk revision .
  ATTACHMENT ID: 12678737

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFastFail

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11557//console

This message is automatically generated.

 HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in 
 CONNECTION_PROPERTIES 
 ---

 Key: HBASE-12407
 URL: https://issues.apache.org/jira/browse/HBASE-12407
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.7, 0.99.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: HBASE-12407.patch


 This causes a HTable instance with custom 
 RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable 
 internal may use a cached connection without this custom conf setting because 
 CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12398) Region isn't assigned in an extreme race condition

2014-11-01 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193657#comment-14193657
 ] 

Jimmy Xiang commented on HBASE-12398:
-

The master branch should not have such a problem because only master updates 
the region states (step b won't happen). So I think we don't need a patch for 
master.

 Region isn't assigned in an extreme race condition
 --

 Key: HBASE-12398
 URL: https://issues.apache.org/jira/browse/HBASE-12398
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.98.7
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: HBASE-12398.patch


 In a test, [~enis] has seen a condition which made one of the regions 
 unassigned. 
 The client failed since the region is not online anywhere: 
 {code}
 2014-10-29 01:51:40,731 WARN  [HBaseReaderThread_13] 
 util.MultiThreadedReader: 
 org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
 attempts=35, exceptions:
 Wed Oct 29 01:39:51 UTC 2014, 
 org.apache.hadoop.hbase.client.RpcRetryingCaller@cc21330, 
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: Region 
 IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9.
  is not online on hor8n08.gq1.ygridcore.net,60020,1414546670414
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2906)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29990)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The root cause of the issue is due to some extreme race condition:
 a) a region is about to open and receives a closeRpc request triggered by a 
 second re-assignment
 b) the second re-assignment updates region state to offline while immediately 
 is overwritten to OPEN from previous region open ZK opened notification
 c) when the region reopened on the same RS by the second assignment, AM force 
 the region to close as the its region state isn't in PendingOpenOrOpening 
 state.  
 d) the region ends up offline  can't server any request
 Region Server Side:
 1) A region almost opens region 689b77e1bad7e951b0d9ef4663b217e9 while the 
 RS(hor8n10) receives a closeRegion request.
 {noformat}
 2014-10-29 01:39:43,153 INFO  
 [PriorityRpcServer.handler=2,queue=0,port=60020] regionserver.HRegionServer: 
 Received CLOSE for the region:689b77e1bad7e951b0d9ef4663b217e9 , which we are 
 already trying to OPEN. Cancelling OPENING.
 {noformat}
 2) Since region 689b77e1bad7e951b0d9ef4663b217e9 was already opened right 
 before some final steps, so the RS logs the following message and close 
 689b77e1bad7e951b0d9ef4663b217e9 immediately after the RS update ZK node 
 state to 'OPENED'.
 {noformat}
 2014-10-29 01:39:43,198 ERROR [RS_OPEN_REGION-hor8n10:60020-0] 
 handler.OpenRegionHandler: Race condition: we've finished to open a region, 
 while a close was requested  on 
 region=IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9..
  It can be a critical error, as a region that should be closed is now opened. 
 Closing it now
 {noformat}
 In Master Server Side:
 {noformat}
 2014-10-29 01:39:43,177 DEBUG [AM.ZK.Worker-pool2-t55] 
 master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
 server=hor8n10.gq1.ygridcore.net,60020,1414546531945, 
 region=689b77e1bad7e951b0d9ef4663b217e9, 
 current_state={689b77e1bad7e951b0d9ef4663b217e9 state=OPENING, 
 ts=1414546783152, server=hor8n10.gq1.ygridcore.net,60020,1414546531945}
 
 2014-10-29 01:39:43,255 DEBUG [AM.-pool1-t16] master.AssignmentManager: 
 Offline 
 IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9.,
  it's not any more on hor8n10.gq1.ygridcore.net,60020,1414546531945
 
 2014-10-29 01:39:43,942 DEBUG [AM.ZK.Worker-pool2-t58] 
 master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
 server=hor8n10.gq1.ygridcore.net,60020,1414546531945, 
 region=689b77e1bad7e951b0d9ef4663b217e9, 
 current_state={689b77e1bad7e951b0d9ef4663b217e9 state=OPEN, ts=1414546783387, 
 server=hor8n10.gq1.ygridcore.net,60020,1414546531945}
 

[jira] [Commented] (HBASE-12219) Cache more efficiently getAll() and get() in FSTableDescriptors

2014-11-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193669#comment-14193669
 ] 

stack commented on HBASE-12219:
---

Builds on branch-1 are blue again after backing this out.  I think this the 
zombie maker.  Leaving open till we figure why.

 Cache more efficiently getAll() and get() in FSTableDescriptors
 ---

 Key: HBASE-12219
 URL: https://issues.apache.org/jira/browse/HBASE-12219
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.24, 0.99.1, 0.98.6.1
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
  Labels: scalability
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12219-0.98.patch, HBASE-12219-0.98.v1.patch, 
 HBASE-12219-0.99.addendum.patch, HBASE-12219-0.99.patch, 
 HBASE-12219-v1.patch, HBASE-12219-v1.patch, HBASE-12219.v0.txt, 
 HBASE-12219.v2.patch, HBASE-12219.v3.patch, list.png


 Currently table descriptors and tables are cached once they are accessed for 
 the first time. Next calls to the master only require a trip to HDFS to 
 lookup the modified time in order to reload the table descriptors if 
 modified. However in clusters with a large number of tables or concurrent 
 clients and this can be too aggressive to HDFS and the master causing 
 contention to process other requests. A simple solution is to have a TTL 
 based cached for FSTableDescriptors#getAll() and  
 FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to 
 process those calls faster without causing contention without having to 
 perform a trip to HDFS for every call. to listtables() or getTableDescriptor()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091

2014-11-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193670#comment-14193670
 ] 

stack commented on HBASE-12285:
---

[~dimaspivak] Open a new issue instead?

The surefire snapshot and the culling of the logs put us in a better place for 
sure.  We have mostly blues now when we build.

We've been failing since #400 because of HBASE-12219.  Was this causing the  
There was a timeout or other error in the fork

Good on you Dima

 Builds are failing, possibly because of SUREFIRE-1091
 -

 Key: HBASE-12285
 URL: https://issues.apache.org/jira/browse/HBASE-12285
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dima Spivak
Assignee: Dima Spivak
Priority: Blocker
 Fix For: 2.0.0, 0.99.2

 Attachments: HBASE-12285_branch-1_v1.patch, 
 HBASE-12285_branch-1_v1.patch


 Our branch-1 builds on builds.apache.org have been failing in recent days 
 after we switched over to an official version of Surefire a few days back 
 (HBASE-4955). The version we're using, 2.17, is hit by a bug 
 ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results 
 in an IOException, which looks like what we're seeing on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12219) Cache more efficiently getAll() and get() in FSTableDescriptors

2014-11-01 Thread Dima Spivak (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193693#comment-14193693
 ] 

Dima Spivak commented on HBASE-12219:
-

Using [~manukranthk]'s awesome findHangingTests script, it looks like the set 
of runs that were red all had org.apache.hadoop.hbase.client.TestAdmin hang, 
which caused the Surefire-forked process to time out after 15 minutes and fail 
the Maven build.

 Cache more efficiently getAll() and get() in FSTableDescriptors
 ---

 Key: HBASE-12219
 URL: https://issues.apache.org/jira/browse/HBASE-12219
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.24, 0.99.1, 0.98.6.1
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
  Labels: scalability
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: HBASE-12219-0.98.patch, HBASE-12219-0.98.v1.patch, 
 HBASE-12219-0.99.addendum.patch, HBASE-12219-0.99.patch, 
 HBASE-12219-v1.patch, HBASE-12219-v1.patch, HBASE-12219.v0.txt, 
 HBASE-12219.v2.patch, HBASE-12219.v3.patch, list.png


 Currently table descriptors and tables are cached once they are accessed for 
 the first time. Next calls to the master only require a trip to HDFS to 
 lookup the modified time in order to reload the table descriptors if 
 modified. However in clusters with a large number of tables or concurrent 
 clients and this can be too aggressive to HDFS and the master causing 
 contention to process other requests. A simple solution is to have a TTL 
 based cached for FSTableDescriptors#getAll() and  
 FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to 
 process those calls faster without causing contention without having to 
 perform a trip to HDFS for every call. to listtables() or getTableDescriptor()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091

2014-11-01 Thread Dima Spivak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima Spivak resolved HBASE-12285.
-
Resolution: Fixed

You're right, [~stack]. Sorry for being quick to reopen, was just paranoid. But 
yay for CI actually helping us track down faulty commits! :)

 Builds are failing, possibly because of SUREFIRE-1091
 -

 Key: HBASE-12285
 URL: https://issues.apache.org/jira/browse/HBASE-12285
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dima Spivak
Assignee: Dima Spivak
Priority: Blocker
 Fix For: 2.0.0, 0.99.2

 Attachments: HBASE-12285_branch-1_v1.patch, 
 HBASE-12285_branch-1_v1.patch


 Our branch-1 builds on builds.apache.org have been failing in recent days 
 after we switched over to an official version of Surefire a few days back 
 (HBASE-4955). The version we're using, 2.17, is hit by a bug 
 ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results 
 in an IOException, which looks like what we're seeing on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)