[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994649#comment-12994649
 ] 

Hudson commented on HBASE-3524:
---

Integrated in HBase-TRUNK #1745 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1745/])


 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: James Kennedy
Assignee: ryan rawson
Priority: Blocker
 Fix For: 0.90.1, 0.90.2

 Attachments: 3524.txt


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-11 Thread James Kennedy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993653#comment-12993653
 ] 

James Kennedy commented on HBASE-3524:
--

So that .meta file with DATA LOSS is definitely old (2010-05-20).
Looking back over old logs i realized that DATA LOSS WARN has been there for a 
while.
So probably that is a separate issue from this CompactionChecker problem.
Guess I'll just delete the file in HDFS.

So, it looks like my data is stable now after the forced compactions. I didn't 
have to apply the patch in production code to stop the NPEs.

I'm still concerned about how this happened to some regions and not others 
since all were left up long enough to get to that NPE point which only 
prevented the first post-0.90.0 upgrade full compactions for 8 out of 50 
tables. Maybe the other 42 were updated as part of the initial startup 
process...

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: James Kennedy
Assignee: James Kennedy
Priority: Blocker
 Fix For: 0.90.1, 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-11 Thread James Kennedy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993792#comment-12993792
 ] 

James Kennedy commented on HBASE-3524:
--

Why choose Long.MIN_VALUE? Wouldn't Long.MAX_VALUE encourage a major compaction 
and get pre-0.90.0 StoreFile's out of the picture sooner?

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: James Kennedy
Assignee: ryan rawson
Priority: Blocker
 Fix For: 0.90.1, 0.90.2

 Attachments: 3524.txt


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-11 Thread James Kennedy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993794#comment-12993794
 ] 

James Kennedy commented on HBASE-3524:
--

duh, yep i get it. Just crossed a wire somewhere.

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: James Kennedy
Assignee: ryan rawson
Priority: Blocker
 Fix For: 0.90.1, 0.90.2

 Attachments: 3524.txt


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-10 Thread James Kennedy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993318#comment-12993318
 ] 

James Kennedy commented on HBASE-3524:
--

Did some more debugging and got a little more intel:  What's null on that line 
is sf.getReader().timeRangeTracker.

It seems to be consistently null for many if not all tables.  Anyone know how 
this could happen?

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
Reporter: James Kennedy
 Fix For: 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-10 Thread James Kennedy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993320#comment-12993320
 ] 

James Kennedy commented on HBASE-3524:
--

I found this in the hbase.log:


[10/02/11 18:37:29] 44386  [1297391814420-0] WARN  
adoop.hbase.regionserver.Store  - Skipping 
hdfs://localhost:7701/hbase/.META./1028785192/info/2685681686584745388 because 
its empty. HBASE-646 DATA LOSS?

So perhaps this issue is a symptom of corrupt meta data. HOW can I fix this!?

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
Reporter: James Kennedy
 Fix For: 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-10 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993321#comment-12993321
 ] 

ryan rawson commented on HBASE-3524:


Old files causing new code to break it seems. Good job tracking it down!


 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
Reporter: James Kennedy
 Fix For: 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-10 Thread James Kennedy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993326#comment-12993326
 ] 

James Kennedy commented on HBASE-3524:
--

Thanks. I'm in a bit of a pickle. Though I tested all upgrades on QA and test 
data, this issue has only cropped up on a production deploy. Since our 
production app appeared to be running smoothly we gave it a +1 and there is 
already new user data in there. I'm wondering if I should revert to older data 
anyway (some user data loss) until this corruption is handled...

Shouldn't 0.90.0 automatically upgrade old data?

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
Reporter: James Kennedy
 Fix For: 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-10 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993327#comment-12993327
 ] 

ryan rawson commented on HBASE-3524:


the issue is that if the hfile does not have timerangeBytes, this code doesn't 
trigger:

(StoreFile.java)
  if (timerangeBytes != null) {
this.reader.timeRangeTracker = new TimeRangeTracker();
Writables.copyWritable(timerangeBytes, this.reader.timeRangeTracker);
  }

And timeRangeTracker remains null.

But this code doesnt check for null:

(Store.java)
832long oldest = now - sf.getReader().timeRangeTracker.minimumTimestamp;


if timeRangeTracker is null, we should probably use Integer.MIN_VALUE for 
minimumTimestamp.

What is the creation time of your empty file? When is it from? Maybe it's old?

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
Reporter: James Kennedy
 Fix For: 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-10 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993328#comment-12993328
 ] 

ryan rawson commented on HBASE-3524:


try this patch:

diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 
b/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
index d7e3ce3..519111a 100644
--- a/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
+++ b/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
@@ -829,7 +829,10 @@ public class Store implements HeapSize {
   if (filesToCompact.size() == 1) {
 // Single file
 StoreFile sf = filesToCompact.get(0);
-long oldest = now - sf.getReader().timeRangeTracker.minimumTimestamp;
+long oldest =
+(sf.getReader().timeRangeTracker == null) ?
+Long.MIN_VALUE :
+now - sf.getReader().timeRangeTracker.minimumTimestamp;
 if (sf.isMajorCompaction() 
 (this.ttl == HConstants.FOREVER || oldest  this.ttl)) {
   if (LOG.isDebugEnabled()) {

no test yet! doh!

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
Reporter: James Kennedy
 Fix For: 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-10 Thread James Kennedy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993341#comment-12993341
 ] 

James Kennedy commented on HBASE-3524:
--

This patch obviously stops the npe and allows compaction checking to follow 
through.

Furthermore I added a log output line that indicates when/what stores have 
.timeRangeTracker == null when encountered.  It seemed that 7 or 8 tables (out 
of 50) had this problem and when i forced their major compaction from the hbase 
shell they stopped reporting the error.

So it looks like the major compactions created new stores with timeRangeTracker 
properly.

I'm still concerned though about how this happened in the first place and I 
need to do some thorough testing of the data to ensure nothing was lost.

Ryan, in your opinion do you think this data is likely to have survived 
corruption?

And thanks for your speedy help.

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
Reporter: James Kennedy
 Fix For: 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3524) NPE from CompactionChecker

2011-02-10 Thread James Kennedy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12993365#comment-12993365
 ] 

James Kennedy commented on HBASE-3524:
--

 What is the creation time of your empty file? When is it from? Maybe it's old?

Let me re-reproduce these issues from scratch tomorrow morning.

 NPE from CompactionChecker
 --

 Key: HBASE-3524
 URL: https://issues.apache.org/jira/browse/HBASE-3524
 Project: HBase
  Issue Type: Bug
Reporter: James Kennedy
 Fix For: 0.90.2


 I recently updated production data to use HBase 0.90.0.
 Now I'm periodically seeing:
 [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR 
 nServer$MajorCompactionChecker  - Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832)
   at 
 org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
 The only negative effect is that this is interrupting compactions from 
 happening. But that is pretty serious and this might be a sign of data 
 corruption?
 Maybe it's just my data, but this task should at least involve improving the 
 handling to catch the NPE and still iterate through the other onlineRegions 
 that might compact without error.  The MajorCompactionChecker.chore() method 
 only catches IOExceptions and so this NPE breaks out of that loop. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira