date:20111016


[ 
https://issues.apache.org/jira/browse/HBASE-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128380#comment-13128380
 ] 

Hudson commented on HBASE-4589:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4589 CacheOnWrite broken in some cases because it can conflict with 
evictOnClose (jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java


 CacheOnWrite broken in some cases because it can conflict with evictOnClose
 ---

 Key: HBASE-4589
 URL: https://issues.apache.org/jira/browse/HBASE-4589
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4589-v1.patch


 Commit of HBASE-4078 added some extra StoreFile verification which just did 
 an open of a StoreFile reader and then closes it, ensuring there's no 
 exception.  If evict-on-close is on, which it is by default, this causes all 
 blocks of a file to be evicted even though it's still open.
 We need to add the boolean into the close call in the way we have booleans 
 for cacheBlocks at some point since we need to make localized decisions in 
 some cases.
 In lots of places, we can always rely on cacheConf.shouldEvictOnClose() so 
 shouldn't be too burdensome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4556) Fix all incorrect uses of InternalScanner.next(...)


[ 
https://issues.apache.org/jira/browse/HBASE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128378#comment-13128378
 ] 

Hudson commented on HBASE-4556:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4556 Fix all incorrect uses of InternalScanner.next(...)

larsh : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HMerge.java


 Fix all incorrect uses of InternalScanner.next(...)
 ---

 Key: HBASE-4556
 URL: https://issues.apache.org/jira/browse/HBASE-4556
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4556-v1.txt, 4556.txt


 There are cases all over the code where InternalScanner.next(...) is not used 
 correctly.
 I see this a lot:
 {code}
 while(scanner.next(...)) {
 }
 {code}
 The correct pattern is:
 {code}
 boolean more = false;
 do {
more = scanner.next(...);
 } while (more);
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4568) Make zk dump jsp response more quickly


[ 
https://issues.apache.org/jira/browse/HBASE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128377#comment-13128377
 ] 

Hudson commented on HBASE-4568:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4568 Make zk dump jsp response faster

nspiegelberg : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/RetryCounter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* /hbase/trunk/src/main/resources/hbase-webapps/master/zk.jsp


 Make zk dump jsp response more quickly
 --

 Key: HBASE-4568
 URL: https://issues.apache.org/jira/browse/HBASE-4568
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4568.patch


 1) For each zk dump, currently hbase will create a zk client instance every 
 time. 
 This is quite slow when any machines in the quorum is dead. Because it will 
 connect to each machine in the zk quorum again.
 code
 HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
 Configuration conf = master.getConfiguration();
 HBaseAdmin hbadmin = new HBaseAdmin(conf);
 HConnection connection = hbadmin.getConnection();
 ZooKeeperWatcher watcher = connection.getZooKeeperWatcher();
 /code
 So we can simplify this:
 code
 HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
 ZooKeeperWatcher watcher = master.getZooKeeperWatcher();
 /code
 2) Also when hbase call getServerStats() for each machine in the zk quorum, 
 it hard coded the default time out as 1 min. 
 It would be nice to make this configurable and set it to a low time out.
 When hbase tries to connect to each machine in the zk quorum, it will create 
 the socket, and then set the socket time out, and read it with this time out.
 It means hbase will create a socket and connect to the zk server with 0 time 
 out at first, which will take a long time. 
 Because a timeout of zero is interpreted as an infinite timeout. The 
 connection will then block until established or an error occurs.
 3) The recoverable zookeeper should be real exponentially backoff when there 
 is connection loss exception, which will give hbase much longer time window 
 to recover from zk machine failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4551) Small fixes to compile against 0.23-SNAPSHOT


[ 
https://issues.apache.org/jira/browse/HBASE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128379#comment-13128379
 ] 

Hudson commented on HBASE-4551:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4551  Fix pom and some test cases to compile and run against Hadoop 
0.23

todd : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/pom.xml
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java


 Small fixes to compile against 0.23-SNAPSHOT
 

 Key: HBASE-4551
 URL: https://issues.apache.org/jira/browse/HBASE-4551
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.92.0

 Attachments: hbase-4551.txt, hbase-4551.txt


 - fix pom.xml to properly pull the test artifacts
 - fix TestHLog to not use the private cluster.getNameNode() API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4078) Silent Data Offlining During HDFS Flakiness


[ 
https://issues.apache.org/jira/browse/HBASE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128381#comment-13128381
 ] 

Hudson commented on HBASE-4078:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4078 Validate store files after flush/compaction

nspiegelberg : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java


 Silent Data Offlining During HDFS Flakiness
 ---

 Key: HBASE-4078
 URL: https://issues.apache.org/jira/browse/HBASE-4078
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.89.20100924, 0.90.3, 0.92.0
Reporter: Nicolas Spiegelberg
Assignee: Pritam Damania
Priority: Blocker
 Fix For: 0.92.0, 0.94.0

 Attachments: 
 0001-Validate-store-files-after-compactions-flushes.patch, 
 0001-Validate-store-files.patch


 See HBASE-1436 .  The bug fix for this JIRA is a temporary workaround for 
 improperly moving partially-written files from TMP into the region directory 
 when a FS error occurs.  Unfortunately, the fix is to ignore all IO 
 exceptions, which masks off-lining due to FS flakiness.  We need to 
 permanently fix the problem that created HBASE-1436  then at least have the 
 option to not open a region during times of flakey FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4597) [book] performance.xml Adding comment about EC2


[ 
https://issues.apache.org/jira/browse/HBASE-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128388#comment-13128388
 ] 

Hudson commented on HBASE-4597:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4597 performance.xml ec2 section

dmeil : 
Files : 
* /hbase/trunk/src/docbkx/performance.xml


 [book] performance.xml Adding comment about EC2
 ---

 Key: HBASE-4597
 URL: https://issues.apache.org/jira/browse/HBASE-4597
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: performance_HBASE_4597.xml.patch


 I added a section under performance reminding people that running HBase on 
 EC2 isn't the same thing as running on a dedicated server.
 This type of question seems to happen fairly often on the dist-list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter


[ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128383#comment-13128383
 ] 

Hudson commented on HBASE-4469:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4469  Avoid top row seek by looking up bloomfilter (liyin via jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java


 Avoid top row seek by looking up bloomfilter
 

 Key: HBASE-4469
 URL: https://issues.apache.org/jira/browse/HBASE-4469
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: HBASE-4469_1.patch


 The problem is that when seeking for the row/col in the hfile, we will go to 
 top of the row in order to check for row delete marker (delete family). 
 However, if the bloomfilter is enabled for the column family, then if a 
 delete family operation is done on a row, the row is already being added to 
 bloomfilter. We can take advantage of this factor to avoid seeking to the top 
 of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4282) RegionServer should abort when WAL close encounters an error with unflushed edits


[ 
https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128384#comment-13128384
 ] 

Hudson commented on HBASE-4282:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4282  RegionServer should abort when WAL close fails with unflushed 
edits

garyh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java


 RegionServer should abort when WAL close encounters an error with unflushed 
 edits
 -

 Key: HBASE-4282
 URL: https://issues.apache.org/jira/browse/HBASE-4282
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0, 0.90.5
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Blocker
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: HBASE-4282_0.90_2.patch, HBASE-4282_0.90_final.patch, 
 HBASE-4282_0.92_final.patch, HBASE-4282_trunk_2.patch, 
 HBASE-4282_trunk_3.patch, HBASE-4282_trunk_final.patch, 
 HBASE-4282_trunk_prelim.patch


 The ability to ride over WAL close errors on log rolling added in HBASE-4222 
 could lead to missing HLog entries if:
 * A table has DEFERRED_LOG_FLUSH=true
 * There are unflushed WALEdit entries for that table in the current 
 SequenceFile writer buffer
 Since the writes were already acknowledged to the client, just ignoring the 
 close error to allow for another log roll doesn't seem like the right thing 
 to do here.
 We could easily flag this state and only ride over the close error if there 
 aren't unflushed entries.  This would bring the above condition back to the 
 previous behavior of aborting the region server.  However, aborting the 
 region server in this state is still guaranteeing data loss.  Is there 
 anything we can do better in this case?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4558) Refactor TestOpenedRegionHandler and TestOpenRegionHandler.


[ 
https://issues.apache.org/jira/browse/HBASE-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128385#comment-13128385
 ] 

Hudson commented on HBASE-4558:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4558 - Addendum for TestMasterFailOver (Ram)
HBASE-4558 Refactor TestOpenedRegionHandler and TestOpenRegionHandler. (Ram)

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockRegionServerServices.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockServer.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/MockServer.java


 Refactor TestOpenedRegionHandler and TestOpenRegionHandler.
 ---

 Key: HBASE-4558
 URL: https://issues.apache.org/jira/browse/HBASE-4558
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4558_1.patch, HBASE-4558_2.patch, 
 HBASE-4558_3.patch


 This is an improvement task taken up to refactor TestOpenedRegionandler and 
 TestOpenRegionHandler so that MockServer and MockRegionServerServices can be 
 accessed from a common utility package.
 If we do this then one of the testcases in TestOpenedRegionHandler need not 
 start up a cluster and also moving it into a common package will help in 
 mocking the server for future testcases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4596) [book] chapter reordering


[ 
https://issues.apache.org/jira/browse/HBASE-4596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128387#comment-13128387
 ] 

Hudson commented on HBASE-4596:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-4596 book.xml chapter reordering

dmeil : 
Files : 
* /hbase/trunk/src/docbkx/book.xml


 [book] chapter reordering
 -

 Key: HBASE-4596
 URL: https://issues.apache.org/jira/browse/HBASE-4596
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4596.xml.patch


 Since the book grew organically things just kept getting added to the end, 
 whether or not it was the best place for it.
 The first 4 chapters stay the same, the change is aimed at the chapters after 
 HBase Shell.  I'm pushing the conceptual material up front, keeping the 
 support chapters together, and keeping the Developing HBase at the end.  For 
 example, right after the book introduces the shell, BAM!  Write a MapReduce 
 program!  Even before you know how to create a table, or even what the 
 overall datamodel is.  Etc.
 Before...
 Getting started
 Configuration
 Upgrading
 HBase Shell
 HBase and MapReduce
 HBase and Schema Design
 Metrics
 Cluster Replication
 Data Model
 Architecture
 Performance Tuning
 Troubleshooting
 Building HBase
 Developing HBase
 External APIs
 HBase Operational Mgt
 After...
 Getting started
 Configuration
 Upgrading
 HBase Shell
 Data Model
 HBase and Schema Design
 HBase and MapReduce
 Architecture
 External APIs
 Performance Tuning
 Troubleshooting
 HBase Operational Mgt
 Building and Developing HBase
 (In another Jira this week, Cluster Replication was put under HBase 
 Operational Mgt, Metrics were put under HBase Operational Mgt, and Building 
 HBase was moved under Developing HBase)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions


[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128382#comment-13128382
 ] 

Hudson commented on HBASE-3446:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-3446 ProcessServerShutdown fails if META moves, orphaning lots of 
regions

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
* /hbase/trunk/src/main/ruby/hbase/admin.rb
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
* /hbase/trunk/src/test/ruby/hbase/admin_test.rb
* /hbase/trunk/src/test/ruby/shell/shell_test.rb


 ProcessServerShutdown fails if META moves, orphaning lots of regions
 

 Key: HBASE-3446
 URL: https://issues.apache.org/jira/browse/HBASE-3446
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 
 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 
 3446v15.txt, 3446v23.txt


 I ran a rolling restart on a 5 node cluster with lots of regions, and 
 afterwards had LOTS of regions left orphaned. The issue appears to be that 
 ProcessServerShutdown failed because the server hosting META was restarted 
 around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme


[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128386#comment-13128386
 ] 

Hudson commented on HBASE-3417:
---

Integrated in HBase-TRUNK #2325 (See 
[https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-3417  CacheOnWrite is using the temporary output path for block 
names, need to use a more consistent block naming scheme (jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-redux-v1.patch, HBASE-3417-v1.patch, 
 HBASE-3417-v2.patch, HBASE-3417-v5.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4598) [book] adding HDFS information, updating FAQ with an EC2 reference


 [ 
https://issues.apache.org/jira/browse/HBASE-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4598:
-

Attachment: docbkx_HBASE_4598.patch

 [book] adding HDFS information, updating FAQ with an EC2 reference
 --

 Key: HBASE-4598
 URL: https://issues.apache.org/jira/browse/HBASE-4598
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: docbkx_HBASE_4598.patch


 book.xml
 * Moved EC2 remote connection question in FAQ to Troubleshooting chapter.
 * Created new general EC2 entry in FAQ with pointers to EC2 sections in Perf 
 and Trouble chapters.
 * Added HDFS section in Architecture chapter, with link to Hadoop HDFS 
 documentation.
 ** These type of questions come up from time-to-time on the dist-list.
 Performance.xml
 * Added section in Performance chapter for HDFS
 ** One sub-section is link to umbrella Jira for HDFS tickets for low-latency 
 reads.
 ** Another is section on HBase vs. HDFS performance in a batch context.
 Trouble.xml
 * Moving EC2 entry from FAQ

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4598) [book] adding HDFS information, updating FAQ with an EC2 reference


 [ 
https://issues.apache.org/jira/browse/HBASE-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4598:
-

Status: Patch Available  (was: Open)

 [book] adding HDFS information, updating FAQ with an EC2 reference
 --

 Key: HBASE-4598
 URL: https://issues.apache.org/jira/browse/HBASE-4598
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: docbkx_HBASE_4598.patch


 book.xml
 * Moved EC2 remote connection question in FAQ to Troubleshooting chapter.
 * Created new general EC2 entry in FAQ with pointers to EC2 sections in Perf 
 and Trouble chapters.
 * Added HDFS section in Architecture chapter, with link to Hadoop HDFS 
 documentation.
 ** These type of questions come up from time-to-time on the dist-list.
 Performance.xml
 * Added section in Performance chapter for HDFS
 ** One sub-section is link to umbrella Jira for HDFS tickets for low-latency 
 reads.
 ** Another is section on HBase vs. HDFS performance in a batch context.
 Trouble.xml
 * Moving EC2 entry from FAQ

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4598) [book] adding HDFS information, updating FAQ with an EC2 reference

2011-10-16 Thread Doug Meil (Created) (JIRA)

[book] adding HDFS information, updating FAQ with an EC2 reference
--

 Key: HBASE-4598
 URL: https://issues.apache.org/jira/browse/HBASE-4598
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: docbkx_HBASE_4598.patch

book.xml
* Moved EC2 remote connection question in FAQ to Troubleshooting chapter.
* Created new general EC2 entry in FAQ with pointers to EC2 sections in Perf 
and Trouble chapters.
* Added HDFS section in Architecture chapter, with link to Hadoop HDFS 
documentation.
** These type of questions come up from time-to-time on the dist-list.

Performance.xml
* Added section in Performance chapter for HDFS
** One sub-section is link to umbrella Jira for HDFS tickets for low-latency 
reads.
** Another is section on HBase vs. HDFS performance in a batch context.

Trouble.xml
* Moving EC2 entry from FAQ

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4598) [book] adding HDFS information, updating FAQ with an EC2 reference


 [ 
https://issues.apache.org/jira/browse/HBASE-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4598:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 [book] adding HDFS information, updating FAQ with an EC2 reference
 --

 Key: HBASE-4598
 URL: https://issues.apache.org/jira/browse/HBASE-4598
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: docbkx_HBASE_4598.patch


 book.xml
 * Moved EC2 remote connection question in FAQ to Troubleshooting chapter.
 * Created new general EC2 entry in FAQ with pointers to EC2 sections in Perf 
 and Trouble chapters.
 * Added HDFS section in Architecture chapter, with link to Hadoop HDFS 
 documentation.
 ** These type of questions come up from time-to-time on the dist-list.
 Performance.xml
 * Added section in Performance chapter for HDFS
 ** One sub-section is link to umbrella Jira for HDFS tickets for low-latency 
 reads.
 ** Another is section on HBase vs. HDFS performance in a batch context.
 Trouble.xml
 * Moving EC2 entry from FAQ

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows

[
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128415#comment-13128415
]

Ted Yu commented on HBASE-4536:
---

bq. Thinking about a ScanConfig (or ScanInfo)
ScanInfo seems to be a better name.

bq. And then maybe a ScanType enum
This is good.

bq. a delete cell does not increase the version count
This should be fine.

Allow CF to retain deleted rows
---

Key: HBASE-4536
URL: https://issues.apache.org/jira/browse/HBASE-4536
Project: HBase
Issue Type: New Feature
Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Parent allows for a cluster to retain rows for a TTL or keep a minimum number
of versions.
However, if a client deletes a row all version older than the delete tomb
stone will be remove at the next major compaction (and even at memstore flush
- see HBASE-4241).
There should be a way to retain those version to guard against software error.
I see two options here:
1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED.
2. Folds this into the parent change. I.e. keep minimum-number-of-versions of
versions even past the delete marker.
#1 would allow for more flexibility. #2 comes somewhat naturally with parent
(from a user viewpoint)
Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write read


 [ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4563:
--

Summary: When error occurs in this.parent.close(false) of split, the split 
region cannot write  read  (was: When split doing this.parent.close(false) 
occurs error,it'll cause the splited region cann't write  read)

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write  read
 

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read


 [ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4563:
--

Assignee: bluedavy
 Summary: When error occurs in this.parent.close(false) of split, the split 
region cannot write or read  (was: When error occurs in 
this.parent.close(false) of split, the split region cannot write  read)

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128418#comment-13128418
 ] 

Ted Yu commented on HBASE-4562:
---

In JIRA description:
bq. 5. kill the regionserver hosted the table;


 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4599) [book] performance.xml - nit grammatical error in EC2 section

2011-10-16 Thread Doug Meil (Created) (JIRA)

[book] performance.xml - nit grammatical error in EC2 section
-

 Key: HBASE-4599
 URL: https://issues.apache.org/jira/browse/HBASE-4599
 Project: HBase
  Issue Type: Bug
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Trivial




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4599) [book] performance.xml - nit grammatical error in EC2 section


 [ 
https://issues.apache.org/jira/browse/HBASE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4599:
-

Attachment: performance_HBASE_4599.xml.patch

 [book] performance.xml - nit grammatical error in EC2 section
 -

 Key: HBASE-4599
 URL: https://issues.apache.org/jira/browse/HBASE-4599
 Project: HBase
  Issue Type: Bug
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Trivial
 Attachments: performance_HBASE_4599.xml.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4599) [book] performance.xml - nit grammatical error in EC2 section


 [ 
https://issues.apache.org/jira/browse/HBASE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4599:
-

Status: Patch Available  (was: Open)

 [book] performance.xml - nit grammatical error in EC2 section
 -

 Key: HBASE-4599
 URL: https://issues.apache.org/jira/browse/HBASE-4599
 Project: HBase
  Issue Type: Bug
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Trivial
 Attachments: performance_HBASE_4599.xml.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4599) [book] performance.xml - nit grammatical error in EC2 section


 [ 
https://issues.apache.org/jira/browse/HBASE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4599:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 [book] performance.xml - nit grammatical error in EC2 section
 -

 Key: HBASE-4599
 URL: https://issues.apache.org/jira/browse/HBASE-4599
 Project: HBase
  Issue Type: Bug
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Trivial
 Attachments: performance_HBASE_4599.xml.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128422#comment-13128422
 ] 

Ted Yu commented on HBASE-4562:
---

If OfflineParentInMeta() times out, SplitRequest.run() would execute the 
following code:
{code}
  if (st.rollback(this.server, this.server)) {
LOG.info(Successful rollback of failed split of  +
  parent.getRegionNameAsString());
  } else {
this.server.abort(Abort; we got an error after 
point-of-no-return);
{code}

I agree that the comments should be consistent in all patches.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4562:
--

Comment: was deleted

(was: In JIRA description:
bq. 5. kill the regionserver hosted the table;
)

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4070) [Coprocessors] Improve region server metrics to report loaded coprocessors to master


[ 
https://issues.apache.org/jira/browse/HBASE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128436#comment-13128436
 ] 

jirapos...@reviews.apache.org commented on HBASE-4070:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2029/
---

(Updated 2011-10-16 16:22:16.729481)


Review request for hbase and Mingjie Lai.


Changes
---

Updated 'testing done' section.


Summary
---

Proposed fix for HBASE-4070. 


This addresses bug HBASE-4070.
https://issues.apache.org/jira/browse/HBASE-4070


Diffs
-

  src/main/jamon/org/apache/hbase/tmpl/master/MasterStatusTmpl.jamon abeb850 
  src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon be6fceb 
  src/main/java/org/apache/hadoop/hbase/ClusterStatus.java 01bc1dd 
  src/main/java/org/apache/hadoop/hbase/HServerLoad.java 0c680e4 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 92c959c 
  src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 
7d2f82e 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 50b49a6 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java e2e694a 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java 
eda5a9b 

Diff: https://reviews.apache.org/r/2029/diff


Testing (updated)
---

Two new tests : testRegionServerCoprocessorReported() and 
testMasterServerCoprocessorsReported() added to (existing) 
src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java.


Thanks,

Eugene



 [Coprocessors] Improve region server metrics to report loaded coprocessors to 
 master
 

 Key: HBASE-4070
 URL: https://issues.apache.org/jira/browse/HBASE-4070
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Mingjie Lai
Assignee: Eugene Koontz
 Attachments: HBASE-4070.patch, HBASE-4070.patch, HBASE-4070.patch, 
 HBASE-4070.patch, master-web-ui.jpg, rs-status-web-ui.jpg


 HBASE-3512 is about listing loaded cp classes at shell. To make it more 
 generic, we need a way to report this piece of information from region to 
 master (or just at region server level). So later on, we can display the 
 loaded class names at shell as well as web console. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4599) [book] performance.xml - nit grammatical error in EC2 section


[ 
https://issues.apache.org/jira/browse/HBASE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128444#comment-13128444
 ] 

Hudson commented on HBASE-4599:
---

Integrated in HBase-TRUNK #2326 (See 
[https://builds.apache.org/job/HBase-TRUNK/2326/])
HBASE-4599.  performance.xml - correcting small error in EC2 section

dmeil : 
Files : 
* /hbase/trunk/src/docbkx/performance.xml


 [book] performance.xml - nit grammatical error in EC2 section
 -

 Key: HBASE-4599
 URL: https://issues.apache.org/jira/browse/HBASE-4599
 Project: HBase
  Issue Type: Bug
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Trivial
 Attachments: performance_HBASE_4599.xml.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4598) [book] adding HDFS information, updating FAQ with an EC2 reference


[ 
https://issues.apache.org/jira/browse/HBASE-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128443#comment-13128443
 ] 

Hudson commented on HBASE-4598:
---

Integrated in HBase-TRUNK #2326 (See 
[https://builds.apache.org/job/HBase-TRUNK/2326/])
HBASE-4598 book update (book.xml, perf.xml, trouble.xml)

dmeil : 
Files : 
* /hbase/trunk/src/docbkx/book.xml
* /hbase/trunk/src/docbkx/performance.xml
* /hbase/trunk/src/docbkx/troubleshooting.xml


 [book] adding HDFS information, updating FAQ with an EC2 reference
 --

 Key: HBASE-4598
 URL: https://issues.apache.org/jira/browse/HBASE-4598
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: docbkx_HBASE_4598.patch


 book.xml
 * Moved EC2 remote connection question in FAQ to Troubleshooting chapter.
 * Created new general EC2 entry in FAQ with pointers to EC2 sections in Perf 
 and Trouble chapters.
 * Added HDFS section in Architecture chapter, with link to Hadoop HDFS 
 documentation.
 ** These type of questions come up from time-to-time on the dist-list.
 Performance.xml
 * Added section in Performance chapter for HDFS
 ** One sub-section is link to umbrella Jira for HDFS tickets for low-latency 
 reads.
 ** Another is section on HBase vs. HDFS performance in a batch context.
 Trouble.xml
 * Moving EC2 entry from FAQ

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4600) [book] book.xml comment about what to do instead of using explicit timestamp, minor reformatting in KeyValue example

2011-10-16 Thread Doug Meil (Created) (JIRA)

[book] book.xml comment about what to do instead of using explicit timestamp, 
minor reformatting in KeyValue example


 Key: HBASE-4600
 URL: https://issues.apache.org/jira/browse/HBASE-4600
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor


book.xml
* further explanation on what to do instead of using explicit Put timestamp.
* minor reformatting in KeyValue example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4600) [book] book.xml comment about what to do instead of using explicit timestamp, minor reformatting in KeyValue example


 [ 
https://issues.apache.org/jira/browse/HBASE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4600:
-

Status: Patch Available  (was: Open)

 [book] book.xml comment about what to do instead of using explicit timestamp, 
 minor reformatting in KeyValue example
 

 Key: HBASE-4600
 URL: https://issues.apache.org/jira/browse/HBASE-4600
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4600.xml.patch


 book.xml
 * further explanation on what to do instead of using explicit Put timestamp.
 * minor reformatting in KeyValue example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4600) [book] book.xml comment about what to do instead of using explicit timestamp, minor reformatting in KeyValue example


 [ 
https://issues.apache.org/jira/browse/HBASE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4600:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 [book] book.xml comment about what to do instead of using explicit timestamp, 
 minor reformatting in KeyValue example
 

 Key: HBASE-4600
 URL: https://issues.apache.org/jira/browse/HBASE-4600
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4600.xml.patch


 book.xml
 * further explanation on what to do instead of using explicit Put timestamp.
 * minor reformatting in KeyValue example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4600) [book] book.xml comment about what to do instead of using explicit timestamp, minor reformatting in KeyValue example

2011-10-16 Thread Kannan Muthukkaruppan (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4600:
-

Attachment: book_HBASE_4600.xml.patch

 [book] book.xml comment about what to do instead of using explicit timestamp, 
 minor reformatting in KeyValue example
 

 Key: HBASE-4600
 URL: https://issues.apache.org/jira/browse/HBASE-4600
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4600.xml.patch


 book.xml
 * further explanation on what to do instead of using explicit Put timestamp.
 * minor reformatting in KeyValue example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix


[ 
https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128529#comment-13128529
 ] 

Kannan Muthukkaruppan commented on HBASE-3443:
--

Now that we have lazy seeks, i.e. HBASE-4465, we should be able to revert the 
work/optimization done HBASE-3082, and avoid this bug. What do you folks think?



 ICV optimization to look in memstore first and then store files (HBASE-3082) 
 does not work when deletes are in the mix
 --

 Key: HBASE-3443
 URL: https://issues.apache.org/jira/browse/HBASE-3443
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan

 For incrementColumnValue() HBASE-3082 adds an optimization to check memstores 
 first, and only if not present in the memstore then check the store files. In 
 the presence of deletes, the above optimization is not reliable.
 If the column is marked as deleted in the memstore, one should not look 
 further into the store files. But currently, the code does so.
 Sample test code outline:
 {code}
 admin.createTable(desc)
 table = HTable.new(conf, tableName)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 admin.flush(tableName)
 sleep(2)
 del = Delete.new(Bytes.toBytes(row))
 table.delete(del)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 get = Get.new(Bytes.toBytes(row))
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())};
 end
 {code}
 The above prints:
 {code}
 Expect 5; Got Value=10
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix


[ 
https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128530#comment-13128530
 ] 

Ted Yu commented on HBASE-3443:
---

+1 on the proposal.

 ICV optimization to look in memstore first and then store files (HBASE-3082) 
 does not work when deletes are in the mix
 --

 Key: HBASE-3443
 URL: https://issues.apache.org/jira/browse/HBASE-3443
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan

 For incrementColumnValue() HBASE-3082 adds an optimization to check memstores 
 first, and only if not present in the memstore then check the store files. In 
 the presence of deletes, the above optimization is not reliable.
 If the column is marked as deleted in the memstore, one should not look 
 further into the store files. But currently, the code does so.
 Sample test code outline:
 {code}
 admin.createTable(desc)
 table = HTable.new(conf, tableName)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 admin.flush(tableName)
 sleep(2)
 del = Delete.new(Bytes.toBytes(row))
 table.delete(del)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 get = Get.new(Bytes.toBytes(row))
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())};
 end
 {code}
 The above prints:
 {code}
 Expect 5; Got Value=10
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4600) [book] book.xml comment about what to do instead of using explicit timestamp, minor reformatting in KeyValue example


[ 
https://issues.apache.org/jira/browse/HBASE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128537#comment-13128537
 ] 

Hudson commented on HBASE-4600:
---

Integrated in HBase-TRUNK #2328 (See 
[https://builds.apache.org/job/HBase-TRUNK/2328/])
HBASE-4600 book.xml

dmeil : 
Files : 
* /hbase/trunk/src/docbkx/book.xml


 [book] book.xml comment about what to do instead of using explicit timestamp, 
 minor reformatting in KeyValue example
 

 Key: HBASE-4600
 URL: https://issues.apache.org/jira/browse/HBASE-4600
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4600.xml.patch


 book.xml
 * further explanation on what to do instead of using explicit Put timestamp.
 * minor reformatting in KeyValue example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4511) There is data loss when master failovers

2011-10-16 Thread gaojinchao (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128554#comment-13128554
 ] 

gaojinchao commented on HBASE-4511:
---

Ihis cannot be reproduced in real cluster and downgrade its priority.

 There is data loss when master failovers
 

 Key: HBASE-4511
 URL: https://issues.apache.org/jira/browse/HBASE-4511
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: gaojinchao
Priority: Critical
 Fix For: 0.92.0

 Attachments: 
 org.apache.hadoop.hbase.master.TestMasterFailover-output.rar


 It goes like this:
 Master crashed ,  at the same time RS with meta is crashing, but RS doesn't 
 eixt.
 Master startups again and finds all living RS. 
 Master verifies the meta failed,  because this RS is crashing.
 Master reassigns the meta, but it doesn't split the Hlog. 
 So some meta data is loss.
 About the logs of a failover test case fail. 
 //It said that we want to kill a RS
 2011-09-28 19:54:45,694 INFO  [Thread-988] regionserver.HRegionServer(1443): 
 STOPPED: Killing for unit test
 2011-09-28 19:54:45,694 INFO  [Thread-988] master.TestMasterFailover(1007): 
 RS 192.168.2.102,54385,1317264874629 killed 
 //Rs didn't crash. 
 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 master.HMaster(458): Registering server found up in zk: 
 192.168.2.102,54385,1317264874629
 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 master.ServerManager(232): Registering 
 server=192.168.2.102,54385,1317264874629
 2011-09-28 19:54:51,770 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(491): master:54557-0x132b31adbb30005 Unable to get data of 
 znode /hbase/unassigned/1028785192 because node does not exist (not an error)
 2011-09-28 19:54:51,771 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 //Meta verification failed and ressigned the meta. So all the regions in the 
 meta is loss.
 2011-09-28 19:54:51,773 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:51,773 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,274 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 2011-09-28 19:54:52,277 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:52,277 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,778 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 2011-09-28 19:54:52,782 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKAssign(264): master:54557-0x132b31adbb30005 Creating (or 
 updating) unassigned node for 1028785192 with OFFLINE state
 2011-09-28 19:54:52,825 DEBUG [Thread-988-EventThread] 
 zookeeper.ZooKeeperWatcher(233): master:54557-0x132b31adbb30005 Received

[jira] [Updated] (HBASE-4511) There is data loss when master failovers

2011-10-16 Thread gaojinchao (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4511:
--

Priority: Minor  (was: Critical)

 There is data loss when master failovers
 

 Key: HBASE-4511
 URL: https://issues.apache.org/jira/browse/HBASE-4511
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: gaojinchao
Priority: Minor
 Fix For: 0.92.0

 Attachments: 
 org.apache.hadoop.hbase.master.TestMasterFailover-output.rar


 It goes like this:
 Master crashed ,  at the same time RS with meta is crashing, but RS doesn't 
 eixt.
 Master startups again and finds all living RS. 
 Master verifies the meta failed,  because this RS is crashing.
 Master reassigns the meta, but it doesn't split the Hlog. 
 So some meta data is loss.
 About the logs of a failover test case fail. 
 //It said that we want to kill a RS
 2011-09-28 19:54:45,694 INFO  [Thread-988] regionserver.HRegionServer(1443): 
 STOPPED: Killing for unit test
 2011-09-28 19:54:45,694 INFO  [Thread-988] master.TestMasterFailover(1007): 
 RS 192.168.2.102,54385,1317264874629 killed 
 //Rs didn't crash. 
 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 master.HMaster(458): Registering server found up in zk: 
 192.168.2.102,54385,1317264874629
 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 master.ServerManager(232): Registering 
 server=192.168.2.102,54385,1317264874629
 2011-09-28 19:54:51,770 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(491): master:54557-0x132b31adbb30005 Unable to get data of 
 znode /hbase/unassigned/1028785192 because node does not exist (not an error)
 2011-09-28 19:54:51,771 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 //Meta verification failed and ressigned the meta. So all the regions in the 
 meta is loss.
 2011-09-28 19:54:51,773 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:51,773 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,274 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 2011-09-28 19:54:52,277 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:52,277 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,778 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 2011-09-28 19:54:52,782 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKAssign(264): master:54557-0x132b31adbb30005 Creating (or 
 updating) unassigned node for 1028785192 with OFFLINE state
 2011-09-28 19:54:52,825 DEBUG [Thread-988-EventThread] 
 zookeeper.ZooKeeperWatcher(233): master:54557-0x132b31adbb30005 Received 
 ZooKeeper Event, type=NodeCreated, state=SyncConnected, 
 path=/hbase/unassigned/1028785192
 //It said

[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix


[ 
https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128564#comment-13128564
 ] 

Lars Hofhansl commented on HBASE-3443:
--

I agree. I think delete handling is generally a bit funky in HBase (see also 
HBASE-4536).

 ICV optimization to look in memstore first and then store files (HBASE-3082) 
 does not work when deletes are in the mix
 --

 Key: HBASE-3443
 URL: https://issues.apache.org/jira/browse/HBASE-3443
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan

 For incrementColumnValue() HBASE-3082 adds an optimization to check memstores 
 first, and only if not present in the memstore then check the store files. In 
 the presence of deletes, the above optimization is not reliable.
 If the column is marked as deleted in the memstore, one should not look 
 further into the store files. But currently, the code does so.
 Sample test code outline:
 {code}
 admin.createTable(desc)
 table = HTable.new(conf, tableName)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 admin.flush(tableName)
 sleep(2)
 del = Delete.new(Bytes.toBytes(row))
 table.delete(del)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 get = Get.new(Bytes.toBytes(row))
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())};
 end
 {code}
 The above prints:
 {code}
 Expect 5; Got Value=10
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128573#comment-13128573
 ] 

Lars Hofhansl commented on HBASE-4562:
--

I see... Thanks Ted.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read


 [ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy updated HBASE-4563:


Attachment: (was: HBASE-4563-0.90.patch)

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read


 [ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy updated HBASE-4563:


Attachment: (was: HBASE-4563-trunk.patch)

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read


 [ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy updated HBASE-4563:


Attachment: (was: HBASE-4563-0.92.patch)

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy updated HBASE-4562:


Attachment: (was: HBASE-4562-0.90.patch)

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read


 [ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy updated HBASE-4563:


Attachment: HBASE-4563-trunk.patch
HBASE-4563-0.92.patch
HBASE-4563-0.90.patch

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy updated HBASE-4562:


Attachment: (was: HBASE-4562-trunk.patch)

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy updated HBASE-4562:


Attachment: HBASE-4562-trunk.patch
HBASE-4562-0.92.patch
HBASE-4562-0.90.patch

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read

2011-10-16 Thread bluedavy (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128575#comment-13128575
 ] 

bluedavy commented on HBASE-4563:
-

I fix the formatter.

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-16 Thread bluedavy (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128574#comment-13128574
 ] 

bluedavy commented on HBASE-4562:
-

I fix the comments to keep consistent in all patches.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128602#comment-13128602
 ] 

Lars Hofhansl commented on HBASE-4562:
--

+1 for latest patches (assuming all tests pass)

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4511) There is data loss when master failovers


 [ 
https://issues.apache.org/jira/browse/HBASE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4511:
--

Fix Version/s: (was: 0.92.0)
   0.94.0

 There is data loss when master failovers
 

 Key: HBASE-4511
 URL: https://issues.apache.org/jira/browse/HBASE-4511
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: gaojinchao
Priority: Minor
 Fix For: 0.94.0

 Attachments: 
 org.apache.hadoop.hbase.master.TestMasterFailover-output.rar


 It goes like this:
 Master crashed ,  at the same time RS with meta is crashing, but RS doesn't 
 eixt.
 Master startups again and finds all living RS. 
 Master verifies the meta failed,  because this RS is crashing.
 Master reassigns the meta, but it doesn't split the Hlog. 
 So some meta data is loss.
 About the logs of a failover test case fail. 
 //It said that we want to kill a RS
 2011-09-28 19:54:45,694 INFO  [Thread-988] regionserver.HRegionServer(1443): 
 STOPPED: Killing for unit test
 2011-09-28 19:54:45,694 INFO  [Thread-988] master.TestMasterFailover(1007): 
 RS 192.168.2.102,54385,1317264874629 killed 
 //Rs didn't crash. 
 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 master.HMaster(458): Registering server found up in zk: 
 192.168.2.102,54385,1317264874629
 2011-09-28 19:54:51,763 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 master.ServerManager(232): Registering 
 server=192.168.2.102,54385,1317264874629
 2011-09-28 19:54:51,770 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(491): master:54557-0x132b31adbb30005 Unable to get data of 
 znode /hbase/unassigned/1028785192 because node does not exist (not an error)
 2011-09-28 19:54:51,771 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 //Meta verification failed and ressigned the meta. So all the regions in the 
 meta is loss.
 2011-09-28 19:54:51,773 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:51,773 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,274 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 2011-09-28 19:54:52,277 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:52,277 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,778 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) 
 of data from znode /hbase/root-region-server and set watcher; 
 192.168.2.102,54383,131726487...
 2011-09-28 19:54:52,782 INFO  [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(476): Failed verification of .META.,,1 at 
 address=192.168.2.102,54385,1317264874629; 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
 192.168.2.102,54385,1317264874629 not running, aborting
 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 catalog.CatalogTracker(316): new .META. server: 
 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null
 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] 
 zookeeper.ZKAssign(264): master:54557-0x132b31adbb30005 Creating (or 
 updating) unassigned node for 1028785192 with OFFLINE state
 2011-09-28 19:54:52,825 DEBUG [Thread-988-EventThread] 
 zookeeper.ZooKeeperWatcher(233): master:54557-0x132b31adbb30005 Received 
 ZooKeeper Event, type=NodeCreated, state=SyncConnected,

[jira] [Assigned] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-16 Thread Ted Yu (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4562:
-

Assignee: bluedavy

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read

2011-10-16 Thread bluedavy (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy resolved HBASE-4563.
-

Resolution: Fixed

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-16 Thread bluedavy (Work started) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-4562 started by bluedavy.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-16 Thread bluedavy (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy resolved HBASE-4562.
-

Resolution: Fixed

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss

2011-10-16 Thread bluedavy (Reopened) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy reopened HBASE-4562:
-


wait for committer commit to the svn.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read

2011-10-16 Thread bluedavy (Reopened) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bluedavy reopened HBASE-4563:
-


wait for committer commit to the svn.

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4588) The floating point arithmetic to validate memory allocation configurations need to be done as integers

2011-10-16 Thread dhruba borthakur (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4588:


Attachment: configVerify2.txt

Addressed Ted's review comments.

 The floating point arithmetic to validate memory allocation configurations 
 need to be done as integers
 --

 Key: HBASE-4588
 URL: https://issues.apache.org/jira/browse/HBASE-4588
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: dhruba borthakur
Priority: Minor
 Fix For: 0.92.0

 Attachments: configVerify1.txt, configVerify2.txt


 The floating point arithmetic to validate memory allocation configurations 
 need to be done as integers.
 On our cluster, we had block cache = 0.6 and memstore = 0.2.  It was saying 
 this was  0.8 when it is actually equal.
 Minor bug but annoying nonetheless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4588) The floating point arithmetic to validate memory allocation configurations need to be done as integers

2011-10-16 Thread dhruba borthakur (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4588:


Attachment: configVerify2.txt

Attaching the appropriate patch file with review comments fixes.

 The floating point arithmetic to validate memory allocation configurations 
 need to be done as integers
 --

 Key: HBASE-4588
 URL: https://issues.apache.org/jira/browse/HBASE-4588
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: dhruba borthakur
Priority: Minor
 Fix For: 0.92.0

 Attachments: configVerify1.txt, configVerify2.txt, configVerify2.txt


 The floating point arithmetic to validate memory allocation configurations 
 need to be done as integers.
 On our cluster, we had block cache = 0.6 and memstore = 0.2.  It was saying 
 this was  0.8 when it is actually equal.
 Minor bug but annoying nonetheless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4588) The floating point arithmetic to validate memory allocation configurations need to be done as integers


[ 
https://issues.apache.org/jira/browse/HBASE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128637#comment-13128637
 ] 

Ted Yu commented on HBASE-4588:
---

+1 on patch v2.

 The floating point arithmetic to validate memory allocation configurations 
 need to be done as integers
 --

 Key: HBASE-4588
 URL: https://issues.apache.org/jira/browse/HBASE-4588
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: dhruba borthakur
Priority: Minor
 Fix For: 0.92.0

 Attachments: configVerify1.txt, configVerify2.txt, configVerify2.txt


 The floating point arithmetic to validate memory allocation configurations 
 need to be done as integers.
 On our cluster, we had block cache = 0.6 and memstore = 0.2.  It was saying 
 this was  0.8 when it is actually equal.
 Minor bug but annoying nonetheless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4588) The floating point arithmetic to validate memory allocation configurations need to be done as integers

2011-10-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4588:
--

Attachment: (was: configVerify2.txt)

 The floating point arithmetic to validate memory allocation configurations 
 need to be done as integers
 --

 Key: HBASE-4588
 URL: https://issues.apache.org/jira/browse/HBASE-4588
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: dhruba borthakur
Priority: Minor
 Fix For: 0.92.0

 Attachments: configVerify1.txt, configVerify2.txt


 The floating point arithmetic to validate memory allocation configurations 
 need to be done as integers.
 On our cluster, we had block cache = 0.6 and memstore = 0.2.  It was saying 
 this was  0.8 when it is actually equal.
 Minor bug but annoying nonetheless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog


[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128650#comment-13128650
 ] 

jirapos...@reviews.apache.org commented on HBASE-4528:
--



bq.  On 2011-10-15 11:55:54, Ted Yu wrote:
bq.   We're closer.
bq.   Thanks for the perseverance, Dhruba.

I will post another version of this patch with some typos corrected.


bq.  On 2011-10-15 11:55:54, Ted Yu wrote:
bq.   /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 
1884
bq.   https://reviews.apache.org/r/2141/diff/5/?file=50446#file50446line1884
bq.  
bq.   We know that w != null here, so w.getWriteNumber() should be passed 
to rollbackMemstore().

There is no need to explicitly pass in w.getWriteNumber(). All the keys that 
are hanging off the familyMaps variable have their memstoreTS set 
appropriately.  These were set in the call to applyFamilyMapToMemstore(). This 
memstoreTS will be used in the rollback methods to ensure that only keys in the 
memstore that also have a matching memstoreTS value are removed.


bq.  On 2011-10-15 11:55:54, Ted Yu wrote:
bq.   /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 
2195
bq.   https://reviews.apache.org/r/2141/diff/5/?file=50446#file50446line2195
bq.  
bq.   We should have memstoreTS parameter here.

No need to pass in memstoreTS. The kvs hanging off the parameter 'familyMaps' 
already have the memstoreTS that was used to insert these keys in the memstore.


bq.  On 2011-10-15 11:55:54, Ted Yu wrote:
bq.   /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 
2228
bq.   https://reviews.apache.org/r/2141/diff/5/?file=50446#file50446line2228
bq.  
bq.   I think this should be in a finally block corresponding to the try 
at line 2205.

I do not think a finally block is needed. If the getlock itself threw an 
exception, then there is no reason to do a releaseLock. Nothing else in this 
code section can throw an exception.


- Dhruba


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2141/#review2611
---


On 2011-10-15 07:32:28, Dhruba Borthakur wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2141/
bq.  ---
bq.  
bq.  (Updated 2011-10-15 07:32:28)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  The changes the multiPut operation so that the sync to the wal occurs 
outside the rowlock.
bq.  
bq.  This enhancement is done only to HRegion.mut(Put[]) because this is the 
only method that gets invoked from an application. The HRegion.put(Put) is used 
only by unit tests and should possibly be deprecated.
bq.  
bq.  
bq.  This addresses bug HBASE-4528.
bq.  https://issues.apache.org/jira/browse/HBASE-4528
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1183585 
bq.
/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 
1183585 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 
1183585 
bq.
/src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java
 1183585 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1183585 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 
1183585 
bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 
1183585 
bq.
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 
1183585 
bq.  
bq.  Diff: https://reviews.apache.org/r/2141/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  I ran TestLogRolling over and over again, about 50 times, not failed a 
single time.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Dhruba
bq.  
bq.



 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, 
 appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is

[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-16 Thread dhruba borthakur (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4528:


Attachment: appendNoSyncPut5.txt

Fixed typos.

Performance numbers return on hbase-92 with a variant of hdfs 0.20.


vanilla hdfs: 1200 put/sec (no patch),
  5000 puts/sec (with patch)
synconsync hdfs : 80 put/sec (no patch)

The synconsync-version-of-hdfs is an internal version  of hdfs that makes the 
datanode issue a sync() on the corresponding ext3 block file for every 
invocation of DFSClient.sync(). This ensures that a hbase transaction is 
really,really on disk before the put rpc returns to the client.


 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, 
 appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, 
 appendNoSyncPut5.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128655#comment-13128655
 ] 

jirapos...@reviews.apache.org commented on HBASE-4528:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2141/
---

(Updated 2011-10-17 04:39:55.174101)


Review request for hbase.


Changes
---

Fixed typos.


Summary
---

The changes the multiPut operation so that the sync to the wal occurs outside 
the rowlock.

This enhancement is done only to HRegion.mut(Put[]) because this is the only 
method that gets invoked from an application. The HRegion.put(Put) is used only 
by unit tests and should possibly be deprecated.


This addresses bug HBASE-4528.
https://issues.apache.org/jira/browse/HBASE-4528


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1184991 
  /src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 
1184991 
  /src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 1184991 
  
/src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java
 1184991 
  /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1184991 
  /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 1184991 
  /src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1184991 
  /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 
1184991 

Diff: https://reviews.apache.org/r/2141/diff


Testing
---

I ran TestLogRolling over and over again, about 50 times, not failed a single 
time.


Thanks,

Dhruba



 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, 
 appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, 
 appendNoSyncPut5.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk


[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128657#comment-13128657
 ] 

Ted Yu commented on HBASE-2856:
---

For patch v7, boolean ignoreCount is added to checkColumn(). I think javadoc 
for this new parameter should be added to ColumnTracker.java
Javadoc for long readPointToUse of ScanQueryMatcher ctor should be added.
Javadoc for boolean useRWCC of StoreFileScanner ctor and 
getScannersForStoreFiles() should be added.
There is duplicate code in StoreFileScanner.next(): lines 164 to 172.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 
 acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read


[ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128660#comment-13128660
 ] 

Lars Hofhansl commented on HBASE-4563:
--

@Ted... You wanna commit, or should I? I'm happy to.

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read


[ 
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128665#comment-13128665
 ] 

Ted Yu commented on HBASE-4563:
---

@Lars:
Go ahead.

 When error occurs in this.parent.close(false) of split, the split region 
 cannot write or read
 -

 Key: HBASE-4563
 URL: https://issues.apache.org/jira/browse/HBASE-4563
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch, 
 HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt, 
 test-4563-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
   ListStoreFile hstoreFilesToSplit = this.parent.close(false);
   throw new IOException(some unexpected error in close store files);
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. scan the table,then it'll fail.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4563) When error occurs in this.parent.close(false) of split, the split region cannot write or read

2011-10-16 Thread Lars Hofhansl (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl resolved HBASE-4563.
--

Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.90, 0.92, and trunk

When error occurs in this.parent.close(false) of split, the split region
cannot write or read
-

Key: HBASE-4563
URL: https://issues.apache.org/jira/browse/HBASE-4563
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
Fix For: 0.90.5

Attachments: HBASE-4563-0.90.patch, HBASE-4563-0.92.patch,
HBASE-4563-trunk.patch, test-4563-0.90.txt, test-4563-0.92.txt,
test-4563-trunk.txt

Follow below steps to replay the problem:
1. change the SplitTransaction.java as below,just like mock the hdfs error.
{code:title=SplitTransaction.java|borderStyle=solid}
ListStoreFile hstoreFilesToSplit = this.parent.close(false);
throw new IOException(some unexpected error in close store files);
{code}
2. update the regionserver code,restart;
3. create a table put some data to the table;
4. split the table;
5. scan the table,then it'll fail.
We can fix the bug just use the patch.

[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss


[ 
https://issues.apache.org/jira/browse/HBASE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128675#comment-13128675
 ] 

Lars Hofhansl commented on HBASE-4562:
--

Committing this too.

 When split doing offlineParentInMeta encounters error, it'll cause data loss
 

 Key: HBASE-4562
 URL: https://issues.apache.org/jira/browse/HBASE-4562
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: bluedavy
Assignee: bluedavy
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4562-0.90.patch, HBASE-4562-0.92.patch, 
 HBASE-4562-trunk.patch, test-4562-0.90.txt, test-4562-0.92.txt, 
 test-4562-trunk.txt


 Follow below steps to replay the problem:
 1. change the SplitTransaction.java as below,just like mock the timeout error.
{code:title=SplitTransaction.java|borderStyle=solid}
   if (!testing) {
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
 throw new IOException(some unexpected error in split);
   }
{code} 
 2. update the regionserver code,restart;
 3. create a table  put some data to the table;
 4. split the table;
 5. kill the regionserver hosted the table;
 6. wait some time after master ServerShutdownHandler.process execute,then 
 scan the table,u'll find the data wrote before lost.
 We can fix the bug just use the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4562) When split doing offlineParentInMeta encounters error, it'll cause data loss