[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart
[ https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2311: -- Priority: Trivial (was: Critical) Dropping priority since this bug has not re-occurred. [hbase] Could not complete hdfs write out to flush file forcing regionserver restart Key: HADOOP-2311 URL: https://issues.apache.org/jira/browse/HADOOP-2311 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Trivial Attachments: delete-logging.patch I've spent some time looking into this issue but there are not enough clues in the logs to tell where the problem is. Here's what I know. Two region servers went down last night, a minute apart, during Paul Saab's 6hr run inserting 300million rows into hbase. The regionservers went down to force rerun of hlog and avoid possible data loss after a failure writing memory flushes to hdfs. Here is the lead up to the failed flush: ... 2007-11-28 22:40:02,231 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no reconstruction log) 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/cookie is 29077708 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no reconstruction log) 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/ip is 29077708 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegion - region postlog,img149/4699/133lm0.jpg,1196318393738 available 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 21357623 to 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no reconstruction log) 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/cookie is 29077708 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no reconstruction log) 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/ip is 29077708 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709 2007-11-28 22:40:04,701 INFO hbase.HRegion - region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region postlog,img143/1310/yashrk3.jpg,1196317258704 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region postlog,img142/8773/1001417zc4.jpg,1196317258703 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region postlog,img149/4699/133lm0.jpg,1196318393738 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, sequenceid=29081563 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore postlog,img149/4699/133lm0.jpg,1196318393738/ip needed.
[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart
[ https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2311: -- Priority: Critical (was: Minor) Sounds serious. Changing to critical. [hbase] Could not complete hdfs write out to flush file forcing regionserver restart Key: HADOOP-2311 URL: https://issues.apache.org/jira/browse/HADOOP-2311 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Critical Attachments: delete-logging.patch I've spent some time looking into this issue but there are not enough clues in the logs to tell where the problem is. Here's what I know. Two region servers went down last night, a minute apart, during Paul Saab's 6hr run inserting 300million rows into hbase. The regionservers went down to force rerun of hlog and avoid possible data loss after a failure writing memory flushes to hdfs. Here is the lead up to the failed flush: ... 2007-11-28 22:40:02,231 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no reconstruction log) 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/cookie is 29077708 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no reconstruction log) 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/ip is 29077708 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegion - region postlog,img149/4699/133lm0.jpg,1196318393738 available 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 21357623 to 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no reconstruction log) 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/cookie is 29077708 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no reconstruction log) 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/ip is 29077708 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709 2007-11-28 22:40:04,701 INFO hbase.HRegion - region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region postlog,img143/1310/yashrk3.jpg,1196317258704 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region postlog,img142/8773/1001417zc4.jpg,1196317258703 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region postlog,img149/4699/133lm0.jpg,1196318393738 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, sequenceid=29081563 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore postlog,img149/4699/133lm0.jpg,1196318393738/ip needed. 2007-11-28 22:41:04,903
[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart
[ https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2311: -- Attachment: delete-logging.patch Log all deletes. [hbase] Could not complete hdfs write out to flush file forcing regionserver restart Key: HADOOP-2311 URL: https://issues.apache.org/jira/browse/HADOOP-2311 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Minor Attachments: delete-logging.patch I've spent some time looking into this issue but there are not enough clues in the logs to tell where the problem is. Here's what I know. Two region servers went down last night, a minute apart, during Paul Saab's 6hr run inserting 300million rows into hbase. The regionservers went down to force rerun of hlog and avoid possible data loss after a failure writing memory flushes to hdfs. Here is the lead up to the failed flush: ... 2007-11-28 22:40:02,231 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no reconstruction log) 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/cookie is 29077708 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no reconstruction log) 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/ip is 29077708 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegion - region postlog,img149/4699/133lm0.jpg,1196318393738 available 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 21357623 to 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no reconstruction log) 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/cookie is 29077708 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no reconstruction log) 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/ip is 29077708 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709 2007-11-28 22:40:04,701 INFO hbase.HRegion - region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region postlog,img143/1310/yashrk3.jpg,1196317258704 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region postlog,img142/8773/1001417zc4.jpg,1196317258703 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region postlog,img149/4699/133lm0.jpg,1196318393738 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, sequenceid=29081563 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore postlog,img149/4699/133lm0.jpg,1196318393738/ip needed. 2007-11-28 22:41:04,903 DEBUG hbase.HRegion - 1703405830/ip