Hello again. My second question concerns one of my region server (often the same) which shutdowns often because it misses the window to heartbeats to master: Maybe it is overloaded. But it misses it for about 6min.
I turned the log file to debug mode, but I havn't found anything more interesting. The last action is a compaction, but it ends normally. Maybe it is followed by a heavy hadoop task ? Or maybe it is linked to the fact that there is only 1Gb HD free ? That is the only difference I notice between this node and the others, Note that its hostname is the first on the regionsserver list. Does this position increase the amount of work ? (e.g. META table always loaded here ?) By the way, on a computer that have (only) 1Gb of RAM should I decrease the jvm max allowed memory to the heaps of hadoop datanode and hbase regionserver (default is 1Gb for each I think) to avoid endless swap ? Nothing in jira seems to match my problem. Other idea ? --- region server log --- 2008-10-16 15:18:45,812 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: table-0.3,PLQ80+70101200 :key/miss;j1DB44040DD81BA02D4E0E9A0D8698DA9 2008-10-16 15:18:45,812 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region table-0.3,PLQ80+70101200 :key/miss;j1DB44040DD81BA02D4E0E9A0D8698DA9,1224096999059 2008-10-16 15:18:45,820 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Skipped compaction of 1 file; compaction size of 1082805005/header: 4 83.5k; Skipped 3 files, size: 488461 2008-10-16 15:18:45,826 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Skipped compaction of 1 file; compaction size of 1082805005/bytes: 13 6.5m; Skipped 2 files, size: 141612841 2008-10-16 15:18:45,833 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Skipped compaction of 1 file; compaction size of 1082805005/info: 1.1 m; Skipped 3 files, size: 1109592 2008-10-16 15:18:45,833 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region table-0.3,PLQ80+70101200 :key/miss;j1DB44040DD81BA02D4E0E9A0D8698DA9,1224096999059 in 0sec 2008-10-16 15:24:32,656 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 265463ms, ten times longer than scheduled: 3000 2008-10-16 15:24:32,656 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to master for 265463 milliseconds - aborting server 2008-10-16 15:24:32,656 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined 2008-10-16 15:24:32,657 INFO org.apache.hadoop.ipc.Server: Stopping server on 60020 2 Again. Thank you for your advises. -- Jean-Adrien Cluster setup: Ubuntu linux 4 regionsservers / datanodes 1 is master / namenode as well. java-6-sun Total size of hdfs: 81.98 GB (replication factor 3) fsck -> healthy hadoop: 0.18.1 hbase: 0.18.0 (jar of hadoop replaced with 0.18.1) 1Gb ram per node -- View this message in context: http://www.nabble.com/Regionserver-sleeps-too-much-tp20014722p20014722.html Sent from the HBase User mailing list archive at Nabble.com.
