Hi.HDFS capacity is about 800gb (8 datanodes) and the current usage is about 30GB. This is after total re-format of the HDFS that was made a hour before.
BTW, the logs i sent are from the first exception that i found in them. Best Regards. On Tue, Oct 28, 2008 at 7:12 PM, stack <[EMAIL PROTECTED]> wrote: > I took a quick look Slava (Thanks for sending the files). Here's a few > notes: > > + The logs are from after the damage is done; the transition from good to > bad is missing. If I could see that, that would help > + But what seems to be plain is that that your HDFS is very sick. See this > from head of one of the regionserver logs: > > 2008-10-27 23:41:12,682 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer > Exception: java.io.IOException: Unable to create new block. > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735) > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912) > > 2008-10-27 23:41:12,682 WARN org.apache.hadoop.dfs.DFSClient: Error > Recovery for block blk_-5188192041705782716_60000 bad datanode[0] > 2008-10-27 23:41:12,685 ERROR > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split > failed for region > BizDB,1.1.PerfBO1.f2188a42-5eb7-4a6a-82ef-2da0d0ea4ce0,1225136351518 > java.io.IOException: Could not get block locations. Aborting... > > > If HDFS is ailing, hbase is too. In fact, the regionservers will shut > themselves to protect themselves against damaging or losing data: > > 2008-10-27 23:41:12,688 FATAL org.apache.hadoop.hbase.regionserver.Flusher: > Replay of hlog required. Forcing server restart > > So, whats up with your HDFS? Not enough space alloted? What happens if > you run "./bin/hadoop fsck /"? Does that give you a clue as to what > happened? Dig in the datanode and namenode logs. Look for where the > exceptions start. It might give you a clue. > > + The suse regionserver log had garbage in it. > > St.Ack > > > Slava Gorelik wrote: > >> Hi. >> My happiness was very short :-( After i successfully added 1M rows (50k >> each row) i tried to add 10M rows. >> And after 3-4 working hours it started to dying. First one region server >> is died, after another one and eventually all cluster is dead. >> >> I attached log files (relevant part, archived) from region servers and >> from the master. >> >> Best Regards. >> >> >> >> On Mon, Oct 27, 2008 at 11:19 AM, Slava Gorelik <[EMAIL PROTECTED]<mailto: >> [EMAIL PROTECTED]>> wrote: >> >> Hi. >> So far so good, after changing the file descriptors >> and dfs.datanode.socket.write.timeout, dfs.datanode.max.xcievers >> my cluster works stable. >> Thank You and Best Regards. >> >> P.S. Regarding deleting multiple columns missing functionality i >> filled jira : https://issues.apache.org/jira/browse/HBASE-961 >> >> >> >> On Sun, Oct 26, 2008 at 12:58 AM, Michael Stack <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >> Slava Gorelik wrote: >> >> Hi.Haven't tried yet them, i'll try tomorrow morning. In >> general cluster is >> working well, the problems begins if i'm trying to add 10M >> rows, after 1.2M >> if happened. >> >> Anything else running beside the regionserver or datanodes >> that would suck resources? When datanodes begin to slow, we >> begin to see the issue Jean-Adrien's configurations address. >> Are you uploading using MapReduce? Are TTs running on same >> nodes as the datanode and regionserver? How are you doing the >> upload? Describe what your uploader looks like (Sorry if >> you've already done this). >> >> >> I already changed the limit of files descriptors, >> >> Good. >> >> >> I'll try >> to change the properties: >> <property> <name>dfs.datanode.socket.write.timeout</name> >> <value>0</value> >> </property> >> >> <property> >> <name>dfs.datanode.max.xcievers</name> >> <value>1023</value> >> </property> >> >> >> Yeah, try it. >> >> >> And let you know, is any other prescriptions ? Did i miss >> something ? >> >> BTW, off topic, but i sent e-mail recently to the list and >> i can't see it: >> Is it possible to delete multiple columns in any way by >> regex : for example >> colum_name_* ? >> >> Not that I know of. If its not in the API, it should be. >> Mind filing a JIRA? >> >> Thanks Slava. >> St.Ack >> >> >> >> >
