Hi.First of all i want to say thank you for you assistance !!! DEBUG on hadoop or hbase ? And how can i enable ? fsck said that HDFS is healthy.
Best Regards and Thank You On Tue, Oct 28, 2008 at 8:45 PM, stack <[EMAIL PROTECTED]> wrote: > Slava Gorelik wrote: > >> Hi.HDFS capacity is about 800gb (8 datanodes) and the current usage is >> about >> 30GB. This is after total re-format of the HDFS that was made a hour >> before. >> >> BTW, the logs i sent are from the first exception that i found in them. >> Best Regards. >> >> > Please enable DEBUG and retry. Send me all logs. What does the fsck on > HDFS say? There is something seriously wrong with your cluster that you are > having so much trouble getting it running. Lets try and figure it. > > St.Ack > > > > > >> On Tue, Oct 28, 2008 at 7:12 PM, stack <[EMAIL PROTECTED]> wrote: >> >> >> >>> I took a quick look Slava (Thanks for sending the files). Here's a few >>> notes: >>> >>> + The logs are from after the damage is done; the transition from good to >>> bad is missing. If I could see that, that would help >>> + But what seems to be plain is that that your HDFS is very sick. See >>> this >>> from head of one of the regionserver logs: >>> >>> 2008-10-27 23:41:12,682 WARN org.apache.hadoop.dfs.DFSClient: >>> DataStreamer >>> Exception: java.io.IOException: Unable to create new block. >>> at >>> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349) >>> at >>> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735) >>> at >>> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912) >>> >>> 2008-10-27 23:41:12,682 WARN org.apache.hadoop.dfs.DFSClient: Error >>> Recovery for block blk_-5188192041705782716_60000 bad datanode[0] >>> 2008-10-27 23:41:12,685 ERROR >>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split >>> failed for region >>> BizDB,1.1.PerfBO1.f2188a42-5eb7-4a6a-82ef-2da0d0ea4ce0,1225136351518 >>> java.io.IOException: Could not get block locations. Aborting... >>> >>> >>> If HDFS is ailing, hbase is too. In fact, the regionservers will shut >>> themselves to protect themselves against damaging or losing data: >>> >>> 2008-10-27 23:41:12,688 FATAL >>> org.apache.hadoop.hbase.regionserver.Flusher: >>> Replay of hlog required. Forcing server restart >>> >>> So, whats up with your HDFS? Not enough space alloted? What happens if >>> you run "./bin/hadoop fsck /"? Does that give you a clue as to what >>> happened? Dig in the datanode and namenode logs. Look for where the >>> exceptions start. It might give you a clue. >>> >>> + The suse regionserver log had garbage in it. >>> >>> St.Ack >>> >>> >>> Slava Gorelik wrote: >>> >>> >>> >>>> Hi. >>>> My happiness was very short :-( After i successfully added 1M rows (50k >>>> each row) i tried to add 10M rows. >>>> And after 3-4 working hours it started to dying. First one region server >>>> is died, after another one and eventually all cluster is dead. >>>> >>>> I attached log files (relevant part, archived) from region servers and >>>> from the master. >>>> >>>> Best Regards. >>>> >>>> >>>> >>>> On Mon, Oct 27, 2008 at 11:19 AM, Slava Gorelik < >>>> [EMAIL PROTECTED]<mailto: >>>> [EMAIL PROTECTED]>> wrote: >>>> >>>> Hi. >>>> So far so good, after changing the file descriptors >>>> and dfs.datanode.socket.write.timeout, dfs.datanode.max.xcievers >>>> my cluster works stable. >>>> Thank You and Best Regards. >>>> >>>> P.S. Regarding deleting multiple columns missing functionality i >>>> filled jira : https://issues.apache.org/jira/browse/HBASE-961 >>>> >>>> >>>> >>>> On Sun, Oct 26, 2008 at 12:58 AM, Michael Stack <[EMAIL PROTECTED] >>>> <mailto:[EMAIL PROTECTED]>> wrote: >>>> >>>> Slava Gorelik wrote: >>>> >>>> Hi.Haven't tried yet them, i'll try tomorrow morning. In >>>> general cluster is >>>> working well, the problems begins if i'm trying to add 10M >>>> rows, after 1.2M >>>> if happened. >>>> >>>> Anything else running beside the regionserver or datanodes >>>> that would suck resources? When datanodes begin to slow, we >>>> begin to see the issue Jean-Adrien's configurations address. >>>> Are you uploading using MapReduce? Are TTs running on same >>>> nodes as the datanode and regionserver? How are you doing the >>>> upload? Describe what your uploader looks like (Sorry if >>>> you've already done this). >>>> >>>> >>>> I already changed the limit of files descriptors, >>>> >>>> Good. >>>> >>>> >>>> I'll try >>>> to change the properties: >>>> <property> <name>dfs.datanode.socket.write.timeout</name> >>>> <value>0</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.datanode.max.xcievers</name> >>>> <value>1023</value> >>>> </property> >>>> >>>> >>>> Yeah, try it. >>>> >>>> >>>> And let you know, is any other prescriptions ? Did i miss >>>> something ? >>>> >>>> BTW, off topic, but i sent e-mail recently to the list and >>>> i can't see it: >>>> Is it possible to delete multiple columns in any way by >>>> regex : for example >>>> colum_name_* ? >>>> >>>> Not that I know of. If its not in the API, it should be. >>>> Mind filing a JIRA? >>>> >>>> Thanks Slava. >>>> St.Ack >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> > >
