Slava, http://wiki.apache.org/hadoop/Hbase/FAQ#5
J-D On Tue, Oct 28, 2008 at 3:31 PM, Slava Gorelik <[EMAIL PROTECTED]>wrote: > Hi.First of all i want to say thank you for you assistance !!! > > DEBUG on hadoop or hbase ? And how can i enable ? > fsck said that HDFS is healthy. > > Best Regards and Thank You > > > On Tue, Oct 28, 2008 at 8:45 PM, stack <[EMAIL PROTECTED]> wrote: > > > Slava Gorelik wrote: > > > >> Hi.HDFS capacity is about 800gb (8 datanodes) and the current usage is > >> about > >> 30GB. This is after total re-format of the HDFS that was made a hour > >> before. > >> > >> BTW, the logs i sent are from the first exception that i found in them. > >> Best Regards. > >> > >> > > Please enable DEBUG and retry. Send me all logs. What does the fsck on > > HDFS say? There is something seriously wrong with your cluster that you > are > > having so much trouble getting it running. Lets try and figure it. > > > > St.Ack > > > > > > > > > > > >> On Tue, Oct 28, 2008 at 7:12 PM, stack <[EMAIL PROTECTED]> wrote: > >> > >> > >> > >>> I took a quick look Slava (Thanks for sending the files). Here's a > few > >>> notes: > >>> > >>> + The logs are from after the damage is done; the transition from good > to > >>> bad is missing. If I could see that, that would help > >>> + But what seems to be plain is that that your HDFS is very sick. See > >>> this > >>> from head of one of the regionserver logs: > >>> > >>> 2008-10-27 23:41:12,682 WARN org.apache.hadoop.dfs.DFSClient: > >>> DataStreamer > >>> Exception: java.io.IOException: Unable to create new block. > >>> at > >>> > >>> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349) > >>> at > >>> > >>> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735) > >>> at > >>> > >>> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912) > >>> > >>> 2008-10-27 23:41:12,682 WARN org.apache.hadoop.dfs.DFSClient: Error > >>> Recovery for block blk_-5188192041705782716_60000 bad datanode[0] > >>> 2008-10-27 23:41:12,685 ERROR > >>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: > Compaction/Split > >>> failed for region > >>> BizDB,1.1.PerfBO1.f2188a42-5eb7-4a6a-82ef-2da0d0ea4ce0,1225136351518 > >>> java.io.IOException: Could not get block locations. Aborting... > >>> > >>> > >>> If HDFS is ailing, hbase is too. In fact, the regionservers will shut > >>> themselves to protect themselves against damaging or losing data: > >>> > >>> 2008-10-27 23:41:12,688 FATAL > >>> org.apache.hadoop.hbase.regionserver.Flusher: > >>> Replay of hlog required. Forcing server restart > >>> > >>> So, whats up with your HDFS? Not enough space alloted? What happens > if > >>> you run "./bin/hadoop fsck /"? Does that give you a clue as to what > >>> happened? Dig in the datanode and namenode logs. Look for where the > >>> exceptions start. It might give you a clue. > >>> > >>> + The suse regionserver log had garbage in it. > >>> > >>> St.Ack > >>> > >>> > >>> Slava Gorelik wrote: > >>> > >>> > >>> > >>>> Hi. > >>>> My happiness was very short :-( After i successfully added 1M rows > (50k > >>>> each row) i tried to add 10M rows. > >>>> And after 3-4 working hours it started to dying. First one region > server > >>>> is died, after another one and eventually all cluster is dead. > >>>> > >>>> I attached log files (relevant part, archived) from region servers and > >>>> from the master. > >>>> > >>>> Best Regards. > >>>> > >>>> > >>>> > >>>> On Mon, Oct 27, 2008 at 11:19 AM, Slava Gorelik < > >>>> [EMAIL PROTECTED]<mailto: > >>>> [EMAIL PROTECTED]>> wrote: > >>>> > >>>> Hi. > >>>> So far so good, after changing the file descriptors > >>>> and dfs.datanode.socket.write.timeout, dfs.datanode.max.xcievers > >>>> my cluster works stable. > >>>> Thank You and Best Regards. > >>>> > >>>> P.S. Regarding deleting multiple columns missing functionality i > >>>> filled jira : https://issues.apache.org/jira/browse/HBASE-961 > >>>> > >>>> > >>>> > >>>> On Sun, Oct 26, 2008 at 12:58 AM, Michael Stack <[EMAIL PROTECTED] > >>>> <mailto:[EMAIL PROTECTED]>> wrote: > >>>> > >>>> Slava Gorelik wrote: > >>>> > >>>> Hi.Haven't tried yet them, i'll try tomorrow morning. In > >>>> general cluster is > >>>> working well, the problems begins if i'm trying to add 10M > >>>> rows, after 1.2M > >>>> if happened. > >>>> > >>>> Anything else running beside the regionserver or datanodes > >>>> that would suck resources? When datanodes begin to slow, we > >>>> begin to see the issue Jean-Adrien's configurations address. > >>>> Are you uploading using MapReduce? Are TTs running on same > >>>> nodes as the datanode and regionserver? How are you doing the > >>>> upload? Describe what your uploader looks like (Sorry if > >>>> you've already done this). > >>>> > >>>> > >>>> I already changed the limit of files descriptors, > >>>> > >>>> Good. > >>>> > >>>> > >>>> I'll try > >>>> to change the properties: > >>>> <property> <name>dfs.datanode.socket.write.timeout</name> > >>>> <value>0</value> > >>>> </property> > >>>> > >>>> <property> > >>>> <name>dfs.datanode.max.xcievers</name> > >>>> <value>1023</value> > >>>> </property> > >>>> > >>>> > >>>> Yeah, try it. > >>>> > >>>> > >>>> And let you know, is any other prescriptions ? Did i miss > >>>> something ? > >>>> > >>>> BTW, off topic, but i sent e-mail recently to the list and > >>>> i can't see it: > >>>> Is it possible to delete multiple columns in any way by > >>>> regex : for example > >>>> colum_name_* ? > >>>> > >>>> Not that I know of. If its not in the API, it should be. > >>>> Mind filing a JIRA? > >>>> > >>>> Thanks Slava. > >>>> St.Ack > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >> > >> > > > > >
