Hi Josh, As far as I know, Hypertable doesn't redistribute ranges. The only chance a range is loaded on another server is after a split, the lower half is loaded on a target server, and this target server is chosen in a round robin (among all live servers in hyperspace) fashion.
To see exactly what ranges are loaded in a range server, you may use the rsstat or rsdump tool in hypertable/bin directory. You may also want to try my stupid web monitoring tool: download from http://groups.google.com/group/hypertable-dev/files About Hyperspace, I suggest setting the expire time to 60 seconds. I think putting Hypertable.Master + Hyperspace.Master + HDFS NameNode on a dedicated server should also be a good idea. Donald On Mon, Sep 8, 2008 at 4:11 AM, Joshua Taylor <[EMAIL PROTECTED]> wrote: > Hi Donald, > > Thanks for the insights! That's interesting that the server has so many > ranges loaded on it. Does Hypertable not yet redistribute ranges for > balancing? > > Looking in /hypertable/tables/X/default/, I see 4313 directories, which I > guess correspond to the ranges. If what you're saying is true, then that > one server has all the ranges. When I was looking at the METADATA table > earlier, I seem to remember that ranges seemed to be spread around as far as > the METADATA table was concerned. I can't verify that now because half of > the RangeServers in the cluster went down after I tried the 15-way load last > night. Maybe these log directories indicate that each range was created on > this one server, but isn't necessarily still hosted there. > > Looking in table range directories, I see that most of them are empty. Of > the 4313 table range directories only 12 have content, with the following > size distribution: > > Name Size in bytes > 71F33965BA815E48705DB484 772005 > D611DD0EE66B8CF9FB4AA997 40917711 > 38D1E3EA8AD2F6D4BA9A4DF8 74199178 > AB2A0D28DE6B77FFDD6C72AF 659455660576 > 4F07C111DD9998285C68F405 900 > F449F89DDE481715AE83F46C 29046097 > 1A0950A7883F9AC068C6B5FD 54621737 > 9213BEAADBFF69E633617D98 900 > 6224D36D9A7D3C5B4AE941B2 131677668 > 6C33339858EDF470B771637C 132973214 > 64365528C0D82ED25FC7FFB0 170159530 > C874EFC44725DB064046A0FF 900 > > It's really skewed, but maybe this isn't a big deal. I'm going to guess > that the 650 GB slice corresponds to the end range of the table. Most of > the data gets created here. When a split happens, the new range holds a > reference to the files in the original range and never has the need to do a > compaction into its own data space. > > As for the log recovery process... when I wrote the last message, the > recovery was still happening and had been running for 115 minutes. I let it > continue to run to see if it would actually finish, and it did. Looking at > the log, it appears that it actually took around 180 minutes to complete and > get back to the outstanding scanner request, which had long since timed > out. After the recovery, the server is back up to 2.8 GB of memory. The > log directory still contains the 4300+ split directories, and the user > commit log directory still contains 350+ GB of data. > > You suggest that the log data is supposed to be cleaned up. I'm using a > post-0.9.0.10 build (v0.9.0.10-14-g50e5f71 to be exact). It contains what I > think is the patch you're referencing: > commit 38bbfd60d1a52aff3230dea80aa4f3c0c07daae4 > Author: Donald <[EMAIL PROTECTED]> > Fixed a bug in RangeServer::schedule_log_cleanup_compactions that > prevents log cleanup com... > > I'm hoping the maintenance task threads weren't too busy for this workload, > as it was pretty light. This is a 15 server cluster with a single active > client writing to the table and nobody reading from the table. Like I said > earlier, I tried a 15-way write after the recovery completed and half the > RangeServers died. It looks like they all lost their Hyperspace lease, and > the Hyperspace.master machine was 80% in the iowait state with an load > average of 20 for a while. The server hosts a HDFS data node, a > RangeServer, and Hyperspace.master. Maybe Hyperspace.master needs a > dedicated server? I should probably take that issue to another thread. > > I'll look into it further, probably tomorrow. > > Josh > > > > On Sat, Sep 6, 2008 at 9:29 PM, Liu Kejia(Donald) <[EMAIL PROTECTED]> > wrote: >> >> Hi Josh, >> >> The 4311 directories are for split logs, they are used while a range >> is splitting into two. This indicates at least you have 4K+ ranges on >> that server, which is pretty big (I usually have several hundreds per >> server). The 3670 files are commit log files, I think it's actually >> quite good performance to take 115 minutes to replay a total of 3.5G >> logs, you get 50MB/s throughput anyway. The problem is many of these >> commit log files should be removed over time, after compactions of the >> ranges take place. Ideally you'll only have 1 or 2 of these files left >> after all the maintenance tasks are done. If so, the replay process >> only costs several seconds. >> >> One reason why the commit log files are not getting reclaimed is due >> to a bug in the range server code, I've pushed out a fix for it and it >> should be included in the latest 0.9.0.10 release. Another reason >> could be that your maintenance task threads are too busy to get the >> work done in time, you may try to increase the number of maintenance >> tasks by setting Hypertable.RangeServer.MaintenanceThreads in your >> hypertable.cfg file. >> >> About load balance, I think your guess is right. About HDFS, it seems >> HDFS always tries to put one copy of the file block on the local >> datanode. This has good performance, but certainly bad load balance if >> you keep writing from one server. >> >> Donald >> >> On Sun, Sep 7, 2008 at 10:20 AM, Joshua Taylor <[EMAIL PROTECTED]> >> wrote: >> > I had a RangeServer process that was taking up around 5.8 GB of memory >> > so I >> > shot it down and restarted it. The RangeServer has spent the last 80 >> > CPU-minutes (>115 minutes on the clock) in local_recover(). Is this >> > normal? >> > >> > Looking around HDFS, I see around 3670 files in server's /.../log/user/ >> > directory, most of which are around 100 MB in size (total directory >> > size: >> > 351,031,700,665 bytes). I also see 4311 directories in the parent >> > directory, of which 4309 are named with a 24 character hex string. Spot >> > inspection of these shows that most (all?) of these contain a single 0 >> > byte >> > file named "0". >> > >> > The RangeServer log file since the restart currently contains over >> > 835,000 >> > lines. The bulk seems to be lines like: >> > >> > 1220752472 INFO Hypertable.RangeServer : >> > >> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553) >> > replay_update - length=30 >> > 1220752472 INFO Hypertable.RangeServer : >> > >> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553) >> > replay_update - length=30 >> > 1220752472 INFO Hypertable.RangeServer : >> > >> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553) >> > replay_update - length=30 >> > 1220752472 INFO Hypertable.RangeServer : >> > >> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553) >> > replay_update - length=30 >> > 1220752472 INFO Hypertable.RangeServer : >> > >> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553) >> > replay_update - length=30 >> > >> > The memory usage may be the same issue that Donald was reporting earlier >> > in >> > his discussion of fragmentation. The new RangeServer process has grown >> > up >> > to 1.5 GB of memory again, but the max cache size is 200 MB (default). >> > >> > I'd been loading into a 15-node Hypertable cluster all week using a >> > single >> > loader process. I'd loaded about 5 billion cells, or around 1.5 TB of >> > data >> > before I decided to kill the loader because it was taking too long (and >> > that >> > one server was getting huge). The total data set size is around 3.5 TB >> > and >> > it took under a week to generate the original set (using 15-way >> > parallelism, >> > not just a single loader), so I decided to trying to load the rest in a >> > distributed manner. >> > >> > The loading was happening in ascending row order. It seems like all of >> > the >> > loading was happening on the same server. I'm guessing that when splits >> > happened, the low range got moved off, and the same server continued to >> > load >> > the end range. That might explain why one server was getting all the >> > traffic. >> > >> > Looking at HDFS disk usage, the loaded server has 954 GB of disk used >> > for >> > Hadoop and the other 14 all have around 140 GB of disk usage. This >> > behavior >> > also has me wondering what happens when that one machine fills up >> > (another >> > couple hundred GB). Does the whole system crash, or does HDFS get >> > smarter >> > about balancing? >> > >> > Josh >> > >> > >> > > >> > >> >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
