[hypertable-dev] Re: RangeServer spending a lot of time in local_recover()

Liu Kejia(Donald) Sun, 07 Sep 2008 17:53:28 -0700

Hi Josh,

As far as I know, Hypertable doesn't redistribute ranges. The only
chance a range is loaded on another server is after a split, the lower
half is loaded on a target server, and this target server is chosen in
a round robin (among all live servers in hyperspace) fashion.


To see exactly what ranges are loaded in a range server, you may use
the rsstat or rsdump tool in hypertable/bin directory. You may also
want to try my stupid web monitoring tool: download from
http://groups.google.com/group/hypertable-dev/files

About Hyperspace, I suggest setting the expire time to 60 seconds. I
think putting Hypertable.Master + Hyperspace.Master + HDFS NameNode on
a dedicated server should also be a good idea.

Donald

On Mon, Sep 8, 2008 at 4:11 AM, Joshua Taylor <[EMAIL PROTECTED]> wrote:
> Hi Donald,
>
> Thanks for the insights!  That's interesting that the server has so many
> ranges loaded on it.  Does Hypertable not yet redistribute ranges for
> balancing?
>
> Looking in /hypertable/tables/X/default/, I see 4313 directories, which I
> guess correspond to the ranges.  If what you're saying is true, then that
> one server has all the ranges.  When I was looking at the METADATA table
> earlier, I seem to remember that ranges seemed to be spread around as far as
> the METADATA table was concerned.  I can't verify that now because half of
> the RangeServers in the cluster went down after I tried the 15-way load last
> night.  Maybe these log directories indicate that each range was created on
> this one server, but isn't necessarily still hosted there.
>
> Looking in table range directories, I see that most of them are empty.  Of
> the 4313 table range directories only 12 have content, with the following
> size distribution:
>
> Name                     Size in bytes
> 71F33965BA815E48705DB484 772005
> D611DD0EE66B8CF9FB4AA997 40917711
> 38D1E3EA8AD2F6D4BA9A4DF8 74199178
> AB2A0D28DE6B77FFDD6C72AF 659455660576
> 4F07C111DD9998285C68F405 900
> F449F89DDE481715AE83F46C 29046097
> 1A0950A7883F9AC068C6B5FD 54621737
> 9213BEAADBFF69E633617D98 900
> 6224D36D9A7D3C5B4AE941B2 131677668
> 6C33339858EDF470B771637C 132973214
> 64365528C0D82ED25FC7FFB0 170159530
> C874EFC44725DB064046A0FF 900
>
> It's really skewed, but maybe this isn't a big deal.  I'm going to guess
> that the 650 GB slice corresponds to the end range of the table.  Most of
> the data gets created here.  When a split happens, the new range holds a
> reference to the files in the original range and never has the need to do a
> compaction into its own data space.
>
> As for the log recovery process...  when I wrote the last message, the
> recovery was still happening and had been running for 115 minutes.  I let it
> continue to run to see if it would actually finish, and it did.  Looking at
> the log, it appears that it actually took around 180 minutes to complete and
> get back to the outstanding scanner request, which had long since timed
> out.  After the recovery, the server is back up to 2.8 GB of memory.  The
> log directory still contains the 4300+ split directories, and the user
> commit log directory still contains 350+ GB of data.
>
> You suggest that the log data is supposed to be cleaned up.  I'm using a
> post-0.9.0.10 build (v0.9.0.10-14-g50e5f71 to be exact).  It contains what I
> think is the patch you're referencing:
> commit 38bbfd60d1a52aff3230dea80aa4f3c0c07daae4
> Author: Donald <[EMAIL PROTECTED]>
>     Fixed a bug in RangeServer::schedule_log_cleanup_compactions that
> prevents log cleanup com...
>
> I'm hoping the maintenance task threads weren't too busy for this workload,
> as it was pretty light.  This is a 15 server cluster with a single active
> client writing to the table and nobody reading from the table.  Like I said
> earlier, I tried a 15-way write after the recovery completed and half the
> RangeServers died.  It looks like they all lost their Hyperspace lease, and
> the Hyperspace.master machine was 80% in the iowait state with an load
> average of 20 for a while.  The server hosts a HDFS data node, a
> RangeServer, and Hyperspace.master.  Maybe Hyperspace.master needs a
> dedicated server?  I should probably take that issue to another thread.
>
> I'll look into it further, probably tomorrow.
>
> Josh
>
>
>
> On Sat, Sep 6, 2008 at 9:29 PM, Liu Kejia(Donald) <[EMAIL PROTECTED]>
> wrote:
>>
>> Hi Josh,
>>
>> The 4311 directories are for split logs, they are used while a range
>> is splitting into two. This indicates at least you have 4K+ ranges on
>> that server, which is pretty big (I usually have several hundreds per
>> server). The 3670 files are commit log files, I think it's actually
>> quite good performance to take 115 minutes to replay a total of 3.5G
>> logs, you get 50MB/s throughput anyway. The problem is many of these
>> commit log files should be removed over time, after compactions of the
>> ranges take place. Ideally you'll only have 1 or 2 of these files left
>> after all the maintenance tasks are done. If so, the replay process
>> only costs several seconds.
>>
>> One reason why the commit log files are not getting reclaimed is due
>> to a bug in the range server code, I've pushed out a fix for it and it
>> should be included in the latest 0.9.0.10 release. Another reason
>> could be that your maintenance task threads are too busy to get the
>> work done in time, you may try to increase the number of maintenance
>> tasks by setting Hypertable.RangeServer.MaintenanceThreads in your
>> hypertable.cfg file.
>>
>> About load balance, I think your guess is right. About HDFS, it seems
>> HDFS always tries to put one copy of the file block on the local
>> datanode. This has good performance, but certainly bad load balance if
>> you keep writing from one server.
>>
>> Donald
>>
>> On Sun, Sep 7, 2008 at 10:20 AM, Joshua Taylor <[EMAIL PROTECTED]>
>> wrote:
>> > I had a RangeServer process that was taking up around 5.8 GB of memory
>> > so I
>> > shot it down and restarted it.  The RangeServer has spent the last 80
>> > CPU-minutes (>115 minutes on the clock) in local_recover().  Is this
>> > normal?
>> >
>> > Looking around HDFS, I see around 3670 files in server's /.../log/user/
>> > directory, most of which are around 100 MB in size (total directory
>> > size:
>> > 351,031,700,665 bytes).  I also see 4311 directories in the parent
>> > directory, of which 4309 are named with a 24 character hex string.  Spot
>> > inspection of these shows that most (all?) of these contain a single 0
>> > byte
>> > file named "0".
>> >
>> > The RangeServer log file since the restart currently contains over
>> > 835,000
>> > lines.  The bulk seems to be lines like:
>> >
>> > 1220752472 INFO Hypertable.RangeServer :
>> >
>> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
>> > replay_update - length=30
>> > 1220752472 INFO Hypertable.RangeServer :
>> >
>> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
>> > replay_update - length=30
>> > 1220752472 INFO Hypertable.RangeServer :
>> >
>> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
>> > replay_update - length=30
>> > 1220752472 INFO Hypertable.RangeServer :
>> >
>> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
>> > replay_update - length=30
>> > 1220752472 INFO Hypertable.RangeServer :
>> >
>> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
>> > replay_update - length=30
>> >
>> > The memory usage may be the same issue that Donald was reporting earlier
>> > in
>> > his discussion of fragmentation.  The new RangeServer process has grown
>> > up
>> > to 1.5 GB of memory again, but the max cache size is 200 MB (default).
>> >
>> > I'd been loading into a 15-node Hypertable cluster all week using a
>> > single
>> > loader process.  I'd loaded about 5 billion cells, or around 1.5 TB of
>> > data
>> > before I decided to kill the loader because it was taking too long (and
>> > that
>> > one server was getting huge).  The total data set size is around 3.5 TB
>> > and
>> > it took under a week to generate the original set (using 15-way
>> > parallelism,
>> > not just a single loader), so I decided to trying to load the rest in a
>> > distributed manner.
>> >
>> > The loading was happening in ascending row order.  It seems like all of
>> > the
>> > loading was happening on the same server.  I'm guessing that when splits
>> > happened, the low range got moved off, and the same server continued to
>> > load
>> > the end range.  That might explain why one server was getting all the
>> > traffic.
>> >
>> > Looking at HDFS disk usage, the loaded server has 954 GB of disk used
>> > for
>> > Hadoop and the other 14 all have around 140 GB of disk usage.  This
>> > behavior
>> > also has me wondering what happens when that one machine fills up
>> > (another
>> > couple hundred GB).  Does the whole system crash, or does HDFS get
>> > smarter
>> > about balancing?
>> >
>> > Josh
>> >
>> >
>> > >
>> >
>>
>>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[hypertable-dev] Re: RangeServer spending a lot of time in local_recover()

Reply via email to