[hypertable-dev] Re: RangeServer spending a lot of time in local_recover()

Joshua Taylor Sun, 07 Sep 2008 13:11:41 -0700

Hi Donald,

Thanks for the insights!  That's interesting that the server has so many
ranges loaded on it.  Does Hypertable not yet redistribute ranges for
balancing?


Looking in /hypertable/tables/X/default/, I see 4313 directories, which I
guess correspond to the ranges.  If what you're saying is true, then that
one server has all the ranges.  When I was looking at the METADATA table
earlier, I seem to remember that ranges seemed to be spread around as far as
the METADATA table was concerned.  I can't verify that now because half of
the RangeServers in the cluster went down after I tried the 15-way load last
night.  Maybe these log directories indicate that each range was created on
this one server, but isn't necessarily still hosted there.

Looking in table range directories, I see that most of them are empty.  Of
the 4313 table range directories only 12 have content, with the following
size distribution:

Name                     Size in bytes
71F33965BA815E48705DB484 772005
D611DD0EE66B8CF9FB4AA997 40917711
38D1E3EA8AD2F6D4BA9A4DF8 74199178
AB2A0D28DE6B77FFDD6C72AF 659455660576
4F07C111DD9998285C68F405 900
F449F89DDE481715AE83F46C 29046097
1A0950A7883F9AC068C6B5FD 54621737
9213BEAADBFF69E633617D98 900
6224D36D9A7D3C5B4AE941B2 131677668
6C33339858EDF470B771637C 132973214
64365528C0D82ED25FC7FFB0 170159530
C874EFC44725DB064046A0FF 900

It's really skewed, but maybe this isn't a big deal.  I'm going to guess
that the 650 GB slice corresponds to the end range of the table.  Most of
the data gets created here.  When a split happens, the new range holds a
reference to the files in the original range and never has the need to do a
compaction into its own data space.

As for the log recovery process...  when I wrote the last message, the
recovery was still happening and had been running for 115 minutes.  I let it
continue to run to see if it would actually finish, and it did.  Looking at
the log, it appears that it actually took around 180 minutes to complete and
get back to the outstanding scanner request, which had long since timed
out.  After the recovery, the server is back up to 2.8 GB of memory.  The
log directory still contains the 4300+ split directories, and the user
commit log directory still contains 350+ GB of data.

You suggest that the log data is supposed to be cleaned up.  I'm using a
post-0.9.0.10 build (v0.9.0.10-14-g50e5f71 to be exact).  It contains what I
think is the patch you're referencing:
commit 38bbfd60d1a52aff3230dea80aa4f3c0c07daae4
Author: Donald <[EMAIL PROTECTED]>
    Fixed a bug in RangeServer::schedule_log_cleanup_compactions that
prevents log cleanup com...

I'm hoping the maintenance task threads weren't too busy for this workload,
as it was pretty light.  This is a 15 server cluster with a single active
client writing to the table and nobody reading from the table.  Like I said
earlier, I tried a 15-way write after the recovery completed and half the
RangeServers died.  It looks like they all lost their Hyperspace lease, and
the Hyperspace.master machine was 80% in the iowait state with an load
average of 20 for a while.  The server hosts a HDFS data node, a
RangeServer, and Hyperspace.master.  Maybe Hyperspace.master needs a
dedicated server?  I should probably take that issue to another thread.

I'll look into it further, probably tomorrow.

Josh



On Sat, Sep 6, 2008 at 9:29 PM, Liu Kejia(Donald) <[EMAIL PROTECTED]>wrote:

>
> Hi Josh,
>
> The 4311 directories are for split logs, they are used while a range
> is splitting into two. This indicates at least you have 4K+ ranges on
> that server, which is pretty big (I usually have several hundreds per
> server). The 3670 files are commit log files, I think it's actually
> quite good performance to take 115 minutes to replay a total of 3.5G
> logs, you get 50MB/s throughput anyway. The problem is many of these
> commit log files should be removed over time, after compactions of the
> ranges take place. Ideally you'll only have 1 or 2 of these files left
> after all the maintenance tasks are done. If so, the replay process
> only costs several seconds.
>
> One reason why the commit log files are not getting reclaimed is due
> to a bug in the range server code, I've pushed out a fix for it and it
> should be included in the latest 0.9.0.10 release. Another reason
> could be that your maintenance task threads are too busy to get the
> work done in time, you may try to increase the number of maintenance
> tasks by setting Hypertable.RangeServer.MaintenanceThreads in your
> hypertable.cfg file.
>
> About load balance, I think your guess is right. About HDFS, it seems
> HDFS always tries to put one copy of the file block on the local
> datanode. This has good performance, but certainly bad load balance if
> you keep writing from one server.
>
> Donald
>
> On Sun, Sep 7, 2008 at 10:20 AM, Joshua Taylor <[EMAIL PROTECTED]>
> wrote:
> > I had a RangeServer process that was taking up around 5.8 GB of memory so
> I
> > shot it down and restarted it.  The RangeServer has spent the last 80
> > CPU-minutes (>115 minutes on the clock) in local_recover().  Is this
> normal?
> >
> > Looking around HDFS, I see around 3670 files in server's /.../log/user/
> > directory, most of which are around 100 MB in size (total directory size:
> > 351,031,700,665 bytes).  I also see 4311 directories in the parent
> > directory, of which 4309 are named with a 24 character hex string.  Spot
> > inspection of these shows that most (all?) of these contain a single 0
> byte
> > file named "0".
> >
> > The RangeServer log file since the restart currently contains over
> 835,000
> > lines.  The bulk seems to be lines like:
> >
> > 1220752472 INFO Hypertable.RangeServer :
> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
> > replay_update - length=30
> > 1220752472 INFO Hypertable.RangeServer :
> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
> > replay_update - length=30
> > 1220752472 INFO Hypertable.RangeServer :
> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
> > replay_update - length=30
> > 1220752472 INFO Hypertable.RangeServer :
> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
> > replay_update - length=30
> > 1220752472 INFO Hypertable.RangeServer :
> > (/home/josh/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:1553)
> > replay_update - length=30
> >
> > The memory usage may be the same issue that Donald was reporting earlier
> in
> > his discussion of fragmentation.  The new RangeServer process has grown
> up
> > to 1.5 GB of memory again, but the max cache size is 200 MB (default).
> >
> > I'd been loading into a 15-node Hypertable cluster all week using a
> single
> > loader process.  I'd loaded about 5 billion cells, or around 1.5 TB of
> data
> > before I decided to kill the loader because it was taking too long (and
> that
> > one server was getting huge).  The total data set size is around 3.5 TB
> and
> > it took under a week to generate the original set (using 15-way
> parallelism,
> > not just a single loader), so I decided to trying to load the rest in a
> > distributed manner.
> >
> > The loading was happening in ascending row order.  It seems like all of
> the
> > loading was happening on the same server.  I'm guessing that when splits
> > happened, the low range got moved off, and the same server continued to
> load
> > the end range.  That might explain why one server was getting all the
> > traffic.
> >
> > Looking at HDFS disk usage, the loaded server has 954 GB of disk used for
> > Hadoop and the other 14 all have around 140 GB of disk usage.  This
> behavior
> > also has me wondering what happens when that one machine fills up
> (another
> > couple hundred GB).  Does the whole system crash, or does HDFS get
> smarter
> > about balancing?
> >
> > Josh
> >
> >
> > >
> >
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[hypertable-dev] Re: RangeServer spending a lot of time in local_recover()

Reply via email to