P.S. The memory exhaustion problem will be fixed in the 0.9.2.4 release.

On Fri, Mar 20, 2009 at 10:37 AM, Doug Judd <[email protected]> wrote:

> With the help of Earle Ady, we've found and fixed the large load corruption
> problem with the 0.9.2.2 release.  To get the fixed version, please pull the
> latest code from the git 
> repository<http://code.google.com/p/hypertable/wiki/SourceCode?tm=4>.
> We'll be releasing 0.9.2.3 soon.
>
> Here's a summary of the problem:
>
> With the fix of issue 
> 246<http://code.google.com/p/hypertable/issues/detail?id=246>,
> compactions are now happening regularly as they should.  However, this has
> added substantial load on the system.  When a range split and the master was
> notified of the newly split-off range, the master selected (round-robin) a
> new RangeServer to own the range.  However, due to the increased load on the
> system and a 30 second hardcoded timeout in the Master, the
> RangeServer::load_range() command was timing out (It was taking 32 to 37
> seconds).  This timeout was reported back to the originating RangeServer,
> which paused a fifteen seconds and tried it again.  But on the second
> attempt to notify the Master of the newly split-off range, the Master would
> (round-robin) select another RangeServer and invoke
> RangeServer::load_range() on that (different) server.  This had the effect
> of the same range being loaded by three different RangeServers which was
> wreaking havoc with the system.  There were two fixes for this problem:
>
> 1. The hardcoded timeout was removed and (almost) all timeouts in the
> system are based on the "Hypertable.Request.Timeout" property which now has
> a default value of 180 seconds.
>
> 2. An interim fix was put in place in the Master where upon
> RangeServer::load_range() failure, the Master will remember what RangeServer
> it attmpted to do the load on.  The next time it gets notified and attempts
> to load the same range, it will choose the same RangeServer.  If it gets an
> error message back, RANGE_ALREADY_LOADED, it will interpret that as
> success.  The reason this fix is interim is because it does not persist the
> Range-to-RangeServer mapping information, so if it were to fail at an
> inopportune time and come back up, we'd be subject to the same failure.
> This will get fixed with Issue 74 - Master directed 
> RangeServer<http://code.google.com/p/hypertable/issues/detail?id=79>recovery 
> since the Master will have a meta-log and will be able to persist
> this mapping as re-constructible state information.
>
> After we fixed this problem, the next problem that Earle ran into was that
> the RangeServer was exhausting memory and crashing.  To fix this, we added
> the following property to the hypertable.cfg file on the machine that was
> doing the LOAD DATA INFILE:
>
> Hypertable.Lib.Mutator.FlushDelay=100
>
> Keep this in mind if you encounter the same problem.
>
> - Doug
>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to