Re: Cannot open filename Exceptions

Zheng Lv Wed, 24 Mar 2010 20:01:51 -0700

Hello Stack,
  Yesterday we got another problem about "zookeeper session expired",
leading rs shutdown, which never happened before.
  I googled it, finding some docs about it, but didnot get things
really certain about how it happened and how to avoid it.
  Now I have put the corresponding logs to
http://rapidshare.com/files/367820690/208-0324.log.html.
  Look forward to your reply.
  Thank you.
    LvZheng


2010/3/24 Zheng Lv <lvzheng19800...@gmail.com>

> Hello Stack,
>   Thank you for your explainations, it's very helpful, Thank you.
>   If I get something new, I'll connect you.
>   Regards,
>     LvZheng
>
> 2010/3/24 Stack <st...@duboce.net>
>
>  On Tue, Mar 23, 2010 at 8:42 PM, Zheng Lv <lvzheng19800...@gmail.com>
>> wrote:
>> > Hello Stack,
>> >  >So, for sure ugly stuff is going on.  I filed
>> >  >https://issues.apache.org/jira/browse/HBASE-2365.  It looks like
>> we're
>> >  >doubly assigning a region.
>> >  Can you tell me how this happened in detail? Thanks a lot.
>> >
>>
>> Yes.
>>
>> Splits are run by the regionserver.  It figures a region needs to be
>> split and goes ahead closing the parent and creating the daughter
>> regions.  It then adds edits to the meta table offlining the parent
>> and inserting the two new daughter regions.  Next it sends a message
>> to the master telling it that a region has been split.   The message
>> contains names of the daughter regions.  On receipt of the message,
>> the master adds the new daughter regions to the unassigned regions
>> list so they'll be passed out the next time a regionserver checks in.
>>
>> Concurrently, the master is running a scan of the meta table every
>> minute making sure all is in order.  One thing it does is if it finds
>> unassigned regions, it'll add them to the unassigned regions (this
>> process is what gets all regions assigned after a startup).
>>
>> In your case, whats happening is that there is a long period between
>> the add of the new split regions to the meta table and the report of
>> split to the master.  During this time, the master meta scan ran,
>> found one of the daughters and went and assigned it.  Then the split
>> message came in and the daughter was assigned again!
>>
>> There was supposed to be protection against this happening IIRC.
>> Looking at responsible code, we are trying to defend against this
>> happening in ServerManager:
>>
>>  /*
>>   * Assign new daughter-of-a-split UNLESS its already been assigned.
>>   * It could have been assigned already in rare case where there was a
>> large
>>   * gap between insertion of the daughter region into .META. by the
>>   * splitting regionserver and receipt of the split message in master (See
>>   * HBASE-1784).
>>   * @param hri Region to assign.
>>   */
>>  private void assignSplitDaughter(final HRegionInfo hri) {
>>    MetaRegion mr =
>> this.master.regionManager.getFirstMetaRegionForRegion(hri);
>>    Get g = new Get(hri.getRegionName());
>>    g.addFamily(HConstants.CATALOG_FAMILY);
>>    try {
>>      HRegionInterface server =
>>        master.connection.getHRegionConnection(mr.getServer());
>>      Result r = server.get(mr.getRegionName(), g);
>>      // If size > 3 -- presume regioninfo, startcode and server -- then
>> presume
>>      // that this daughter already assigned and return.
>>      if (r.size() >= 3) return;
>>    } catch (IOException e) {
>>      LOG.warn("Failed get on " + HConstants.CATALOG_FAMILY_STR +
>>        "; possible double-assignment?", e);
>>    }
>>    this.master.regionManager.setUnassigned(hri, false);
>>  }
>>
>> So, the above is not working in your case for some reason.   I'll take
>> a look but I'm not sure I can figure it w/o DEBUG (thanks for letting
>> me know about the out-of-sync clocks... Now I can have more faith in
>> what the logs are telling me).
>>
>> >
>> >  >With DEBUG enabled have you been able to reproduce?
>> >  These days the exception did not appera again, if it would, I'll show
>> you
>> > the logs.
>> >
>>
>> For sure, if you come across it again, I'm interested.
>>
>> Thanks Zheng,
>> St.Ack
>>
>
>

Re: Cannot open filename Exceptions

Reply via email to