[jira] Assigned: (DERBY-2338) improve page allocation when there are many threaded inserts to same table .

Mike Matrigali (JIRA) Tue, 29 Jun 2010 16:07:18 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mike Matrigali reassigned DERBY-2338:
-------------------------------------

    Assignee: Mike Matrigali

> improve page allocation when there are many threaded inserts to same table .
> ----------------------------------------------------------------------------
>
>                 Key: DERBY-2338
>                 URL: https://issues.apache.org/jira/browse/DERBY-2338
>             Project: Derby
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 10.2.2.0, 10.7.0.0
>            Reporter: Mike Matrigali
>            Assignee: Mike Matrigali
>         Attachments: fillrate.txt
>
>
> The derby strategy for picking the page to insert a new row on  could be 
> tuned to act better in a multi-user insert to same table environment.
> The algoriithm currently has the following information to use:
> 1) the number of the last page an insert was done on
> 2) a bit map indicating all the completely empty, but allocated pages
> 3) a bit map indicating all the allocated pages which are "half-filled", 
> where half filled is very loosely defined.  Basically pages with some
>      rows on it, where a previous insert has not failed because it was too 
> big.
> 4)  2 pointers to where we last left off in a linear search of half filled 
> pages and free pages.
> Derby  chooses to optimize inserts for future select performance by trying to 
> fit entire rows on pages.  To this end it uses a 3 try method
> when picking what page to insert a new row on (this is for base tables, 
> referred internally as heaps).  Not currently when the insert is attempted
> the store does not know how long the row is until after it picks the page to 
> insert on, and streams it into a log record.
> 1) try to insert entire row on last page an insert was attempted.  If it 
> doesn't fit move on to next step.
> 2) try to insert entire row onto a "half-filled page", if it doesn't fit move 
> on to next step.
> 3) get empty page and insert row and overflow any part that does not fit.
> There are a number of optimizations that could be done (if anyone chooses to 
> work on one it may be better to log a separate jira to track that
> specific project):
> 1) in memory keep track of more than one last page.  In a multi-user 
> environment it may be better if one got a latch wait on the picked page to
>      try a different page and keep a "group/queue" of last pages - maybe 
> sorted by space available.  It would be good if such code was zero
> admin in that it could configure itself based on the dynamic concurrency it 
> recognized.  The current algorithm works pretty well until there are
> concurrent inserts by multiple threads into the same one table at the same 
> time (problems probably start showing up with either dual core or
> real multi-processor).  Note that  in the insert algorithm described above I 
> believe there is a problem if many threads hit step 1 and each find
> they can't insert on page 100 but then all choose some different page as part 
> of step 2 and/or 3.  Especially if  there are no unfilled pages each
> may allocate a new page and only the last one to do the insert will remember 
> that as the "last page" and all subsequent inserts would start at
> that last page.
> 2) For step 2 and 3 one may know the entire size of the row , or at the very 
> least the minimum size of the row.  This info could be used to better
>     pick a candidate page.
> 3) unfilled page tracking is very limited given only 1 bit per page in the 
> allocation map - can't really tell difference between 1 byte left and page-1 
> byte left.  There are a couple of options.  One could expand the on
> disk allocation maps.  The downside is more disk overhead, and an upgrade 
> issue.  One could also just do a better job of maintaining in memory
> information in the alloc cache as one reads pages from disk and avoid the on 
> disk changes.  Just keeping a  queue of recently seen unfilled pages
> inverse by space available might be a big improvement.  The queue need not be 
> the actual pages, just a page number/space available.  The info
> could be treated as a hint so could avoid extra latching/concurrency issues.
> Please attach any other ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (DERBY-2338) improve page allocation when there are many threaded inserts to same table .

Reply via email to