[
https://issues.apache.org/jira/browse/DERBY-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Matrigali reassigned DERBY-2338:
-------------------------------------
Assignee: Mike Matrigali
> improve page allocation when there are many threaded inserts to same table .
> ----------------------------------------------------------------------------
>
> Key: DERBY-2338
> URL: https://issues.apache.org/jira/browse/DERBY-2338
> Project: Derby
> Issue Type: Improvement
> Components: Store
> Affects Versions: 10.2.2.0, 10.7.0.0
> Reporter: Mike Matrigali
> Assignee: Mike Matrigali
> Attachments: fillrate.txt
>
>
> The derby strategy for picking the page to insert a new row on could be
> tuned to act better in a multi-user insert to same table environment.
> The algoriithm currently has the following information to use:
> 1) the number of the last page an insert was done on
> 2) a bit map indicating all the completely empty, but allocated pages
> 3) a bit map indicating all the allocated pages which are "half-filled",
> where half filled is very loosely defined. Basically pages with some
> rows on it, where a previous insert has not failed because it was too
> big.
> 4) 2 pointers to where we last left off in a linear search of half filled
> pages and free pages.
> Derby chooses to optimize inserts for future select performance by trying to
> fit entire rows on pages. To this end it uses a 3 try method
> when picking what page to insert a new row on (this is for base tables,
> referred internally as heaps). Not currently when the insert is attempted
> the store does not know how long the row is until after it picks the page to
> insert on, and streams it into a log record.
> 1) try to insert entire row on last page an insert was attempted. If it
> doesn't fit move on to next step.
> 2) try to insert entire row onto a "half-filled page", if it doesn't fit move
> on to next step.
> 3) get empty page and insert row and overflow any part that does not fit.
> There are a number of optimizations that could be done (if anyone chooses to
> work on one it may be better to log a separate jira to track that
> specific project):
> 1) in memory keep track of more than one last page. In a multi-user
> environment it may be better if one got a latch wait on the picked page to
> try a different page and keep a "group/queue" of last pages - maybe
> sorted by space available. It would be good if such code was zero
> admin in that it could configure itself based on the dynamic concurrency it
> recognized. The current algorithm works pretty well until there are
> concurrent inserts by multiple threads into the same one table at the same
> time (problems probably start showing up with either dual core or
> real multi-processor). Note that in the insert algorithm described above I
> believe there is a problem if many threads hit step 1 and each find
> they can't insert on page 100 but then all choose some different page as part
> of step 2 and/or 3. Especially if there are no unfilled pages each
> may allocate a new page and only the last one to do the insert will remember
> that as the "last page" and all subsequent inserts would start at
> that last page.
> 2) For step 2 and 3 one may know the entire size of the row , or at the very
> least the minimum size of the row. This info could be used to better
> pick a candidate page.
> 3) unfilled page tracking is very limited given only 1 bit per page in the
> allocation map - can't really tell difference between 1 byte left and page-1
> byte left. There are a couple of options. One could expand the on
> disk allocation maps. The downside is more disk overhead, and an upgrade
> issue. One could also just do a better job of maintaining in memory
> information in the alloc cache as one reads pages from disk and avoid the on
> disk changes. Just keeping a queue of recently seen unfilled pages
> inverse by space available might be a big improvement. The queue need not be
> the actual pages, just a page number/space available. The info
> could be treated as a hint so could avoid extra latching/concurrency issues.
> Please attach any other ideas.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.