Posting your test to a JIRA issue would be best. It would be interesting to post the space table results after each
insert/delete/compress iteration (or every 10, ...).
When do you commit (every row or every 10000)? Is it multi-threaded? Does your
test always insert rows at one end of the index and delete them
from the other end.  If so it may be DERBY-5473 (a runtime issue,
not a compress table issue).

inplace compress as currently inplemented does not do much for
indexes.  Originally the problem it was solving was row level
space reclamation of base tables, indexes were not an issue.  Lately
multiple reports of index space issues have been coming up so more
work would be profitable in this area.
Currently it's only use is to give back space to the OS if possible,
and that depends on the highest page number of a used page.

Here is a discussion of the 3 phases:

1) it does not do purging of index rows.
   The original reason for this is that there is no space savings
   from purging
   alone in indexes.  "half filled" pages can't be used as they are in
   base tables.  Each leaf page "owns" a range of keys and when a key
   is inserted in this range and no space is found on the page, it
   automatically at that point purges the deleted rows before splitting.

   Given DERBY-5473 it may make sense to implement purging and merging
   to handle cases where we "miss" reclaiming fully empty pages.

2) it does not do defragment of indexes.

   Especially without a table level lock this is very complicated.
   Moving rows is not possible as they only can go where they are.  It
   could be possible to move pages but then all pointers would also
   have to fixed up.  Currently only code exists to do splits under
   row level locking, this operation would be much more complicated
   than a split.  The process is even more complicated in that one
   can not look at a page in an index and get all necessary pointer
   that need to be updated from that page (there are no parent pointers
   on children).

3) it does try to truncate the file of an index if possible, but it does
   no work to manipulate the pages such that the last page used in the
   file is moved to earlier in the file.



Sundar Narayanaswamy wrote:
Hi,
I am trying to use Derby database in the embedded mode in an application. I wrote a test program where I have a
table (named LOCATION) with 4 columns as below:
create table location(id int, num int, addr varchar(40), zip int, primary key(id, zip))
create index loc_index on location (num)

I insert 10000 rows into the table, then delete all that rows. I then call SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE with just the option to purge rows so that the space left behind by deleted rows can be reused for future inserts. I have the inserts, deletes and compress in a loop. After running through the loop a few times, I am noticing that the number of allocated and unfilled pages for the primary key keeps growing (causing database size to grow over time). The longer I run the loop, larger these numbers are. That does not happen for the table or the index files though..

CONGLOMERATE NumAllocatedPages NumFreePages NumUnFilledPages LOCATION 1 831 0 SQL111027234806120 1342 294 594 LOC_INDEX 1 521 1 The primary key space continues to grow even when I include the options to defragment_rows and truncate_end to the above function.

CONGLOMERATE            NumAllocatedPages   NumFreePages    NumUnFilledPages
LOCATION 1 0 0 SQL111027233119770 1674 47 704 LOC_INDEX 13 357 3

The SYSCS_UTIL.SYSCS_COMPRESS_TABLE function works well and leaves no free/unfilled pages as expected. However, I am concerned with the efficiency (time taken) of using the compress function since there could be 10s of millions of rows in a production setup. It seems that the SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE function with just the purge_rows option would work best for my situation. Is
this function not expected to free up deleted space in primary key files ?

Thanks very much for you help,
Sundar.


Reply via email to