Posting your test to a JIRA issue would be best. It would be
interesting to post the space table results after each
insert/delete/compress iteration (or every 10, ...).
When do you commit (every row or every 10000)? Is it multi-threaded?
Does your
test always insert rows at one end of the index and delete them
from the other end. If so it may be DERBY-5473 (a runtime issue,
not a compress table issue).
inplace compress as currently inplemented does not do much for
indexes. Originally the problem it was solving was row level
space reclamation of base tables, indexes were not an issue. Lately
multiple reports of index space issues have been coming up so more
work would be profitable in this area.
Currently it's only use is to give back space to the OS if possible,
and that depends on the highest page number of a used page.
Here is a discussion of the 3 phases:
1) it does not do purging of index rows.
The original reason for this is that there is no space savings
from purging
alone in indexes. "half filled" pages can't be used as they are in
base tables. Each leaf page "owns" a range of keys and when a key
is inserted in this range and no space is found on the page, it
automatically at that point purges the deleted rows before splitting.
Given DERBY-5473 it may make sense to implement purging and merging
to handle cases where we "miss" reclaiming fully empty pages.
2) it does not do defragment of indexes.
Especially without a table level lock this is very complicated.
Moving rows is not possible as they only can go where they are. It
could be possible to move pages but then all pointers would also
have to fixed up. Currently only code exists to do splits under
row level locking, this operation would be much more complicated
than a split. The process is even more complicated in that one
can not look at a page in an index and get all necessary pointer
that need to be updated from that page (there are no parent pointers
on children).
3) it does try to truncate the file of an index if possible, but it does
no work to manipulate the pages such that the last page used in the
file is moved to earlier in the file.
Sundar Narayanaswamy wrote:
Hi,
I am trying to use Derby database in the embedded mode in an
application. I wrote a test program where I have a
table (named LOCATION) with 4 columns as below:
create table location(id int, num int, addr varchar(40), zip int,
primary key(id, zip))
create index loc_index on location (num)
I insert 10000 rows into the table, then delete all that rows. I then
call SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE
with just the option to purge rows so that the space left behind by
deleted rows can be reused for future inserts. I have
the inserts, deletes and compress in a loop. After running through the
loop a few times, I am noticing that the
number of allocated and unfilled pages for the primary key keeps growing
(causing database size to grow over time). The
longer I run the loop, larger these numbers are. That does not happen
for the table or the index files though..
CONGLOMERATE NumAllocatedPages NumFreePages
NumUnFilledPages
LOCATION 1
831 0
SQL111027234806120 1342 294
594
LOC_INDEX 1
521 1
The primary key space continues to grow even when I include the options
to defragment_rows and truncate_end to the above function.
CONGLOMERATE NumAllocatedPages NumFreePages NumUnFilledPages
LOCATION 1 0
0
SQL111027233119770 1674 47
704
LOC_INDEX 13 357
3
The SYSCS_UTIL.SYSCS_COMPRESS_TABLE function works well and leaves no
free/unfilled pages as expected. However, I am concerned with the
efficiency (time taken) of using the compress function since there could
be 10s of millions of rows in a production setup. It seems that the
SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE function with just the
purge_rows option would work best for my situation. Is
this function not expected to free up deleted space in primary key files ?
Thanks very much for you help,
Sundar.