[dba-issues] [Issue 99961] Modifying table design cau ses the db to grow (old data might be kept in file)

wimbaetens Fri, 06 Mar 2009 02:45:03 -0800

To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=99961
                 Issue #|99961
                 Summary|Modifying table design causes the db to grow (old data
                        | might be kept in file)
               Component|Database access
                 Version|OOo 3.0.1
                Platform|Unknown
                     URL|
              OS/Version|Windows 2000
                  Status|UNCONFIRMED
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|none
             Assigned to|dbaneedsconfirm
             Reported by|wimbaetens






------- Additional comments from wimbaet...@openoffice.org Fri Mar  6 10:44:59 
+0000 2009 -------
DB type=HSQLDB

When repeatedly modifying the table design (e.g. during a normalisation), I have
the feeling that a lot of (old) garbage data is still kept in the db-file. My
suspicion is confirmed when I create a totally new DB and copy table by table,
query by query to the new, empty db-file. The difference can be quite big: the
new copy can be 2MB, where the original db-file reaches sometimes 10x as much
(20MB!), which makes it a real pain to use that db. Is the old data somehow
staying behind in the db-file or am I doing something really stupid? 

I think that the following example may reproduce the phenomenon.

I started out with non-normalised tables, which I started to normalise, for
example (tblNames):
ID;Name;Type
"0";"name1";"some very long description containing a lot of text duplication"
"1";"name2";"some very long description containing a lot of text duplication"
"2";"name3";"some very long description containing a lot of text duplication"
"3";"name4";"another long description"

where 
ID=integer (primary key, autovalue)
Name=VARCHAR 10
Type=VARCHAR 255

For these four records, I have maximum 4x (integer+Varchar10+Varchar255) bytes =
4x (10+10+255) = 1100

The example used here contains only a few records. In reality, I had a table
with hundreds of records and only about three distinct values for 'Type'.

Created a second table (tblDescription):
ID;TXT
"0";"some very long description containing a lot of text duplication"
"1";"another long description"

tblNames gets a new column:
ID;Name;Type;D_ID
where tblNames.D_ID refers to tblDescription.ID

Eventually, column tblNames.Type is deleted.
At this point, I have the following estimate:
tblNames: 4x (integer+Varchar10+integer) = 4x (10+10+10) = 120
tblDescription: 2x (integer+Varchar255) = 2x (10+255) = 530
TOTAL: max 650 bytes

In practice, the db-file keeps growing, as if the original tblNames.type column
is still kept somewhere. It's true that this is just a small example, but as
stated above, a db can grow from 2 to 20MB using similar operations.
To get the data from the beginning, I use a query or table view that
dereferences tblNames.D_ID to tblDescription.TXT.

If necessary, I can come up with a larger example db-file (however, it will cost
me quite some time to repeat the process)

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@dba.openoffice.org
For additional commands, e-mail: issues-h...@dba.openoffice.org

[dba-issues] [Issue 99961] Modifying table design cau ses the db to grow (old data might be kept in file)

Reply via email to