Dear friends,

I'm using derby db to record word's frequencies from a large text corpus
with a java program. It works nice with standard statements, like: "INSERT
INTO WORDS VALUES('"+word+"',1)" (it takes 50Mb to store 400000 words), but
when I switched to prepared statements and inner statements(in order to
improve performance) and repeated the process, after few hours of processing
(200MB of plain text), the database's disk consumption gets an absurd
dimension: 13GB!, I mean, 13GB of disk space to store 400000 words (of
standard length) and its frequencies!!. What may be the problem??
the biggest file is: seg0\c3c0.dat (13GB), there are no log files problem.

Here is how I'm making insertions and updates:

                Connection con=EmbeddedDBMSConnectionBroker.getConnection();
                PreparedStatement st=con.prepareStatement("INSERT INTO WORDS
VALUES(?,1)");
                st.setString(1, word);
                
                word=word.trim().toLowerCase();
                
                try{
                        st.execute();   
                }
                catch(SQLIntegrityConstraintViolationException e){
                        PreparedStatement ps=con.prepareStatement("update words 
set
frequency=((select frequency from words where word=?)+1) where word=?");
                        ps.setString(1, word);
                        ps.setString(2, word);
                        ps.execute();
                }
                
                con.commit();
                con.close();

This method is used concurrently by 100 threads. Please, anyone know the
causes of this estrange Derby's behavior?? (handling GBs of disk space just
for store few words isn't reasonable!).

Thanks in advance

Héctor
-- 
View this message in context: 
http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19433858.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.

Reply via email to