Dear friends,
Thanks for your suggestions.
the problem is fixed!, I changed the frequencies update logic from:
try{insert}catch{update}, to : count=update;if (count==0) insert, with the
multithread strategy (also reusing connections), and it works like a charm,
700MB of text on a 13MB database...
Already I don´t know why this happens (apparently the problem was updating
on exception handling), and the trim operation didn't affect the whole
operation.
Best regards,
Héctor
try{
> st.execute();
> }
> catch(SQLIntegrityConstraintViolationException e){
> PreparedSta
Sai Pullabhotla wrote:
>
> Definitely the word.trim().toLowerCase() should appear before setting
> it on the statement. Otherwise, the statement would actually inserting
> the original word with white spaces and mixed cases. Which could be
> why there are so many words.
>
> Also, when you do an update in the catch block, I don't think you have
> to do the inner select. You could simply use a statement like this:
>
> update words set frequency=frequency+1 where word=?
>
> Sai Pullabhotla
> Phone: (402) 408-5753
> Fax: (402) 408-6861
> www.jMethods.com
>
>
>
> On Thu, Sep 11, 2008 at 7:39 AM, hcadavid <[EMAIL PROTECTED]> wrote:
>>
>> Dear friends,
>>
>> I'm using derby db to record word's frequencies from a large text corpus
>> with a java program. It works nice with standard statements, like:
>> "INSERT
>> INTO WORDS VALUES('"+word+"',1)" (it takes 50Mb to store 400000 words),
>> but
>> when I switched to prepared statements and inner statements(in order to
>> improve performance) and repeated the process, after few hours of
>> processing
>> (200MB of plain text), the database's disk consumption gets an absurd
>> dimension: 13GB!, I mean, 13GB of disk space to store 400000 words (of
>> standard length) and its frequencies!!. What may be the problem??
>> the biggest file is: seg0\c3c0.dat (13GB), there are no log files
>> problem.
>>
>> Here is how I'm making insertions and updates:
>>
>> Connection
>> con=EmbeddedDBMSConnectionBroker.getConnection();
>> PreparedStatement st=con.prepareStatement("INSERT INTO
>> WORDS
>> VALUES(?,1)");
>> st.setString(1, word);
>>
>> word=word.trim().toLowerCase();
>>
>> try{
>> st.execute();
>> }
>> catch(SQLIntegrityConstraintViolationException e){
>> PreparedStatement ps=con.prepareStatement("update
>> words set
>> frequency=((select frequency from words where word=?)+1) where word=?");
>> ps.setString(1, word);
>> ps.setString(2, word);
>> ps.execute();
>> }
>>
>> con.commit();
>> con.close();
>>
>> This method is used concurrently by 100 threads. Please, anyone know the
>> causes of this estrange Derby's behavior?? (handling GBs of disk space
>> just
>> for store few words isn't reasonable!).
>>
>> Thanks in advance
>>
>> Héctor
>> --
>> View this message in context:
>> http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19433858.html
>> Sent from the Apache Derby Users mailing list archive at Nabble.com.
>>
>>
>
>
--
View this message in context:
http://www.nabble.com/Derby-problem%3A-13GB-of-space-with-200000-records%21-tp19433858p19465767.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.