Re: [sqlite] Speed up DELETE of a lot of records

2010-10-12 Thread Simon Slavin

On 12 Oct 2010, at 6:30pm, Scott Hess wrote:

>  Something like:
> 
>  INSERT INTO my_table VALUES (blah, blah, blah);
>  DELETE FROM my_table WHERE date_or_whatever < some_limit LIMIT 5;
> 
> If your insert load consistently tracks from day to day, this mostly
> looks like a circular buffer.  The main downside is that it makes your
> heaviest insert period also be your heaviest delete period, which is
> unfortunate.

Instead of having a queue length which is an integer number of days, have one 
which is, for example, 7.5 days long.  That way at 3pm you're deleting the 
records which were made at 3am in the morning.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-12 Thread Scott Hess
On Mon, Oct 11, 2010 at 8:13 AM, Jay A. Kreibich  wrote:
> On Mon, Oct 11, 2010 at 02:08:54PM +0200, Michele Pradella scratched on the 
> wall:
>>   Ok so the main idea it's always the same: split the DELETE to make the
>> operation on less records, but do it more often.
>
>  Another thought occurs to me...   If your insert rates are fairly
>  consistent (e.g. the number of records per minute is consistent, or
>  at least predictable) then you can just use the table as a circular
>  buffer.  That is, rather than inserting new data and deleting the
>  old data, simply overwrite the older records with the new data.
>  Rather than inserting new records, simply find the oldest record and
>  update it with the new data.

In a past life, I ran a large production system on MYSQL, and what we
did in a similar situation is add the appropriate delete after
inserts.  Something like:

  INSERT INTO my_table VALUES (blah, blah, blah);
  DELETE FROM my_table WHERE date_or_whatever < some_limit LIMIT 5;

If your insert load consistently tracks from day to day, this mostly
looks like a circular buffer.  The main downside is that it makes your
heaviest insert period also be your heaviest delete period, which is
unfortunate.  The LIMIT 5 is there so that if your volume varies, you
eventually catch up.  If your volume varies a lot, you probably would
need to increase the value.

It might seem like this will make inserts twice as slow, but I'd
implement it this way before complexifying things.  Since the index
the delete runs against is probably in the page cache, you might find
that it works just fine.  If I had to modify it, I might have it do
the delete on every 100th insert, and change the limit to 100 +
epsilon.  After the delete, use sqlite3_changes() to check whether it
deleted to the limit, and increase epsilon a bit for the next pass,
until the limit is not reached, then drop epsilon back towards 0.
But, like I said, you might find that the simple solution does the job
just fine.

Modifying your selects to ignore too-old records shouldn't slow things
down materially.  If you had done the query 10 minutes ago, you'd have
gotten those 25 records, and when you do it now you get those 25
records plus 1 more, and throw away 3 of them - but if it would have
been adequate performance 10 minutes ago, why is it not adequate now?
So long as you keep the number of filtered records contained, it's
just a slight constant factor, and if you keep them very tightly
contained, the cost is almost minimal unless the query could otherwise
run against indices along (once SQLite has fetched the row from the
table, doing an additional check on data within that row is generally
not expensive).

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-12 Thread Michele Pradella
  Thank you for the advice, I'll try it in my application.
regards

Il 12/10/2010 17.17, Pavel Ivanov ha scritto:
> Michele,
>
> Here is another thought for you to consider. Apparently your
> application consistently generates some records, each record is marked
> with a timestamp of its creation and after some time you have to
> garbage-collect all records that are at least at a certain amount of
> time in the past. You can do this with generational gc: split your
> full database in parts according to record creation time and then your
> garbage collection will consist of deletion of database file from file
> system without deleting any records. E.g. let's say you start with one
> part, you write your records in it for 15 minutes. Then you create new
> part and write all new records in it. Old part just sits there for
> querying. After 15 minutes you create new part and so on. Then when
> time comes and all records in the first part must be deleted you just
> delete that database file and that's it - no interruption in access
> for newly created records. Of course this implementation complicates
> querying of existing records especially when you need some statistics
> pretty often (as opposed to querying individual records) but for some
> access patterns it can give you a huge performance improvement (it
> actually did for my application).
>
>
> Pavel
>
> On Mon, Oct 11, 2010 at 11:13 AM, Jay A. Kreibich  wrote:
>> On Mon, Oct 11, 2010 at 02:08:54PM +0200, Michele Pradella scratched on the 
>> wall:
>>>Ok so the main idea it's always the same: split the DELETE to make the
>>> operation on less records, but do it more often.
>>   Another thought occurs to me...   If your insert rates are fairly
>>   consistent (e.g. the number of records per minute is consistent, or
>>   at least predictable) then you can just use the table as a circular
>>   buffer.  That is, rather than inserting new data and deleting the
>>   old data, simply overwrite the older records with the new data.
>>   Rather than inserting new records, simply find the oldest record and
>>   update it with the new data.
>>
>>   If the records have similar static length content (e.g. no variable
>>   length text strings) this should be fairly fast, and will eliminate
>>   the need to to delete the whole records.  The UPDATE might be a tad
>>   slower than a unencumbered INSERT, but it is still likely to be
>>   faster than an INSERT plus a later DELETE.
>>
>>   Be careful, however, as even integers are variable-length records in
>>   SQLite (if their magnitude is different).  It might be best to use
>>   fixed size strings, even for the numeric values.  The UPDATE will be
>>   much faster if the new data fits into the same "slot" as the old
>>   record, and it can be updated in-place.
>>
>>   You could setup the inserts to find the oldest time and update that
>>   records (which should be fairly quick if there is an index on your
>>   timestamp column) or you could just manually create however many
>>   NULL entries you need and explicitly update incremental records.
>>   When the application starts up, just find the oldest date and
>>   continue.
>>
>>   This all depends on being able to predict the number of records
>>   required to meet you storage needs, however.  I suppose you could
>>   allow the database to expand as needed (that is, get the oldest date,
>>   and if it is not outside your window, INSERT rather than UPDATE).
>>   There are lots of ways to do this, the specifics depend on your
>>   needs.
>>
>>
>>-j
>>
>> --
>> Jay A. Kreibich<  J A Y  @  K R E I B I.C H>
>>
>> "Intelligence is like underwear: it is important that you have it,
>>   but showing it to the wrong people has the tendency to make them
>>   feel uncomfortable." -- Angela Johnson
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-12 Thread Pavel Ivanov
Michele,

Here is another thought for you to consider. Apparently your
application consistently generates some records, each record is marked
with a timestamp of its creation and after some time you have to
garbage-collect all records that are at least at a certain amount of
time in the past. You can do this with generational gc: split your
full database in parts according to record creation time and then your
garbage collection will consist of deletion of database file from file
system without deleting any records. E.g. let's say you start with one
part, you write your records in it for 15 minutes. Then you create new
part and write all new records in it. Old part just sits there for
querying. After 15 minutes you create new part and so on. Then when
time comes and all records in the first part must be deleted you just
delete that database file and that's it - no interruption in access
for newly created records. Of course this implementation complicates
querying of existing records especially when you need some statistics
pretty often (as opposed to querying individual records) but for some
access patterns it can give you a huge performance improvement (it
actually did for my application).


Pavel

On Mon, Oct 11, 2010 at 11:13 AM, Jay A. Kreibich  wrote:
> On Mon, Oct 11, 2010 at 02:08:54PM +0200, Michele Pradella scratched on the 
> wall:
>>   Ok so the main idea it's always the same: split the DELETE to make the
>> operation on less records, but do it more often.
>
>  Another thought occurs to me...   If your insert rates are fairly
>  consistent (e.g. the number of records per minute is consistent, or
>  at least predictable) then you can just use the table as a circular
>  buffer.  That is, rather than inserting new data and deleting the
>  old data, simply overwrite the older records with the new data.
>  Rather than inserting new records, simply find the oldest record and
>  update it with the new data.
>
>  If the records have similar static length content (e.g. no variable
>  length text strings) this should be fairly fast, and will eliminate
>  the need to to delete the whole records.  The UPDATE might be a tad
>  slower than a unencumbered INSERT, but it is still likely to be
>  faster than an INSERT plus a later DELETE.
>
>  Be careful, however, as even integers are variable-length records in
>  SQLite (if their magnitude is different).  It might be best to use
>  fixed size strings, even for the numeric values.  The UPDATE will be
>  much faster if the new data fits into the same "slot" as the old
>  record, and it can be updated in-place.
>
>  You could setup the inserts to find the oldest time and update that
>  records (which should be fairly quick if there is an index on your
>  timestamp column) or you could just manually create however many
>  NULL entries you need and explicitly update incremental records.
>  When the application starts up, just find the oldest date and
>  continue.
>
>  This all depends on being able to predict the number of records
>  required to meet you storage needs, however.  I suppose you could
>  allow the database to expand as needed (that is, get the oldest date,
>  and if it is not outside your window, INSERT rather than UPDATE).
>  There are lots of ways to do this, the specifics depend on your
>  needs.
>
>
>   -j
>
> --
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "Intelligence is like underwear: it is important that you have it,
>  but showing it to the wrong people has the tendency to make them
>  feel uncomfortable." -- Angela Johnson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-11 Thread Jay A. Kreibich
On Mon, Oct 11, 2010 at 02:08:54PM +0200, Michele Pradella scratched on the 
wall:
>   Ok so the main idea it's always the same: split the DELETE to make the 
> operation on less records, but do it more often.

  Another thought occurs to me...   If your insert rates are fairly
  consistent (e.g. the number of records per minute is consistent, or
  at least predictable) then you can just use the table as a circular
  buffer.  That is, rather than inserting new data and deleting the 
  old data, simply overwrite the older records with the new data.
  Rather than inserting new records, simply find the oldest record and
  update it with the new data.

  If the records have similar static length content (e.g. no variable
  length text strings) this should be fairly fast, and will eliminate
  the need to to delete the whole records.  The UPDATE might be a tad
  slower than a unencumbered INSERT, but it is still likely to be
  faster than an INSERT plus a later DELETE.

  Be careful, however, as even integers are variable-length records in
  SQLite (if their magnitude is different).  It might be best to use
  fixed size strings, even for the numeric values.  The UPDATE will be
  much faster if the new data fits into the same "slot" as the old
  record, and it can be updated in-place.

  You could setup the inserts to find the oldest time and update that
  records (which should be fairly quick if there is an index on your
  timestamp column) or you could just manually create however many
  NULL entries you need and explicitly update incremental records.
  When the application starts up, just find the oldest date and
  continue.

  This all depends on being able to predict the number of records
  required to meet you storage needs, however.  I suppose you could
  allow the database to expand as needed (that is, get the oldest date,
  and if it is not outside your window, INSERT rather than UPDATE).
  There are lots of ways to do this, the specifics depend on your
  needs.


   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-11 Thread Michele Pradella
  Ok so the main idea it's always the same: split the DELETE to make the 
operation on less records, but do it more often.

Il 11/10/2010 13.33, Black, Michael (IS) ha scritto:
> I said this before but never saw a response...
>
> Just put your delete inside a select loop so it will always be interruptable.
>
> I assume you are talking about writing your own code in C or such?
>
> So rather than
> DELETE FROM TABLE WHERE datetime<  oldtime;
>
> Do this;
> SELECT datetime from TABLE where datetime<  oldtime;
> BEGIN
> for each record
> DELETE FROM TABLE where datetime=thistime;
> COMMIT
>
> I don't know how distinct your datetime values are...this delete could delete 
> multiple records at once which would be OK.
>
> One other possibility is to break out your delete statements like this:
>
> DELETE FROM TABLE WHERE datetime<  oldtime AND ((datetime % 100)=0)
> DELETE FROM TABLE WHERE datetime<  oldtime AND ((datetime % 100)=1)
> DELETE FROM TABLE WHERE datetime<  oldtime AND ((datetime % 100)=2)
> DELETE FROM TABLE WHERE datetime<  oldtime AND ((datetime % 100)=3)
> ...
>
> So each delete would be approx 100 times faster and would allow interruption 
> inbetween deletes.
>
>
>
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Northrop Grumman Information Systems
>
>
> 
>
> From: sqlite-users-boun...@sqlite.org on behalf of Michele Pradella
> Sent: Mon 10/11/2010 4:56 AM
> To: General Discussion of SQLite Database
> Subject: EXTERNAL:Re: [sqlite] Speed up DELETE of a lot of records
>
>
>
>
>I know that in this use case UPDATE is lither than DELETE, but if I
> make a DELETE at 4am I can have the DB locked for a lot of time at 4am:
> I'm only shift the problem.
> It's not a problem of interface efficiency for user experience: the goal
> is to make the system always reactive without slow down all the other DB
> operation.
>
> Il 11/10/2010 11.46, Simon Slavin ha scritto:
>> On 11 Oct 2010, at 10:26am, Michele Pradella wrote:
>>
>>>Soft delete could increase the SELECT speed because you have to check
>>> always for the "deleted" column.
>>> Moreover the DB will grow up without limit if no one physically delete
>>> the records: anyway UPDATE of a lot of records could be expensive too.
>>> I think the only way is to DELETE more frequently less records.
>> I recommend that you try it.  At least in the command-line application if 
>> not in your own code.  Deleting a record is very expensive in terms of time 
>> and resources.  Changing a field from a 1 to a 0 is far cheaper and faster, 
>> even if it's in an index or two.
>>
>> You can have a regular task that runs at 4am that deletes all the records 
>> with a 0 in.  That bit is easy.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>
> --
> Selea s.r.l.
>
>
>  Michele Pradella R&D
>
>
>  SELEA s.r.l.
>
> Via Aldo Moro 69
> Italy - 46019 Cicognara (MN)
> Tel +39 0375 889091
> Fax +39 0375 889080
> *michele.prade...@selea.com*
> *http://www.selea.com*
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-11 Thread Michele Pradella
  It's what I'm doing: I'm deleting every 15 minutes of a small number 
of records at a time.

Il 11/10/2010 12.40, Simon Slavin ha scritto:
> On 11 Oct 2010, at 10:56am, Michele Pradella wrote:
>
>>   I know that in this use case UPDATE is lither than DELETE, but if I
>> make a DELETE at 4am I can have the DB locked for a lot of time at 4am:
>> I'm only shift the problem.
>> It's not a problem of interface efficiency for user experience: the goal
>> is to make the system always reactive without slow down all the other DB
>> operation.
> Then delete the records immediately.  Don't wait for a batch to build up.  
> Delete some every minute.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-11 Thread Simon Slavin

On 11 Oct 2010, at 10:56am, Michele Pradella wrote:

>  I know that in this use case UPDATE is lither than DELETE, but if I 
> make a DELETE at 4am I can have the DB locked for a lot of time at 4am: 
> I'm only shift the problem.
> It's not a problem of interface efficiency for user experience: the goal 
> is to make the system always reactive without slow down all the other DB 
> operation.

Then delete the records immediately.  Don't wait for a batch to build up.  
Delete some every minute.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-11 Thread Michele Pradella
  I know that in this use case UPDATE is lither than DELETE, but if I 
make a DELETE at 4am I can have the DB locked for a lot of time at 4am: 
I'm only shift the problem.
It's not a problem of interface efficiency for user experience: the goal 
is to make the system always reactive without slow down all the other DB 
operation.

Il 11/10/2010 11.46, Simon Slavin ha scritto:
> On 11 Oct 2010, at 10:26am, Michele Pradella wrote:
>
>>   Soft delete could increase the SELECT speed because you have to check
>> always for the "deleted" column.
>> Moreover the DB will grow up without limit if no one physically delete
>> the records: anyway UPDATE of a lot of records could be expensive too.
>> I think the only way is to DELETE more frequently less records.
> I recommend that you try it.  At least in the command-line application if not 
> in your own code.  Deleting a record is very expensive in terms of time and 
> resources.  Changing a field from a 1 to a 0 is far cheaper and faster, even 
> if it's in an index or two.
>
> You can have a regular task that runs at 4am that deletes all the records 
> with a 0 in.  That bit is easy.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-11 Thread Simon Slavin

On 11 Oct 2010, at 10:26am, Michele Pradella wrote:

>  Soft delete could increase the SELECT speed because you have to check 
> always for the "deleted" column.
> Moreover the DB will grow up without limit if no one physically delete 
> the records: anyway UPDATE of a lot of records could be expensive too.
> I think the only way is to DELETE more frequently less records.

I recommend that you try it.  At least in the command-line application if not 
in your own code.  Deleting a record is very expensive in terms of time and 
resources.  Changing a field from a 1 to a 0 is far cheaper and faster, even if 
it's in an index or two.

You can have a regular task that runs at 4am that deletes all the records with 
a 0 in.  That bit is easy.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-11 Thread Michele Pradella
  Soft delete could increase the SELECT speed because you have to check 
always for the "deleted" column.
Moreover the DB will grow up without limit if no one physically delete 
the records: anyway UPDATE of a lot of records could be expensive too.
I think the only way is to DELETE more frequently less records.
If someone have a smart idea, I'm listening ;)

Il 08/10/2010 22.54, Nicolas Williams ha scritto:
> On Fri, Oct 08, 2010 at 05:49:18PM +0100, Simon Slavin wrote:
>> On 8 Oct 2010, at 5:48pm, Stephan Wehner wrote:
>>> On Fri, Oct 8, 2010 at 7:14 AM, Michele Pradella
>>>   wrote:
   "science fiction?" was a rhetorically question. I'm only wondering
 about what is the best and fastest way to DELETE a lot of records from
 huge DB. I know and understand physical limit of data moving: anyway for
 now I'm trying to split the BIG DELETE in some smaller DELETE to spread
 the time used. It's the only way I can figure out at the moment.
>>> Is a soft-delete faster? Then you could add a slow-moving delete
>>> (mentioned earlier by Aldes Rossi, for example)
>>> for the soft-deleted records.
>> Soft-delete ?  Is that having another column which is a '1' if the
>> record is supposed to exist and a '0' if it's supposed to be deleted ?
> Use NULL to indicate deleted-ness.  But note that you may still have to
> update lots of rows and indexes.  Indeed, a true delete might not need
> much more I/O (whereas a VACUUM after the DELETE would).
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Nicolas Williams
On Fri, Oct 08, 2010 at 05:49:18PM +0100, Simon Slavin wrote:
> On 8 Oct 2010, at 5:48pm, Stephan Wehner wrote:
> > On Fri, Oct 8, 2010 at 7:14 AM, Michele Pradella
> >  wrote:
> >>  "science fiction?" was a rhetorically question. I'm only wondering
> >> about what is the best and fastest way to DELETE a lot of records from
> >> huge DB. I know and understand physical limit of data moving: anyway for
> >> now I'm trying to split the BIG DELETE in some smaller DELETE to spread
> >> the time used. It's the only way I can figure out at the moment.
> > 
> > Is a soft-delete faster? Then you could add a slow-moving delete
> > (mentioned earlier by Aldes Rossi, for example)
> > for the soft-deleted records.
> 
> Soft-delete ?  Is that having another column which is a '1' if the
> record is supposed to exist and a '0' if it's supposed to be deleted ?

Use NULL to indicate deleted-ness.  But note that you may still have to
update lots of rows and indexes.  Indeed, a true delete might not need
much more I/O (whereas a VACUUM after the DELETE would).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Stephan Wehner
On Fri, Oct 8, 2010 at 9:49 AM, Simon Slavin  wrote:
>
> On 8 Oct 2010, at 5:48pm, Stephan Wehner wrote:
>
>> On Fri, Oct 8, 2010 at 7:14 AM, Michele Pradella
>>  wrote:
>>>  "science fiction?" was a rhetorically question. I'm only wondering
>>> about what is the best and fastest way to DELETE a lot of records from
>>> huge DB. I know and understand physical limit of data moving: anyway for
>>> now I'm trying to split the BIG DELETE in some smaller DELETE to spread
>>> the time used. It's the only way I can figure out at the moment.
>>
>> Is a soft-delete faster? Then you could add a slow-moving delete
>> (mentioned earlier by Aldes Rossi, for example)
>> for the soft-deleted records.
>
> Soft-delete ?  Is that having another column which is a '1' if the record is 
> supposed to exist and a '0' if it's supposed to be deleted ?
>

Yes, that's what I meant. Usually call the column 'deleted'; 1 means
deleted, 0 means not-deleted.

Stephan

> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Stephan Wehner

-> http://stephan.sugarmotor.org (blog and homepage)
-> http://loggingit.com
-> http://www.thrackle.org
-> http://www.buckmaster.ca
-> http://www.trafficlife.com
-> http://stephansmap.org -- http://blog.stephansmap.org
-> http://twitter.com/stephanwehner / @stephanwehner
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Simon Slavin

On 8 Oct 2010, at 5:48pm, Stephan Wehner wrote:

> On Fri, Oct 8, 2010 at 7:14 AM, Michele Pradella
>  wrote:
>>  "science fiction?" was a rhetorically question. I'm only wondering
>> about what is the best and fastest way to DELETE a lot of records from
>> huge DB. I know and understand physical limit of data moving: anyway for
>> now I'm trying to split the BIG DELETE in some smaller DELETE to spread
>> the time used. It's the only way I can figure out at the moment.
> 
> Is a soft-delete faster? Then you could add a slow-moving delete
> (mentioned earlier by Aldes Rossi, for example)
> for the soft-deleted records.

Soft-delete ?  Is that having another column which is a '1' if the record is 
supposed to exist and a '0' if it's supposed to be deleted ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Stephan Wehner
On Fri, Oct 8, 2010 at 7:14 AM, Michele Pradella
 wrote:
>  "science fiction?" was a rhetorically question. I'm only wondering
> about what is the best and fastest way to DELETE a lot of records from
> huge DB. I know and understand physical limit of data moving: anyway for
> now I'm trying to split the BIG DELETE in some smaller DELETE to spread
> the time used. It's the only way I can figure out at the moment.
>

Is a soft-delete faster? Then you could add a slow-moving delete
(mentioned earlier by Aldes Rossi, for example)
for the soft-deleted records.

Stephan

> Il 08/10/2010 15.55, Jay A. Kreibich ha scritto:
>> On Fri, Oct 08, 2010 at 09:09:09AM +0200, Michele Pradella scratched on the 
>> wall:
>>>    I was thinking this too, but I take this for last chance: my hope is I
>>> can delete 5 millions of records in few seconds, science fiction? :)
>>    Science fiction of the worst B-grade sort.
>>
>>    Think about the numbers.  You're talking about updating a significant
>>    chunk of a multi-gigabyte file.  The WAL file tells you the changes
>>    amount to ~600MB of writes.  That's a whole CDs worth of data.  These
>>    days that might not be much for storage, but it is still a lot of
>>    data to move around.  Even if your storage system has a continuous,
>>    sustained write ability of 20MB/sec, that's a half minute.  How fast
>>    can your disk copy 600MB worth of data?
>>
>>    But you're not just writing.  You're doing a lot of reads from all
>>    over the file in an attempt to figure out what to modify and write.
>>    Both the reads and the writes (the integration, at least) are
>>    scattered and small, so you're not going to get anywhere near the
>>    sustained performance levels.  10x slower would be extremely good.
>>
>>    Or think of it in more physical numbers... If you're using a single
>>    vanilla disk, it likely spins at 7200 RPMs.  If it takes five minutes
>>    to update 5,000,000 records, that's an average of almost 140 records
>>    per disk revolution.  That's pretty good, considering everything else
>>    that is going on!
>>
>>
>>
>>    The only possible way to manipulate that much data in a "few seconds"
>>    is to load up on RAM, get a real operating system, and throw the
>>    whole database into memory.  Or spend many, many, many thousands of
>>    dollars on a very wide disk array with a very large battery-backed
>>    cache and a huge pipe between your host and the array.
>>
>>    Big storage is cheap.  Fast storage is not.  Don't confuse the two.
>>
>>     -j
>>
>>
>
>
> --
> Selea s.r.l.
>
>
>        Michele Pradella R&D
>
>
>        SELEA s.r.l.
>
> Via Aldo Moro 69
> Italy - 46019 Cicognara (MN)
> Tel +39 0375 889091
> Fax +39 0375 889080
> *michele.prade...@selea.com* 
> *http://www.selea.com*
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Stephan Wehner

-> http://stephan.sugarmotor.org (blog and homepage)
-> http://loggingit.com
-> http://www.thrackle.org
-> http://www.buckmaster.ca
-> http://www.trafficlife.com
-> http://stephansmap.org -- http://blog.stephansmap.org
-> http://twitter.com/stephanwehner / @stephanwehner
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Michele Pradella
  "science fiction?" was a rhetorically question. I'm only wondering 
about what is the best and fastest way to DELETE a lot of records from 
huge DB. I know and understand physical limit of data moving: anyway for 
now I'm trying to split the BIG DELETE in some smaller DELETE to spread 
the time used. It's the only way I can figure out at the moment.

Il 08/10/2010 15.55, Jay A. Kreibich ha scritto:
> On Fri, Oct 08, 2010 at 09:09:09AM +0200, Michele Pradella scratched on the 
> wall:
>>I was thinking this too, but I take this for last chance: my hope is I
>> can delete 5 millions of records in few seconds, science fiction? :)
>Science fiction of the worst B-grade sort.
>
>Think about the numbers.  You're talking about updating a significant
>chunk of a multi-gigabyte file.  The WAL file tells you the changes
>amount to ~600MB of writes.  That's a whole CDs worth of data.  These
>days that might not be much for storage, but it is still a lot of
>data to move around.  Even if your storage system has a continuous,
>sustained write ability of 20MB/sec, that's a half minute.  How fast
>can your disk copy 600MB worth of data?
>
>But you're not just writing.  You're doing a lot of reads from all
>over the file in an attempt to figure out what to modify and write.
>Both the reads and the writes (the integration, at least) are
>scattered and small, so you're not going to get anywhere near the
>sustained performance levels.  10x slower would be extremely good.
>
>Or think of it in more physical numbers... If you're using a single
>vanilla disk, it likely spins at 7200 RPMs.  If it takes five minutes
>to update 5,000,000 records, that's an average of almost 140 records
>per disk revolution.  That's pretty good, considering everything else
>that is going on!
>
>
>
>The only possible way to manipulate that much data in a "few seconds"
>is to load up on RAM, get a real operating system, and throw the
>whole database into memory.  Or spend many, many, many thousands of
>dollars on a very wide disk array with a very large battery-backed
>cache and a huge pipe between your host and the array.
>
>Big storage is cheap.  Fast storage is not.  Don't confuse the two.
>
> -j
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Jay A. Kreibich
On Fri, Oct 08, 2010 at 09:09:09AM +0200, Michele Pradella scratched on the 
wall:
>   I was thinking this too, but I take this for last chance: my hope is I 
> can delete 5 millions of records in few seconds, science fiction? :)

  Science fiction of the worst B-grade sort.

  Think about the numbers.  You're talking about updating a significant
  chunk of a multi-gigabyte file.  The WAL file tells you the changes
  amount to ~600MB of writes.  That's a whole CDs worth of data.  These
  days that might not be much for storage, but it is still a lot of
  data to move around.  Even if your storage system has a continuous,
  sustained write ability of 20MB/sec, that's a half minute.  How fast
  can your disk copy 600MB worth of data?

  But you're not just writing.  You're doing a lot of reads from all
  over the file in an attempt to figure out what to modify and write.
  Both the reads and the writes (the integration, at least) are
  scattered and small, so you're not going to get anywhere near the
  sustained performance levels.  10x slower would be extremely good.

  Or think of it in more physical numbers... If you're using a single
  vanilla disk, it likely spins at 7200 RPMs.  If it takes five minutes
  to update 5,000,000 records, that's an average of almost 140 records
  per disk revolution.  That's pretty good, considering everything else
  that is going on!



  The only possible way to manipulate that much data in a "few seconds"
  is to load up on RAM, get a real operating system, and throw the
  whole database into memory.  Or spend many, many, many thousands of
  dollars on a very wide disk array with a very large battery-backed
  cache and a huge pipe between your host and the array.

  Big storage is cheap.  Fast storage is not.  Don't confuse the two.

   -j


-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Max Vlasov
On Thu, Oct 7, 2010 at 8:05 PM, Michele Pradella  wrote:

>  Hi all, I have a question about how to speed up a DELETE statement.
> I have a DB of about 3GB: the DB has about 23 millions of records.
> The DB is indexed by a DateTime column (is a 64 bit integer), and
> suppose you want to delete all records before a date.
> Now I'm using a syntax like this (I try all the statement with the
> sqlite shell):
> suppose to use __int64 DateValue=the date limit you want to delete
>
> DELETE FROM table_name WHERE DateTime
> the statement is trying to delete about 5 millions records and it takes
> about 4-5minutes.
> Is there a way to try to speed up the DELETE?
>



I think there is some optimization possible in this case, but I don't know
whether it exists in the optimizer currently.
It could work like this. If sqlite first deletes entries from the index
B-tree while appending rowids to the auto index based on the rowid of the
main table (it is a part of the index obviously), the next delete from the
main B-tree can use this index to delete faster. I tried to determine
existence of such optimization from numbers of reading and writing and it
seems that DELETE does separate lookup in the main table for every index
entry it found.
Does something like this exist in sqlite and if no, worth it or not
implementing?

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Black, Michael (IS)
Rather than doing batch deletes why don't you add a date/time constraint to 
your selects?
 
Then, you can have a thread which does a lazy delete on the old data.  Or, you 
can create a trigger on inserts which deletes anything older than N-days or 
such.
 
Or...if all you want is an interruptable process do your delete from a select 
statement one record at a time.
 
It should take about the same amount of time but your users wouldn't notice it.
 
Your never going to get through 600MB of data and 5 million records in 
"seconds".
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
 



From: sqlite-users-boun...@sqlite.org on behalf of Michele Pradella
Sent: Fri 10/8/2010 4:06 AM
To: General Discussion of SQLite Database
Subject: EXTERNAL:Re: [sqlite] Speed up DELETE of a lot of records




  I try 32 and it takes same time. Moreover the RAM used is about 500MB
Even with "PRAGMA synchronous = OFF;" operation time is the same.
I think the only way for now is to make more DELETE with less element,
but it's not the best way




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Michele Pradella
  I try 32 and it takes same time. Moreover the RAM used is about 500MB
Even with "PRAGMA synchronous = OFF;" operation time is the same.
I think the only way for now is to make more DELETE with less element, 
but it's not the best way


Il 08/10/2010 10.41, Marcus Grimm ha scritto:
> Michele Pradella wrote:
>>ok I'll try with 3.7.3
>> DELETE is a little bit faster, and the -wal is reintegrated when I close
>> the connection.
>> Changing cache_size (I try 1) DELETE takes same time to complete.
> 1 doesn't sounds very big, I used to define it to e.g. 32 instead when
> working with a GB size DB (mainly for index creation and "PRAGMA 
> integrity_check;")
>
> Maybe playing with temporary setting PRAGMA synchronous = OFF; would be 
> useful,
> although I think the wal mode should already avoid too much syncing.
>
> Marcus
>
>
>> Was my fault, because to close the shell connection I used Ctrl+C but
>> this leave the -wal file. If I close with .quit the -wal file is
>> reintegrated.
>> I thought that Ctrl+C is like a ".quit " but it's not.
>> Anyway if I close the DB connection with Ctrl+C and than reopen
>> connection and close it with .quit the -wal file is not reintegrated.
>>
>> Il 08/10/2010 9.56, Michele Pradella ha scritto:
>>> I'll try to increase cache size, and I'll try operation on my Db with
>>> the 3.7.3 anyway I already ported the Fix of the WAL issue from recent
>>> snapshot. I'll try and let you know
>>>
>>> Il 08/10/2010 9.52, Marcus Grimm ha scritto:
 Michele Pradella wrote:
>  As I explain in previews email, I think that recreating the index is
> the slowest operation I can do on my DB.
> Anyway in my first email I ask another question about -wal file
> Tryin to DELETE the (5 millions) records with the shell SQLITE interface
> I can see the -wal grow up till 600MB. I can not understand why the -wal
> is no more reintegrated, and even when I close the connection (closing
> the shell) -wal file still exist.
> Trying for example the statement "create table new as select * from
> current where condition = keep;" the -wal file grow up till 1,5GB and
> than the same, after closing shell the -wal remain.
>
> Moreover the operation above "create new" terminate with a "Error:
> disk I/O error"
> The hard disk I use has a lot of free space and it's SATA2 hard disk, so
> is internal
 You may try with the new 3.7.3 version, the 3.7.2 doesn't operate
 very well on win32 when doing huge transactions in wal mode.

 2nd, when running sqlite with a gigabyte sized database it is useful
 to heavily increase the cache size, not sure if that helps for delete
 statements though, but it does in particular when creating indices.

 Marcus

> Il 07/10/2010 20.38, Petite Abeille ha scritto:
>> On Oct 7, 2010, at 6:05 PM, Michele Pradella wrote:
>>
>>> I have a DB of about 3GB: the DB has about 23 millions of records. [..]
>>> the statement is trying to delete about 5 millions records and it takes
>>> about 4-5minutes.
>>> Is there a way to try to speed up the DELETE?
>> Considering that you want to delete about a quarter of the records, 
>> perhaps it would be more efficient to recreate that table altogether, no?
>>
>> Pseudocode:
>>
>> create table new as select * from current where condition = keep;
>> create index on new;
>> drop table current;
>> alter table rename new to current;
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Michele Pradella
  DELETE on PrimaryKey instead of DateTime index takes same time

Il 08/10/2010 10.30, Michele Pradella ha scritto:
>ok I'll try with 3.7.3
> DELETE is a little bit faster, and the -wal is reintegrated when I close
> the connection.
> Changing cache_size (I try 1) DELETE takes same time to complete.
> Was my fault, because to close the shell connection I used Ctrl+C but
> this leave the -wal file. If I close with .quit the -wal file is
> reintegrated.
> I thought that Ctrl+C is like a ".quit " but it's not.
> Anyway if I close the DB connection with Ctrl+C and than reopen
> connection and close it with .quit the -wal file is not reintegrated.
>
> Il 08/10/2010 9.56, Michele Pradella ha scritto:
>> I'll try to increase cache size, and I'll try operation on my Db with
>> the 3.7.3 anyway I already ported the Fix of the WAL issue from recent
>> snapshot. I'll try and let you know
>>
>> Il 08/10/2010 9.52, Marcus Grimm ha scritto:
>>> Michele Pradella wrote:
  As I explain in previews email, I think that recreating the index is
 the slowest operation I can do on my DB.
 Anyway in my first email I ask another question about -wal file
 Tryin to DELETE the (5 millions) records with the shell SQLITE interface
 I can see the -wal grow up till 600MB. I can not understand why the -wal
 is no more reintegrated, and even when I close the connection (closing
 the shell) -wal file still exist.
 Trying for example the statement "create table new as select * from
 current where condition = keep;" the -wal file grow up till 1,5GB and
 than the same, after closing shell the -wal remain.

 Moreover the operation above "create new" terminate with a "Error:
 disk I/O error"
 The hard disk I use has a lot of free space and it's SATA2 hard disk, so
 is internal
>>> You may try with the new 3.7.3 version, the 3.7.2 doesn't operate
>>> very well on win32 when doing huge transactions in wal mode.
>>>
>>> 2nd, when running sqlite with a gigabyte sized database it is useful
>>> to heavily increase the cache size, not sure if that helps for delete
>>> statements though, but it does in particular when creating indices.
>>>
>>> Marcus
>>>
 Il 07/10/2010 20.38, Petite Abeille ha scritto:
> On Oct 7, 2010, at 6:05 PM, Michele Pradella wrote:
>
>> I have a DB of about 3GB: the DB has about 23 millions of records. [..]
>> the statement is trying to delete about 5 millions records and it takes
>> about 4-5minutes.
>> Is there a way to try to speed up the DELETE?
> Considering that you want to delete about a quarter of the records, 
> perhaps it would be more efficient to recreate that table altogether, no?
>
> Pseudocode:
>
> create table new as select * from current where condition = keep;
> create index on new;
> drop table current;
> alter table rename new to current;
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Marcus Grimm

Michele Pradella wrote:
>   ok I'll try with 3.7.3
> DELETE is a little bit faster, and the -wal is reintegrated when I close 
> the connection.
> Changing cache_size (I try 1) DELETE takes same time to complete.

1 doesn't sounds very big, I used to define it to e.g. 32 instead when
working with a GB size DB (mainly for index creation and "PRAGMA 
integrity_check;")

Maybe playing with temporary setting PRAGMA synchronous = OFF; would be useful,
although I think the wal mode should already avoid too much syncing.

Marcus


> Was my fault, because to close the shell connection I used Ctrl+C but 
> this leave the -wal file. If I close with .quit the -wal file is 
> reintegrated.
> I thought that Ctrl+C is like a ".quit " but it's not.
> Anyway if I close the DB connection with Ctrl+C and than reopen 
> connection and close it with .quit the -wal file is not reintegrated.
> 
> Il 08/10/2010 9.56, Michele Pradella ha scritto:
>>I'll try to increase cache size, and I'll try operation on my Db with
>> the 3.7.3 anyway I already ported the Fix of the WAL issue from recent
>> snapshot. I'll try and let you know
>>
>> Il 08/10/2010 9.52, Marcus Grimm ha scritto:
>>> Michele Pradella wrote:
 As I explain in previews email, I think that recreating the index is
 the slowest operation I can do on my DB.
 Anyway in my first email I ask another question about -wal file
 Tryin to DELETE the (5 millions) records with the shell SQLITE interface
 I can see the -wal grow up till 600MB. I can not understand why the -wal
 is no more reintegrated, and even when I close the connection (closing
 the shell) -wal file still exist.
 Trying for example the statement "create table new as select * from
 current where condition = keep;" the -wal file grow up till 1,5GB and
 than the same, after closing shell the -wal remain.

 Moreover the operation above "create new" terminate with a "Error:
 disk I/O error"
 The hard disk I use has a lot of free space and it's SATA2 hard disk, so
 is internal
>>> You may try with the new 3.7.3 version, the 3.7.2 doesn't operate
>>> very well on win32 when doing huge transactions in wal mode.
>>>
>>> 2nd, when running sqlite with a gigabyte sized database it is useful
>>> to heavily increase the cache size, not sure if that helps for delete
>>> statements though, but it does in particular when creating indices.
>>>
>>> Marcus
>>>
 Il 07/10/2010 20.38, Petite Abeille ha scritto:
> On Oct 7, 2010, at 6:05 PM, Michele Pradella wrote:
>
>> I have a DB of about 3GB: the DB has about 23 millions of records. [..]
>> the statement is trying to delete about 5 millions records and it takes
>> about 4-5minutes.
>> Is there a way to try to speed up the DELETE?
> Considering that you want to delete about a quarter of the records, 
> perhaps it would be more efficient to recreate that table altogether, no?
>
> Pseudocode:
>
> create table new as select * from current where condition = keep;
> create index on new;
> drop table current;
> alter table rename new to current;
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>>
> 
> 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Michele Pradella
  ok I'll try with 3.7.3
DELETE is a little bit faster, and the -wal is reintegrated when I close 
the connection.
Changing cache_size (I try 1) DELETE takes same time to complete.
Was my fault, because to close the shell connection I used Ctrl+C but 
this leave the -wal file. If I close with .quit the -wal file is 
reintegrated.
I thought that Ctrl+C is like a ".quit " but it's not.
Anyway if I close the DB connection with Ctrl+C and than reopen 
connection and close it with .quit the -wal file is not reintegrated.

Il 08/10/2010 9.56, Michele Pradella ha scritto:
>I'll try to increase cache size, and I'll try operation on my Db with
> the 3.7.3 anyway I already ported the Fix of the WAL issue from recent
> snapshot. I'll try and let you know
>
> Il 08/10/2010 9.52, Marcus Grimm ha scritto:
>> Michele Pradella wrote:
>>> As I explain in previews email, I think that recreating the index is
>>> the slowest operation I can do on my DB.
>>> Anyway in my first email I ask another question about -wal file
>>> Tryin to DELETE the (5 millions) records with the shell SQLITE interface
>>> I can see the -wal grow up till 600MB. I can not understand why the -wal
>>> is no more reintegrated, and even when I close the connection (closing
>>> the shell) -wal file still exist.
>>> Trying for example the statement "create table new as select * from
>>> current where condition = keep;" the -wal file grow up till 1,5GB and
>>> than the same, after closing shell the -wal remain.
>>>
>>> Moreover the operation above "create new" terminate with a "Error:
>>> disk I/O error"
>>> The hard disk I use has a lot of free space and it's SATA2 hard disk, so
>>> is internal
>> You may try with the new 3.7.3 version, the 3.7.2 doesn't operate
>> very well on win32 when doing huge transactions in wal mode.
>>
>> 2nd, when running sqlite with a gigabyte sized database it is useful
>> to heavily increase the cache size, not sure if that helps for delete
>> statements though, but it does in particular when creating indices.
>>
>> Marcus
>>
>>> Il 07/10/2010 20.38, Petite Abeille ha scritto:
 On Oct 7, 2010, at 6:05 PM, Michele Pradella wrote:

> I have a DB of about 3GB: the DB has about 23 millions of records. [..]
> the statement is trying to delete about 5 millions records and it takes
> about 4-5minutes.
> Is there a way to try to speed up the DELETE?
 Considering that you want to delete about a quarter of the records, 
 perhaps it would be more efficient to recreate that table altogether, no?

 Pseudocode:

 create table new as select * from current where condition = keep;
 create index on new;
 drop table current;
 alter table rename new to current;

 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Michele Pradella
  I'll try to increase cache size, and I'll try operation on my Db with 
the 3.7.3 anyway I already ported the Fix of the WAL issue from recent 
snapshot. I'll try and let you know

Il 08/10/2010 9.52, Marcus Grimm ha scritto:
> Michele Pradella wrote:
>>As I explain in previews email, I think that recreating the index is
>> the slowest operation I can do on my DB.
>> Anyway in my first email I ask another question about -wal file
>> Tryin to DELETE the (5 millions) records with the shell SQLITE interface
>> I can see the -wal grow up till 600MB. I can not understand why the -wal
>> is no more reintegrated, and even when I close the connection (closing
>> the shell) -wal file still exist.
>> Trying for example the statement "create table new as select * from
>> current where condition = keep;" the -wal file grow up till 1,5GB and
>> than the same, after closing shell the -wal remain.
>>
>> Moreover the operation above "create new" terminate with a "Error:
>> disk I/O error"
>> The hard disk I use has a lot of free space and it's SATA2 hard disk, so
>> is internal
> You may try with the new 3.7.3 version, the 3.7.2 doesn't operate
> very well on win32 when doing huge transactions in wal mode.
>
> 2nd, when running sqlite with a gigabyte sized database it is useful
> to heavily increase the cache size, not sure if that helps for delete
> statements though, but it does in particular when creating indices.
>
> Marcus
>
>> Il 07/10/2010 20.38, Petite Abeille ha scritto:
>>> On Oct 7, 2010, at 6:05 PM, Michele Pradella wrote:
>>>
 I have a DB of about 3GB: the DB has about 23 millions of records. [..]
 the statement is trying to delete about 5 millions records and it takes
 about 4-5minutes.
 Is there a way to try to speed up the DELETE?
>>> Considering that you want to delete about a quarter of the records, perhaps 
>>> it would be more efficient to recreate that table altogether, no?
>>>
>>> Pseudocode:
>>>
>>> create table new as select * from current where condition = keep;
>>> create index on new;
>>> drop table current;
>>> alter table rename new to current;
>>>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>>
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Marcus Grimm

Michele Pradella wrote:
>   As I explain in previews email, I think that recreating the index is 
> the slowest operation I can do on my DB.
> Anyway in my first email I ask another question about -wal file
> Tryin to DELETE the (5 millions) records with the shell SQLITE interface 
> I can see the -wal grow up till 600MB. I can not understand why the -wal 
> is no more reintegrated, and even when I close the connection (closing 
> the shell) -wal file still exist.
> Trying for example the statement "create table new as select * from 
> current where condition = keep;" the -wal file grow up till 1,5GB and 
> than the same, after closing shell the -wal remain.
> 
> Moreover the operation above "create new" terminate with a "Error: 
> disk I/O error"
> The hard disk I use has a lot of free space and it's SATA2 hard disk, so 
> is internal

You may try with the new 3.7.3 version, the 3.7.2 doesn't operate
very well on win32 when doing huge transactions in wal mode.

2nd, when running sqlite with a gigabyte sized database it is useful
to heavily increase the cache size, not sure if that helps for delete
statements though, but it does in particular when creating indices.

Marcus

> 
> Il 07/10/2010 20.38, Petite Abeille ha scritto:
>> On Oct 7, 2010, at 6:05 PM, Michele Pradella wrote:
>>
>>> I have a DB of about 3GB: the DB has about 23 millions of records. [..]
>>> the statement is trying to delete about 5 millions records and it takes
>>> about 4-5minutes.
>>> Is there a way to try to speed up the DELETE?
>> Considering that you want to delete about a quarter of the records, perhaps 
>> it would be more efficient to recreate that table altogether, no?
>>
>> Pseudocode:
>>
>> create table new as select * from current where condition = keep;
>> create index on new;
>> drop table current;
>> alter table rename new to current;
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
> 
> 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Michele Pradella
  I was thinking this too, but I take this for last chance: my hope is I 
can delete 5 millions of records in few seconds, science fiction? :)

Il 08/10/2010 9.00, Aldes Rossi ha scritto:
>Il 10/08/2010 08:30 AM, Michele Pradella ha scritto:
>> I don't know if could be faster to do more Delete of less records, or
>> perhaps making a VIEW and than deleting all the records matching the
>> VIEW. I'm thinking about this to find the fastest solution, because the
>> problem is that when sqlite is deleting the records obviously I can not
>> access the DB for insert new records and all the operations have to wait
>> for delete complete...and 4-5minutes is too much time to wait.
> Maybe splitting this operation in many Delete of less record lengthens the
> total time, but results in shortest service interruptions, and may in
> fact be
> more acceptable.
>
> Aldes Rossi
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Aldes Rossi
  Il 10/08/2010 08:30 AM, Michele Pradella ha scritto:
> I don't know if could be faster to do more Delete of less records, or
> perhaps making a VIEW and than deleting all the records matching the
> VIEW. I'm thinking about this to find the fastest solution, because the
> problem is that when sqlite is deleting the records obviously I can not
> access the DB for insert new records and all the operations have to wait
> for delete complete...and 4-5minutes is too much time to wait.
Maybe splitting this operation in many Delete of less record lengthens the
total time, but results in shortest service interruptions, and may in 
fact be
more acceptable.

Aldes Rossi

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-07 Thread Michele Pradella
  As I explain in previews email, I think that recreating the index is 
the slowest operation I can do on my DB.
Anyway in my first email I ask another question about -wal file
Tryin to DELETE the (5 millions) records with the shell SQLITE interface 
I can see the -wal grow up till 600MB. I can not understand why the -wal 
is no more reintegrated, and even when I close the connection (closing 
the shell) -wal file still exist.
Trying for example the statement "create table new as select * from 
current where condition = keep;" the -wal file grow up till 1,5GB and 
than the same, after closing shell the -wal remain.

Moreover the operation above "create new" terminate with a "Error: 
disk I/O error"
The hard disk I use has a lot of free space and it's SATA2 hard disk, so 
is internal

Il 07/10/2010 20.38, Petite Abeille ha scritto:
> On Oct 7, 2010, at 6:05 PM, Michele Pradella wrote:
>
>> I have a DB of about 3GB: the DB has about 23 millions of records. [..]
>> the statement is trying to delete about 5 millions records and it takes
>> about 4-5minutes.
>> Is there a way to try to speed up the DELETE?
> Considering that you want to delete about a quarter of the records, perhaps 
> it would be more efficient to recreate that table altogether, no?
>
> Pseudocode:
>
> create table new as select * from current where condition = keep;
> create index on new;
> drop table current;
> alter table rename new to current;
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-07 Thread Michele Pradella
  I think that if the total records are 23 millions and the record to 
delete is 5 millions, creating a temp table of 23-5=18millions records 
is slower than deleting directly 5millions records.
Anyway, the table has got only 1 index on the DateTime column that is 
INTEGER.
I dropping all index deleting and recreate index is the slowest 
operation I can do on the DB, so I can not use it.
I can try to Delete on Primary Key instead on the DateTime, but I think 
that the slowest operation is the deletion of the record not the 
selection. Further more I can not be sure that the Primary Key is always 
incremental, usually should be, but if for example the Primary key wrap 
could not be the right thing to do.
I don't know if could be faster to do more Delete of less records, or 
perhaps making a VIEW and than deleting all the records matching the 
VIEW. I'm thinking about this to find the fastest solution, because the 
problem is that when sqlite is deleting the records obviously I can not 
access the DB for insert new records and all the operations have to wait 
for delete complete...and 4-5minutes is too much time to wait.


Il 07/10/2010 19.14, P Kishor ha scritto:
> On Thu, Oct 7, 2010 at 11:05 AM, Michele Pradella
>   wrote:
>>   Hi all, I have a question about how to speed up a DELETE statement.
>> I have a DB of about 3GB: the DB has about 23 millions of records.
>> The DB is indexed by a DateTime column (is a 64 bit integer), and
>> suppose you want to delete all records before a date.
>> Now I'm using a syntax like this (I try all the statement with the
>> sqlite shell):
>> suppose to use __int64 DateValue=the date limit you want to delete
>>
>> DELETE FROM table_name WHERE DateTime>
> What is the speed of
>
> SELECT FROM table WHERE DateTime>= DateValue;
>
> If the above speed is acceptable, then try the following
>
> CREATE TABLE tmp AS SELECT FROM table WHERE DateTime>= DateValue;
> DROP TABLE table;
> ALTER TABLE tmp RENAME to table;
>
>
>> the statement is trying to delete about 5 millions records and it takes
>> about 4-5minutes.
>> Is there a way to try to speed up the DELETE? I already try to put the
>> DELETE statement between a BEGIN; COMMIT; statement, but same result.
>> After the delete complete I have a -wal file of about 600MB: this file
>> is not deleted even if I disconnect from the database.
>> Is that the right behavior? I thought that when the last DB connection
>> terminate the -wal file is reintegrated in the DB, but it's not.
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>


-- 
Selea s.r.l.


Michele Pradella R&D


SELEA s.r.l.

Via Aldo Moro 69
Italy - 46019 Cicognara (MN)
Tel +39 0375 889091
Fax +39 0375 889080
*michele.prade...@selea.com* 
*http://www.selea.com*
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-07 Thread Petite Abeille

On Oct 7, 2010, at 6:05 PM, Michele Pradella wrote:

> I have a DB of about 3GB: the DB has about 23 millions of records. [..]
> the statement is trying to delete about 5 millions records and it takes 
> about 4-5minutes.
> Is there a way to try to speed up the DELETE?

Considering that you want to delete about a quarter of the records, perhaps it 
would be more efficient to recreate that table altogether, no?

Pseudocode:

create table new as select * from current where condition = keep;
create index on new;
drop table current;
alter table rename new to current;

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-07 Thread P Kishor
On Thu, Oct 7, 2010 at 11:05 AM, Michele Pradella
 wrote:
>  Hi all, I have a question about how to speed up a DELETE statement.
> I have a DB of about 3GB: the DB has about 23 millions of records.
> The DB is indexed by a DateTime column (is a 64 bit integer), and
> suppose you want to delete all records before a date.
> Now I'm using a syntax like this (I try all the statement with the
> sqlite shell):
> suppose to use __int64 DateValue=the date limit you want to delete
>
> DELETE FROM table_name WHERE DateTime

What is the speed of

SELECT FROM table WHERE DateTime >= DateValue;

If the above speed is acceptable, then try the following

CREATE TABLE tmp AS SELECT FROM table WHERE DateTime >= DateValue;
DROP TABLE table;
ALTER TABLE tmp RENAME to table;


> the statement is trying to delete about 5 millions records and it takes
> about 4-5minutes.
> Is there a way to try to speed up the DELETE? I already try to put the
> DELETE statement between a BEGIN; COMMIT; statement, but same result.
> After the delete complete I have a -wal file of about 600MB: this file
> is not deleted even if I disconnect from the database.
> Is that the right behavior? I thought that when the last DB connection
> terminate the -wal file is reintegrated in the DB, but it's not.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
---
Assertions are politics; backing up assertions with evidence is science
===
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-07 Thread Pavel Ivanov
> But the rows he wants to delete are those with DateTime without an index on that column SQL can't find which rows to delete quickly !

"Quickly" is appropriate for one row. For several rows SQLite will
sequentially scan the index and for each rowid found there it will
traverse the table's b-tree structure from top to bottom searching for
the rowid and then delete that row. For 3 million rows it's well
likely that this process is slower than sequentially scanning the
table and marking as deleted all rows satisfying the condition.


Pavel

On Thu, Oct 7, 2010 at 12:52 PM, Simon Slavin  wrote:
>
> On 7 Oct 2010, at 5:36pm, Jay A. Kreibich wrote:
>
>> On Thu, Oct 07, 2010 at 05:22:19PM +0100, Simon Slavin scratched on the wall:
>>> On 7 Oct 2010, at 5:05pm, Michele Pradella wrote:
>>>
 The DB is indexed by a DateTime column (is a 64 bit integer)
>>>
>>> Do make sure that that column is declared as INTEGER and that there
>>> is an index on it.
>>
>>  When deleting 20 to 25% of the rows, an index is likely to slow
>>  things down.
>
> But the rows he wants to delete are those with DateTime without an index on that column SQL can't find which rows to delete quickly !
>
> Hmm.  If all rows are entered in DateTime order, and the table has an 
> AUTOINCREMENT primary key, then the values in that column will be in the same 
> order as the values in the primary key.  So perhaps he could do one SELECT to 
> find the appropriate primary key value, then use the primary key in his 
> DELETE command instead of the DateTime column.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-07 Thread Simon Slavin

On 7 Oct 2010, at 5:36pm, Jay A. Kreibich wrote:

> On Thu, Oct 07, 2010 at 05:22:19PM +0100, Simon Slavin scratched on the wall:
>> On 7 Oct 2010, at 5:05pm, Michele Pradella wrote:
>> 
>>> The DB is indexed by a DateTime column (is a 64 bit integer)
>> 
>> Do make sure that that column is declared as INTEGER and that there
>> is an index on it.
> 
>  When deleting 20 to 25% of the rows, an index is likely to slow
>  things down.

But the rows he wants to delete are those with DateTimehttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-07 Thread Jay A. Kreibich
On Thu, Oct 07, 2010 at 05:22:19PM +0100, Simon Slavin scratched on the wall:
> On 7 Oct 2010, at 5:05pm, Michele Pradella wrote:
> 
> > The DB is indexed by a DateTime column (is a 64 bit integer)
> 
> Do make sure that that column is declared as INTEGER and that there
> is an index on it.

  When deleting 20 to 25% of the rows, an index is likely to slow
  things down.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-07 Thread Simon Slavin

On 7 Oct 2010, at 5:05pm, Michele Pradella wrote:

> The DB is indexed by a DateTime column (is a 64 bit integer)

Do make sure that that column is declared as INTEGER and that there is an index 
on it.

> DELETE FROM table_name WHERE DateTime 
> the statement is trying to delete about 5 millions records and it takes 
> about 4-5minutes.
> Is there a way to try to speed up the DELETE?

Do you have many indexes on that table ?  Or any complicated ones ?  It might 
be worth doing something like

BEGIN
DROP all indexes on the table apart from the one on DateTime
DELETE the records
CREATE all the dropped indexes again
COMMIT

On the other hand you say you're deleting about 5 million records out of 23, so 
that might not help.

> I already try to put the 
> DELETE statement between a BEGIN; COMMIT; statement, but same result.

Without declaring transactions, one statement (i.e. one DELETE command) is one 
transaction, no matter how many records it has an effect on.  As you've 
discovered.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users