Re: [sqlite] SQL approach to Import only new items, delete other items

2007-09-07 Thread Dwight Ingersoll
On 9/6/07, Andre du Plessis <[EMAIL PROTECTED]> wrote:


> Im importing data
>
> The data has a unique value, call it MD5 for now that could be a unique
> value for the data.
>
>
>
> Each record that gets imported is converted to MD5, a lookup is done on
> the table for that MD5,
>
> if found it must leave it alone, if not found it must insert a new
> record...


<>

A SQL approach to this would be to force a left inner join creating a result
set which would indicate which records currently exist/does not exist in the
primary table.  So in the result set, if both columns have values, then the
record exists in the primary table and needs to be updated, if the column
from the primary table is null, then it doesn't exist and must be inserted.
To determine which records need to be deleted, force a right outer join, and
use similar logic to determine which records need to be deleted.

Unfortunately the SQL dialects I learned were prior to SQL-92, and my
examples use the database engine proprietary syntax to indicate forced
inner/outer joins, so I can't give you an example.  I'm going to post
another message asking what the correct syntax would be in SQL-92 terms.


Re: [sqlite] SQL approach to Import only new items, delete other items

2007-09-06 Thread Scott Hess
You could invert your solution.  Create a temporary table which
contains all of the existing keys, and every time you insert a new
item, delete that item's key from the temporary table.  At the end, do
something like 'DELETE FROM main_table WHERE key IN (SELECT key FROM
tmp_table)'.

-scott

On 9/6/07, Andre du Plessis <[EMAIL PROTECTED]> wrote:
> Im importing data
>
> The data has a unique value, call it MD5 for now that could be a unique
> value for the data.
>
>
>
> Each record that gets imported is converted to MD5, a lookup is done on
> the table for that MD5,
>
> if found it must leave it alone, if not found it must insert a new
> record...
>
>
>
> All the items in the table that was not imported must be deleted...
>
>
>
> The only feasible approach I have is to add a column to the table, like
> UPDATE_FLAG for example,
>
> During the import update_flag gets set to 0,
>
> Once a record is found update_flag gets set to 1,
>
>
>
> At the end of the import all records with update_flag = 0 gets
> deleted...
>
>
>
> However I did not want to add a column to the table, REASON being, I'm
> also version controlling the DB, and when an import occurs and nothing
>
> Has been added or removed,  I don't want modifications to the DB, as the
> import can run many times over.
>
>
>
> I was considering the following:
>
> Create a temp table call it,
>
> Temp_ids for example
>
> Then insert into the temp table for each record that was found...
>
>
>
> At the end do something like
>
> Delete from imports where import_id not in (select id from temp_ids)
>
>
>
> But that might be horribly slow and memory expensive remembering that
> the import table may have millions of records..
>
>
>
> What could the people here suggest to me,
>
>
>
> Thanks.
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQL approach to Import only new items, delete other items

2007-09-06 Thread Tom Briggs
 

> Because my concern is this, I don't know how SQLite will do
> 
> Delete from table where col not in (select from large temp dataset)
> 
> How the delete will actually be walked, if it will create a serverside
> cursor and walk the values in the in statement then it will 
> be fine and
> fast,
> If however it loads all the values into memory and then walk 
> the dataset
> it would require a large amount of memory,

   I believe that it will write the results of the subquery to a
temporary table, so I doubt that memory usage will be a problem.  I'm
only about 98% sure though, so no guarantees. :)  EXPLAIN will tell you
for sure though.

> Your suggestion to drop the table and recreate from temp 
> although a good
> idea will probably modify the database which means that if I version
> control it, it will create a large amount of changes each 
> time an import
> is run, even though nothing might have been changed.

   I think you have issues here either way though - one way or another
you need to know that nothing changed.  You may be able to determine
that for free, depending on your approach, or you may need to go out of
your way to determine that, it depends on your approach.  But I wouldn't
assume that you can know this for free and that any solution that
requires work to know that is inherently bad.

   -Tom

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQL approach to Import only new items, delete other items

2007-09-06 Thread Andre du Plessis
Thank you for the suggestion,

Im about to set up a test as soon as I have the time where I want to
create a test table and populate it with a high number of rows, then
perform an update to a temp table, and then delete with an in statement
Because my concern is this, I don't know how SQLite will do

Delete from table where col not in (select from large temp dataset)

How the delete will actually be walked, if it will create a serverside
cursor and walk the values in the in statement then it will be fine and
fast,
If however it loads all the values into memory and then walk the dataset
it would require a large amount of memory,

Your suggestion to drop the table and recreate from temp although a good
idea will probably modify the database which means that if I version
control it, it will create a large amount of changes each time an import
is run, even though nothing might have been changed.

I guess I could write another framework for versioning the database,
.dump would probably work better but it requires that user intervention.

But thanks for the suggestion I might be able to implement something
that works.

-Original Message-
From: Tom Briggs [mailto:[EMAIL PROTECTED] 
Sent: 06 September 2007 03:33 PM
To: sqlite-users@sqlite.org
Subject: RE: [sqlite] SQL approach to Import only new items, delete
other items


   Your suggested temp table approach is how I would solve this; if
everything is properly indexed it won't be too bad.  Even if it is bad,
it'll be better than updating columns within the table and then deleting
rows based on that.

   Another potential alternative is to:

   1. Load all new rows into a temp table
   2. Select the old matching rows into a second temp table
   3. Insert all the remaining new rows to that second temp table
   4. Drop the original table and rename the second temp table

   That's likely to be slower on small data sets and faster on larger
datasets, I think.  Depends on how much data is already in the database
vs. the amount of data being loaded.

   -Tom  

> -Original Message-
> From: Andre du Plessis [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, September 06, 2007 5:41 AM
> To: sqlite-users@sqlite.org
> Subject: [sqlite] SQL approach to Import only new items, 
> delete other items
> 
> Im importing data
> 
> The data has a unique value, call it MD5 for now that could 
> be a unique
> value for the data.
> 
>  
> 
> Each record that gets imported is converted to MD5, a lookup 
> is done on
> the table for that MD5, 
> 
> if found it must leave it alone, if not found it must insert a new
> record...
> 
>  
> 
> All the items in the table that was not imported must be deleted...
> 
>  
> 
> The only feasible approach I have is to add a column to the 
> table, like
> UPDATE_FLAG for example, 
> 
> During the import update_flag gets set to 0,
> 
> Once a record is found update_flag gets set to 1,
> 
>  
> 
> At the end of the import all records with update_flag = 0 gets
> deleted...
> 
>  
> 
> However I did not want to add a column to the table, REASON being, I'm
> also version controlling the DB, and when an import occurs and nothing
> 
> Has been added or removed,  I don't want modifications to the 
> DB, as the
> import can run many times over.
> 
>  
> 
> I was considering the following:
> 
> Create a temp table call it,
> 
> Temp_ids for example
> 
> Then insert into the temp table for each record that was found...
> 
>  
> 
> At the end do something like
> 
> Delete from imports where import_id not in (select id from temp_ids)
> 
>  
> 
> But that might be horribly slow and memory expensive remembering that
> the import table may have millions of records..
> 
>  
> 
> What could the people here suggest to me,
> 
>  
> 
> Thanks.
> 
> 


-
To unsubscribe, send email to [EMAIL PROTECTED]

-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQL approach to Import only new items, delete other items

2007-09-06 Thread Tom Briggs

   Your suggested temp table approach is how I would solve this; if
everything is properly indexed it won't be too bad.  Even if it is bad,
it'll be better than updating columns within the table and then deleting
rows based on that.

   Another potential alternative is to:

   1. Load all new rows into a temp table
   2. Select the old matching rows into a second temp table
   3. Insert all the remaining new rows to that second temp table
   4. Drop the original table and rename the second temp table

   That's likely to be slower on small data sets and faster on larger
datasets, I think.  Depends on how much data is already in the database
vs. the amount of data being loaded.

   -Tom  

> -Original Message-
> From: Andre du Plessis [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, September 06, 2007 5:41 AM
> To: sqlite-users@sqlite.org
> Subject: [sqlite] SQL approach to Import only new items, 
> delete other items
> 
> Im importing data
> 
> The data has a unique value, call it MD5 for now that could 
> be a unique
> value for the data.
> 
>  
> 
> Each record that gets imported is converted to MD5, a lookup 
> is done on
> the table for that MD5, 
> 
> if found it must leave it alone, if not found it must insert a new
> record...
> 
>  
> 
> All the items in the table that was not imported must be deleted...
> 
>  
> 
> The only feasible approach I have is to add a column to the 
> table, like
> UPDATE_FLAG for example, 
> 
> During the import update_flag gets set to 0,
> 
> Once a record is found update_flag gets set to 1,
> 
>  
> 
> At the end of the import all records with update_flag = 0 gets
> deleted...
> 
>  
> 
> However I did not want to add a column to the table, REASON being, I'm
> also version controlling the DB, and when an import occurs and nothing
> 
> Has been added or removed,  I don't want modifications to the 
> DB, as the
> import can run many times over.
> 
>  
> 
> I was considering the following:
> 
> Create a temp table call it,
> 
> Temp_ids for example
> 
> Then insert into the temp table for each record that was found...
> 
>  
> 
> At the end do something like
> 
> Delete from imports where import_id not in (select id from temp_ids)
> 
>  
> 
> But that might be horribly slow and memory expensive remembering that
> the import table may have millions of records..
> 
>  
> 
> What could the people here suggest to me,
> 
>  
> 
> Thanks.
> 
> 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] SQL approach to Import only new items, delete other items

2007-09-06 Thread Andre du Plessis
Im importing data

The data has a unique value, call it MD5 for now that could be a unique
value for the data.

 

Each record that gets imported is converted to MD5, a lookup is done on
the table for that MD5, 

if found it must leave it alone, if not found it must insert a new
record...

 

All the items in the table that was not imported must be deleted...

 

The only feasible approach I have is to add a column to the table, like
UPDATE_FLAG for example, 

During the import update_flag gets set to 0,

Once a record is found update_flag gets set to 1,

 

At the end of the import all records with update_flag = 0 gets
deleted...

 

However I did not want to add a column to the table, REASON being, I'm
also version controlling the DB, and when an import occurs and nothing

Has been added or removed,  I don't want modifications to the DB, as the
import can run many times over.

 

I was considering the following:

Create a temp table call it,

Temp_ids for example

Then insert into the temp table for each record that was found...

 

At the end do something like

Delete from imports where import_id not in (select id from temp_ids)

 

But that might be horribly slow and memory expensive remembering that
the import table may have millions of records..

 

What could the people here suggest to me,

 

Thanks.