Re: [sqlite] Slow operation

2010-10-05 Thread Drake Wilson
Quoth Ian Hardingham , on 2010-10-05 11:52:36 +0100:
>   Hey guys.  My apologies in advance if this is a slightly mundane question.

(Please don't start new threads by replying to random messages.  The
resultant header information indicates falsely that your email is part
of the same thread.)

> I'm running this code from a scripting language bound to SQLite:
> 
>  %r = db.query("SELECT * FROM userTable", 0);
>  %i = 0;
> 
>  db.query("BEGIN TRANSACTION", 0);
>  while (%i < db.numRows(%r))
>  {
>  %username = db.getColumn(%r, name);
>  db.query("UPDATE userTable SET playedInfIds='' WHERE name LIKE 
> '?'", 0, %username);
>  %i ++;
>  }
>  db.query("END TRANSACTION", 0);

Ah-heh?

A number of points come to mind fairly immediately:

  - Don't keep a query from outside a transaction active inside it.

  - Don't SELECT * when all you need is one column.

  - You shouldn't have to iterate a result set by numerically iterating
until you hit the total number of rows, but I don't know what API
this is, so I don't know exactly how the replacement would look.

  - This whole loop looks like it could be replaced with the single
query « UPDATE userTable SET playedInfIds = '' » because you're
just targeting all the rows, unless there's something unobviously
different that I've missed.

Right now you're doing a full table scan to get each name, then doing
another full table scan for each name to update each row with a
similar name.  That's O(N^2) in the number of rows; with 3k rows, that
requires ~9M processing steps.

> Is there anything obvious I'm doing wrong?  I know using LIKE is not 
> ideal, but the scripting language does not guarantee case so it is 
> necessary here.

Store the name in a canonical form (e.g., all lowercase) in the
database, then query based on that form.  You can store the
non-canonical form next to it in a separate column if it's needed.
The fact that you are using LIKE suggests that 'ian' and 'Ian' should
be treated identically, but currently your primary key allows separate
rows to exist for each of those.

Also, PRIMARY KEY UNIQUE is redundant.  A primary key is always
unique.

   ---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow operation

2010-10-05 Thread Ian Hardingham
  Many thanks Drake, all of your points were highly pertinent.  I'll 
stop lazily replying to threads and changing the subject!

I indeed see that my approach was pretty bafflingly bad in highsight.  I 
tend to do most "logic" in the scripting language as opposed to in 
SQLite commands as it's what I'm comfortable with, but I probably need 
to think more in terms of what SQLite can automate for me.

Your query,

UPDATE userTable SET playedInfIds = ''

Still took two seconds actually...  but significantly better than what I 
was doing.

Ian

On 05/10/2010 12:08, Drake Wilson wrote:
> Quoth Ian Hardingham, on 2010-10-05 11:52:36 +0100:
>>Hey guys.  My apologies in advance if this is a slightly mundane question.
> (Please don't start new threads by replying to random messages.  The
> resultant header information indicates falsely that your email is part
> of the same thread.)
>
>> I'm running this code from a scripting language bound to SQLite:
>>
>>   %r = db.query("SELECT * FROM userTable", 0);
>>   %i = 0;
>>
>>   db.query("BEGIN TRANSACTION", 0);
>>   while (%i<  db.numRows(%r))
>>   {
>>   %username = db.getColumn(%r, name);
>>   db.query("UPDATE userTable SET playedInfIds='' WHERE name LIKE
>> '?'", 0, %username);
>>   %i ++;
>>   }
>>   db.query("END TRANSACTION", 0);
> Ah-heh?
>
> A number of points come to mind fairly immediately:
>
>- Don't keep a query from outside a transaction active inside it.
>
>- Don't SELECT * when all you need is one column.
>
>- You shouldn't have to iterate a result set by numerically iterating
>  until you hit the total number of rows, but I don't know what API
>  this is, so I don't know exactly how the replacement would look.
>
>- This whole loop looks like it could be replaced with the single
>  query « UPDATE userTable SET playedInfIds = '' » because you're
>  just targeting all the rows, unless there's something unobviously
>  different that I've missed.
>
> Right now you're doing a full table scan to get each name, then doing
> another full table scan for each name to update each row with a
> similar name.  That's O(N^2) in the number of rows; with 3k rows, that
> requires ~9M processing steps.
>
>> Is there anything obvious I'm doing wrong?  I know using LIKE is not
>> ideal, but the scripting language does not guarantee case so it is
>> necessary here.
> Store the name in a canonical form (e.g., all lowercase) in the
> database, then query based on that form.  You can store the
> non-canonical form next to it in a separate column if it's needed.
> The fact that you are using LIKE suggests that 'ian' and 'Ian' should
> be treated identically, but currently your primary key allows separate
> rows to exist for each of those.
>
> Also, PRIMARY KEY UNIQUE is redundant.  A primary key is always
> unique.
>
> --->  Drake Wilson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow operation

2010-10-05 Thread Drake Wilson
Quoth Ian Hardingham , on 2010-10-05 12:16:11 +0100:
> Your query,
> 
> UPDATE userTable SET playedInfIds = ''
> 
> Still took two seconds actually...  but significantly better than what I 
> was doing.

You're doing this only once rather than once per row, right?  On a
table with around 3k rows, it seems a little odd that it would take
that long, even if updating every row tends to be expensive in
general.  What does your schema look like, if I might ask?  Is there
significant concurrent access with that giant update?

> Ian

   ---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow operation

2010-10-05 Thread Ian Hardingham
  I typed the command into my console - not doing it once per row.  
Doing it again, it was more like one second.  No other SQLite commands 
should have been happening near the time of execution.

I'm not entirely sure what "schema" means in this context.  The 
definiton of userTable is:

CREATE TABLE IF NOT EXISTS userTable (name TEXT PRIMARY KEY NOT NULL 
UNIQUE, password TEXT NOT NULL, email TEXT, key TEXT, status TEXT, date 
TEXT, playedFor INTEGER, totalScore FLOAT DEFAULT 0, totalRecord TEXT 
DEFAULT '0\t0', dailyScore FLOAT DEFAULT 0, dailyRecord TEXT DEFAULT 
'0\t0', dailyGameRecord TEXT DEFAULT '', dailyGamesPlayed INTEGER 
DEFAULT 0, scoreStreak TEXT DEFAULT '', scoreStreakNumber INT DEFAULT 0, 
noEmail INT DEFAULT 0, playedInfIds TEXT DEFAULT '')

I haven't changed any of the default PRAGMAs, although I suspect it 
might help me.  This is a game server which generally has 10 - 1000 
users, each sending requests every 5-10 seconds.  Every 30 minutes 
there's a big operation which re-calculates the leaderboards, and that 
can take up to 20 seconds.  I'm considering doing that on a separate 
thread, maybe even with a duplicate database.

Your help us really appreciated.

Ian

On 05/10/2010 12:22, Drake Wilson wrote:
> Quoth Ian Hardingham, on 2010-10-05 12:16:11 +0100:
>> Your query,
>>
>> UPDATE userTable SET playedInfIds = ''
>>
>> Still took two seconds actually...  but significantly better than what I
>> was doing.
> You're doing this only once rather than once per row, right?  On a
> table with around 3k rows, it seems a little odd that it would take
> that long, even if updating every row tends to be expensive in
> general.  What does your schema look like, if I might ask?  Is there
> significant concurrent access with that giant update?
>
>> Ian
> --->  Drake Wilson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow operation

2010-10-05 Thread Drake Wilson
Quoth Ian Hardingham , on 2010-10-05 12:27:38 +0100:
> CREATE TABLE IF NOT EXISTS userTable (name TEXT PRIMARY KEY NOT NULL 
> UNIQUE, password TEXT NOT NULL, email TEXT, key TEXT, status TEXT, date 
> TEXT, playedFor INTEGER, totalScore FLOAT DEFAULT 0, totalRecord TEXT 
> DEFAULT '0\t0', dailyScore FLOAT DEFAULT 0, dailyRecord TEXT DEFAULT 
> '0\t0', dailyGameRecord TEXT DEFAULT '', dailyGamesPlayed INTEGER 
> DEFAULT 0, scoreStreak TEXT DEFAULT '', scoreStreakNumber INT DEFAULT 0, 
> noEmail INT DEFAULT 0, playedInfIds TEXT DEFAULT '')

Those *Record fields look like the sort of thing that will expand to
include large blobs later on.  If this is the case, possibly consider:

  - Moving the blobs into a separate table or tables, if they're
really best represented as singular blobs.  Frequently updating
large blobs isn't going to give you very good performance, but
keeping them in separate tables will help prevent them from
impacting smaller updates.

  - Using separate tables and then actually storing the data in
relational form, if it's suitably representable.  This could
result in much more efficient storage and access, because you'd be
using the SQLite components in a more natural way.  The presence
of that \t suggests that you might be storing sequences of records
in those fields to start with; those could well be separate rows
in a suitable secondary table.

   ---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow operation

2010-10-05 Thread Ian Hardingham
  Thanks again Drake, I'll investigate those alternatives.

On 05/10/2010 13:52, Drake Wilson wrote:
> Quoth Ian Hardingham, on 2010-10-05 12:27:38 +0100:
>> CREATE TABLE IF NOT EXISTS userTable (name TEXT PRIMARY KEY NOT NULL
>> UNIQUE, password TEXT NOT NULL, email TEXT, key TEXT, status TEXT, date
>> TEXT, playedFor INTEGER, totalScore FLOAT DEFAULT 0, totalRecord TEXT
>> DEFAULT '0\t0', dailyScore FLOAT DEFAULT 0, dailyRecord TEXT DEFAULT
>> '0\t0', dailyGameRecord TEXT DEFAULT '', dailyGamesPlayed INTEGER
>> DEFAULT 0, scoreStreak TEXT DEFAULT '', scoreStreakNumber INT DEFAULT 0,
>> noEmail INT DEFAULT 0, playedInfIds TEXT DEFAULT '')
> Those *Record fields look like the sort of thing that will expand to
> include large blobs later on.  If this is the case, possibly consider:
>
>- Moving the blobs into a separate table or tables, if they're
>  really best represented as singular blobs.  Frequently updating
>  large blobs isn't going to give you very good performance, but
>  keeping them in separate tables will help prevent them from
>  impacting smaller updates.
>
>- Using separate tables and then actually storing the data in
>  relational form, if it's suitably representable.  This could
>  result in much more efficient storage and access, because you'd be
>  using the SQLite components in a more natural way.  The presence
>  of that \t suggests that you might be storing sequences of records
>  in those fields to start with; those could well be separate rows
>  in a suitable secondary table.
>
> --->  Drake Wilson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow operation

2010-10-05 Thread Jay A. Kreibich
On Tue, Oct 05, 2010 at 04:08:41AM -0700, Drake Wilson scratched on the wall:
> Quoth Ian Hardingham , on 2010-10-05 11:52:36 +0100:

> > I'm running this code from a scripting language bound to SQLite:
> > 
> >  %r = db.query("SELECT * FROM userTable", 0);
> >  %i = 0;
> > 
> >  db.query("BEGIN TRANSACTION", 0);
> >  while (%i < db.numRows(%r))
> >  {
> >  %username = db.getColumn(%r, name);
> >  db.query("UPDATE userTable SET playedInfIds='' WHERE name LIKE 
> > '?'", 0, %username);
> >  %i ++;
> >  }
> >  db.query("END TRANSACTION", 0);

> A number of points come to mind fairly immediately:
> 
>   - Don't keep a query from outside a transaction active inside it.

  While this is good advice, I don't think that's what is going on
  here.  The SELECT should definitely be inside the BEGIN/END, because
  otherwise the data may change between the SELECT and UPDATE loop, but 
  that's a different issue.  If I understand this scripting API, the
  query() function will run the SELECT to completion, so any auto-commit
  transaction that is open will be shut again by the time the code
  enters the loop.

  Not that it matters... If the SELECT did open an auto-commit
  transaction that was not closed, a transaction would already be open
  and the BEGIN/END wouldn't change that.  The whole collection of
  statements would be run inside the auto-commit transaction until
  all the statements were complete.

>   - This whole loop looks like it could be replaced with the single
> query ? UPDATE userTable SET playedInfIds = '' ? because you're
> just targeting all the rows, unless there's something unobviously
> different that I've missed.

  Yes... usually the mistake is to forget the WHERE, but in this case
  it looks like the OP actually wants to change every row in the table,
  and that can be done with a single UPDATE statement just drop the
  WHERE all together.


  That column name makes me suspicious, however... I have a bad feeling
  that's a text field that is made up of delineated values,
  essentially forming a list.  If true, that's a very poor design.

> Right now you're doing a full table scan to get each name, then doing
> another full table scan for each name to update each row with a
> similar name.  That's O(N^2) in the number of rows; with 3k rows, that
> requires ~9M processing steps.

  It isn't that bad.  He's looking up a PRIMARY KEY without wildcards
  (I assume), so each look-up is O(logN), making the total O(NlogN).
  It is a bloated O(logN) because you're depending on the LIKE
  optimizations, but it is still technically O(logN).  Not good for
  the desired results, which should more or less be O(N), but not
  nearly as bad as O(NN).

> Also, PRIMARY KEY UNIQUE is redundant.  A primary key is always
> unique.

  True, but it does no harm.  SQLite is smart enough to only create one
  index.  The NOT NULL would be redundant in most database systems as
  well, but SQLite requires it for true PK behavior.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow operation

2010-10-05 Thread Ian Hardingham
Hey Jay, thanks for your feedback.

I am indeed using (several) delineated lists.  I would very much 
appreciate your input into how bad a decision this is.

So, I basically need to find entrys of Table B that do not appear in 
that list.  Obviously, it would be better to have a playedInf table and 
do some kind of SELECT FROM B NOT IN A query.

However, I would rather not do the processing at all.  I'm pushing off 
the calculation to the client - and the client does not use a database 
at all. I simply send the list as a text field and get the client to 
sort it out, as he has plenty of processor time.  It seems to me that 
this is quicker than making the list on the fly by doing a SELECT and 
then a concatination from another table...

Am I completely off-base on this?

Thanks,
Ian




>
>
>
>   That column name makes me suspicious, however... I have a bad feeling
>   that's a text field that is made up of delineated values,
>   essentially forming a list.  If true, that's a very poor design.
>
>   

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow operation

2010-10-06 Thread Jay A. Kreibich
On Tue, Oct 05, 2010 at 03:44:38PM +0100, Ian Hardingham scratched on the wall:
> Hey Jay, thanks for your feedback.
> 
> I am indeed using (several) delineated lists.  I would very much 
> appreciate your input into how bad a decision this is.

  I hesitate to call it a bad decision, so much as it is rather
  un-database like, and from that standpoint it is a poor database
  design.  It might be an acceptable application design, however.

  There are kind of two camps on this.  If you're more comfortable
  doing everything in a scripting language or client environment, and
  your data needs are fairly simple, then there isn't a big concern
  about the database design.  You're more or less using the database
  as nothing more than a fancy CSV container.  While this is, in a
  sense, missing the point on what a relational database gives you
  and throwing away a huge amount of expression and processing ability,
  sometimes that's all you need-- and if you need to make frequent
  updates and queries, there are still advantages to using a library
  like SQLite vs. an actual CSV or similar file.

  Then there is the feeling that if you're going to be using a database,
  you might as well "do it right" and actually use the database the way
  it was designed to be used.  I tend to agree with that idea,
  especially when the data needs and structures are anything other than
  the most trivial one-table design.  HOWEVER, I'm well aware that many
  programmers-- even seasoned, experienced developers-- don't have any
  kind of background with relational concepts and design.  Without a basic
  understanding of what a database is good at doing (and how to get the
  database to do such tricks), it is very difficult to design toward the
  strengths of a relational system.  You can't "do it right" if you're
  not aware what "right" is.

  I don't really have an answer to that, other than suggesting
  resources to help you learn.  Maybe it's worth your time, maybe it
  isn't.  There are many books and website tutorials on SQL and databases.
  Some of these are for the serious database professional, while some
  are very entry level.  You might enjoy something like "Using SQLite"
  , which was specifically
  written for the experienced developer that, until this point, has
  never had a need to play around with a database.  While it covers the
  SQLite API and the SQL language, it also tries to provide a brief
  "get you started" introduction to database design and the basic
  "design patterns" used in most designs.  (I should also point out
  that I'm somewhat biased in my preference for this book, given
  that I wrote it.)

> So, I basically need to find entrys of Table B that do not appear in 
> that list.  Obviously, it would be better to have a playedInf table and 
> do some kind of SELECT FROM B NOT IN A query.

  Returning to the problem at hand, one of the first principals of
  database design is that each attribute of each row (that is, a
  column/row "cell") should hold only one value.  (There are some that
  will argue that this "value" can be an array, but that is a point for
  the theory people to argue over, and I'm not interested in having that
  discussion.)  Having "singular" values is one of the rules for what
  is called the First Normal Form.  You can Google that for more
  information, if you don't already know what the Normal Forms are all
  about.

  A big reason for singular values is that all the built-in techniques
  for cross indexing data in one table to another table depend on the
  single values, as that allows the database to match things back up.
  This is one of the more difficult aspects of database, however... The
  concept of JOINs and how to effectively use them tend to trip
  newcomers up a bit, and if you don't understand how to use JOINs to
  bring your data back together, it isn't a big surprise you might
  not want to split your data back up.


  So, in your case, if I can guess some of the details, you've got a
  Many-to-Many relationship (i.e. people in groups; people can belong
  to more than one group, groups have more than one member).  This is
  typically done with a people table, a group table, and a "membership"
  table, known as a "link table" or "bridge table" that connects people
  to groups.  You can then use JOINs to bring all that data back
  together and get the answers you need, even if you're just doing
  equi-JOIN filtering (filtering based off JOIN values that aren't
  actually returned).  To find rows that aren't there, you can often use
  OUTER JOINs and search for NULLs where there shouldn't be any, or you
  can use other operations, such as NOT EXISTS( ).

> However, I would rather not do the processing at all.  I'm pushing off 
> the calculation to the client - and the client does not use a database 
> at all. I simply send the list as a text field and get the client to 
> sort it out, as he has plenty of processor 

Re: [sqlite] Slow operation

2010-10-06 Thread Simon Slavin

On 7 Oct 2010, at 1:46am, Jay A. Kreibich wrote:

> Much of the big "No-SQL" movement does exactly this.
>  The big secret is that most of these applications are still doing
>  relational data operations, they're just pushing the manipulations
>  (such as a JOIN) down into the client code, making the "database" a
>  more simple level storage engine.  Many of the end results are the
>  same, however.  It makes sense when performance is at an utmost
>  premium, and allows greater scalability, but it also increases
>  application complexity and tends to skirt some issues like
>  transactional safety.  Those types of systems also trade flexibility
>  in the general case for complexity and speed in the common case.

Which in itself is a result of Moore's Law.  SQL was invented back when DBMSs 
were needed because computers were slow and memory was expensive.  ORDER BY was 
invented because of the incredible convenience of being able to retrieve your 
records in the order you wanted to process them in.  Being able to create your 
records in one order and retrieve them in another was ridiculous luxury.

Nowdays, although INDEXes are still important to save you from having to grovel 
through the whole file, sorting a thousand records doesn't require writing a 
temporary file.  Read all the records from disk, then sort them in memory.  
Memory is cheap; processing is fast; who cares ?

So nowdays it's possible to push some of those manipulations previously done at 
the database level to your application code.  Which means you need less stuff 
in your DBMS.  Which means it can be better optimised for general operations.  
Which is how you get No-SQL and similar unSQL things.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users