Re: [sqlite] use of parentheses when not arithmetically or logically required

2009-08-18 Thread yaconsult

Simon,

Thank you for the optimization.  I'll switch to using between.  This
particular database is used for generating some statistics and generating
reports, so performance is not that big an issue.

When you talk about "clever use of an index", are you referring to a
combined index or merely indices on each column?

Thanks for your input!
-- 
View this message in context: 
http://www.nabble.com/use-of-parentheses-when-not-arithmetically-or-logically-required-tp25033803p25035025.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] use of parentheses when not arithmetically or logically required

2009-08-18 Thread yaconsult

I have searched the list and understand the need for parentheses when
required by logical or arithmetic operations or subselects.

My question is if it makes a difference to use parentheses when not
logically required, as in the following example where the only logical used
is "and":

select * from log
where
  (response >= 200 and response < 300)
  and
  (port >= 8444 and port <= 8459)

Is there any difference in execution or performance if the parentheses are
present or not?  I'm guessing not, but would like confirmation.

select * from log
where
  response >= 200
  and response < 300
  and port >= 8444
  and port <= 8459

Thanks!
-- 
View this message in context: 
http://www.nabble.com/use-of-parentheses-when-not-arithmetically-or-logically-required-tp25033803p25033803.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 3 million rows, query speeds, and returning zero for rows that don't exist

2009-07-01 Thread yaconsult

I'm using sqlite to do some analysis on very large web application log
files - approaching 3 million lines per day.  And what a wonderful tool
it is!  It has saved me from writing lots of custom scripts.

I have a perl script that parses an rsynced copy of this huge log file,
munges, converts, and adds new entries to my database.

The schema currently looks something like this - some of the columns
are there as a aid to development/debugging so I can see what's
going on:

CREATE TABLE 'log' (
 ip VARCHAR,
 datetimestring VARCHAR COLLATE NOCASE,
 uid VARCHAR COLLATE NOCASE,
 timestamp INTEGER,
 operation VARCHAR COLLATE NOCASE,
 target VARCHAR,
 response INTEGER,
 client VARCHAR COLLATE NOCASE,
 completion VARCHAR COLLATE NOCASE,
 port INTEGER,
 connections INTEGER,
 date INTEGER,
 hour INTEGER,
 min INTEGER,
 sec INTEGER
)

I've been experimenting with indices as far as when the tradeoff in
size and speed of adding data is offset sufficiently by how fast I
need to get the results.  For testing purposes, I created indexes
for all the columns used in the selects below.

Most of the queries I've done so far have been pretty straightforward
and it's worked very well.  But, now I need to do one that's taking
too long.  There's probably a better way than the one I'm using.

The problem is that I need to produce information for the timeslots
that are missing as well as those that are there.

For example, I need to be able to evaluate the effect of different
load balancers, so I want to compare the numbers of transactions on
all the ports.   I need to do so on a second by second basis so I can
see the most detail and not an average.

I thought one way do do this would be with a self-join because
there are so many transactions that I'm pretty sure that all seconds
will be present in the table.  I also considered creating a second
table derived from the first whenever it's updated that would have a
unix epoch entry for each second within the log file. 

Here's what I tried for the first approach:

select
distinct t.timestamp,
t.port,
count(u.timestamp)
from
log t
left join
log u
on
t.timestamp = u.timestamp
and
t.timestamp >= 1246266000
and u.uid != "-"
and (u.response = 207
 or u.response = 200)
and u.port = 8444;

The purpose of the join is to give me all of the timestamps - one for
each second - even those seconds that had no activity on that port -
and then use those for the query.  But this query is taking a very,
very, very long time.  I'm probably making some newbie mistake,
because that's what I am!

Is this a valid approach?  Is there a better solution?

I am able to get results extremely quickly by using something like
this:

select
date,
hour,
min,
sec,
count(port)
from log
where
uid != "-"
and (response = 207
 or response = 200)
and port = 8444
group by
timestamp
order by
timestamp;

but the problem is that I also need to know when the ports are not
busy and there are no log entries.  Here are the last few lines of the
result: 

2009-06-29|17|42|0|2
2009-06-29|17|42|7|1
2009-06-29|17|42|8|4

What I need to have in the results are entries for seconds 1-6 with a
value of 0, like this:

2009-06-29|17|42|0|2
2009-06-29|17|42|1|0
2009-06-29|17|42|2|0
2009-06-29|17|42|3|0
2009-06-29|17|42|4|0
2009-06-29|17|42|5|0
2009-06-29|17|42|6|0
2009-06-29|17|42|7|1
2009-06-29|17|42|8|4

Am I on the right path with the self-join?  I also considered adding a
new table to the database that contained all the possible time values
in epoch seconds - I could easily do so with the script that creates
the database and adds the data.  I could then left join this table
with the data table.  But would doing so be substantially faster than
the self-join?

The examples above are for only one port - I need to do the same for
16 ports.  I wrote a perl script to do queries for each second, but it was
taking about 1 minute to do queries for the 16 ports for a 1 second
interval - way too slow to be used to produce results for 24 hour periods.

Thanks for any advice!  I learn a lot from this list.
Leo
-- 
View this message in context: 
http://www.nabble.com/3-million-rows%2C-query-speeds%2C-and-returning-zero-for-rows-that-don%27t-exist-tp24299431p24299431.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] how do I use more than one column as a "combined key"?

2009-05-12 Thread yaconsult

SQL newbie here.  Sqlite has been a fantastic tool for analyzing, comparing
and correcting lots of account information.  But I've hit a roadblock
because I don't know how to treat multiple columns as a kind of combined
key.

I need to know how to relate two tables
on multiple columns.

It's been easy when I only had to relate
using a single column.

The tables are pretty big - 20,000+ entries.

User Accounts
name
uid
server
login
.
.
.

Calendar Accounts
server
login
firstname
lastname
.
.
.


What I need to be able to do is to check
that the server and login information
in the first table matches one and only
one of the accounts in the second table.

What I'm having trouble figuring out is
how to use the server and login as
a combined key.

When I have a single unique value that
I can relate, like a DS ID, I know how
to do that.

How can I query to find out which user
accounts don't match up with one and
only one calendar accounts?  And which
calendar accounts are not associated
with a single user account?

Thanks for any help you can provide.
Sqlite is the perfect tool for this kind of stuff!
-- 
View this message in context: 
http://www.nabble.com/how-do-I-use-more-than-one-column-as-a-%22combined-key%22--tp23510319p23510319.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How do I do this join on multiple columns?

2009-03-02 Thread yaconsult

SQL newbie here.  I have data in a couple of tables that I need to relate. 
But I don't know how to relate on more than one column.  I'll not go into
detail about why the data is formed the way it is - it comes from other
programs.

For example, let's say there are a couple of tables:

server login name last_login email ...

and we have a second table that contains columns for server and login:

server login

How can I, for example, select rows of the first table where their server
and login, combined, are NOT in the second table.
It's sorta like a composite key.  Do I do it with a couple of joins?  Could
I see an example?

If it were only a single column I could use a "not in" subselect, but I'm
not sure how to do it when it involves more than one column.

Thanks for any help!


-- 
View this message in context: 
http://www.nabble.com/How-do-I-do-this-join-on-multiple-columns--tp22301000p22301000.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users