Re: [sqlite] Query with UNION on large table

2011-06-24 Thread Jean-Christophe Deschamps
Hi Rense,


>Thanks for this idea. In fact, the purpose of my original query is
>exactly to reduce the database. The 800 mln rows were exported from
>another source, and I was hoping to be able to use sqlite to manage
>this massive amount of data (e.g., removing redundant information)
>before I proceed with my analyses.
>So, would your approach also work if I import the data, rather than 
>insert?

I'm afraid I'm not understanding your question correctly. What's the 
distinction between import (into SQLite) and insert (into SQLite)?

If 10 rows of your raw data looks like e.g.
111, 222(this couple is direct, n1 <= n2)
115, 227   "
143, 333   "
254, 555   "
567, 666   "
410, 189(this couple is reverse, n1 > n2)
333, 143   "
227, 115   "
555, 254   "
666, 567   "

where only (111, 222) and (189, 410) are not mirrored, you could end up 
with only 6 rows since all couple of pairs which also have a mirror 
will collapse into a single row (first value is the flags):
1, 111, 222
2, 189, 410
3, 115, 227
3, 143, 333
3, 254, 555
3, 567, 666

So yes, with a large ratio of mirrored pairs, your DB will shrink and 
more important will hold roughly half the number of rows (making 
queries slightly less than twice faster).

If your 800M rows have, say, 20M of pairs without mirror, your new 
Schema would make that:
20M orphan rows with flags in (1, 2)
(800 - 20) / 2 = 390M rows with flags = 3
All in all, that would be 410M rows.

The initial 800M-row DB holds (800M * 2) = 1600M integers, but the new 
DB would contains only (410M * 3) = 1230M integers, 410M of them taking 
only one byte of storage (flags).

If your DB is "build once, insert/update rarely and query mostly" you 
will certainly save much both in space and coffee ;-)

You can obviously apply the same schema to the alien DB as well.

--
j...@antichoc.net  

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-24 Thread Rense Corten
Jean-Christophe,
Thanks for this idea. In fact, the purpose of my original query is
exactly to reduce the database. The 800 mln rows were exported from
another source, and I was hoping to be able to use sqlite to manage
this massive amount of data (e.g., removing redundant information)
before I proceed with my analyses.
So, would your approach also work if I import the data, rather than insert?

On Fri, Jun 24, 2011 at 5:45 AM, Jean-Christophe Deschamps
 wrote:
> Rense,
>
>>As for the ranges of n1 and n1: they are both roughly between 6
>>and 1200 .
>>
>>Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
>>Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;
>>
>>1|0|0|SCAN TABLE table1 (~437976176 rows)
>>2|0|0|SCAN TABLE table1 (~437976176 rows)
>>0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)
>
> If you can change your schema at all and your application supports
> taking a bit more time for inserts, I believe you can achieve much
> better query times.
>
> I tried the following schema:
> CREATE TABLE "T" (
>   "flags" INTEGER,
>   "n1" INTEGER DEFAULT (random() % 100),
>   "n2" INTEGER DEFAULT (random() % 100),
>   CONSTRAINT "T_order" CHECK(((n1 <= n2) and (flags between 1 and 3;
>
>>In my database, the mirrored pairs vastly outnumber the non-mirrored
>>ones, to the extent that the non-mirrored pairs are actually extremely
>>rare (but relevant).
>
> Given the fact that mirrored pairs dominate, if you store new pairs
>  (N1, N2)
> as
>  (flags, n1, n2)
> with n1 <= n2 and flags bit 0 set if (n1, n2) = (N1, N2)
>               and flags bit 1 set if (n1, n2) = (N2, N1)
> your query becomes:
>   select * from T where flags = 3;
>
> If you build an index on flags, e.g.
>   CREATE INDEX "ixFlags" ON "T" ("flags");
> then it will be used (but it is regularly pointed out that in cases
> where the distribution of values queried doesn't fall in small bins, an
> index may in fact slow things down).
>
> Since your have mostly mirrored pairs, using the schema above will
> decrease the size of your DB (or keep it about the same if you use the
> index) but and --I see that as a big bonus-- essentially half the
> number of rows.
>
> Note that, if it can be of any value to your application, you can even
> have the mirrored pairs sorted if you make the index
>   CREATE INDEX "ixFlags" ON "T" ("flags", "n1");
> Of course this compound index will take some more room, but depending
> on your needs it may prove useful.
>
> The price to pay is having to query first before insert (or have a
> [slow] trigger do the work for you).  Could your application support that?
>
> Sounds to me it could be worth trying on a reduced DB.
>
>
>
> --
> j...@antichoc.net
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-24 Thread Jean-Christophe Deschamps
Rense,

>As for the ranges of n1 and n1: they are both roughly between 6
>and 1200 .
>
>Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
>Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;
>
>1|0|0|SCAN TABLE table1 (~437976176 rows)
>2|0|0|SCAN TABLE table1 (~437976176 rows)
>0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)

If you can change your schema at all and your application supports 
taking a bit more time for inserts, I believe you can achieve much 
better query times.

I tried the following schema:
CREATE TABLE "T" (
   "flags" INTEGER,
   "n1" INTEGER DEFAULT (random() % 100),
   "n2" INTEGER DEFAULT (random() % 100),
   CONSTRAINT "T_order" CHECK(((n1 <= n2) and (flags between 1 and 3;

>In my database, the mirrored pairs vastly outnumber the non-mirrored
>ones, to the extent that the non-mirrored pairs are actually extremely
>rare (but relevant).

Given the fact that mirrored pairs dominate, if you store new pairs
  (N1, N2)
as
  (flags, n1, n2)
with n1 <= n2 and flags bit 0 set if (n1, n2) = (N1, N2)
   and flags bit 1 set if (n1, n2) = (N2, N1)
your query becomes:
   select * from T where flags = 3;

If you build an index on flags, e.g.
   CREATE INDEX "ixFlags" ON "T" ("flags");
then it will be used (but it is regularly pointed out that in cases 
where the distribution of values queried doesn't fall in small bins, an 
index may in fact slow things down).

Since your have mostly mirrored pairs, using the schema above will 
decrease the size of your DB (or keep it about the same if you use the 
index) but and --I see that as a big bonus-- essentially half the 
number of rows.

Note that, if it can be of any value to your application, you can even 
have the mirrored pairs sorted if you make the index
   CREATE INDEX "ixFlags" ON "T" ("flags", "n1");
Of course this compound index will take some more room, but depending 
on your needs it may prove useful.

The price to pay is having to query first before insert (or have a 
[slow] trigger do the work for you).  Could your application support that?

Sounds to me it could be worth trying on a reduced DB.



--
j...@antichoc.net  

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Max Vlasov
On Thu, Jun 23, 2011 at 10:20 PM, Rense Corten  wrote:

> Wow, I'm almost embarrassed by how helpful you are, gentlemen. Thanks a
> lot.
>
> As to RAM: I'm trying this on two different machines, one with 12 Gb
> and one with 32 Gb RAM. I won't be able to get more in the near
> future. Something that might be relevant is that the case of the 32Gb
> machine, the database is on an AFS.
>
>
Hmm, Jan suggested dedicating the memory to the cache, but if you can use
64-bit sqlite shell, and giving that you have 800 millions (or less) rows,
maybe it's reasonable to switch completely to memory temporary storage?
(PRAGMA temp_store=2;). In best scenario no disk will be involved until
sqlite will be merging two b-trees for your "create table as"

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Martin Gadbois
On Thu, Jun 23, 2011 at 5:59 PM, Simon Slavin  wrote:

>
> On 23 Jun 2011, at 10:56pm, Rense Corten wrote:
>
> > Simon: no, AFS=Andrew File System :). Both my servers run Ubuntu
> > (64-bit). In case of the 12 Gb RAM machine, everything is done
> > locally. On the 32 Gb machine, I'm afraid I can't do that. And in fact
> > I've been using the command-line tool all along.
>
> Oh that's a relief.  Okay, well that got rid of several possible sources of
> problems.
>
>
>From http://en.wikipedia.org/wiki/Andrew_File_System:
"The *Andrew File System* (*AFS*) is a distributed networked file
system
"


-- 
Martin
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Simon Slavin

On 23 Jun 2011, at 10:56pm, Rense Corten wrote:

> Simon: no, AFS=Andrew File System :). Both my servers run Ubuntu
> (64-bit). In case of the 12 Gb RAM machine, everything is done
> locally. On the 32 Gb machine, I'm afraid I can't do that. And in fact
> I've been using the command-line tool all along.

Oh that's a relief.  Okay, well that got rid of several possible sources of 
problems.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Rense Corten
Simon: no, AFS=Andrew File System :). Both my servers run Ubuntu
(64-bit). In case of the 12 Gb RAM machine, everything is done
locally. On the 32 Gb machine, I'm afraid I can't do that. And in fact
I've been using the command-line tool all along.

On Thu, Jun 23, 2011 at 2:45 PM, Simon Slavin  wrote:
>
> On 23 Jun 2011, at 7:20pm, Rense Corten wrote:
>
>> As to RAM: I'm trying this on two different machines, one with 12 Gb
>> and one with 32 Gb RAM. I won't be able to get more in the near
>> future. Something that might be relevant is that the case of the 32Gb
>> machine, the database is on an AFS.
>
> Wait … AFS == Apple File Service ?  You're doing this on Macintoshes 
> accessing storage over a network ?  Okay, then the cause of your speed 
> problems is the network.  Please try creating the INDEX locally on the 
> server.  You don't need to write a program to do this, you can use the 
> command-line tool which you'll find at
>
> /usr/bin/sqlite3
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Simon Slavin

On 23 Jun 2011, at 7:20pm, Rense Corten wrote:

> As to RAM: I'm trying this on two different machines, one with 12 Gb
> and one with 32 Gb RAM. I won't be able to get more in the near
> future. Something that might be relevant is that the case of the 32Gb
> machine, the database is on an AFS.

Wait … AFS == Apple File Service ?  You're doing this on Macintoshes accessing 
storage over a network ?  Okay, then the cause of your speed problems is the 
network.  Please try creating the INDEX locally on the server.  You don't need 
to write a program to do this, you can use the command-line tool which you'll 
find at

/usr/bin/sqlite3

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Simon Slavin

On 23 Jun 2011, at 9:31pm, Jan Hudec wrote:

> You also need to make sure you are using
> 64-bit build of SQLite (32-bit build can only use between 2 and 3 GB and
> that's not enough for such huge database).

If you try to use a 32-bit compilation of SQLite to open a bigger database, 
does sqlite3_open() issue an error message ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Jan Hudec
On Thu, Jun 23, 2011 at 11:20:22 -0700, Rense Corten wrote:
> Wow, I'm almost embarrassed by how helpful you are, gentlemen. Thanks a lot.
> 
> As to RAM: I'm trying this on two different machines, one with 12 Gb
> and one with 32 Gb RAM. I won't be able to get more in the near
> future. Something that might be relevant is that the case of the 32Gb
> machine, the database is on an AFS.

I hope they are running 64-bit OS. You also need to make sure you are using
64-bit build of SQLite (32-bit build can only use between 2 and 3 GB and
that's not enough for such huge database).

Than you need to tell sqlite to use the memory for cache. The cache size is
specified in multiples of page size, so first ask sqlite to tell you what the
database's page size is by issuing

pragma page_size;

query and than set the cache_size by issuing

pragma cache_size = ;

where  is desired cache size divided by the page size. You need to
load about billion rows times two integers, so it will certainly have use for
8GB cache. Try giving it 8 GB on the 12 GB machine and perhaps 24 GB on the
32 GB one.

Since the setting is connection-local, you need to issue the pragma
cache_size command in the application before doing the big operation.


AFS (do I remember right that it's a network filesystem?) is likely slow for
this purpose. Trying on local disk may help.

> As for the ranges of n1 and n1: they are both roughly between 6
> and 1200 .
> 
> Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
> Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;
> 
> 1|0|0|SCAN TABLE table1 (~437976176 rows)
> 2|0|0|SCAN TABLE table1 (~437976176 rows)
> 0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)
> 
> I'm not sure what "reasonable" should look like :). I'm running this
> query right now, let's see what happens.

Well, it shows sqlite is indeed trying to create temporary index to do the
intersect, which can be reasonably fast if the data can fit in the cache
for sorting, but if it starts trashing the cache, it will be pretty slow.
8 GB would hopefully be enough, anything less definitely won't.

Reading such large tables in index order tends to be significantly slower
than reading them sequentially, but covering index (that is one containing
all columns needed, so in this case both n1 and n2) might help as that would
be read in sequential order. I am not sure how well the query planner will
manage to use it here though.

It's worth experimenting with different ways to write the query and indices,
perhaps on a smaller sample first. However creating index, temporary or
regular, will always need huge amount of cache and be unusably slow
otherwise, so setting the cache is probably the most important thing you have
to do.

> In my database, the mirrored pairs vastly outnumber the non-mirrored
> ones, to the extent that the non-mirrored pairs are actually extremely
> rare (but relevant).
> 
> Generally, is there a way to figure out if sqlite is still doing
> anything, or whether it got somehow stuck?

It's possible to install a hook using the
http://sqlite.org/c3ref/progress_handler.html API, which will be called every
N virtual machine instructions, but there is no estimate how many
instructions will be needed overall nor how many instructions are left until
query completion and different instructions (in different phases of the
query) may take different amount of time.

> One thing I noticed is that
> it is not writing to the database while my problematic queries are
> running. Should it be?

Not yet. According to the above query plan, it's first going to do the first
subselect and sort it's output and until that is done, it won't have any
output and thus won't write anything to the database. Than it would probably
be more efficient to sort the second subselect too and merge the two lists,
but I think it's actually not going to.

-- 
 Jan 'Bulb' Hudec 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Black, Michael (IS)
Come to think of itwhy don't you just write code to walk through the table 
just once?

You can then print out your values just like you want.



Also...it sounds as though you might be able to insert your values "in order". 
i.e. always put the smaller value in the first field (I happen to have an 
application that does just that as order is not important).  Your query seems 
to indicate that order is not important.



Then it's just 1 simple query and 1 SCAN TABLE which would be equivalent to 
walking through the table once.

sqlite> explain query plan select n1,n2 from table2;
sele  order  from  deta
  -    
0 0  0 SCAN TABLE table2 (~100 rows)



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Black, Michael (IS) [michael.bla...@ngc.com]
Sent: Thursday, June 23, 2011 2:03 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Query with UNION on large table

Any reason you can't add another field to your database?

0=equal

1=n1n2



The create an index on that field.





Then your query plan would look like this:

sqlite> explain query plan select n1,n2 from table2 where flag = 1 intersect 
select n2,n1 from table2 where flag = 2;
sele  order  from  deta
  -    
1 0  0 SEARCH TABLE table2 USING INDEX idxn4 (flag=?) (~10 
rows)
2 0  0 SEARCH TABLE table2 USING INDEX idxn4 (flag=?) (~10 
rows)
0 0  0 COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE 
(INTERSECT)



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Rense Corten [rcor...@gmail.com]
Sent: Thursday, June 23, 2011 1:20 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Query with UNION on large table

Wow, I'm almost embarrassed by how helpful you are, gentlemen. Thanks a lot.

As to RAM: I'm trying this on two different machines, one with 12 Gb
and one with 32 Gb RAM. I won't be able to get more in the near
future. Something that might be relevant is that the case of the 32Gb
machine, the database is on an AFS.

As for the ranges of n1 and n1: they are both roughly between 6
and 1200 .

Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;

1|0|0|SCAN TABLE table1 (~437976176 rows)
2|0|0|SCAN TABLE table1 (~437976176 rows)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)

I'm not sure what "reasonable" should look like :). I'm running this
query right now, let's see what happens.

In my database, the mirrored pairs vastly outnumber the non-mirrored
ones, to the extent that the non-mirrored pairs are actually extremely
rare (but relevant).

Generally, is there a way to figure out if sqlite is still doing
anything, or whether it got somehow stuck? One thing I noticed is that
it is not writing to the database while my problematic queries are
running. Should it be?

Rense

On Thu, Jun 23, 2011 at 9:33 AM, Jan Hudec <b...@ucw.cz> wrote:
> On Wed, Jun 22, 2011 at 10:25:52 -0700, Rense Corten wrote:
>> CREATE TABLE table2 AS SELECT n1,n2 FROM (SELECT n1, n2 FROM table1
>> UNION SELECT n2 AS n1, n1 AS n2 FROM table1) WHERE(n1<n2) GROUP BY
>> n1,n2;
>
> Have you tried explaining it (prefix the whole query with "explain query
> plan" and run it)? Does the result look reasonable?
>
>> This has the desired result on a small example, but when I try this on
>> my actual table which has about 800 million rows, the query never
>> seems to complete. It has been running for a couple of days now, and
>> it doesn't seem sqlite is still doing anything (cpu usage dropped to
>> almost zero), but I get no error messages.
>
> Is the disk busy? It would mean you are trashing the caches, which is quite
> likely. For this size of database, couple of gigabytes of cache would
> probably be in order. Try giving it as much cache as possible given your
> available memory using 'PRAGMA cache_size'.
>
> --
> Jan 'Bulb' Hudec <b...@ucw.cz>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http

Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Black, Michael (IS)
Any reason you can't add another field to your database?

0=equal

1=n1n2



The create an index on that field.





Then your query plan would look like this:

sqlite> explain query plan select n1,n2 from table2 where flag = 1 intersect 
select n2,n1 from table2 where flag = 2;
sele  order  from  deta
  -    
1 0  0 SEARCH TABLE table2 USING INDEX idxn4 (flag=?) (~10 
rows)
2 0  0 SEARCH TABLE table2 USING INDEX idxn4 (flag=?) (~10 
rows)
0 0  0 COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE 
(INTERSECT)



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Rense Corten [rcor...@gmail.com]
Sent: Thursday, June 23, 2011 1:20 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Query with UNION on large table

Wow, I'm almost embarrassed by how helpful you are, gentlemen. Thanks a lot.

As to RAM: I'm trying this on two different machines, one with 12 Gb
and one with 32 Gb RAM. I won't be able to get more in the near
future. Something that might be relevant is that the case of the 32Gb
machine, the database is on an AFS.

As for the ranges of n1 and n1: they are both roughly between 6
and 1200 .

Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;

1|0|0|SCAN TABLE table1 (~437976176 rows)
2|0|0|SCAN TABLE table1 (~437976176 rows)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)

I'm not sure what "reasonable" should look like :). I'm running this
query right now, let's see what happens.

In my database, the mirrored pairs vastly outnumber the non-mirrored
ones, to the extent that the non-mirrored pairs are actually extremely
rare (but relevant).

Generally, is there a way to figure out if sqlite is still doing
anything, or whether it got somehow stuck? One thing I noticed is that
it is not writing to the database while my problematic queries are
running. Should it be?

Rense

On Thu, Jun 23, 2011 at 9:33 AM, Jan Hudec <b...@ucw.cz> wrote:
> On Wed, Jun 22, 2011 at 10:25:52 -0700, Rense Corten wrote:
>> CREATE TABLE table2 AS SELECT n1,n2 FROM (SELECT n1, n2 FROM table1
>> UNION SELECT n2 AS n1, n1 AS n2 FROM table1) WHERE(n1<n2) GROUP BY
>> n1,n2;
>
> Have you tried explaining it (prefix the whole query with "explain query
> plan" and run it)? Does the result look reasonable?
>
>> This has the desired result on a small example, but when I try this on
>> my actual table which has about 800 million rows, the query never
>> seems to complete. It has been running for a couple of days now, and
>> it doesn't seem sqlite is still doing anything (cpu usage dropped to
>> almost zero), but I get no error messages.
>
> Is the disk busy? It would mean you are trashing the caches, which is quite
> likely. For this size of database, couple of gigabytes of cache would
> probably be in order. Try giving it as much cache as possible given your
> available memory using 'PRAGMA cache_size'.
>
> --
> Jan 'Bulb' Hudec <b...@ucw.cz>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Rense Corten
Yes, I have run ANALYZE.

On Thu, Jun 23, 2011 at 11:56 AM, Simon Slavin  wrote:
>
> On 23 Jun 2011, at 7:20pm, Rense Corten wrote:
>
>> Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
>> Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;
>>
>> 1|0|0|SCAN TABLE table1 (~437976176 rows)
>> 2|0|0|SCAN TABLE table1 (~437976176 rows)
>> 0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)
>
> Have you run ANALYZE ?
>
> http://www.sqlite.org/lang_analyze.html
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Simon Slavin

On 23 Jun 2011, at 7:20pm, Rense Corten wrote:

> Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
> Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;
> 
> 1|0|0|SCAN TABLE table1 (~437976176 rows)
> 2|0|0|SCAN TABLE table1 (~437976176 rows)
> 0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)

Have you run ANALYZE ?

http://www.sqlite.org/lang_analyze.html

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Rense Corten
Wow, I'm almost embarrassed by how helpful you are, gentlemen. Thanks a lot.

As to RAM: I'm trying this on two different machines, one with 12 Gb
and one with 32 Gb RAM. I won't be able to get more in the near
future. Something that might be relevant is that the case of the 32Gb
machine, the database is on an AFS.

As for the ranges of n1 and n1: they are both roughly between 6
and 1200 .

Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;

1|0|0|SCAN TABLE table1 (~437976176 rows)
2|0|0|SCAN TABLE table1 (~437976176 rows)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)

I'm not sure what "reasonable" should look like :). I'm running this
query right now, let's see what happens.

In my database, the mirrored pairs vastly outnumber the non-mirrored
ones, to the extent that the non-mirrored pairs are actually extremely
rare (but relevant).

Generally, is there a way to figure out if sqlite is still doing
anything, or whether it got somehow stuck? One thing I noticed is that
it is not writing to the database while my problematic queries are
running. Should it be?

Rense

On Thu, Jun 23, 2011 at 9:33 AM, Jan Hudec  wrote:
> On Wed, Jun 22, 2011 at 10:25:52 -0700, Rense Corten wrote:
>> CREATE TABLE table2 AS SELECT n1,n2 FROM (SELECT n1, n2 FROM table1
>> UNION SELECT n2 AS n1, n1 AS n2 FROM table1) WHERE(n1> n1,n2;
>
> Have you tried explaining it (prefix the whole query with "explain query
> plan" and run it)? Does the result look reasonable?
>
>> This has the desired result on a small example, but when I try this on
>> my actual table which has about 800 million rows, the query never
>> seems to complete. It has been running for a couple of days now, and
>> it doesn't seem sqlite is still doing anything (cpu usage dropped to
>> almost zero), but I get no error messages.
>
> Is the disk busy? It would mean you are trashing the caches, which is quite
> likely. For this size of database, couple of gigabytes of cache would
> probably be in order. Try giving it as much cache as possible given your
> available memory using 'PRAGMA cache_size'.
>
> --
>                                                 Jan 'Bulb' Hudec 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Jan Hudec
On Wed, Jun 22, 2011 at 10:25:52 -0700, Rense Corten wrote:
> CREATE TABLE table2 AS SELECT n1,n2 FROM (SELECT n1, n2 FROM table1
> UNION SELECT n2 AS n1, n1 AS n2 FROM table1) WHERE(n1 n1,n2;

Have you tried explaining it (prefix the whole query with "explain query
plan" and run it)? Does the result look reasonable?

> This has the desired result on a small example, but when I try this on
> my actual table which has about 800 million rows, the query never
> seems to complete. It has been running for a couple of days now, and
> it doesn't seem sqlite is still doing anything (cpu usage dropped to
> almost zero), but I get no error messages.

Is the disk busy? It would mean you are trashing the caches, which is quite
likely. For this size of database, couple of gigabytes of cache would
probably be in order. Try giving it as much cache as possible given your
available memory using 'PRAGMA cache_size'.

-- 
 Jan 'Bulb' Hudec 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Tom Holden
Is the Hard Drive thrashing? Could be that most everything is being done in 
swap files. Given the size of the table, a lot more RAM would help.

Tom 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Max Vlasov
On Wed, Jun 22, 2011 at 9:25 PM, Rense Corten  wrote:

>
> CREATE TABLE table2 AS SELECT n1,n2 FROM (SELECT n1, n2 FROM table1
> UNION SELECT n2 AS n1, n1 AS n2 FROM table1) WHERE(n1 n1,n2;
>
> This has the desired result on a small example, but when I try this on
> my actual table which has about 800 million rows, the query never
> seems to complete. It has been running for a couple of days now, and
> it doesn't seem sqlite is still doing anything (cpu usage dropped to
> almost zero)...
>

Rense, looks like an interesting task.

What are the ranges of n1 and n2? We can take sqlite logic into account, but
regardless of the one, if the range is small comaring to the row count, the
result set will be small, so there's always possible to do full scan
maintaining the result as a small table (or memory array). I suppose sqlite
sometimes goes this way after the some guessing. But if the range is large
(so also a large result set is expected), there's no way other then prior
sorting of both sub-tables (or probably by creating two indexes). Looking at
how your query executed against my test data and seeing I/i read and write
statistics, sqlite probably does these temporary tables creation.

I noticed that fewer reads/writes will be with the following variant

SELECT * FROM (SELECT n1, n2 FROM table1 Where n1 < n2)
UNION
SELECT * FROM (SELECT n2, n1 FROM table1 Where n2 < n1)

since this one decreases the sizes of the tables that should be ordered.

Max Vlasov
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Simon Slavin

On 23 Jun 2011, at 1:00am, Rense Corten wrote:

> @Simon: I don't have a problem per se with things running overnight,
> as long as I can be sure that things will complete at some point...the
> result of "PRAGMA integrity_check;", which by the way took less then
> an hour, is "ok".  Any suggestions?

Well, that is a major hurdle cleared.  Sorry I don't have any better ideas.  I 
hope someone else on the list does.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Rense Corten
Thanks Simon, Jim and Tom for your replies!

@Simon: I don't have a problem per se with things running overnight,
as long as I can be sure that things will complete at some point...the
result of "PRAGMA integrity_check;", which by the way took less then
an hour, is "ok".  Any suggestions?

@Jim: I'm running your alternative query right now, but so far (after
a couple of hours) seems to have the same symptoms: cpu jumps at
first, a journal file is created, but then cpu usage drops to almost
nothing and keeps decreasing, and no observable changes to the
database file.

@Tom: you are right, the version I posted removes only the "reverse
copies" but leaves the rest in, which is another operation I need to
do, and I mixed up the code when posting. To get the result I
initially described, I would indeed use an "intersect"  as you
suggested. Sorry for the confusion.

Rense
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Rense Corten
Thanks Simon, Jim and Tom for your replies!

@Simon: I don't have a problem per se with things running overnight,
as long as I can be sure that things will complete at some point...the
result of "PRAGMA integrity_check;", which by the way took less then
an hour, is "ok".  Any suggestions?

@Jim: I'm running your alternative query right now, but so far (after
a couple of hours) seems to have the same symptoms: cpu jumps at
first, a journal file is created, but then cpu usage drops to almost
nothing and keeps decreasing, and no observable changes to the
database file.

@Tom: you are right, the version I posted removes only the "reverse
copies" but leaves the rest in, which is another operation I need to
do, and I mixed up the code when posting. To get the result I
initially described, I would indeed use an "intersect"  as you
suggested. Sorry for the confusion.

Rense
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Tom Holden
This is even faster, on a tiny table:

SELECT n1, n2 FROM table1
  INTERSECT
   SELECT n2, n1  FROM table1
WHERE n2  1

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Jim Morris
Did you try to time a simpler select:

SELECT min(n1, n2) as new1, max(n1,n2) as new2 FROM table1 group by new1, new2 
having count(*)>  1


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Simon Slavin

On 22 Jun 2011, at 6:36pm, Simon Slavin wrote:

> With a billion rows,

I forgot to say that there shouldn't really be a problem with a database of 
this size.  If you have a single-user single-process use for a database of this 
size there's no reason SQLite shouldn't be a good solution for you.  But I am 
interested to know what's causing your problem.

Simon.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Simon Slavin

On 22 Jun 2011, at 6:25pm, Rense Corten wrote:

> This has the desired result on a small example, but when I try this on
> my actual table which has about 800 million rows, the query never
> seems to complete. It has been running for a couple of days now, and
> it doesn't seem sqlite is still doing anything (cpu usage dropped to
> almost zero), but I get no error messages. I should mention that
> table1 is not indexed, as indexing this large table also didn't work
> out yet

I'm actually more concerned about your problem with creating the index than 
about your SELECT.  In fact, it's possible that the SELECT optimiser inside 
SQLite has decided that the best way to perform your very complicated SELECT 
command is to make a temporary index.

Please run "PRAGMA integrity_check" on your database as in



With a billion rows, this will probably need at least an overnight run.  In 
fact the integrity check, creating the index, and your SELECT will all need an 
overnight run.  And I suspect that your are stressing some component of your 
setup, possibly your operating-system-level disk caching or your hard disk 
drive.

The integrity_check should be a good first check to see whether it's worth 
pursuing the other things.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users