Re: [sqlite] Query with UNION on large table

2011-06-24 Thread Rense Corten
Jean-Christophe,
Thanks for this idea. In fact, the purpose of my original query is
exactly to reduce the database. The 800 mln rows were exported from
another source, and I was hoping to be able to use sqlite to manage
this massive amount of data (e.g., removing redundant information)
before I proceed with my analyses.
So, would your approach also work if I import the data, rather than insert?

On Fri, Jun 24, 2011 at 5:45 AM, Jean-Christophe Deschamps
 wrote:
> Rense,
>
>>As for the ranges of n1 and n1: they are both roughly between 6
>>and 1200 .
>>
>>Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
>>Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;
>>
>>1|0|0|SCAN TABLE table1 (~437976176 rows)
>>2|0|0|SCAN TABLE table1 (~437976176 rows)
>>0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)
>
> If you can change your schema at all and your application supports
> taking a bit more time for inserts, I believe you can achieve much
> better query times.
>
> I tried the following schema:
> CREATE TABLE "T" (
>   "flags" INTEGER,
>   "n1" INTEGER DEFAULT (random() % 100),
>   "n2" INTEGER DEFAULT (random() % 100),
>   CONSTRAINT "T_order" CHECK(((n1 <= n2) and (flags between 1 and 3;
>
>>In my database, the mirrored pairs vastly outnumber the non-mirrored
>>ones, to the extent that the non-mirrored pairs are actually extremely
>>rare (but relevant).
>
> Given the fact that mirrored pairs dominate, if you store new pairs
>  (N1, N2)
> as
>  (flags, n1, n2)
> with n1 <= n2 and flags bit 0 set if (n1, n2) = (N1, N2)
>               and flags bit 1 set if (n1, n2) = (N2, N1)
> your query becomes:
>   select * from T where flags = 3;
>
> If you build an index on flags, e.g.
>   CREATE INDEX "ixFlags" ON "T" ("flags");
> then it will be used (but it is regularly pointed out that in cases
> where the distribution of values queried doesn't fall in small bins, an
> index may in fact slow things down).
>
> Since your have mostly mirrored pairs, using the schema above will
> decrease the size of your DB (or keep it about the same if you use the
> index) but and --I see that as a big bonus-- essentially half the
> number of rows.
>
> Note that, if it can be of any value to your application, you can even
> have the mirrored pairs sorted if you make the index
>   CREATE INDEX "ixFlags" ON "T" ("flags", "n1");
> Of course this compound index will take some more room, but depending
> on your needs it may prove useful.
>
> The price to pay is having to query first before insert (or have a
> [slow] trigger do the work for you).  Could your application support that?
>
> Sounds to me it could be worth trying on a reduced DB.
>
>
>
> --
> j...@antichoc.net
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Rense Corten
Simon: no, AFS=Andrew File System :). Both my servers run Ubuntu
(64-bit). In case of the 12 Gb RAM machine, everything is done
locally. On the 32 Gb machine, I'm afraid I can't do that. And in fact
I've been using the command-line tool all along.

On Thu, Jun 23, 2011 at 2:45 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>
> On 23 Jun 2011, at 7:20pm, Rense Corten wrote:
>
>> As to RAM: I'm trying this on two different machines, one with 12 Gb
>> and one with 32 Gb RAM. I won't be able to get more in the near
>> future. Something that might be relevant is that the case of the 32Gb
>> machine, the database is on an AFS.
>
> Wait … AFS == Apple File Service ?  You're doing this on Macintoshes 
> accessing storage over a network ?  Okay, then the cause of your speed 
> problems is the network.  Please try creating the INDEX locally on the 
> server.  You don't need to write a program to do this, you can use the 
> command-line tool which you'll find at
>
> /usr/bin/sqlite3
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Rense Corten
Yes, I have run ANALYZE.

On Thu, Jun 23, 2011 at 11:56 AM, Simon Slavin <slav...@bigfraud.org> wrote:
>
> On 23 Jun 2011, at 7:20pm, Rense Corten wrote:
>
>> Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
>> Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;
>>
>> 1|0|0|SCAN TABLE table1 (~437976176 rows)
>> 2|0|0|SCAN TABLE table1 (~437976176 rows)
>> 0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)
>
> Have you run ANALYZE ?
>
> http://www.sqlite.org/lang_analyze.html
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-23 Thread Rense Corten
Wow, I'm almost embarrassed by how helpful you are, gentlemen. Thanks a lot.

As to RAM: I'm trying this on two different machines, one with 12 Gb
and one with 32 Gb RAM. I won't be able to get more in the near
future. Something that might be relevant is that the case of the 32Gb
machine, the database is on an AFS.

As for the ranges of n1 and n1: they are both roughly between 6
and 1200 .

Here are the results of  EXPLAIN QUERY PLAN SELECT n1, n2 FROM table1
Where n1 < n2 INTERSECT SELECT n2, n1 FROM table1 Where n2 < n1;

1|0|0|SCAN TABLE table1 (~437976176 rows)
2|0|0|SCAN TABLE table1 (~437976176 rows)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (INTERSECT)

I'm not sure what "reasonable" should look like :). I'm running this
query right now, let's see what happens.

In my database, the mirrored pairs vastly outnumber the non-mirrored
ones, to the extent that the non-mirrored pairs are actually extremely
rare (but relevant).

Generally, is there a way to figure out if sqlite is still doing
anything, or whether it got somehow stuck? One thing I noticed is that
it is not writing to the database while my problematic queries are
running. Should it be?

Rense

On Thu, Jun 23, 2011 at 9:33 AM, Jan Hudec <b...@ucw.cz> wrote:
> On Wed, Jun 22, 2011 at 10:25:52 -0700, Rense Corten wrote:
>> CREATE TABLE table2 AS SELECT n1,n2 FROM (SELECT n1, n2 FROM table1
>> UNION SELECT n2 AS n1, n1 AS n2 FROM table1) WHERE(n1<n2) GROUP BY
>> n1,n2;
>
> Have you tried explaining it (prefix the whole query with "explain query
> plan" and run it)? Does the result look reasonable?
>
>> This has the desired result on a small example, but when I try this on
>> my actual table which has about 800 million rows, the query never
>> seems to complete. It has been running for a couple of days now, and
>> it doesn't seem sqlite is still doing anything (cpu usage dropped to
>> almost zero), but I get no error messages.
>
> Is the disk busy? It would mean you are trashing the caches, which is quite
> likely. For this size of database, couple of gigabytes of cache would
> probably be in order. Try giving it as much cache as possible given your
> available memory using 'PRAGMA cache_size'.
>
> --
>                                                 Jan 'Bulb' Hudec <b...@ucw.cz>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Rense Corten
Thanks Simon, Jim and Tom for your replies!

@Simon: I don't have a problem per se with things running overnight,
as long as I can be sure that things will complete at some point...the
result of "PRAGMA integrity_check;", which by the way took less then
an hour, is "ok".  Any suggestions?

@Jim: I'm running your alternative query right now, but so far (after
a couple of hours) seems to have the same symptoms: cpu jumps at
first, a journal file is created, but then cpu usage drops to almost
nothing and keeps decreasing, and no observable changes to the
database file.

@Tom: you are right, the version I posted removes only the "reverse
copies" but leaves the rest in, which is another operation I need to
do, and I mixed up the code when posting. To get the result I
initially described, I would indeed use an "intersect"  as you
suggested. Sorry for the confusion.

Rense
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query with UNION on large table

2011-06-22 Thread Rense Corten
Thanks Simon, Jim and Tom for your replies!

@Simon: I don't have a problem per se with things running overnight,
as long as I can be sure that things will complete at some point...the
result of "PRAGMA integrity_check;", which by the way took less then
an hour, is "ok".  Any suggestions?

@Jim: I'm running your alternative query right now, but so far (after
a couple of hours) seems to have the same symptoms: cpu jumps at
first, a journal file is created, but then cpu usage drops to almost
nothing and keeps decreasing, and no observable changes to the
database file.

@Tom: you are right, the version I posted removes only the "reverse
copies" but leaves the rest in, which is another operation I need to
do, and I mixed up the code when posting. To get the result I
initially described, I would indeed use an "intersect"  as you
suggested. Sorry for the confusion.

Rense
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query with UNION on large table

2011-06-22 Thread Rense Corten
Hi list,
I am rather new to sqlite, and try to use it to manage a large
database. The problem I have is the following:
I Have a table "table1" that looks like this:

n1,n2
1,3
3,1
2,3
3,2
2,4

thus there exists "reverse copies" of (1,3) and (2,3), while there is
no such copy of (2,4) . I want to reduce this to:

n1,n2
1,3
2,3

that is, keep only the rows of which there exists a reverse copy, and
then keep only one of those copies.

I tried the following query to achieve this:

CREATE TABLE table2 AS SELECT n1,n2 FROM (SELECT n1, n2 FROM table1
UNION SELECT n2 AS n1, n1 AS n2 FROM table1) WHERE(n1

[sqlite] sqlite seems to hang while indexing

2011-06-14 Thread Rense Corten
Hi list,

I was trying to create an index on an integer column in a very large
table with over 400,000,000 rows on an Ubuntu server, 8-core opteron
2384, 32 Gb RAM. After a couple of days, the process was still running
and nothing seemed to happen. At first, sqlite uses a lot of cpu and
more and more  memory, until it reached about 90% memory use, after
which cpu usage drops. Also, I don't see the size of the database file
increasing, which I guess should happen while the index is created. I
increased the cache size to 2 but to no avail. Can it be that
this table is somehow too large to index? Or should I just wait
longer? If so, how, do I know? Any other ideas?

Thanks!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] .import error: cannot open large file

2011-06-06 Thread Rense Corten
Just to report back on this issue: recompiling as per Nuno's
instructions indeed solved the problem. Still, if anyone can explain
to me why the original executable would work without problems on a
different machine, I would be grateful.

>
> On Fri, Jun 3, 2011 at 10:28 AM, Rense Corten <rcor...@gmail.com> wrote:
>> Thanks for your answer, Nuno.
>> However, the system I am using is already 64-bit (I should have
>> mentioned that) , and the same binary can do the job on another Ubuntu
>> 64-bit  system. I'll try your suggestion nevertheless, but can there
>> be other causes?
>> Rense
>>
>>> It's what the thread says. The SQLite shell on Ubuntu (on 11.04) isn't
>>> compiled with large file support on 32-bit systems, so while the
>>> SQLite library does work with 64-bit database, the shell doesn't.
>>> The easy solution is to either use a 64-bit Ubuntu system or compile
>>> the shell yourself with large file support.
>>>
>>> To compile it, download the sqlite amalgamation files and run:
>>>
>>> gcc -o sqli -O3 -DNDEBUG=1 -D_FILE_OFFSET_BITS=64 sqlite3.c shell.c
>>> -ldl -pthread
>>>
>>> The resulting binary (sqli) will be compiled with large file support
>>> (I verified it was using strace).
>>>
>>>
>>> Regards,
>>> ~Nuno Lucas
>>>
>>> P.S.- While this could be considered an Ubuntu bug, the truth is that
>>> the linux shell binary on the sqlite site also isn't compiled with
>>> large file support, so I would consider this an SQLite bug.
>>>
>>>
>>> --
>>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] .import error: cannot open large file

2011-06-03 Thread Rense Corten
Thanks for your answer, Nuno.
However, the system I am using is already 64-bit (I should have
mentioned that) , and the same binary can do the job on another Ubuntu
64-bit  system. I'll try your suggestion nevertheless, but can there
be other causes?
Rense

> It's what the thread says. The SQLite shell on Ubuntu (on 11.04) isn't
> compiled with large file support on 32-bit systems, so while the
> SQLite library does work with 64-bit database, the shell doesn't.
> The easy solution is to either use a 64-bit Ubuntu system or compile
> the shell yourself with large file support.
>
> To compile it, download the sqlite amalgamation files and run:
>
> gcc -o sqli -O3 -DNDEBUG=1 -D_FILE_OFFSET_BITS=64 sqlite3.c shell.c
> -ldl -pthread
>
> The resulting binary (sqli) will be compiled with large file support
> (I verified it was using strace).
>
>
> Regards,
> ~Nuno Lucas
>
> P.S.- While this could be considered an Ubuntu bug, the truth is that
> the linux shell binary on the sqlite site also isn't compiled with
> large file support, so I would consider this an SQLite bug.
>
>
> --
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] .import error: cannot open large file

2011-06-02 Thread Rense Corten
Hi all,

I'm trying to import a large file of about 13GB using SQLite 3.7.6.3
on Ubuntu. I use the precompiled Linux binary.

The commands are:
=
.separator ";"
.import largefile.csv mytable
=

but then I get: "Error: cannot open "largefile.csv" "

I can view the file with "head" or "less", so there seems to be no
problem with readability or permissions. Moreover, I can (partially)
import the same file on a different Ubuntu system using the exact same
commands (but run out of storage space before the import completes).

So I searched the archives of this list and found two threads on this:

http://www.mail-archive.com/sqlite-users@sqlite.org/msg51574.html
http://www.mail-archive.com/sqlite-users@sqlite.org/msg48649.html

The first thread got no answers, but the second suggest to either
split the file or recompile sqlite3 with the option for large file
support . Now I had understood that since version 3.5.9 large file
support is switched on by default so that should not be the problem
(http://www.sqlite.org/changes.html). Splitting the file, however,
seems to solve the problem. I would prefer not to have to split the
file first.

Any ideas on what causes this problem?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users