Re: [sqlite] faster query - help

2013-01-25 Thread Igor Tandetnik

On 1/25/2013 9:45 AM, moti lahiani wrote:

Hi

I have a data base with the following tables
1) files - each file in the system have his information in this table. that
table include 12 different columns like file_id as integer primary key
autoincrement, date_uploaded as integer, file_type as integer and so..
2) music_files - hold more information on file in case the file is music
file. columns like file_id as integer, artist, genre, album_name and more

I need to query those 2 tables (with both all information) according to
some filters/rule that I have, order the results according to date_uploaded
and return the top(10) as final result

my question is what is faster and have the best performance: (the table
files can have more than 150 rows)

option 1: select the file_id and date_uploaded, order by date_uploaded and
get the top(10) file_id and than select all information from both(files and
music_files) tables according to those top(10) results
option 2: select all information from both(files+music files) tables, order
by date_uploaded and return the top(10) as results


At which point do you plan to apply your filters? In option 1, if you 
first select top ten file IDs and *then* apply filters to them, then you 
would likely end up with fewer than ten files. But if you apply filters 
at the same time you choose your ten IDs, then you need to read 
associated data anyway, and your option 1 becomes equivalent to option 2.


I'd say, just do

select * from files left join music_files using file_id
where YourFilterGoesHere
order by date_uploaded limit 10;

An index on files(date_uploaded) would likely prove beneficial.
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Substring Search Across Many Columns

2013-01-25 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 25/01/13 12:59, Paul Vercellotti wrote:
> As I understand, it's tricky to get FTS to do substring matching, no?
> What's the best way to do that?

In what way is it tricky?  There are several examples of doing it in the
doc I pointed to.  Even when it does a full scan the list of all words
should be shorter than visiting each source row.

I recommend you actually go ahead and use FTS before deciding it doesn't
work.  You'll be able to get accurate performance information for your
data set.

  http://c2.com/cgi/wiki?PrematureOptimization

If you want to do substring matching using an index then you need to use
n-grams.  This involves taking fragments from the text.  For example if
your source text is "hi there" and you are doing n-grams between 2 and 4
letters then you would index these:

  'hi' 'hi ' 'hi t' 'i ' 'i t' 'i th' ' t' ' th' ' the' 'th'
  'the' 'ther' 'he' 'her' 'here' 'er' 'ere' 're'

You can possibly also use a FTS tokenizer that produces n-grams.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEDFdwACgkQmOOfHg372QTShgCfXMmtiWFbWL9INRMF4TfTUTGb
5+IAn2LrTYKTm9mLcJ6mR6piRQ8LT6nw
=taL+
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite-analyzer & largish db?

2013-01-25 Thread Petite Abeille
Hello,

[3.7.15.2 2013-01-09 11:53:05 c0e09560d26f0a6456be9dd3447f5311eb4f238f]
[sqlite-analyzer-osx-x86-3071502]

sqlite3_analyzer seems to, hmmm, overflow or something given a largish db.

For example, for a db of 8,434,233,344 bytes (8.43 GB on disk) :

--8<--

*** All tables and indices ***

Percentage of total database.. 100.000%  
Number of entries. 143344555 
Bytes of storage consumed. -155705344
Bytes used after compression.. 8434229248 -5416.8%
Bytes of payload.. 6972547011 -4478.0% 

-->8--

Note the negative 'Bytes of storage consumed' and associated negative 
percentages.

Bug or feature?

Thanks in advance.

Cheers,

PA.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Substring Search Across Many Columns

2013-01-25 Thread Paul Vercellotti



As I understand, it's tricky to get FTS to do substring matching, no?  What's 
the best way to do that?

Thanks!
Paul



 From: Roger Binns 
To: General Discussion of SQLite Database  
Sent: Friday, January 25, 2013 12:01 PM
Subject: Re: [sqlite] Substring Search Across Many Columns
 
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 25/01/13 11:30, Paul Vercellotti wrote:
> I'm trying to match a substring (case-insensitive) across multiple
> columns and want it to go fast; it's very slow to do a query like '...
> WHERE name LIKE "%fish%" OR desc LIKE "%red%" OR title LIKE
> "%soup%"...'
> 
> Will creating a composite index ...

If you are concerned about performance then SQLite already has a solution
for you:

  http://www.sqlite.org/fts3.html

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEC5IEACgkQmOOfHg372QSNawCfcstLdcz+03qyerOuCRD1apxg
OsIAoKmLokdeHugQARtxRmD3bxVwBu99
=MyPO
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] identify virtual tables?

2013-01-25 Thread Petite Abeille
Hello,

What would be a reasonable way to programmatically identify all the virtual 
tables in a database?

Is there a structured way to do so?

Short of scanning the DDLs that is?

selectname
from  sqlite_master

where type = 'table'
and   lower( sql ) like '% virtual %'

order by  name;

Thanks in advance.

Cheers,

PA.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Substring Search Across Many Columns

2013-01-25 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 25/01/13 11:30, Paul Vercellotti wrote:
> I'm trying to match a substring (case-insensitive) across multiple
> columns and want it to go fast; it's very slow to do a query like '...
> WHERE name LIKE "%fish%" OR desc LIKE "%red%" OR title LIKE
> "%soup%"...'
> 
> Will creating a composite index ...

If you are concerned about performance then SQLite already has a solution
for you:

  http://www.sqlite.org/fts3.html

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEC5IEACgkQmOOfHg372QSNawCfcstLdcz+03qyerOuCRD1apxg
OsIAoKmLokdeHugQARtxRmD3bxVwBu99
=MyPO
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Substring Search Across Many Columns

2013-01-25 Thread Paul Vercellotti


Hi there,

I'm trying to match a substring (case-insensitive) across multiple columns and 
want it to go fast; it's very slow to do a query like '... WHERE name LIKE 
"%fish%" OR desc LIKE "%red%" OR title LIKE "%soup%"...'

 Will creating a composite index that has all those columns speed up a LIKE 
query across all the columns in that index?  That is, we'd like it to only have 
to do one index scan, rather than multiple table scans - will a composite key 
help with that?

Any other suggestions for doing this are appreciated too.

Thanks!
Paul
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] using the same sqlite parameter more than once causes premature memory deallocation

2013-01-25 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/01/13 17:37, abbood wrote:
> Btw I'm curious how did you find out about this auto release thing? Can
> you add more detail about that?

I used a debugger to set a breakpoint in malloc_error_debug as the message
says.  That gave a stack trace of an autorelease pool being drained.

Since most frees happen in the autorelease pool drain it doesn't
particularly help with telling you which item is the problem.  I asked on
the valgrind list and got a way of keeping the allocation stack trace
instead of the free one.  (The next valgrind release will allow you to
print both.)

  http://article.gmane.org/gmane.comp.debugging.valgrind/12755

The cause of that original issue in my code was NSInvocation and dealing
with returned objects.  The cause of the most recent issue was because
NSData was owning a buffer passed to it that I didn't want it to.

> Ie I put a break point in malloc_error_break.. But then I just jump to
> the internals of sqlite.. And basically within the salite internals
> it's freeing an operation that doesn't exist..

Using valgrind will narrow down the problem.  What you are seeing is the
consequence of earlier memory errors.

> But im not sure if it's a good idea to modify the guts of sqlite.. Is
> it?

SQLite is *extremely* unlikely to have a bug.  Some other piece of code
has the bug, and SQLite is the victim.  Remember that virtually every web
browser on virtually every platform is using SQLite - an error would show
up for someone.

  http://www.sqlite.org/testing.html

> And then I made the second part use in line variables as opposed to
> bindings.. Ie NSString stringwithformat..

Do remember sqlite3_mprintf for that sort of thing, especially if strings
are involved to avoid sql injection attacks/bugs:

  http://www.sqlite.org/c3ref/mprintf.html

> So I deleted that row from the dbase and it works fine.. I couldn't
> find anything different on that row..

Memory allocators typically have buckets for different sized allocations
often in powers of two.  It could be that row had a string needing 33
bytes while others needed less which then caused memory to come out of a
different bucket which then changes where the victim of the actual bug
shows up.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEC2o8ACgkQmOOfHg372QQQgACg2u97/wfBys3ryf/EZphv0R43
hjUAoLh2/anUjqGWa+GxC+7GO5tt3D0L
=X0kB
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] faster query - help

2013-01-25 Thread Michael Black
The real answer is try both and see which is faster.

My guess is #1 is probably faster since I don't think there's an easy way to
limit the left-hand side of a left-join operation to do #2 without touching
all the music_file records, is there?  The join would have to match on
file_id so would hit the entire music_files against .  I'm sure somebody
will correct me if I'm wrong here...I'm not confident that the join isn't
smarter than that.

Is the join done BEFORE the order by and limit?  Or after?  I would think
it's done before.  Or will the order by and limit apply to the left-hand
side first before the join when it can?

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani
Sent: Friday, January 25, 2013 8:45 AM
To: sqlite-users@sqlite.org
Subject: [sqlite] faster query - help

Hi

I have a data base with the following tables
1) files - each file in the system have his information in this table. that
table include 12 different columns like file_id as integer primary key
autoincrement, date_uploaded as integer, file_type as integer and so..
2) music_files - hold more information on file in case the file is music
file. columns like file_id as integer, artist, genre, album_name and more

I need to query those 2 tables (with both all information) according to
some filters/rule that I have, order the results according to date_uploaded
and return the top(10) as final result

my question is what is faster and have the best performance: (the table
files can have more than 150 rows)

option 1: select the file_id and date_uploaded, order by date_uploaded and
get the top(10) file_id and than select all information from both(files and
music_files) tables according to those top(10) results
option 2: select all information from both(files+music files) tables, order
by date_uploaded and return the top(10) as results


Thanks
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] faster query - help

2013-01-25 Thread moti lahiani
Hi

I have a data base with the following tables
1) files - each file in the system have his information in this table. that
table include 12 different columns like file_id as integer primary key
autoincrement, date_uploaded as integer, file_type as integer and so..
2) music_files - hold more information on file in case the file is music
file. columns like file_id as integer, artist, genre, album_name and more

I need to query those 2 tables (with both all information) according to
some filters/rule that I have, order the results according to date_uploaded
and return the top(10) as final result

my question is what is faster and have the best performance: (the table
files can have more than 150 rows)

option 1: select the file_id and date_uploaded, order by date_uploaded and
get the top(10) file_id and than select all information from both(files and
music_files) tables according to those top(10) results
option 2: select all information from both(files+music files) tables, order
by date_uploaded and return the top(10) as results


Thanks
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] bug report: out of memory error for most operations on a large database

2013-01-25 Thread Michael Black
How much free disk space do you have?  Your temp tables might be exceeding
capacity.

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of ammon_lymphater
Sent: Thursday, January 24, 2013 1:41 PM
To: sqlite-users@sqlite.org
Subject: [sqlite] bug report: out of memory error for most operations on a
large database

Summary: except for  select count(*) , all operations on a large table (500m
rows/80gb) result in out-of-memory error

 

Details:

a.   Using SQLite3 ver 3.7.15.2, Windows 7  64 bit AMD(but the error
also in 3.6.xx and on other Windows platforms)

b.  Created a table (schema attached), imported data from a csv file
using sqlite3 (no problems)

c.  "select Name5, count(*) from LargeData group by name5 order by name5
results" in Error: out of memory (the cardinality of Name5 is 12)

d.  "sqlite> select count(*) from StorageHourlyBySIDCL2;" gives 587608348,
as it should

e.  The above independent of cache_size (from 0.3 to 1.5gb); happens for
page_size 32kb and 64kb (others not tested)

 

(personally not urgent for me - just testing the limits -  but may be useful
for others)

 

   table schema & sqlite3_analyzer output

CREATE TABLE largedata (

  name1 smalldatetime

 , Name2 uniqueidentifier

 , Name3 varchar (16)

 , Name4 varchar (8)

 , Name5 varchar (80)

 , Name6 real

 , Name7 real

 , Name8 real

 , Name9 real

 , Name10 real

 , Name11 real

 , Name12 real

 , Name13 real

 , Name14 smallint

 , Name15 tinyint

 , Name16 tinyint

 , Name17 smalldatetime

 , Name18 real

 , Name19 tinyint

);

 

-- SQLITE3_ANALYZER output

/** Disk-Space Utilization Report For h:\temp\convert\import2.db

 

Page size in bytes 32768

Pages in the whole file (measured) 2578119

Pages in the whole file (calculated).. 2578118

Pages that store data. 2578118100.000%

Pages on the freelist (per header) 00.0%

Pages on the freelist (calculated) 10.0%

Pages of auto-vacuum overhead. 00.0%

Number of tables in the database.. 11

Number of indices. 0

Number of named indices... 0

Automatically generated indices... 0

Size of the file in bytes. 84479803392

Bytes of user payload stored.. 79293861071  93.9%

 

*** Page counts for all tables with their indices 

 

LargeData. 2578108100.000%

smalldata 10.0%

(zero-length tables omitted)

 

*** All tables ***

 

Percentage of total database.. 100.000%

Number of entries. 587608358

Bytes of storage consumed. 84479770624

Bytes of payload.. 79293871126  93.9%

Average payload per entry. 134.94

Average unused bytes per entry 0.32

Average fanout 2716.00

Fragmentation.   0.11%

Maximum payload per entry. 1933

Entries that use overflow. 00.0%

Index pages used.. 949

Primary pages used 2577169

Overflow pages used... 0

Total pages used.. 2578118

Unused bytes on index pages... 3904523 12.6%

Unused bytes on primary pages. 1841229100.22%

Unused bytes on overflow pages 0

Unused bytes on all pages. 1880274330.22%

 

--$ removing unused tables

*** Table STORAGEHOURLYBYSIDCL2 **

 

Percentage of total database.. 100.000%

Number of entries. 587608348

Bytes of storage consumed. 84479442944

Bytes of payload.. 79293861071  93.9%

Average payload per entry. 134.94

Average unused bytes per entry 0.32

Average fanout 2716.00

Fragmentation.   0.11%

Maximum payload per entry. 183

Entries that use overflow. 00.0%

Index pages used.. 949

Primary pages used 2577159

Overflow pages used... 0

Total pages used.. 2578108

Unused bytes on index pages... 3904523 12.6%

Unused bytes on primary pages. 1838055150.22%

Unused bytes on overflow pages 0

Unused bytes on all pages. 1877100380.22%

 

 

*** Table SQLITE_MASTER **

 

Percentage of total database..   0.0%

Number of entries. 10

Bytes of storage consumed. 32768

Bytes of payload.. 10055   30.7%

Average payload per entry. 1005.50

Average unused bytes per e

[sqlite] bug report: out of memory error for most operations on a large database

2013-01-25 Thread ammon_lymphater
Summary: except for  select count(*) , all operations on a large table (500m
rows/80gb) result in out-of-memory error

 

Details:

a.   Using SQLite3 ver 3.7.15.2, Windows 7  64 bit AMD(but the error
also in 3.6.xx and on other Windows platforms)

b.  Created a table (schema attached), imported data from a csv file
using sqlite3 (no problems)

c.  "select Name5, count(*) from LargeData group by name5 order by name5
results" in Error: out of memory (the cardinality of Name5 is 12)

d.  "sqlite> select count(*) from StorageHourlyBySIDCL2;" gives 587608348,
as it should

e.  The above independent of cache_size (from 0.3 to 1.5gb); happens for
page_size 32kb and 64kb (others not tested)

 

(personally not urgent for me - just testing the limits -  but may be useful
for others)

 

   table schema & sqlite3_analyzer output

CREATE TABLE largedata (

  name1 smalldatetime

 , Name2 uniqueidentifier

 , Name3 varchar (16)

 , Name4 varchar (8)

 , Name5 varchar (80)

 , Name6 real

 , Name7 real

 , Name8 real

 , Name9 real

 , Name10 real

 , Name11 real

 , Name12 real

 , Name13 real

 , Name14 smallint

 , Name15 tinyint

 , Name16 tinyint

 , Name17 smalldatetime

 , Name18 real

 , Name19 tinyint

);

 

-- SQLITE3_ANALYZER output

/** Disk-Space Utilization Report For h:\temp\convert\import2.db

 

Page size in bytes 32768

Pages in the whole file (measured) 2578119

Pages in the whole file (calculated).. 2578118

Pages that store data. 2578118100.000%

Pages on the freelist (per header) 00.0%

Pages on the freelist (calculated) 10.0%

Pages of auto-vacuum overhead. 00.0%

Number of tables in the database.. 11

Number of indices. 0

Number of named indices... 0

Automatically generated indices... 0

Size of the file in bytes. 84479803392

Bytes of user payload stored.. 79293861071  93.9%

 

*** Page counts for all tables with their indices 

 

LargeData. 2578108100.000%

smalldata 10.0%

(zero-length tables omitted)

 

*** All tables ***

 

Percentage of total database.. 100.000%

Number of entries. 587608358

Bytes of storage consumed. 84479770624

Bytes of payload.. 79293871126  93.9%

Average payload per entry. 134.94

Average unused bytes per entry 0.32

Average fanout 2716.00

Fragmentation.   0.11%

Maximum payload per entry. 1933

Entries that use overflow. 00.0%

Index pages used.. 949

Primary pages used 2577169

Overflow pages used... 0

Total pages used.. 2578118

Unused bytes on index pages... 3904523 12.6%

Unused bytes on primary pages. 1841229100.22%

Unused bytes on overflow pages 0

Unused bytes on all pages. 1880274330.22%

 

--$ removing unused tables

*** Table STORAGEHOURLYBYSIDCL2 **

 

Percentage of total database.. 100.000%

Number of entries. 587608348

Bytes of storage consumed. 84479442944

Bytes of payload.. 79293861071  93.9%

Average payload per entry. 134.94

Average unused bytes per entry 0.32

Average fanout 2716.00

Fragmentation.   0.11%

Maximum payload per entry. 183

Entries that use overflow. 00.0%

Index pages used.. 949

Primary pages used 2577159

Overflow pages used... 0

Total pages used.. 2578108

Unused bytes on index pages... 3904523 12.6%

Unused bytes on primary pages. 1838055150.22%

Unused bytes on overflow pages 0

Unused bytes on all pages. 1877100380.22%

 

 

*** Table SQLITE_MASTER **

 

Percentage of total database..   0.0%

Number of entries. 10

Bytes of storage consumed. 32768

Bytes of payload.. 10055   30.7%

Average payload per entry. 1005.50

Average unused bytes per entry 2255.50

Maximum payload per entry. 1933

Entries that use overflow. 00.0%

Primary pages used 1

Overflow pages used... 0

Total pages used.. 1

Unused bytes on primary pages. 22555   68.8%

Unused bytes on overflow pages 0

Unused bytes on all pages.