Re: [sqlite] Sizeof tables

2010-08-20 Thread Lukas Haase
Am 18.08.2010 16:09, schrieb Max Vlasov:
> On Tue, Aug 17, 2010 at 4:28 PM, Lukas Haase  wrote:
>
>> Hi,
>>
>> My sqlite database is about 65 MB. The data is split into serval tables.
>>
>> Is there a way to enumerate the space requirements for each table so
>> that I can see which tables are the memory consumers?
>>
>>
> Look at this discussion:
> http://www.mail-archive.com/sqlite-users@sqlite.org/msg53997.html
> at least two variants of the solution there.

Hi,

Thank you very much!

Do you have a different link?

This one is "Not Found" :-(

Regards,
Luke

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Shawn Wilsher
On Thu, Aug 19, 2010 at 2:56 PM, Simon Slavin  wrote:
> It might be worth noting that fragmentation is normally seen as an issue only 
> under Windows which is very sensitive to it however.  Other operating systems 
> use different ways of handling disk access, however, real figures from 
> real-world examples may disprove this classic view.  Also, many installations 
> of SQLite are on solid state devices where, of course, fragmentation has no 
> effect at all.
Really?  I can think of at least 350 million installations of SQLite
that very likely aren't on an SSD (hint: it's a web browser).

Cheers,

Shawn
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Max Vlasov
> In my case (which is certainly not typical), a (several GB) large
> database is built up in several batches, one table at a time, while in
> parallel many intermediate files on the disk are created. This resulted
> in a very fragmented database file. After that, also several times, the
> data is selected in a way that uses 80-90% of the data in the database,
> using joins of all tables and sorting.
>
> ...
>
> With the new feature available, i can remove my own workaround, which
> does not work so well annyway. Many thanks to the developers.
>
>
Martin, you gave a good example of the case when this really helps. Although
I suppose you still need some tweaking. As Dan Kennedy wrote you have to set
the the parameter and "From that point on, connection "db" extends and
truncates  the db file in 1MB chunks". So for example if you just created a
db and maybe did minor changes to the db and have plans to extend it to
larger size, you have to set SQLITE_FCNTL_CHUNK_SIZE with
sqlite3_file_control and also write something new and not only write but be
sure it's not going to be written to a previously disposed page.

As long as cases like yours is real and can be used in real life, maybe a
change to existing freelist_count pragma is possible? If it is writable
(PRAGMA freelist_count=1024;), sqlite compares the value supplied with the
current count and if it is bigger allocates necessary space. It seems this
syntax will be straightforward and self-explaing. What you think?

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Tim Romano
Igor,
Here's the example where a partial index can "hide" rows.

>From the wikipedia article cited by the OP:


It is not necessary that the condition be the same as the index criterion;
Stonebraker's paper below presents a number of examples with indexes similar
to the following:

  create index partial_salary on employee(age) where salary > 2100;



What would happen if you issued these queries?

  select max(age) from employee
  select avg(age) from employee

Would the ages of employees earning <= 2100 be included?  Is the
partial-index used under those circumstances?

--
Tim Romano






On Thu, Aug 19, 2010 at 9:16 PM, Igor Tandetnik  wrote:

> Tim Romano  wrote:
> > How would you find a row whose column X contained value Y if the
> "partial"
> > index on column X specified that rows containing value Y in column X
> should
> > never be returned?
>
> No one suggests partial index should be capable of hiding anything. The
> idea is that, when the query can be proven to only involve rows covered by
> the partial index, the index can be used to speed up the query. Otherwise,
> it simply won't be used.
> --
> Igor Tandetnik
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Filip Navara
On Fri, Aug 20, 2010 at 1:47 PM, Tim Romano  wrote:
> Igor,
> Here's the example where a partial index can "hide" rows.
>
> From the wikipedia article cited by the OP:
>
> 
> It is not necessary that the condition be the same as the index criterion;
> Stonebraker's paper below presents a number of examples with indexes similar
> to the following:
>
>  create index partial_salary on employee(age) where salary > 2100;
>
> 
>
> What would happen if you issued these queries?
>
>                  select max(age) from employee
>                  select avg(age) from employee
>
> Would the ages of employees earning <= 2100 be included?

Yes

> Is the partial-index used under those circumstances?

No, it would change outcome of the query. The partial index is used
only for optimizing queries that satisfy the index condition.

F.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Igor Tandetnik
Tim Romano  wrote:
> Igor,
> Here's the example where a partial index can "hide" rows.
> 
> From the wikipedia article cited by the OP:
> 
> 
> It is not necessary that the condition be the same as the index criterion;
> Stonebraker's paper below presents a number of examples with indexes similar
> to the following:
> 
>  create index partial_salary on employee(age) where salary > 2100;
> 
> 
> 
> What would happen if you issued these queries?
> 
>  select max(age) from employee
>  select avg(age) from employee
> 
> Would the ages of employees earning <= 2100 be included?

Of course. The presence or absence of an index never affect the meaning of a 
query - just its performance.

> Is the
> partial-index used under those circumstances?

No, I don't see how it could be beneficial for these queries.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Simon Slavin

On 20 Aug 2010, at 7:58am, Martin Engelschalk wrote:

> Under Windows, the insert speed did not change measurably, but the speed 
> of the later selects increased by about 15-20%. Also, my customer was 
> happy.

Okay, this is real-world data.  In that case there may be some point to 
defragmentation.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Martin Engelschalk


Am 20.08.2010 13:38, schrieb Max Vlasov:
>> In my case (which is certainly not typical), a (several GB) large
>> database is built up in several batches, one table at a time, while in
>> parallel many intermediate files on the disk are created. This resulted
>> in a very fragmented database file. After that, also several times, the
>> data is selected in a way that uses 80-90% of the data in the database,
>> using joins of all tables and sorting.
>>
>> ...
>>
>> With the new feature available, i can remove my own workaround, which
>> does not work so well annyway. Many thanks to the developers.
>>
>>
> Martin, you gave a good example of the case when this really helps. Although
> I suppose you still need some tweaking. As Dan Kennedy wrote you have to set
> the the parameter and "From that point on, connection "db" extends and
> truncates  the db file in 1MB chunks". So for example if you just created a
> db and maybe did minor changes to the db and have plans to extend it to
> larger size, you have to set SQLITE_FCNTL_CHUNK_SIZE with
> sqlite3_file_control and also write something new and not only write but be
> sure it's not going to be written to a previously disposed page.
>
> As long as cases like yours is real and can be used in real life, maybe a
> change to existing freelist_count pragma is possible? If it is writable
> (PRAGMA freelist_count=1024;), sqlite compares the value supplied with the
> current count and if it is bigger allocates necessary space. It seems this
> syntax will be straightforward and self-explaing. What you think?
>
> Max

Hello Max,

Personally, I use the C Api and do not need any other interface to the 
functionality.

If you make PRAGMA freelist_count writable as you suggest, I would 
expect the database to reserve space once, and not use a larger pice of 
the disk every time ist has to be expanded in the future.

Martin

> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] errors running test suite (couldn't execute "testfixture")

2010-08-20 Thread Dan Kennedy

On Aug 20, 2010, at 6:07 AM, Paweł Hajdan, Jr. wrote:

> I updated to latest fossil version, ran make distclean, ./configure,  
> make,
> make test and got this:

Thanks for this report. The test code was assuming that "."
was in your PATH variable. Fixed now.

> What should I do to make it pass?

Either add "." to the PATH variable or do another update.

Dan.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Stephen Oberholtzer
> If you make PRAGMA freelist_count writable as you suggest, I would
> expect the database to reserve space once, and not use a larger pice of
> the disk every time ist has to be expanded in the future.
>
> Martin


To throw in my $0.02, I would suggest a *different* name for the
pragma, something more like

PRAGMA reserve_space(N) where N is in pages or bytes or whatever.

That could call whatever functions are needed to force the database to
grow such that the minimum number of free pages/bytes in the database
is N.

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Cory Nelson
On Thu, Aug 19, 2010 at 1:30 PM, Eric Smith  wrote:
> Afaict sqlite doesn't support indices on subsets of rows in a table, Ю
> la http://en.wikipedia.org/wiki/Partial_index -- right?
>
> Any plans to implement that?

+1 for this feature request.  They've got a very specific and fairly
rare use case, but when opportunity strikes partial indexes are much
more convenient, straightforward, and efficient than the alternative.

-   If a table has 100,000,000 rows and each index page for it holds
100 rows, a full index will need 1,000,000 pages of storage and 4 page
reads to search.
-   With a partial index covering only 10,000 rows of the same table,
it will only need 100 pages of storage and 2 page reads to search.

Big improvement while keeping query syntax and results exactly the same!

> Are there any known hacks to implement something similar?

There's not really any good solution.  You can create a separate table
with the subset in it, that's pretty much it.  But that has some
overhead too, and can complicate your queries.

-- 
Cory Nelson
http://int64.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Jim Wilcoxson
On 8/20/10, Cory Nelson  wrote:
> +1 for this feature request.  They've got a very specific and fairly
> rare use case, but when opportunity strikes partial indexes are much
> more convenient, straightforward, and efficient than the alternative.
>
> -   If a table has 100,000,000 rows and each index page for it holds
> 100 rows, a full index will need 1,000,000 pages of storage and 4 page
> reads to search.
> -   With a partial index covering only 10,000 rows of the same table,
> it will only need 100 pages of storage and 2 page reads to search.
>
> Big improvement while keeping query syntax and results exactly the same!
>
>> Are there any known hacks to implement something similar?
>
> There's not really any good solution.  You can create a separate table
> with the subset in it, that's pretty much it.  But that has some
> overhead too, and can complicate your queries.

A while back I needed something like a partial index.  The backup
program I'm working on does block dedup, so I made an index on each
block's SHA1.  But then I wanted to only dedup some blocks, not all.
I needed to say "insert this block row and also index the sha in the
sha index", or "insert this block row but don't index the sha in the
sha index".  Then I wanted to be able to say "is X in the sha index?",
ie, only look in the index.  SQL doesn't really have a way to do that.
 It's a simple concept, but it doesn't seem to fit within the SQL
framework.

The best I could come up with is a separate table.  The problem is,
indexing the SHA1 normally means there is a copy in the row and a copy
in the index.  Using a separate table, which still has to be indexed,
means there is a copy in the row of the main table, a copy in the
separate table, and a copy in the separate table's index.

I guess one way to do this using the SQL partial index feature being
discussed would be to have an extra column in the row called
"isindexed", setting that to 1 or 0 depending on whether the SHA
should be included in the index, and using create index ... where
isindexed=1.  Then on the queries, say "select blockid from blocks
where isindexed=1 and sha=X".  And any query that didn't have
isindexed=1 wouldn't be able to use the index at all.

Is that how it would work?

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Help with database corruption?

2010-08-20 Thread Filip Navara
Hello,

is there anybody willing to help analyze corrupted database for
possible bug in SQLite?

It is a database file taken from one of our test machines and it is
only few days old at most. The database file was only ever accessed
with SQLite 3.7.0.1. It has page size of 1024, WAL mode,
synchronous=FULL and incremental vacuum. No power failure happened on
that machine. Apparently somewhere during the course of execution of
our application the database free page list become corrupted. This is
for the third time this week the same error happened, but only this
time I have a copy of the database file. It is 887 Mb big (although
the real data consume less than 2 Mb) and so it is rather problematic
to provide the file. A copy is available at
http://www.emclient.com/temp/mail_data.zip.

The database was accessed on Windows machine using SQLite 3.7.0.1
64-bit build. The following statements are the only ones that were
executed against the database besides SELECTs and initial schema
definition:

INSERT INTO LocalMailContents (
   "id", "partName", "contentType", "contentId",
   "contentDescription", "contentTransferEncoding",
   "contentMD5", "contentDisposition", "contentLanguage",
   "contentLocation", "partHeader", "partBody",
   "synchronizationKey", "contentLength")
   VALUES
   (@id, @partName, @contentType, @contentId, @contentDescription,
@contentTransferEncoding, @contentMD5, @contentDisposition,
@contentLanguage, @contentLocation, @partHeader, @partBody,
@synchronizationKey, @contentLength)
UPDATE LocalMailContents SET synchronizationk...@synchronizationkey
WHERE i...@id AND partna...@partname
UPDATE LocalMailContents SET parthead...@partheader WHERE i...@id AND
partna...@partname
UPDATE LocalMailContents SET partbo...@partcontent WHERE i...@id AND
partna...@partname
DELETE FROM LocalMailContents WHERE id IN ();
PRAGMA freelist_count;
PRAGMA incremental_vacuum();

The error messages produced by "pragma integrity_check" are

*** in database main ***
Main freelist: invalid page number 866828
Page 396 is never used
Page 473 is never used
Page 14780 is never used
Page 14915 is never used
Page 153649 is never used
Page 210894 is never used
Page 319247 is never used
Page 397006 is never used
Page 416545 is never used
Page 416636 is never used
Page 416704 is never used
Page 416705 is never used
Page 416706 is never used
Page 416707 is never used
Page 416708 is never used
Page 416710 is never used
Page 416711 is never used
...

I tried to locate the missing freelist pages in the database file and
they definitely were there at some point, but I am not familiar enough
with the file format to track the whole freelist and find where the
corruption exactly happen. All I know is that page 388 is corrupted
and points to a location outside of the database file.

Thanks,
Filip Navara
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Jay A. Kreibich
On Thu, Aug 19, 2010 at 10:56:58PM +0100, Simon Slavin scratched on the wall:

> It might be worth noting that fragmentation is normally seen as an issue
> only under Windows which is very sensitive to it however.

  Most systems are sensitive to fragmentation.  It just happens that
  many other system address the problem with more advanced filesystems
  and/or filesystem drivers.  HFS+, for example (the primary filesystem
  of Darwin and Mac OS X) will automatically defrag smaller files
  (< 20MB).  I'm told HFS+ will also tend to migrate frequently used
  files to the middle of the disk platters (smaller average seek time),
  but I haven't found definitive documentation on this. 
  
  A number of the more advanced filesystems commonly found in the UNIX
  world have similar features.

  I know Vista and Win7 introduced some auto-defrag features, but I
  don't have any significant personal experience with those systems.

> Also, many installations of SQLite are on solid state devices where,
> of course, fragmentation has no effect at all.

  Umm... no.  SSDs have very different access characteristics compared to
  spinning platters, but they most definitely do not allow uniform access
  times to the whole address space.  "Seek time" isn't exactly the right
  word, but they have read delays based off the last block accessed,
  and a whole slew of other factors, including how the individual
  storage chips are "stacked", how the chips are cut up into banks, how
  the controller is designed, and on and on.  Also, because of the way
  the SSD devices work at the chip level, they tend to "stream" data,
  rather than just access it, so there are some designs that are even
  more sensitive to fragmentation.

  In many ways they're much more challenging to write drivers for, as
  each device has different characteristics that cannot be accounted
  for by just looking at the device geometry.

  This is all made even more interesting by the fact that many SSDs
  utilize file systems that intentionally fragment and move blocks
  around with every write (including writes to existing space in
  existing files) to spread out the write cycles.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Cory Nelson
On Fri, Aug 20, 2010 at 7:39 AM, Jim Wilcoxson  wrote:
> ...
> The best I could come up with is a separate table.  The problem is,
> indexing the SHA1 normally means there is a copy in the row and a copy
> in the index.  Using a separate table, which still has to be indexed,
> means there is a copy in the row of the main table, a copy in the
> separate table, and a copy in the separate table's index.
>

You might want to index the "sha" column on the primary table, then
the secondary table can just be indexedblocks(blockid INTEGER PRIMARY
KEY).  No wasted space that way, but will require more I/Os to JOIN
the tables.

>
> I guess one way to do this using the SQL partial index feature being
> discussed would be to have an extra column in the row called
> "isindexed", setting that to 1 or 0 depending on whether the SHA
> should be included in the index, and using create index ... where
> isindexed=1.  Then on the queries, say "select blockid from blocks
> where isindexed=1 and sha=X".  And any query that didn't have
> isindexed=1 wouldn't be able to use the index at all.
>
> Is that how it would work?

Yes, that's exactly how it works.

Partial indexes are best used when:

-   You only need to search a specific subset of your rows.
"isindexed" is a good candidate for this.

-   You have no other indexes that can already narrow down searches efficiently.

-   You have existing queries that use rows from both inside and
outside of the partial index.  If you don't, then there's no reason to
group them together and you might as well put the subset in a separate
table if it's not too inconvenient.

-   The column is used outside of the index, or the storage used by it
is acceptable.  Ie. if 99% of your rows have "isindexed=1", you've got
a lot of redundant columns taking up space and might investigate
JOINing with another smaller table instead.

-- 
Cory Nelson
http://int64.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Jay A. Kreibich
On Thu, Aug 19, 2010 at 06:46:07PM -0400, Jim Wilcoxson scratched on the wall:
> On 8/19/10, Simon Slavin  wrote:
> >
> > On 19 Aug 2010, at 9:27pm, Taras Glek wrote:
> >
> >> I really appreciate that sqlite got this feature to reduce
> >> fragmentation, but why not expose this as a pragma?
> >
> > Do you have figures which suggest that reducing fragmentation leads to any
> > improvement in performance ?
> 
> Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion.
> I myself would love to see features exposed via pragmas whenever
> possible, for the simple reason that I don't use the C API and can't
> make use of the features otherwise.  I would assume that since the
> SQLite developers added the feature to the C API, there must be a use
> for it or they wouldn't have bothered.

  From discussions in the past, I've gotten the feeling that the SQLite
  team considers the C API and the SQL interfaces (including PRAGMA
  commands) to have different security requirements.  This is why you
  need to make an API call to enable extension loading-- because it is
  viewed as something "dangerous" you can do from the SQL level. 
  Similarly, this is why there is no SQL PRAGMA interface to such
  functions as sqlite3_limit().

  Personally, I've found very few situations where a user might have
  access to an SQL prompt when they would not have access to database
  file (e.g. they can run sqlite3 or their own program and do whatever
  they want), but I suppose this may be true in some web environments
  or some scripting environments.

  Given how easy it is to build SQL wrappers in an extension (or
  built-in code), I don't see it as a major issue.  One of the "how
  to write a custom SQL function" examples in "Using SQLite" builds a
  wrapper around sqlite3_limit().  I know people that are not using the
  C interface are faced with additional challenges when it comes to
  loading extensions or modifying the core SQLite library, but if it is
  that important, it can usually be done.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Jim Wilcoxson
On 8/20/10, Cory Nelson  wrote:
> On Fri, Aug 20, 2010 at 7:39 AM, Jim Wilcoxson  wrote:
>> ...
>> The best I could come up with is a separate table.  The problem is,
>> indexing the SHA1 normally means there is a copy in the row and a copy
>> in the index.  Using a separate table, which still has to be indexed,
>> means there is a copy in the row of the main table, a copy in the
>> separate table, and a copy in the separate table's index.
>>
>
> You might want to index the "sha" column on the primary table, then
> the secondary table can just be indexedblocks(blockid INTEGER PRIMARY
> KEY).  No wasted space that way, but will require more I/Os to JOIN
> the tables.

In my case, the point of a partial index would be to save space.  If
the sha column were indexed completely in the primary table, no other
tables or indexes are necessary, but then no space is saved.  If I
only want to index 80% of the rows, I'd save 20*.2n bytes with a
partial index, where n is the number of rows.  (SHA1 is 20 bytes)

Using a separate table, I'd need 40 bytes (roughly) for each sha
indexed: 20 bytes for the main table, 20 for the index; or 40m for m
entries.  This saves space if half or fewer of all blocks are indexed.
 If more than than half are indexed, indexing all rows in the main
table uses less space.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Jay A. Kreibich
On Fri, Aug 20, 2010 at 06:52:40AM -0700, Cory Nelson scratched on the wall:
> On Thu, Aug 19, 2010 at 1:30 PM, Eric Smith  wrote:
> > Afaict sqlite doesn't support indices on subsets of rows in a table, ??
> > la http://en.wikipedia.org/wiki/Partial_index -- right?
> >
> > Any plans to implement that?
> 
> +1 for this feature request.  They've got a very specific and fairly
> rare use case, but when opportunity strikes partial indexes are much
> more convenient, straightforward, and efficient than the alternative.

  This strikes me as more trouble than it is worth.

  You admit it has a very specific and rare use case, yet implementing
  it is going to add complexity and size to the query processor and
  optimizer.  This is not a "low cost" feature.  Any time a partial
  index comes into play, the optimizer needs to determine if the index
  can be used or not by verifying that the query domain is a subset of
  the index domain, based off the conditions on the query and on the
  partial index.  This may or may not involve some significant logic (or
  it will tend to be rather dumb).  It requires noticeable amounts of
  development and testing time-- especially testing, as it introduces
  the possibility of returning a looks-correct-but-is-wrong answer.  And,
  because you're talking about modifications to a very central part
  of the database engine, adding complexity adds upkeep costs from here
  on out.

  It is also likely you would need to bump the file format, since you
  now need the ability to attach conditions to an index and flag it
  as partial and unusable for the general case.

  I have no doubt that it is useful for those cases where it applies,
  but that seems like a high costs for such a narrow gain.

> > Are there any known hacks to implement something similar?
> 
> There's not really any good solution.  You can create a separate table
> with the subset in it, that's pretty much it.  

  Right.  Create a shadow table that has the fields you want to
  partially index.  Map the ROWID values one-to-one.  Index the shadow
  table (or not, depending on your needs).  Put triggers on the main
  table to keep the shadow table up to date on INSERT/UPDATE/DELETEs.
  If you index the shadow table, you now have a partial index.

  You have to make the call with each query about using the partial
  index or not, and you have to manually add an extra JOIN into your
  query, but for most of the situations described in the tread so far,
  that doesn't seem that difficult.  Performance-wise, queries should
  be the same as a native partial index, since the ROWID value can be
  pulled directly from the index and mapped back to the main table,
  exactly like a native index.

> But that has some overhead too, and can complicate your queries.

  The complexity is there no matter what.  The difference is if the
  person writing the query sees it or not.  
  
  My personal opinion is that this is a unique enough feature, and
  workarounds exist (even if they aren't exactly pretty), that it
  does not justify the long-term testing and upkeep costs.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] errors running test suite (couldn't execute "testfixture")

2010-08-20 Thread Paweł Hajdan , Jr .
After updating the tests run fine now. Thanks for the quick update!

By the way, I usually avoid having "." in the PATH for security reasons.

On Fri, Aug 20, 2010 at 05:30, Dan Kennedy  wrote:

>
> On Aug 20, 2010, at 6:07 AM, Paweł Hajdan, Jr. wrote:
>
> > I updated to latest fossil version, ran make distclean, ./configure,
> > make,
> > make test and got this:
>
> Thanks for this report. The test code was assuming that "."
> was in your PATH variable. Fixed now.
>
> > What should I do to make it pass?
>
> Either add "." to the PATH variable or do another update.
>
> Dan.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Taras Glek

 On 08/19/2010 01:27 PM, Taras Glek wrote:

 Hi,
I really appreciate that sqlite got this feature to reduce 
fragmentation, but why not expose this as a pragma? In many cases it 
is not feasible to pass the chunk size via a C API. For example with a 
pragma I could do fragmentation testing via an sqlite shell, now this 
option is out without modifying the shell source. I presume that a 
similar problem exists for users of scripting languages until every 
single scripting host provides support for this.


Perhaps this should even persist similar to how page_size persists 
across connections.


Taras
To the patronizing "i don't believe this is a problem" and "even if it 
is, Linux is superior to windows" concerns:
Attached my earlier email to the dev list showing that fragmented files 
can be 3x slower to access than non-fragmented ones.


Trivia:
XFS on Linux gets a prize for being the most fragmentation-prone 
filesystem in my survey. It is roughly 10-30x worse than everybody else. 
Ext4 is ok, not great. Windows has a builtin autodefrag that on some 
occasions ensures that our Firefox databases are not fragmented(ie it 
beats the crap out of Linux contrary to the "classic" wisdom). Even 
Apple's horrific filesystem occasionally results in less fragmented 
files due to autodefrag.


For more details see

http://blog.mozilla.com/tglek/2010/07/22/file-fragmentation/

I already have a patch for Firefox that makes use of the new API, we 
don't require a new pragma. It just seems like an important feature for 
performance-tuning sqlite that isn't easy to get at for many users.


Taras
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] REPLACE syntax doesn't seem to work fts3 tables

2010-08-20 Thread Nasron Cheong
Looks like using replace on a fts3 table doesn't seem to work:

SQLite version 3.6.23.1
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create virtual table fts using fts3(message);
sqlite> insert into fts (docid, message) values (1,'bob');
sqlite> insert into fts (docid, message) values (1,'bob');
Error: constraint failed
sqlite> replace into fts (docid, message) values (1,'bob');
Error: constraint failed

That should've worked I think.

sqlite>
sqlite> create table nonfts(docid INTEGER PRIMARY KEY, message TEXT);
sqlite> insert into nonfts (docid, message) values (1,'bob');
sqlite> insert into nonfts (docid, message) values (1,'bob');
Error: PRIMARY KEY must be unique
sqlite> replace into nonfts (docid, message) values (1,'bob');
sqlite>

Thanks,

- Nasron Cheong
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] What is the most efficient way to get the close by numbers?

2010-08-20 Thread Peng Yu
Hi,

I have the following code to search for neighboring positions
(distance <=10). But it is slow for large data set. I'm wondering what
is the most efficient query for such a search. Note that I don't
create an index, as I'm not sure what index to create on table A.

$ cat main.sql
#!/usr/bin/env bash

rm -f main.db
sqlite3 main.db 

Re: [sqlite] What is the most efficient way to get the close by numbers?

2010-08-20 Thread Jim Morris
  If there is an index on (name, position) the a where like below might 
use it.

A1.name=A2.name and A2.position between( A1.position - 10, A1.position + 10 )


On 8/20/2010 3:54 PM, Peng Yu wrote:
> Hi,
>
> I have the following code to search for neighboring positions
> (distance<=10). But it is slow for large data set. I'm wondering what
> is the most efficient query for such a search. Note that I don't
> create an index, as I'm not sure what index to create on table A.
>
> $ cat main.sql
> #!/usr/bin/env bash
>
> rm -f main.db
> sqlite3 main.db<
> create table A (name text, position integer);
> insert into A values('a', 1);
> insert into A values('a', 5);
> insert into A values('a', 21);
> insert into A values('b', 3);
> insert into A values('b', 15);
> insert into A values('b', 19);
>
> .mode column
> .headers on
> .echo on
> select * from A as A1, A as A2 where A1.name=A2.name and
> abs(A1.position - A2.position)<= 10 and A1.position != A2.position;
>
> EOF
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the most efficient way to get the close by numbers?

2010-08-20 Thread Simon Slavin

On 20 Aug 2010, at 11:54pm, Peng Yu wrote:

> select * from A as A1, A as A2 where A1.name=A2.name and
> abs(A1.position - A2.position) <= 10 and A1.position != A2.position;

If you're doing this a lot, work out which chunks of 20 each point is in.  You 
only need to compare a point with points in two chunks of 20: the one below and 
the one above.  So you can reject almost all the points immediately.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the most efficient way to get the close bynumbers?

2010-08-20 Thread ve3meo
Without an index, EXPLAIN returns the following number of virtual machine 
instructions for each query:

41 - original query:
select * from A as A1, A as A2 where A1.name=A2.name and
abs(A1.position - A2.position) <= 10 and
A1.position != A2.position ;

39 - original query re-ordered:
select * from A as A1, A as A2 where A1.name=A2.name and
A1.position != A2.position and
abs(A1.position - A2.position) <= 10 ;

41 - Jim's query revamped to work:
select * from A as A1, A as A2 where A1.name=A2.name and
A1.position != A2.position and
A2.position between A1.position - 10 and A1.position + 10 ;

In all cases above, Table A2 is used with an automatic index.

create index idxNamePos ON A(name,position);

With this index present, the above queries all use it EXPLAIN reports this 
number of VM instructions, respectively:

32, 30, 36.

If the number of VM instructions is a good inverse indicator of efficiency, 
then the second query with the index appears to be the most efficient. I'm 
sure somebody will point out if it is not.

I didn't include Simon's suggestion as I did not know how to implement it.

Tom

"Jim Morris"  wrote in 
message news:4c6f0a83.4090...@bearriver.com...
>  If there is an index on (name, position) the a where like below might
> use it.
>
> A1.name=A2.name and A2.position between( A1.position - 10, A1.position + 
> 10 )
>
>
> On 8/20/2010 3:54 PM, Peng Yu wrote:
>> Hi,
>>
>> I have the following code to search for neighboring positions
>> (distance<=10). But it is slow for large data set. I'm wondering what
>> is the most efficient query for such a search. Note that I don't
>> create an index, as I'm not sure what index to create on table A.
>>
>> $ cat main.sql
>> #!/usr/bin/env bash
>>
>> rm -f main.db
>> sqlite3 main.db<>
>> create table A (name text, position integer);
>> insert into A values('a', 1);
>> insert into A values('a', 5);
>> insert into A values('a', 21);
>> insert into A values('b', 3);
>> insert into A values('b', 15);
>> insert into A values('b', 19);
>>
>> .mode column
>> .headers on
>> .echo on
>> select * from A as A1, A as A2 where A1.name=A2.name and
>> abs(A1.position - A2.position)<= 10 and A1.position != A2.position;
>>
>> EOF
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the most efficient way to get the closebynumbers?

2010-08-20 Thread Igor Tandetnik
ve3meo  wrote:
> If the number of VM instructions is a good inverse indicator of efficiency,
> then the second query with the index appears to be the most efficient. I'm
> sure somebody will point out if it is not.

It is not. It may very well take fewer instructions to implement a 
straightforward linear scan than it is to join through the index.
-- 
Igor Tandetnik



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the most efficient way to get the closebynumbers?

2010-08-20 Thread Simon Slavin

On 21 Aug 2010, at 3:29am, Igor Tandetnik wrote:

> ve3meo  wrote:
>> If the number of VM instructions is a good inverse indicator of efficiency,
>> then the second query with the index appears to be the most efficient. I'm
>> sure somebody will point out if it is not.
> 
> It is not. It may very well take fewer instructions to implement a 
> straightforward linear scan than it is to join through the index.

Yeah.  I think your formulation like this:

> 41 - Jim's query revamped to work:
> select * from A as A1, A as A2 where A1.name=A2.name and
> A1.position != A2.position and
> A2.position between A1.position - 10 and A1.position + 10 ;

is going to be the fastest.  It involves more VM instructions but ends up 
searching fewer points.  You might want to time it as it is, then add an index 
(just for testing) on just (position) and see whether that improves matters.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the most efficient way to get the close by numbers?

2010-08-20 Thread Eric Smith
Peng Yu wrote:

> I have the following code to search for neighboring positions
> (distance <=10). But it is slow for large data set. I'm wondering what
> is the most efficient query for such a search. Note that I don't
> create an index, as I'm not sure what index to create on table A.

I haven't used it myself, but I'm pretty sure this is what the R*tree 
module was designed for:

http://www.sqlite.org/rtree.html

--
Eric A. Smith

Drill for oil? You mean drill into the ground to try and find oil? You're
crazy.
-- Drillers who Edwin L. Drake tried to enlist to his project 
   to drill for oil in 1859.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith
Igor Tandetnik wrote: 

> > How would you find a row whose column X contained value Y if the 
> > "partial" index on column X specified that rows containing value Y 
> > in column X should never be returned?  
> 
> No one suggests partial index should be capable of hiding anything.  The 
> idea is that, when the query can be proven to only involve rows covered 
> by the partial index, the index can be used to speed up the query.  
> Otherwise, it simply won't be used.  

Right.

-- 
Eric A. Smith

Well then, let's go on.  Sorry, there's nothing to go on to.
Let's digress.
-- Tim Pulju, Winter 2005
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith
Stephen Oberholtzer wrote:

> I believe what he's getting at is this:
{snip explanation}

You exactly understand what I'm going for and my use case.

Is there a better way to implement it in sql itself than what I
outlined?  I.e. create my own index table that points to the proper
rows and keep it updated via triggers?

Eric

--
Eric A. Smith

Gnagloot, n.:
A person who leaves all his ski passes on his jacket just to
impress people.
-- Rich Hall, "Sniglets"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith

Sorry, let me amend that:

> The schema is roughly
> 
> create table records(__recno INTEGER PRIMARY KEY, fileId, data);

Forget the INTEGER PRIMARY KEY.  My partial index would reference 
the _rowid_.  I don't permit vacuums on the database so, if I'm not
mistaken, this shouldn't be an issue.

--
Eric A. Smith

Parkinson's Fourth Law:
The number of people in any working group tends to increase
regardless of the amount of work to be done.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith
Kees Nuyt wrote:

> You could add a "deleted" column with value range (0,1) and
> create an index on it if benchmarks show that makes it
> faster. As a bonus it is easier to code and maintain than a
> separate table with references and triggers.
> 
> Alternatively, you can create an composite index with the
> "deleted" column as one of the components.
> 
> From a theoretical view, if you care about the visibility of
> a row, you should express it as an attribute of the entity.
> The solutions above comply with that notion.

I think you misunderstand what I want.

I don't care about keeping the row around after it is deleted.  I don't
care about visibility.

I only want to find rows quickly in order to delete them.

--
Eric A. Smith

Bowlikinetics (boh lih kih neh' tiks), n.:
The act of trying to control a released bowling ball 
by twisting one's body in the direction one wants it to go.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith
Simon Slavin wrote: 

> http://www.sqlite.org/lang_createview.html 
> 
> This is the SQL standard way to reduce your view of a table to just 
> certain rows.  If I understand your request, this feature should provide 
> exactly what you want.  Appropriate indexes will be used when consulting 
> any VIEW you've defined.  

I don't think that helps either.  A view in sqlite is just syntactic
sugar for a select statement.  I don't want to define any real indices 
-- they are a performance burden.

> > Something like DELETE FROM records WHERE __recno IN (SELECT __recno 
> > FROM idxTable), where __recno is the INTEGER PRIMARY KEY on records.  
> 
> I don't understand what you're looking up here.  If you have some 
> method of recognising which rows of a table should be deleted just use the 
> appropriate 
> 
> DELETE FROM ...  WHERE ...  
> 
> command.  No need for any sub-SELECT clause.  

Maybe it'll be clearer if I describe my (quite simple) use case.  Our
app is caching what are basically csv files.  Hundreds of files, about 2m 
records per file.  Sometimes we want to delete all the cache rows for one 
of the files.  We know ahead of time which file it will be -- let's say
it's file 7.

The schema is roughly

create table records(__recno INTEGER PRIMARY KEY, fileId, data);

So sometimes we would *like* to say "delete from records where
fileId=7".

But that is bad because does a full table scan.

So the next cut is to say "create index recordsIdxFileId on
records(fileId)".

But that is bad because it is a huge burden during INSERT and is not
used often enough (or with enough values) to justify its existence.

What I really want is to be able to say "create index
recordsIdxFileIdOnFile3 on records(fileId) where fileId=7".  
But sqlite doesn't do that.

So let's assume SQLite is much faster at deleting rows by the INTEGER
PRIMARY KEY than it is by deleting rows by some other value.  Then we
can optimize by keeping track of which __recnos we will want to delete.

Hence my idea for having a separate table that maps fileId --> __recno,
but only for the fileId we care about.

Eric

-- 
Eric A. Smith

Keeping Young #6:
Don't look back. Something might be gaining on you.
-- Satchel Paige
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the most efficient way to get theclosebynumbers?

2010-08-20 Thread ve3meo
"Simon Slavin"  wrote ...
>
> On 21 Aug 2010, at 3:29am, Igor Tandetnik wrote:
>
>> ve3meo  wrote:
>>> If the number of VM instructions is a good inverse indicator of 
>>> efficiency,
>>> then the second query with the index appears to be the most efficient. 
>>> I'm
>>> sure somebody will point out if it is not.
>>
>> It is not. It may very well take fewer instructions to implement a 
>> straightforward linear scan than it is to join through the index.
>
> Yeah.  I think your formulation like this:
>
>> 41 - Jim's query revamped to work:
>> select * from A as A1, A as A2 where A1.name=A2.name and
>> A1.position != A2.position and
>> A2.position between A1.position - 10 and A1.position + 10 ;
>
> is going to be the fastest.  It involves more VM instructions but ends up 
> searching fewer points.  You might want to time it as it is, then add an 
> index (just for testing) on just (position) and see whether that improves 
> matters.

Thanks for the explanation. I thought I might be naively wishing for too 
much from the count of VM instructions when some may take much longer to 
execute than others. I was intrigued by Peng's question and wondered if I 
might be able to predict efficiency by examining Explain Query Plan and 
Explain. Apparently not.

I understand the principle of minimising the number of points to search and 
I think that this argues for the WHERE conditions to have the coarse 
resolution first and the finest resolution last. To pick up on Simon's 
suggestion to chunk, I added an integer field 'chunk' to the table:

UPDATE A SET chunk = (position/10);

I indexed a bunch of ways to see what the optimiser would do:

create index idxNameChunkPos ON A(name,chunk,position);
create index idxNamePos ON A(name,position);
create index idxNameChunk ON A(name,chunk);
create index idxName ON A(name);
create index idxPos ON A(position);
create index idxChunk ON A(chunk);


Here's how I have interpreted Simon's suggested chunky query:

select * from A as A1, A as A2 where A1.name=A2.name and
A1.position != A2.position and
A2.chunk between A1.chunk - 1 and A1.chunk + 1 and
A2.position between A1.position - 10 and A1.position + 10 ;

Explain Query Plan shows that this query uses table A2 with idxNameChunkPos. 
Is this liable to result in a minimum of points searched?

But so do Peng's original and re-ordered query use idxNameChunkPos even 
though Chunk is not in the query. Despite the presence of all these other 
indexes, Jim's suggested query persists in using idxNamePos, as one would 
expect it to be the optimal index for that query and for the original and 
re-ordered.

I'm wary of what the optimiser does. On those first two queries, I have seen 
that it switches from using idxNamePos to idxName after the addition of the 
Chunk field when only those two indexes were present. I should add that the 
SQLite manager through which I play with SQLite uses a 3.7.0 DLL.

Tom 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Simon Slavin

On 19 Aug 2010, at 11:18pm, Eric Smith wrote:

> The schema is roughly
> 
> create table records(__recno INTEGER PRIMARY KEY, fileId, data);
> 
> So sometimes we would *like* to say "delete from records where
> fileId=7".
> 
> But that is bad because does a full table scan.
> 
> So the next cut is to say "create index recordsIdxFileId on
> records(fileId)".
> 
> But that is bad because it is a huge burden during INSERT and is not
> used often enough (or with enough values) to justify its existence.
> 
> What I really want is to be able to say "create index
> recordsIdxFileIdOnFile3 on records(fileId) where fileId=7".  
> But sqlite doesn't do that.

But in order to create that partial index, SQLite would have to scan the entire 
table anyway.  Otherwise how would it know which records had fileId=7 ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users