Re: [sqlite] Import at first unhappy - then happy after all

2012-09-21 Thread Udi Karni
Kevin - thanks. I've been experimenting - my code would also fail when, for
example, I spooled to a file called "c:\red.txt" where the \r was stripped
and used as an escape character.

Switching to linux notation even while on Windows - ".output c:/red.txt"
solved the problem !

Much appreciated.

On Sun, Aug 26, 2012 at 2:14 PM, Kevin Benson wrote:

> On Sun, Aug 26, 2012 at 11:30 AM, Udi Karni  wrote:
>
> > The following scenario happens occasionally.
> >
> > At first the import fails on mismatched columns - as if the "end-of-line"
> > marker is off.
> >
> > You run it again right away and everything completes successfully.
> >
> > ?
> >
> > Is there some kind of tracing that can be turned on so that when this
> > happens we can gather some clues?
> >
> > As an aside - not a big deal - this is on Windows - the file
> specifications
> > in the ".import" and ".read" commands require 2 back slashes in the
> syntax
> > - can that be fixed?
> >
> > Thanks !
> >
>
> Perhaps
> http://www.sqlite.org/uri.html  3.1 The URI Path
> will help you
>
> --
>--
>   --
>  --Ô¿Ô--
> K e V i N
>
>
>
> >
> > Z:\>sqlite3 z:\xyz.db3
> > SQLite version 3.7.11 2012-03-20 11:35:50
> > Enter ".help" for instructions
> > Enter SQL statements terminated with a ";"
> > sqlite> pragma page_size=4096;
> > sqlite> pragma journal_mode=off;
> > off
> > sqlite> pragma temp_store=memory;
> > sqlite> .timer on
> > sqlite> .separator ','
> > sqlite> .read c:\\create_table\\create_table_xyz.sql
> > CPU Time: user 0.00 sys 0.00
> > sqlite> .import c:\\data_csv\\xyz.csv xyz
> > Error: c:\data_csv\xyz.csv line 12: expected 59 columns of data but found
> > 99
> > sqlite> select count (*) from xyz;
> > 0
> > CPU Time: user 0.00 sys 0.00
> > sqlite> .import c:\\data_csv\\xyz.csv xyz
> > sqlite> select count (*) from xyz;
> > 97744059
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How do you load a ".csv" and skip the first line?

2012-09-01 Thread Udi Karni
If all else fails - after the import issue  DELETE FROM XYZ WHERE ID = 'id'
;

But - yes - adding the "skip=y" and  "load=x" parameters  (skip the first y
rows in the .csv file, and stop after loading x rows) - to the .import
utility would be most desirable.

On Sat, Sep 1, 2012 at 12:31 PM, Petite Abeille wrote:

>
> On Sep 1, 2012, at 9:19 PM, joe.fis...@tanguaylab.com <
> joe.fis...@tanguaylab.com> wrote:
>
> > Is there some way to load a ".csv" file into a SQLite database table
> when the first row (record) of the file contains headers?
>
> On *nix, any of tail +2, more +2, etc would do.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Import at first unhappy - then happy after all

2012-08-26 Thread Udi Karni
Thanks for the explanation.

Is there a way to change the Sqlite escape character globaly? Chose some
out-of-the-way character that would allow backward quotes to be used
normally?

Something like -

.escape [
.read c:\path\file.sql

???

On Sun, Aug 26, 2012 at 9:41 AM, Kevin Benson wrote:

>  On Sun, Aug 26, 2012 at 11:30 AM, Udi Karni  wrote:
>
> ***SNIP***
>
>
> >
> > As an aside - not a big deal - this is on Windows - the file
> specifications
> > in the ".import" and ".read" commands require 2 back slashes in the
> syntax
> > - can that be fixed?
> >
> > Thanks !
> >
> >
> The explanation for this behavior is here
>
> http://www.sqlite.org/cvstrac/tktview?tn=1073
> --
>--
>  --
>--Ô¿Ô--
>   K e V i N
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Import at first unhappy - then happy after all

2012-08-26 Thread Udi Karni
The following scenario happens occasionally.

At first the import fails on mismatched columns - as if the "end-of-line"
marker is off.

You run it again right away and everything completes successfully.

?

Is there some kind of tracing that can be turned on so that when this
happens we can gather some clues?

As an aside - not a big deal - this is on Windows - the file specifications
in the ".import" and ".read" commands require 2 back slashes in the syntax
- can that be fixed?

Thanks !

Z:\>sqlite3 z:\xyz.db3
SQLite version 3.7.11 2012-03-20 11:35:50
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> pragma page_size=4096;
sqlite> pragma journal_mode=off;
off
sqlite> pragma temp_store=memory;
sqlite> .timer on
sqlite> .separator ','
sqlite> .read  c:\\create_table\\create_table_xyz.sql
CPU Time: user 0.00 sys 0.00
sqlite> .import c:\\data_csv\\xyz.csv  xyz
Error: c:\data_csv\xyz.csv line 12: expected 59 columns of data but found 99
sqlite> select count (*) from xyz;
0
CPU Time: user 0.00 sys 0.00
sqlite> .import c:\\data_csv\\xyz.csv  xyz
sqlite> select count (*) from xyz;
97744059
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Windows I/O (was: Initial read speed greater than subsequent)

2012-08-01 Thread Udi Karni
You are right. True Parallel Query can get very complicated. I was hoping
for something very limited for starters - for example -

- only 2 processes
- only for simple full scans where the block range can be divided in two
- only when there is no ORDER/GROUP BY where sub results from the 2 threads
have to be combined

Basically only for queries such as SELECT COUNT/MIN/MAX FROM TABLE WHERE

Sounds very limited / what's-the-point kind of thing - but it would
actually be very useful when working with large data where you find
yourself doing a lot of QA and study of the data - "how many rows have this
range of codes / are null", etc.

Having 2 processes working simultaneously might cut run times in half - and
save many minutes.

Going higher than 2 might hit disk read limitations anyway - so 2 might be
plenty for version 1.

In other words - nothing grand - just a small optimization that will kick
in on simple stuff. Pick some low hanging fruit.

A "would be nice" if not too complicated.


On Wed, Aug 1, 2012 at 5:57 PM, Christian Smith <
csm...@thewrongchristian.org.uk> wrote:

> On Sat, Jul 14, 2012 at 03:17:07PM +0100, Simon Slavin wrote:
> >
> > On 14 Jul 2012, at 3:12pm, Udi Karni  wrote:
> >
> > > I know
> > > nothing about writing DB engines - so I don't know whether adding a 2nd
> > > parallel process adds 10K or 10M to the code base.
> >
> > You've reached the limit of what I know about parallelization.  I hope
> someone else can chime in.
>
>
> Using SQLite's VM architecture, I would guess that adding this sort of
> parallelization would be non-trival. You need a parallel VM, significantly
> different to the current sequential VM, at at least a way of managing
> asynchronous IO, with perhaps a callback mechanism into the VM to handle IO
> completion. 
>
> While not certain, I guess other databases handle this by using tree based
> execution plans, where any single execution node can easily be split into
> branches to another thread/process/machine, then merged in the parent tree
> node, with each branch handling a certain key range.
>
> This might make sense, for example, with a partitioned table, where each
> partition is on it's own spindle, so a full table scan can be executed in
> parallel on each spindle and merged as a final step. So, for a table scan
> between k0 and k3, find intermediate keys to split the query between
> spindles:
>
> (k0-k3)
>   /|\
>  / | \
> /  |  \
>/   |   \
>   /|\
> (k0-k1] (k1-k2] (k2-k3)
>|   |   |
> disk1disk2disk3
>
> I sat through an Oracle internals course once, and the instructor gave us
> an example of a setup such as this where data was partitioned across 24
> disks, and the resulting full table scans were in fact quicker than index
> based scans for the data set they were using.
>
> Of course, the above would be useless for SQLite anyway, being a single
> file database. And even with the likes of Oracle, Stripe And Mirror
> Everything (SAME) might also largely defeat parallel scans.
>
> All in all, the added bloat would be measured in MB, rather than KB.
>
> Christian
>
> disclaimer: Not a practical DB implementation expert.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Windows I/O (was: Initial read speed greater than subsequent)

2012-07-14 Thread Udi Karni
You are right. Disk IO - even with SSD - is typically the bottleneck.
Running multiple threads would only make sense if all the data is in RAM -
:memory: DB or a giant RAM drive (remains to be tested of course and
pricey).

However - I noticed that when fully copying a master table into a :memory:
DB (and on my small machine this means no greater than 4-5GB - so not a
very conlusive test) - simple SELECT COUNT WHEREs go through the data at
about 250 MB/sec. IO is now not a factor anymore, and I doubt I am hitting
bus speed, so I suspect that this is truly a case of the CPU maxing out -
reading the pages from RAM and applying the WHERE criteria on the rows.
It's very linear. A 100GB table takes 400 seconds or 7.5 minutes. That's
where I am hoping a second concurrent thread could cut it down 50%.

But I understand this is completely not the core mission of Sqlite. I know
nothing about writing DB engines - so I don't know whether adding a 2nd
parallel process adds 10K or 10M to the code base. Just hoping that as
technology progresses and small embedded devices carry more and morer RAM -
it would be OK to slightly expand the footprint of Sqlite and add some more
"desktop" features.

This would be so incredible. As it is - Sqlite is virtually identical to
Microsoft ACCESS without the Microsoft price tag and footprint.
Multi-threaded capability would actually surpass it...

On Sat, Jul 14, 2012 at 6:51 AM, Simon Slavin  wrote:

>
> On 14 Jul 2012, at 2:31pm, Udi Karni  wrote:
>
> > (4) Get enough RAM to contain all the data and swap
>
> This may be the best solution to the problem but it's disappointing.  You
> really shouldn't need to do this.  Computers and caching algorithms should
> be doing better to help you.
>
> > In light of that - is it completely out of line to consider "light
> > parallelization" for Sqlite? Even just for the simplest SELECTs (SELECT
> > COUNT (*) WHERE AGE = NNN) ? Have Sqlite spawn 2 processes - 1 counting
> > the 1st half of the blocks of a table - the second counting the 2nd half?
> > Limited only to reads where there are no locking issues? Even capped at 2
> > threads? Only for tables greater than xGB so as not to bother with the
> tiny
> > ones? Would that introduce bloat in the code and detract from Sqlite's
> > original mission?
>
> Parallelization is of great advantage when the problem is processing:
> maths, pattern-searching, and processing of lots of data already in memory.
>  But SQLite is very efficient at doing this, with a very simple very
> searchable file format.  I think your bottleneck here isn't processing,
> it's I/O, just as you put in the 'Subject' header.  I bet your limitation
> is in your bus width, throughput, or low-level file handling.  If you
> implement parallelization the way you describe, the result will just be the
> two commands constantly fighting over access to your datastore -- back to
> bandwidth and throughput again.  Just to give you an idea, in normal setups
> the normal bottleneck for SQLite speed is the rotational speed of a hard
> disk.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Windows I/O (was: Initial read speed greater than subsequent)

2012-07-14 Thread Udi Karni
Thank you all for your advice. I use the standard Shell compiled for 64-bit
Windows - so it's single threaded - and the only way to multi-thread is to
run 2 concurrent processes on 2 concurrent Sqlite DBs - assuming there
would be no conflict with the master tables - since they are read only -
I'll need to test that.

Otherwise - seems like the options are -

(1) Windows compile optimizations
(2) OS optimizations
(3) Try Linux which is typically zippier
(4) Get enough RAM to contain all the data and swap

I completely understand why Sqlite needs to be "lite" - because it's meant
for small devices running in RAM. It's just so clever - nothing to install
- no threads, etc. - it's tempting to use on the desktop for conventional
DBs - where you really need a parallel engine if you want to cut through
large amounts of data quickly.

In light of that - is it completely out of line to consider "light
parallelization" for Sqlite? Even just for the simplest SELECTs (SELECT
COUNT (*) WHERE AGE = NNN) ? Have Sqlite spawn 2 processes - 1 counting
the 1st half of the blocks of a table - the second counting the 2nd half?
Limited only to reads where there are no locking issues? Even capped at 2
threads? Only for tables greater than xGB so as not to bother with the tiny
ones? Would that introduce bloat in the code and detract from Sqlite's
original mission?

Thanks.

On Sat, Jul 14, 2012 at 5:58 AM, Simon Slavin  wrote:

>
> On 14 Jul 2012, at 5:40am, Udi Karni  wrote:
>
> > It very easily handles billion row/100GB tables - multi-table joins,
> etc. -
> > it just chugs for a while because it's single threaded, and will gobble
> up
> > memory and swap - but it gets the job done.
>
> ... though you can, of course, do your SQLite call in a second thread and
> proceed as normal on your main thread.  Still single-threaded but it won't
> block.  I've noticed a lot of web pages which build the page, then fill in
> the contents of a table later.
>
> > It's quite amazing for a DB engine that's not even 1 MB.
>
> It's kept small partly because it's used in tiny single-core embedded
> systems: phone handsets, handheld control devices, and machine controllers.
>  I even found a SQLite header in the firmware from my TV recorder.  It has
> to run under tiny amounts of memory on simple hardware with little or no
> caching.  What's interesting is that somehow a SQL engine designed for
> hand-sized devices is so good on standard desktop computers that many
> people use it.
>
> > While it's happiest when it can do all the work in memory versus disk -
> it
> > reads fairly quickly from disk the first time around - but not
> subsequently
> > - which is why I posed the question.
>
> This characteristic is something to do with your hardware or OS, not
> something inherent in SQLite.  I bet if you tried the same thing on a Mac
> or Linux you wouldn't get the same behaviour.
>
> I seem to remember that Windows caches files with certain extensions
> specially, but I can't find any elucidation on the web.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Windows I/O (was: Initial read speed greater than subsequent)

2012-07-13 Thread Udi Karni
Experimenting with Data Warehouse - which should really be run on a more
"mainstream" DB. Sqlite was supposed to be just for piloting and testing -
but it's such an incredible little database engine - it's hard to let it go
- so I try big things on it just for kicks - delaying the inevitable.

It very easily handles billion row/100GB tables - multi-table joins, etc. -
it just chugs for a while because it's single threaded, and will gobble up
memory and swap - but it gets the job done.

It's quite amazing for a DB engine that's not even 1 MB.

While it's happiest when it can do all the work in memory versus disk - it
reads fairly quickly from disk the first time around - but not subsequently
- which is why I posed the question.

For now - my workaround is to attach and detach for every SQL statement -
but a better solution is probably to get a server with big RAM and tune
some of those OS settings - and ultimately, maybe MySQL?

On Fri, Jul 13, 2012 at 9:16 PM, Keith Medcalf  wrote:

>
> I know the newer versions of Windows are fantastically bloated (and slower
> every version), but what are you running that uses more than 16 GB of
> committed memory?
>
> > Thanks. More RAM would clearly be helpful - but first I need a bigger
> > machine that can take it. For some reason - the "home" line of PC is
> > typically capped at 16GB or so. I'll Need more of a workstation to go
> > higher and experiment with the settings you suggested.
> >
> > On Fri, Jul 13, 2012 at 8:35 PM, Keith Medcalf 
> wrote:
> >
> > >
> > > Windows is really atrociously bad at I/O.  Windows has the same basic
> > > model of how to perform I/O as a 6 year-old.  Scratch that, the six
> year
> > > old could probably understand I/O better than whoever wrote/designed
> the
> > > crap in Windows that passes for I/O routines.
> > >
> > > Anyway, make sure that you have all the fanciful magical features
> turned
> > > OFF (they make things slower, not faster).  That is all the various
> > > "SpeedBooster" crap and so forth that Microsoft crappifies their OS
> with to
> > > give that "gee wiz" wonderful warm and fuzzy feeling to the mass
> consumer
> > > market.
> > >
> > > Second, make sure you have turned off "Large System Cache".  Force
> Windows
> > > to forgo the magic, forgo the dreadful cache design, and do I/O
> properly.
> > >
> > > Disable "Magical Virtual Machine Resizing"-- set a fixed pagefile size
> --
> > > or better yet get sufficient RAM and disable swapping altogether -- it
> is
> > > pure bull droppings that you need a pagefile that is a percentage of
> RAM
> > > size.  If it works with 4GB of RAM and a 4GB swapfile, then it will
> work
> > > better with 8 GB of RAM and no pagefile.
> > >
> > > Then increase the IOPageLockLimit to something reasonable.
> > >
> > > And if your DASD driver supports it, enable block-level I/O
> optimization
> > > and/or caching.
>
> ---
> ()  ascii ribbon campaign against html e-mail
> /\  www.asciiribbon.org
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Initial read speed greater than subsequent

2012-07-13 Thread Udi Karni
Thanks.

Here is something that seems to work - I open a Sqlite "scratchpad" DB into
which I select subsets from master tables - which are the problem because
they are large.

Initially I put each master table in its own DB and attached them all in
the beginning of the script. The first read was fast and the subsequent
ones slow.

I now attach the master DBs I need for a given SELECT right before the
SELECT and then detach them. I repeat this on every SELECT statement where
I refer to the large external tables.

Definitely not pretty - but it works !  The master tables are now read fast
every time. Somehow this is faking Windows to think it's the "first time"
every time. Sigh...
So this is bad -
==
attach TABLE1...
SELECT TABLE1  (fast)
SELECT TABLE1  (slow)
SELECT TABLE1  (slow)
detach TABLE1

And this is good -

attach TABLE1...
SELECT TABLE1 (fast)
detach TABLE1
attach TABLE1...
SELECT TABLE1 (fast)
detach TABLE1
attach TABLE1...
SELECT TABLE1 (fast)
detach TABLE1

Whatever works...


On Fri, Jul 13, 2012 at 7:11 PM, Simon Slavin  wrote:

>
> On 14 Jul 2012, at 3:07am, Udi Karni  wrote:
>
> > Thanks. I understand. I tried to set PRAGMA CACHE_SIZE=0; in the hope of
> > not cache anything and forcing Sqlite to always go to disk - but that
> > didn't help.
> >
> > I see some reads on pagefile.sys - but both the DB and the pagefile are
> on
> > SSD - so you would think it shouldn't be too costly to read one or both -
> > even in a more random fashion.
> >
> > The SSD disk read queue is just over 1 - so it doesn't quite fit the
> > typical scenario of IO bottlenecks - a high read queue and a disk arm
> > moving frantically.
>
> I don't think you can alter this inside SQLite.  The caching you have
> noticed is not being done by SQLite.  It's either a part of Windows'
> read-ahead caching, or something being done by your storage driver.
>
> Windows gets a lot of its speed by doing things like read-head caching.
>  It can lead to strange results like the ones you reported.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Windows I/O (was: Initial read speed greater than subsequent)

2012-07-13 Thread Udi Karni
Thanks. More RAM would clearly be helpful - but first I need a bigger
machine that can take it. For some reason - the "home" line of PC is
typically capped at 16GB or so. I'll Need more of a workstation to go
higher and experiment with the settings you suggested.

On Fri, Jul 13, 2012 at 8:35 PM, Keith Medcalf  wrote:

>
> Windows is really atrociously bad at I/O.  Windows has the same basic
> model of how to perform I/O as a 6 year-old.  Scratch that, the six year
> old could probably understand I/O better than whoever wrote/designed the
> crap in Windows that passes for I/O routines.
>
> Anyway, make sure that you have all the fanciful magical features turned
> OFF (they make things slower, not faster).  That is all the various
> "SpeedBooster" crap and so forth that Microsoft crappifies their OS with to
> give that "gee wiz" wonderful warm and fuzzy feeling to the mass consumer
> market.
>
> Second, make sure you have turned off "Large System Cache".  Force Windows
> to forgo the magic, forgo the dreadful cache design, and do I/O properly.
>
> Disable "Magical Virtual Machine Resizing"-- set a fixed pagefile size --
> or better yet get sufficient RAM and disable swapping altogether -- it is
> pure bull droppings that you need a pagefile that is a percentage of RAM
> size.  If it works with 4GB of RAM and a 4GB swapfile, then it will work
> better with 8 GB of RAM and no pagefile.
>
> Then increase the IOPageLockLimit to something reasonable.
>
> And if your DASD driver supports it, enable block-level I/O optimization
> and/or caching.
>
> ---
> ()  ascii ribbon campaign against html e-mail
> /\  www.asciiribbon.org
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Initial read speed greater than subsequent

2012-07-13 Thread Udi Karni
Thanks. I understand. I tried to set PRAGMA CACHE_SIZE=0; in the hope of
not cache anything and forcing Sqlite to always go to disk - but that
didn't help.

I see some reads on pagefile.sys - but both the DB and the pagefile are on
SSD - so you would think it shouldn't be too costly to read one or both -
even in a more random fashion.

The SSD disk read queue is just over 1 - so it doesn't quite fit the
typical scenario of IO bottlenecks - a high read queue and a disk arm
moving frantically.

But who knows. More research I suppose.


On Fri, Jul 13, 2012 at 5:58 PM, Ryan Johnson
wrote:

> On 13/07/2012 5:37 PM, Udi Karni wrote:
>
>> Hello,
>>
>> Running on Windows 7 - I am noticing that tables in :memory: DBs are read
>> (SELECTED) at a constant rate. However - conventional DBs on disk - even
>> on
>> SSD - are read fast the first time, and much slower subsequently. Closing
>> and reopening a DB for every SQL statement seems to cure this - but
>> obviously is not a solution when you want to run a multi-step SQL script
>> on
>> a database.
>>
>> Is this a Windows "feature" - caching or otherwise? Is it Sqlite? Looking
>> at perfmon - the initial read "chunk" is > 100K while subsequently it's 4K
>> (the page size). Is there some prefetching taking place the first time
>> around? How do you make it permanent? How do you make Sqlite consistently
>> table-scan from disk?
>>
> This is a common problem with database buffer caches unless great pains
> are taken to avoid it (as in, not a "lite" product).
>
> What happens is that the first time through the data, it's read
> sequentially... but not all of it fits in the page cache. What gets evicted
> is *not* sequential, so on the second time around the disk requests are
> randomly scattered and  take about 100x longer to complete.
>
> Ryan
>
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Initial read speed greater than subsequent

2012-07-13 Thread Udi Karni
Hello,

Running on Windows 7 - I am noticing that tables in :memory: DBs are read
(SELECTED) at a constant rate. However - conventional DBs on disk - even on
SSD - are read fast the first time, and much slower subsequently. Closing
and reopening a DB for every SQL statement seems to cure this - but
obviously is not a solution when you want to run a multi-step SQL script on
a database.

Is this a Windows "feature" - caching or otherwise? Is it Sqlite? Looking
at perfmon - the initial read "chunk" is > 100K while subsequently it's 4K
(the page size). Is there some prefetching taking place the first time
around? How do you make it permanent? How do you make Sqlite consistently
table-scan from disk?

Thanks !
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Improving performance of GROUP BY

2012-06-17 Thread Udi Karni
Thanks. There are no indexes of any kind because I would need many and they
take very long to create.

I am experimenting with various GROUP-BY scenarios like the following where
all the work is done in TEMP.

When TEMP is on disk - it finishes but is slower - when it's in memory -
it's faster but I sometime get an "out of memory" error.

I didn't completely understand the ANALYZE output. Is there a way to get it
to estimate how many MB/GB of TEMP a SQL statement would need to
successfully complete?

  CREATE TABLE X
  AS
  SELECT
 COL1,
 COL2,
 COUNT (*) CNT
FROM
 300K_ROW_TABLE X,
 100M_ROW_TABLE Y
   WHERE
 Y.MM BETWEEN X.MM_START  AND  X.MM_END
GROUP BY
 COL1,
 COL2
  HAVING
 COUNT (*) = 24
/


On Tue, May 29, 2012 at 3:04 PM, Nico Williams wrote:

> On Fri, May 25, 2012 at 1:38 PM, Udi Karni  wrote:
> > I am running the following query -
> >
> > CREATE TABLE XYZ AS
> > SELECT ID, MIN (DATE)
> > FROM SOURCE-TABLE
> > WHERE CRITERIA
> > GROUP BY ID ;
> >
> > SOURCE-TABLE has 600 million rows, 2 million meet the WHERE criteria, and
> > get grouped to 100,000 distinct IDs. There are no indexes as there is too
> > much variety in the WHERE clauses and index creation on this table takes
> > many hours. Temp_store=memory. When executed - the source table is read
> > very slowly, and runtime is 1/2 hour.
> >
> > I split this SQL into 2 steps - the first just extracting - the second
> just
> > grouping. The extract read the table extremely fast and finished in 3
> > minutes. The GROUP BY ran in a few seconds. So a 10X improvement - which
> is
> > pretty incredible speed.
>
> Meaning?  Something like this:
>
> CREATE TABLE FOO AS SELECT ID, DATE FROM SOURCE_TABLE;
> CREATE TABLE BAR AS SELECT ID, MIN(DATE) FROM FOO GROUP BY ID;
>
> ?
>
> Are you sure that there are no indexes?  Maybe there are indexes
> implied by UNIQUE constraints and maybe SQLite3 is incorrectly
> deciding to scan the index instead of the table?
>
> My guess is that you have a UNIQUE constraint on ID in SOURCE_TABLE
> and SQLite3 is scanning the index and then fetching the DATE from the
> table.  But this makes no sense because then it makes no sense to
> select the min() of DATE as there would be a single row per-ID.  My
> guess has to be wrong.  Maybe there's a UNIQUE constraint on several
> columns including ID?  That would not be incompatible with the
> semantics of your query.
>
> Can you post EXPLAIN QUERY PLAN output for your statements?  And/or your
> schema.
>
> Nico
> --
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parameterized SQL

2012-06-15 Thread Udi Karni
Niall - thanks. If I understand correctly - you use bash to do the
preprocessing of the substitutions and submit the prepared statements to
Sqlite.

On Fri, Jun 15, 2012 at 1:48 AM, Niall O'Reilly wrote:

>
> On 14 Jun 2012, at 22:16, Udi Karni wrote:
>
> > Is there a way to run SQL scripts in the Shell with parameters?
> >
> > Something like this?
> >
> > set &YEAR = 2010
> >
> > SELECT COUNT (*) FROM TABLE WHERE YEAR = &YEAR ;
> >
> > ???
>
> I use bash and sqlite3, as in the fragment below.
>
> #!/bin/bash
>
> # query-script for netdb/SQL
>
> qtype=${2-node_by_name_or_alias}
> dbfile=${3-default-network.db}
>
> case $qtype in
>object_by_property)
>qkey=${1-code=EE}
>echo "  Performing query '$qtype' for search argument '$qkey' in
> database '$dbfile'"
>echo
>tag=`echo $qkey | sed -e 's/=.*//'`
>val=`echo $qkey | sed -e 's/.*=//'`
>/usr/bin/time /usr/local/bin/sqlite3 "$dbfile" < -- tailor display
> -- .mode tabs
> .separator ' '
>
> -- select memory for temporary storage
> pragma temp_store = memory;
>
> create temporary table tmp_objects (object_ref integer);
>
> -- collect objects whose name or alias exactly matches the search key
> insert into tmp_objects
>select distinct object_ref from property where tag = '$tag' and
> value = '$val';
>
> -- show count
> select count(), 'object(s) found' from (select distinct object_ref from
> tmp_objects);
>
> -- collect linked objects (ranges, interfaces ...)
> insert into tmp_objects
>select origin_ref from tie where target_ref in (select distinct
> object_ref from tmp_objects);
> select id, '', class, '' from object where id in (select distinct
> object_ref from tmp_objects)
>union all
>select origin_ref, '  ', class, target_ref from tie where
> target_ref in (select object_ref from tmp_objects)
>union all
>select object_ref, '  ', tag, value from property where object_ref
> in (select object_ref from tmp_objects)
>order by object_ref asc;
>
> EOF
>;;
>
> # Other cases omitted ...
>
>*)
>echo "  Unknown query: '$qtype'"
>;;
>esac
>
>
>I hope this helps.
>
>
>Best regards,
>
>Niall O'Reilly
>University College Dublin IT Services
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Parameterized SQL

2012-06-14 Thread Udi Karni
Is there a way to run SQL scripts in the Shell with parameters?

Something like this?

set &YEAR = 2010

SELECT COUNT (*) FROM TABLE WHERE YEAR = &YEAR ;

???
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Displaying currently running SQL

2012-06-06 Thread Udi Karni
When pasting multiple SQL statements into the Shell - you can see them
running one by one. However - when you put them in a file and run them via
".read" - the statements don't show on the screen. I have ".timer on" - so
all I see are the run times of each statement.

Is there a "verbose" setting that will cause sqlite to show the SQL
statement currently running?

Thanks !
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Improving performance of GROUP BY

2012-05-26 Thread Udi Karni
It certainly seems to improve things. I ran multiple tests because over
time seems like portions of the data get cached and there is a difference
between an initial run and subsequent runs. I almost wish simple
table-scans would always go exclusively to disk - I am not sure the caching
always helps.

In any event - breaking the SQL up in 2 is still at least twice as better.

Where SQlite seems to run superfast all the time - is :memory: tables - if
you have the RAM. Maybe that's the real direction with large DBs. A giant
RAMDisk ?
On Fri, May 25, 2012 at 12:10 PM, Simon Slavin  wrote:

>
> On 25 May 2012, at 7:38pm, Udi Karni  wrote:
>
> > I am running the following query -
> >
> > CREATE TABLE XYZ AS
> > SELECT ID, MIN (DATE)
> > FROM SOURCE-TABLE
> > WHERE CRITERIA
> > GROUP BY ID ;
> >
> > SOURCE-TABLE has 600 million rows, 2 million meet the WHERE criteria, and
> > get grouped to 100,000 distinct IDs. There are no indexes as there is too
> > much variety in the WHERE clauses and index creation on this table takes
> > many hours. Temp_store=memory. When executed - the source table is read
> > very slowly, and runtime is 1/2 hour.
> >
> > I split this SQL into 2 steps - the first just extracting - the second
> just
> > grouping. The extract read the table extremely fast and finished in 3
> > minutes. The GROUP BY ran in a few seconds. So a 10X improvement - which
> is
> > pretty incredible speed.
>
> Just for testing purposes, change MIN(DATE) to DATE and check the timings.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Improving performance of GROUP BY

2012-05-25 Thread Udi Karni
I am running the following query -

CREATE TABLE XYZ AS
SELECT ID, MIN (DATE)
FROM SOURCE-TABLE
WHERE CRITERIA
GROUP BY ID ;

SOURCE-TABLE has 600 million rows, 2 million meet the WHERE criteria, and
get grouped to 100,000 distinct IDs. There are no indexes as there is too
much variety in the WHERE clauses and index creation on this table takes
many hours. Temp_store=memory. When executed - the source table is read
very slowly, and runtime is 1/2 hour.

I split this SQL into 2 steps - the first just extracting - the second just
grouping. The extract read the table extremely fast and finished in 3
minutes. The GROUP BY ran in a few seconds. So a 10X improvement - which is
pretty incredible speed.

Why such a large disparity? Why wouldn't the single-step SQL read fast -
write the accepted values to TEMP - then sort and group - same as the
2-step?

Any way to influence the SQlite SQL processor to chose a more optimal
single-step approach?

Thanks !
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-06 Thread Udi Karni
Hi Michael,

May I make a request on behalf of the C++ challenged crowd?

Is it possible to fuse your contribution with the existing capabilities of
the ".import" command so that it can be activated via syntax similar to -
"RECORDS DELIMITED BY... FIELDS TERMINATED BY... OPTIONALLY ENCLOSED BY..."

It would make ".import" so much more powerful.

Thanks !

On Sun, May 6, 2012 at 5:59 AM, Black, Michael (IS)
wrote:

> I modified my csvimport to allow for quoted fields.  Still automaticallhy
> detects the # of columns.
> If quoted fields also contain internal quotes they must be escaped by
> doubling them.
> e.g.
> col1,"this is, a ""quoted"" field",col3
> It's not fully RFC4180 compliant as it doesn't allow for CF/LF inside
> quotes -- does anybody really need/use that?
>
> #include 
> #include 
> #include 
> #include "sqlite3.h"
> #define BUFSIZE 100
> #define MOD 10
> #define MAXTOK 10
> char *nextToken(char *line) {
>  static char token[MAXTOK];
>  static char *p;
>  int n=0;
>  if (line) {
>p = line;
>  }
>  while(*p && *p != ',' && *p!='\r' && *p!= '\n') {
>if (*p == '"') { // quoted field
>  ++p; // move past first quote
>  while((*p && *p!='"') || *(p+1)=='"') {
>if (*p == '"' && *(p+1)=='"') { // escaped quote?
>  token[n++]='"';
>  p+=2;
>  continue;
>}
>token[n++] = *p;
>++p;
>  }
>  token[n++]=0;
>  ++p;
>  if (*p == ',') ++p;
>  return token;
>}
>token[n++]=*p;
>++p;
>  }
>  if (n > MAXTOK) {
>fprintf(stderr,"Token too big??\n");
>exit(1);
>  }
>  token[n]=0;
>  ++p;
>  if (*p == ',') ++p;
>  if (n > 0) {
>return token;
>  }
>  return NULL;
> }
> // Add comma delimited file to exisiting database/table
> // Quoted strings are accepted
> int main(int argc, char *argv[]) {
>  sqlite3 *db;
>  sqlite3_stmt *stmt;
>  int rc;
>  int ncol=0;
>  int nline=0;
>  char *buf=malloc(BUFSIZE);
>  char sql[8192];
>  FILE *fp;
>  char *filename;
>  char *databasename;
>  char *tablename;
>  if (argc != 4) {
>fprintf(stderr,"Usage: %s filename databasename tablename\n",argv[0]);
>exit(1);
>  }
>  filename = argv[1];
>  databasename = argv[2];
>  tablename = argv[3];
>  rc = sqlite3_open_v2(databasename,&db,SQLITE_OPEN_READWRITE,NULL);
>  if (rc) {
>fprintf(stderr,"Error opening database '%s':
> %s\n",databasename,sqlite3_errmsg(db));
>exit(1);
>  }
>  sprintf(sql,"insert into %s values (",tablename);
>  fp=fopen(filename,"r");
>  if (fp == NULL) {
>perror(filename);
>exit(1);
>  }
>  buf[BUFSIZE-1] = '*';
>  fgets(buf,BUFSIZE,fp);
>  if (buf[BUFSIZE-1] != '*') {
>fprintf(stderr,"BUFSIZE not big enough...aborting\n");
>exit(1);
>  }
>  // count the columns
>  char *p=nextToken(buf);
>  ncol=0;
>  while(p) {
>++ncol;
>strcat(sql,ncol==1?"":",");
>strcat(sql,"?");
>p=nextToken(NULL);
>  }
>  printf("%d columns detected\n",ncol);
>  strcat(sql,")");
>  rewind(fp);
>  // Let's wrap things in a transaction
>  rc = sqlite3_exec(db,"BEGIN",NULL,NULL,NULL);
>  if (rc) {
>fprintf(stderr,"BEGIN failed on '%s': %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  // prepare our statement
>  rc = sqlite3_prepare(db,sql,strlen(sql),&stmt,NULL);
>  if (rc) {
>fprintf(stderr,"Prepare failed on '%s': %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  // Loop over file file
>  while(fgets(buf,BUFSIZE,fp)) {
>char *p=nextToken(buf);
>int i=1;
>++nline;
>if ((nline % MOD)==0) {
>  printf("%d\r",nline);
>  fflush(stdout);
>}
>while(p) { // bind the columns as text, table will take care of
> conversion to column types
>  rc=sqlite3_bind_text(stmt,i,p,-1,SQLITE_TRANSIENT);
>  if (rc) {
>fprintf(stderr,"bind_text failed on '%s':
> %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  ++i;
>  p=nextToken(NULL);
>}
>if (--i != ncol) {
>  fprintf(stderr,"expected %d cols, got %d cols on
> line#%d\n",ncol,i,nline);
>} else {
>  rc = sqlite3_step(stmt);
>  if (rc != SQLITE_DONE) {
>fprintf(stderr,"Insert failed on '%s':
> %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  rc = sqlite3_reset(stmt);
>  if (rc) {
>fprintf(stderr,"Reset failed on '%s': %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>}
>  }
>  rc=sqlite3_finalize(stmt);
>  if (rc) {
>fprintf(stderr,"Finalize failed: %s\n",sqlite3_errmsg(db));
>exit(1);
>  }
>  printf("%d inserts, committing...\n",nline);
>  rc = sqlite3_exec(db,"COMMIT",NULL,NULL,NULL);
>  if (rc) {
>fprintf(stderr,"COMMIT failed on '%s': %s\n",sql,sqlite3_errmsg(db));
>exit(1);
>  }
>  rc=sqlite3_close(db);
>  if (rc) {
>fprintf(stderr,"Close failed on '%s': %s\n",argv[2],sqlite3_errmsg(db));
>exit(1);
>  }
>  fclose(fp);
>  return 0;
> }
>
>
>
>
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Advanced GEOINT Solutions Operating Unit
> Northrop Grum

Re: [sqlite] 64-bit precompiled command-line shell binary for Windows 7

2012-05-01 Thread Udi Karni
Thank you Mr. Burstein. This is great !!!

Run times using memory for DB and Temp are much improved over SSD - let
alone conventional disk, "out of memory" messages are gone, count (*) is
sub-second.

Time to go shopping for more RAM.

Thanks again.

On Mon, Apr 30, 2012 at 9:23 AM, Baruch Burstein wrote:

> I haven't tested it (I only have a 32-bit system), but here you go. Just
> change the extension to .exe
>
> On Mon, Apr 30, 2012 at 4:28 PM, Udi Karni  wrote:
>
> > Dear Sqlite development team,
> >
> > I'd like to add a vote for requesting a 64-bit precompiled command-line
> > shell binary (sqlite3.exe) for Windows 7 - to be added to the list of
> files
> > downloadable from the page <http://www.sqlite.org/download.html> .
> >
> > It would be very helpful for those of us who are not versed with
> compiling
> > C++ code.
> >
> > I have seen a tremendous performance boost from the use of "memory
> > temp_store" and "memory database". The 64-bit version would allow them to
> > go beyond 2GB.
> >
> > Is that simple? Not so simple? If you could elaborate - it would be much
> > appreciated.
> >
> > Thank you !
> >
> > Udi
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
>
> --
> Programming today is a race between software engineers striving to build
> bigger and better idiot-proof programs, and the Universe trying to produce
> bigger and better idiots. So far, the Universe is winning.  - Rich Cook
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 64-bit precompiled command-line shell binary for Windows 7

2012-04-30 Thread Udi Karni
Dear Sqlite development team,

I'd like to add a vote for requesting a 64-bit precompiled command-line
shell binary (sqlite3.exe) for Windows 7 - to be added to the list of files
downloadable from the page  .

It would be very helpful for those of us who are not versed with compiling
C++ code.

I have seen a tremendous performance boost from the use of "memory
temp_store" and "memory database". The 64-bit version would allow them to
go beyond 2GB.

Is that simple? Not so simple? If you could elaborate - it would be much
appreciated.

Thank you !

Udi
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] max size for ":memory:" database?

2012-04-15 Thread Udi Karni
I tried with the 32-bit version of sqlite-for-Windows on a Windows 7 Home
Premium desktop with 8GB - and the ":memory" database couldn't exceed 2GB.

The advice was to get a 64-bit version of sqlite-for-Windows.

I asked - but didn't get a definitive answer - so excuse my asking again -
where can we find a 64-bit Windows version of 3.7.11? Can one be added to
the download page? I am not versed enough in compiling C code.

Thanks !

On Sun, Apr 15, 2012 at 8:13 AM, Simon Slavin  wrote:

>
> On 15 Apr 2012, at 3:48pm, niXman  wrote:
>
> > Tell me please, what is the maximum size allowed for a ":memory:"
> database?
> > 64 GB is allowed?
>
> If the question was answered directly, the answer would be here:
>
> 
>
> bit it's not (though see answer 12).  However, the internal architecture
> of SQLite suggests that it's 64-bit clean, thus that the limit will be in
> the OS, not SQLite, so what you really need to know is the limits of your
> OS.  Some expert on page-handling in SQLite might have a specific answer.
>
> The assumption for the standard 64 bit OSes these days is 64GB.  Some
> versions of Windows 7 can handle 192 GB (plenty of test installations since
> it's actually useful to have that much memory in a Windows 7 server).  Red
> Hat Linux running on an AMD processor can handle 256 MB (proved by an
> actual installation, but not seen personally by me).  Apple's OS X 10.7 is
> designed to support 16 TB of memory but I don't know of any test
> installation.  The most I've seen personally is 192 MB, which worked, but
> was put together just for the lols and taken apart shortly afterwards once
> the benchmarks had been run.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] ETL anomalies

2012-04-04 Thread Udi Karni
First - Dr. Hipp - thanks for the .import fix from 1/16 baked into 3.7.11 -
it now recognizes commas embedded within character strings in a csv +
strips the double-quotes. This makes the loading much smoother.

 ==

Now for 3 strange events I've been running into. Sorry - the data volumes
are large - so it's very time consuming to run many tests to pinpoint these
further. Just raising the flag in case any of this rings a bell with anyone.

==

(1) I import data into a table in one DB
(2) I then create an identical table in a 2nd DB
(3) I then copy the table from DB1 to DB2 - with some SQL ETL cleanups.
Surprisingly - the data gets appended to the initial table on DB1 instead
of the new table on DB2.

I tried to replicate it with a couple of rows - but it works fine. However
- with the real 100 million row table it happens.

My workaround for the moment is to give the 2 tables on the 2 DBs different
names and ALTER RENAME when all is said and done. That works fine.

sqlite3 orig.db3
create table xyz (.);
import  xyz.csv  xyz
.exit

sqlite3 final.db3
create table xyz (.);
attach database 'orig.db3' as indb;
insert into xyz select * from indb.xyz
.exit

Look at the size of orig.db3 - it's now double in size with twice as many
rows
Look at the size of final.db3 - 0K

 ==

The reason I use the 2 DB approach in leu of just UPDATEing the original DB
- is that on UPDATE sqlite seems to try and read the table first. At least
it seems to do that on large tables.

Apparently - if the table is large enough - memory usage slowly rises - and
then I get an "out of memory" message.

So UPDATE works on smaller tables (up to 10-20GB) but for larger ones I
need to create a whole new copy of the table for every ETL iteration.

Since I would like each large table to end up in its own DB, and I would
like to avoid putting multiple intermediate tables in the same DB, then
delete the unnecessary ones, then finally VACUUM - I use multiple physical
DBs - hence - the abovementioned attach issue.

 ==

If I run an .import and mistype something - like the path of the .csv file
- or the wrong .separator - the commands obviously fails. I correct the
syntax and run again - I sometimes get a "expected x columns but found y
columns stopping at row nn" type of message.

Initiall I very carefully looked at the data and tried to load "nn+1" rows.
Everything worked. I reran the .import and all was fine.

Then I learned to just close / reopen the DB - run the .import again -
without changing anything - and all works fine.

It's as if when .import runs against a problem - it doesn't clean something
up - which leads to other messages. Starting from scratch - making sure the
syntax is correct the first time - works fine.

==

Again - sorry these sounds so fuzzy. I'll try to provide more detail as I
run across it - but just in case these are simple to figure out.

Thanks !
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 64-bit Windows Command Shell

2012-03-24 Thread Udi Karni
Very interesting. 2 questions -

(1) On the SQLite Download page - the Linux version just says "x86" while
the Windows version is "Win-32 x86". Does this mean that the Linux version
is a 64-bit version? In other words - if run on, say, RedHat 5.5 64-bit -
it will be able to use >4GB of RAM?

(2) On Windows - a RamDisk "drive letter" can be used beneficially in 2
ways -
(a) to contain the database itself
(b) to serve as a destination for the TEMP and TMP - which SQLite (through
Windows I suppose) uses often as a scratchpad.
Is it possible - good/bad idea - to point the Windows TMP and TEMP to a
"virtual" drive letter on top of a RamDisk? The RamDisk gets established as
Windows comes up - but the question is whether Windows might need the TMP /
TEMP prior to that point in time - when they are not yet going to be
available - causing startup problems?

On Sat, Mar 24, 2012 at 5:36 PM, Roger Binns  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 24/03/12 13:48, Udi Karni wrote:
> > I tried the SQLite in-memory DB feature - but when it exceeded about
> > 2GB - I got the "out of memory" message.
>
> Technically it ran out of address space, not ram.
>
> > In my particular scenario - while the raw data being attached and read
> > is hundreds of GB - the result sets are only a few GB.
>
> 2GB is the threshold to switch to 64 bits.  Also note that SQLite does not
> calculate the entire result set in advance.  Instead when you call
> sqlite3_step() it does whatever work is necessary to find the next result
> row.  It is only if you do something like require sorting that cannot be
> satisfied by an index that the whole result set has to be calculated and
> then sorted.
>
> > A 64-bit version of SQLite that could handle an in-memory DB of 5-10
> > GB would be quite useful and interesting to test.
>
> You can also use a ramdisk for the files with journalling turned off.
> Microsoft used to have one, but now a search shows a number of companies.
>  I have no idea which ones are the most reputable. Under Linux you can use
> tmpfs which makes things very easy.
>
> > Unfortunately I am not versed enough in compiling my own version - so
> > hopefully this would be offered one of these days.
>
> Unfortunately making the free Microsoft dev tools generate 64 bit binaries
> is a PITA.  Another alternative is to use a 64 bit version Linux in which
> case all this stuff just works.
>
> > Whether a RAM-only version would truly run subtantially faster than an
> > SSD-based DB - remains to be seen - but the <2GB experiments seem to
> > show that in-memory is quite promising.
>
> Any memory used to store databases is memory that cannot be used for I/O
> caching.
>
> Roger
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iEYEARECAAYFAk9uaHgACgkQmOOfHg372QQhXQCg0c7MYT5s9WBSGcwyYWra7ZNx
> 3qsAmwRSECp48Z2KkhfkMGvyvR7X1LsJ
> =M4tq
> -END PGP SIGNATURE-
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 64-bit Windows Command Shell

2012-03-24 Thread Udi Karni
I tried the SQLite in-memory DB feature - but when it exceeded about 2GB -
I got the "out of memory" message.

In my particular scenario - while the raw data being attached and read is
hundreds of GB - the result sets are only a few GB.

A 64-bit version of SQLite that could handle an in-memory DB of 5-10 GB
would be quite useful and interesting to test. Unfortunately I am not
versed enough in compiling my own version - so hopefully this would be
offered one of these days.

Whether a RAM-only version would truly run subtantially faster than an
SSD-based DB - remains to be seen - but the <2GB experiments seem to show
that in-memory is quite promising.

On Thu, Mar 22, 2012 at 8:29 AM, Don V Nielsen wrote:

> There is a natural 5th extrapolation:
>
> 5) Could sqlite3 take advantage of multiple cpu's by parsing a single task
> into one thread per cpu and segment data to be worked by each thread?  Big
> league stuff.  But I don't think sqlite3 is meant to compete in that
> market.  It already exceeds expectations in its current market.
>
> dvn
>
> On Wed, Mar 21, 2012 at 6:57 PM, Udi Karni  wrote:
>
> > Frankly I don't know if a 64-bit version and Big RAM would make a
> > difference and if so - up to what point. With SQLite being a single
> process
> > - assigned for the most part to a single CPU - even if everything was
> done
> > in RAM - there is a limit to what 1 CPU can do.
> >
> > I am just noticing anecdotally that SQlite uses cache and dealing with
> > tables of a few hundred MB or less doesn't seem to generate IO. Also -
> when
> > there is IO - it often comes from the swap file (under Windows 7).
> >
> > So the questions are -
> >
> > (1) how much RAM is the point of diminishing returns on 32-bit
> > (2) is there value to going 64-bit
> > (3) if there was a 64-bit version - would it use more RAM more
> effectively?
> > (4) as a fallback - let's say the 32-bit version and 4GB are as good as
> you
> > can pretty much expect. Would getting a server with 4 CPUs and 16GB (a
> > high-end home-version PC) - reasonably enable me to run 3-4 SQLite jobs
> > concurrently? In other words - no great speed improvement per job - but
> in
> > aggregate more work could get done?
> >
> > Thanks !
> >
> > On Wed, Mar 21, 2012 at 12:26 PM, Roger Binns 
> > wrote:
> >
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA1
> > >
> > > On 21/03/12 11:09, Black, Michael (IS) wrote:
> > > > Cache is the primary (and obvious) thing I can think of.
> > >
> > > With a 32 bit compilation you'll be able to bump it up to about 2GB.
> > > However by that point you will long have passed diminishing returns and
> > > can just let the OS do its own caching.
> > >
> > > Roger
> > > -BEGIN PGP SIGNATURE-
> > > Version: GnuPG v1.4.11 (GNU/Linux)
> > >
> > > iEYEARECAAYFAk9qK2IACgkQmOOfHg372QQVdwCfbJTAzhCPR4ARPxhYHewLvvcT
> > > 4lYAoI4QFXFfxILtsQGxVWm8BRM/mbIX
> > > =e0aW
> > >  -END PGP SIGNATURE-
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users@sqlite.org
> > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] VACUUMing large DBs

2012-03-22 Thread Udi Karni
Very nice!  Thanks !

But then - can you turn journaling off and then run a VACUUM and have it
run as a 2-step instead of a 3-step?

On Thu, Mar 22, 2012 at 3:25 PM, Petite Abeille wrote:

>
> On Mar 22, 2012, at 11:19 PM, Udi Karni wrote:
>
> > Is there a way to run NOLOGGING in SQlite syntax - which means that if
> > something in the destination table/DB fails - you are prepared to just
> drop
> > it and start over?
>
> PRAGMA journal_mode=off
>
> http://sqlite.org/pragma.html#pragma_journal_mode
>  ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] VACUUMing large DBs

2012-03-22 Thread Udi Karni
For the time being - I have been avoiding the VACUUM of very large DBs by
creating a new iteration of the table/DB for each transformation instead of
using UPDATE/DELETE (given that I only have 1 table per DB) -

(1) create new DB_V2 / Table_V2
(2) attach DB_V1 / Table_V1
(3) insert into Table_V2 select (column list with transformations) from
Table_V1
(4) drop DB_V1

If there are too many transformations - I just do it in a few iterations.

By using 2 seperate disks for the 2 DBs/tables - one only reads - the other
only writes - rows don't grow/shrink - and especially if both disks are
SSDs - this works quite fast and no need to VACUUM the final version.

What would make it even better would be the possibility of using CREATE
TABLE X NOLOGGING AS SELECT ... FROM Y;

Is there a way to run NOLOGGING in SQlite syntax - which means that if
something in the destination table/DB fails - you are prepared to just drop
it and start over?

Thanks !

On Thu, Mar 22, 2012 at 3:06 PM, Scott Hess  wrote:

> On Tue, Mar 20, 2012 at 8:25 PM, Jay A. Kreibich  wrote:
> > On Tue, Mar 20, 2012 at 01:59:59PM -0700, Udi Karni scratched on the
> wall:
> >> Is there a way to go directory from "original" to "journal/final" -
> >> skipping the creation of the Temp version?
> >
> >  No, it requires all three copies.
> <...>
> >  Almost exactly two years ago I proposed a "VACUUM TO "
> >  version of the command that did the first copy and then quit.  Rather
> >  than building an optimized temp copy, VACUUM TO would copy the
> >  current database to an optimized named file (rather than a temp file),
> >  and then skip the copy-back stage.  This would allow a system admin
> >  to shut down all database users, VACUUM the database, swap files,
> >  and finally restart everything.  The process would require more
> >  manual work, but would only require 2x the drive space, rather than
> >  3x.  Nobody spoke up about the idea, however.
>
> I think you could manage 2x-the-drive-space without shutdown by
> writing a read-only VFS which treated the pages in the journal as its
> backing store, faulting missed through to the main file.  Then you
> could VACUUM from the database-in-the-journal to the
> database-in-the-database.  In case of failure, the journal rolls
> things back like you'd expect.
>
> I _think_ this would work.
>
> -scott
>  ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 64-bit Windows Command Shell

2012-03-21 Thread Udi Karni
Frankly I don't know if a 64-bit version and Big RAM would make a
difference and if so - up to what point. With SQLite being a single process
- assigned for the most part to a single CPU - even if everything was done
in RAM - there is a limit to what 1 CPU can do.

I am just noticing anecdotally that SQlite uses cache and dealing with
tables of a few hundred MB or less doesn't seem to generate IO. Also - when
there is IO - it often comes from the swap file (under Windows 7).

So the questions are -

(1) how much RAM is the point of diminishing returns on 32-bit
(2) is there value to going 64-bit
(3) if there was a 64-bit version - would it use more RAM more effectively?
(4) as a fallback - let's say the 32-bit version and 4GB are as good as you
can pretty much expect. Would getting a server with 4 CPUs and 16GB (a
high-end home-version PC) - reasonably enable me to run 3-4 SQLite jobs
concurrently? In other words - no great speed improvement per job - but in
aggregate more work could get done?

Thanks !

On Wed, Mar 21, 2012 at 12:26 PM, Roger Binns  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 21/03/12 11:09, Black, Michael (IS) wrote:
> > Cache is the primary (and obvious) thing I can think of.
>
> With a 32 bit compilation you'll be able to bump it up to about 2GB.
> However by that point you will long have passed diminishing returns and
> can just let the OS do its own caching.
>
> Roger
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iEYEARECAAYFAk9qK2IACgkQmOOfHg372QQVdwCfbJTAzhCPR4ARPxhYHewLvvcT
> 4lYAoI4QFXFfxILtsQGxVWm8BRM/mbIX
> =e0aW
>  -END PGP SIGNATURE-
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 64-bit Windows Command Shell

2012-03-21 Thread Udi Karni
Hello,

Is there - or can we kindly request that a 64-bit version of the Command
Shell be offered on the download page?

A version that could run on Windows 7 64-bit and effectively use large RAM?

Thanks !
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] VACUUMing large DBs

2012-03-20 Thread Udi Karni
Thanks!  I got one and tried - and it seems to improve overall performance
about 2X. Very cool.

The 240GB SSD drives are pretty reasonably priced and would suffice for
most tables. I'm just wondering how long before Flash Write Fatigue sets in
and you need a replacement. 

On Tue, Mar 20, 2012 at 2:27 PM, Roger Binns  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 20/03/12 13:59, Udi Karni wrote:
> > And a more general question. My PC has 8GB of RAM. I am considering
> > getting a much larger machine that can take upwards of 100-200GB of
> > RAM.
>
> I'd recommend getting one or more SSDs instead (also a lot cheaper).  The
> reason is that during your vacuum most data is only read once so RAM won't
> really help - instead you want improved latency and throughput of reads.
>
> For RAM you should make it a little larger than your working set of data.
>  That will depend on your workload and app, and applies to any kind of
> database.  Again SSDs will help since the penalty of a RAM/cache miss is
> less than with spinning media.
>
> Roger
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iEYEARECAAYFAk9o9ksACgkQmOOfHg372QRIygCgrCnBL5osiqWR+W1bHjLgDwZZ
> VAoAoN0gKsJU35myHrlFEerHwLnXjyjY
> =yAEn
> -END PGP SIGNATURE-
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Increasing the number of attached databases beyond 10

2012-03-15 Thread Udi Karni
Thanks. However - SQLITE_MAX_ATTACHED seems to only apply to a tool you are
compiling - whereas I just submit SQL scripts from the standard shell and
was hoping to be able to attach more than 10 DBs.

I looked and looked and so far have not found any way to do this.

Thanks again.

On Tue, Mar 13, 2012 at 3:13 PM, Petite Abeille wrote:

>
> On Mar 13, 2012, at 11:05 PM, Udi Karni wrote:
>
> > Is there a way to increase the number of attached databases beyond the
> > default of 10 - from the command shell - by issueing a command / pragma?
>
> Not from the shell, but check SQLITE_MAX_ATTACHED:
>
> http://sqlite.org/limits.html
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Increasing the number of attached databases beyond 10

2012-03-13 Thread Udi Karni
Hello,

Is there a way to increase the number of attached databases beyond the
default of 10 - from the command shell - by issueing a command / pragma?

I have 50+/- large tables (50GB - 100GB) and maintenance / transfer of them
is much easier if each one is in its own SQLite DB file.

I need to run SQL scripts that join groups of these tables. I was hoping to
be able to create a "work" database - issue 50 attach staements - then
refer to however many tables I need from the group using the DB.TABLE
syntax.

Is this reasonable? doable? I am trying to avoid bunching them together
into 1/4 TB - 1 TB giant DBs.

Thanks !
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-05 Thread Udi Karni
Watching Windows 7 Resource Monitor (launched from a button on Windows Task
Manager) - I see that sqlite - directly - or through Windows - generates
quite a bit of activity on a temp file located on
C:\users\owner\AppData\Local\Temp - especially when running large joins,
etc. There are large read and write queues on this file - so it's
definitely slowing things down.

I suspect this is the sqlite equivalent of TempDB - a scratchpad where
sorting / grouping / joining takes place.

I am wondering about adding a Solid State Drive, and redirecting Windows
from the C:\ location this SSD drive - where performance will hopefully be
better than the C: drive. There seem to be some posts on how to do this.

Then again - Resource Monitor shows dozens of various files on the C: drive
being constantly read / written as Sqlite works - though to a much lesser
degree than the Temp location. Since Windows seems very tightly bound to
the C: drive - I am also wondering about getting a PC with an SSD C: drive
where all IO to all locations on the C: drive will hopefully be faster.

Lots to research, and a new concept to get used to - having the entire OS
on an SSD.

If anyone has tried any of this and would like to share their experience -
it would be much appreciated.



On Fri, Feb 3, 2012 at 10:48 AM, Alexey Pechnikov wrote:

> You can try page size 8192 and cache size 128000.
> Note: is required vacuuming after changing page size on existing database.
>
> Inserting/updating big indexed tables may be slow. In last SQLite versions
> a index creating performance is optimized and so indexing fully populated
> table is a good idea when it's possible.
>
> Don't use b-tree index for text fields. Use instead FTS4 extension or
> integer
> hash value to index text fields. Note: index compression is not supported
> by
> SQLite.
>
> 100+ Gb table and database is not too much for SQLite. I think you have
> problems with big indexed tables but not with big tables. Big cache or
> RAM drive or SSD disk may increase index updating speed. When a index
> is larger than cache size (for parameters above cache size will be
> 128 000 *8 192 bytes) all index moditications is very disk expensive
> operations.
>
> P.S. Do not use cache size > 2Gb on 32-bit hosts.
>
> 2012/2/3 Udi Karni :
> > Marcus - thanks. I will experiment with those 2 PRAGMAs.
> >
> > Meanwhile - I was trying to update some columns in a table with 130
> million
> > rows and got this error -
> >
> > "Error: out of memory"
> >
> > I am not sure why. I thought the "UPDATE" just writes out new rows and
> > maybe a journal of the old rows for recoverability - but I am not sure
> why
> > it would need very much memory.
> >
> > Be that as it may - and with regard to your suggestion - and in light of
> > this error message - given that I only have 4GB of RAM on my PC - is this
> > really enough RAM to handle tables of this size ? Or am I giving Sqlite
> an
> > unreasonably small amount of resources and it's time for a serious
> hardware
> > upgrade?
> >
> > Thanks,
> >
> > Udi
> >
> > On Thu, Feb 2, 2012 at 10:03 PM, Marcus Grimm  >wrote:
> >
> >> > Given how clever and compelling Sqlite is - I am testing how it
> scales to
> >> > tables in the 100GB / 200 million row range. This is for a strictly
> "read
> >> > only" application - but first the tables must be populated in a one
> time
> >> > process. As is often the case with Big Data - the data is a little
> dirty
> >> -
> >> > so the process involves importing - selecting - counting - inspecting
> -
> >> > updating some rows - deleting some rows - selecting - counting -
> >> > inspecting, etc. until clean.
> >> >
> >> > Placing the Sqlite database on a traditional C: drive - IO was too
> slow.
> >> > At
> >> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> >> > external Raid array where I ran across an interesting find. IO wasn't
> >> that
> >> > much faster - until I vaccuumed the database - which increase IO 10X
> to
> >> > 150
> >> > MB/sec - with the same CPU utilization.
> >> >
> >> > This is good news for the final implementation of this read-only
> database
> >> > -
> >> > but still a dilemma at the data load phase. After a ".vaccuum" -
> issueing
> >> > a
> >> > single DML against a table - even a DELETE which deletes no rows at
> all -
> >> > causes IO to drop back down to 15 MB/sec - on the table I&#x

Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-03 Thread Udi Karni
Thanks !  I will experiment.

I have Windows 7 (64 bit) with 4GB RAM - so I suspect that 2GB is the most
I can specify anyway.

Is there a 32-bit versus 64-bit official releases of the command line shell?

On Fri, Feb 3, 2012 at 10:48 AM, Alexey Pechnikov wrote:

> You can try page size 8192 and cache size 128000.
> Note: is required vacuuming after changing page size on existing database.
>
> Inserting/updating big indexed tables may be slow. In last SQLite versions
> a index creating performance is optimized and so indexing fully populated
> table is a good idea when it's possible.
>
> Don't use b-tree index for text fields. Use instead FTS4 extension or
> integer
> hash value to index text fields. Note: index compression is not supported
> by
> SQLite.
>
> 100+ Gb table and database is not too much for SQLite. I think you have
> problems with big indexed tables but not with big tables. Big cache or
> RAM drive or SSD disk may increase index updating speed. When a index
> is larger than cache size (for parameters above cache size will be
> 128 000 *8 192 bytes) all index moditications is very disk expensive
> operations.
>
> P.S. Do not use cache size > 2Gb on 32-bit hosts.
>
> 2012/2/3 Udi Karni :
>  > Marcus - thanks. I will experiment with those 2 PRAGMAs.
> >
> > Meanwhile - I was trying to update some columns in a table with 130
> million
> > rows and got this error -
> >
> > "Error: out of memory"
> >
> > I am not sure why. I thought the "UPDATE" just writes out new rows and
> > maybe a journal of the old rows for recoverability - but I am not sure
> why
> > it would need very much memory.
> >
> > Be that as it may - and with regard to your suggestion - and in light of
> > this error message - given that I only have 4GB of RAM on my PC - is this
> > really enough RAM to handle tables of this size ? Or am I giving Sqlite
> an
> > unreasonably small amount of resources and it's time for a serious
> hardware
> > upgrade?
> >
> > Thanks,
> >
> > Udi
> >
> > On Thu, Feb 2, 2012 at 10:03 PM, Marcus Grimm  >wrote:
> >
> >> > Given how clever and compelling Sqlite is - I am testing how it
> scales to
> >> > tables in the 100GB / 200 million row range. This is for a strictly
> "read
> >> > only" application - but first the tables must be populated in a one
> time
> >> > process. As is often the case with Big Data - the data is a little
> dirty
> >> -
> >> > so the process involves importing - selecting - counting - inspecting
> -
> >> > updating some rows - deleting some rows - selecting - counting -
> >> > inspecting, etc. until clean.
> >> >
> >> > Placing the Sqlite database on a traditional C: drive - IO was too
> slow.
> >> > At
> >> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> >> > external Raid array where I ran across an interesting find. IO wasn't
> >> that
> >> > much faster - until I vaccuumed the database - which increase IO 10X
> to
> >> > 150
> >> > MB/sec - with the same CPU utilization.
> >> >
> >> > This is good news for the final implementation of this read-only
> database
> >> > -
> >> > but still a dilemma at the data load phase. After a ".vaccuum" -
> issueing
> >> > a
> >> > single DML against a table - even a DELETE which deletes no rows at
> all -
> >> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting
> /
> >> > DMLing - which makes the data loading / cleansing phase very long.
> >> >
> >> > So I have 2 questions -
> >> >
> >> > (1) Why would simple DML cause such an extreme slowdown as compared
> with
> >> > "post vaccuum" speeds ?
> >> >
> >> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
> >> > without resorting to ".vaccuum" ?
> >>
> >>
> >> You didn't tell if you already set the usual tricks to speed up
> >> your load phase. That would be to increase the page cache and try
> >> with reduced syncs.
> >> See
> >> PRAGMA cache_size
> >> PRAGMA synchronous
> >>
> >> In particular the page cache should be increased dramatically
> >> for huge DB files.
> >>
> >> Marcus
> >>
> >> >
> >> > Thanks,
> >> >
> >>

Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Udi Karni
Marcus - thanks. I will experiment with those 2 PRAGMAs.

Meanwhile - I was trying to update some columns in a table with 130 million
rows and got this error -

"Error: out of memory"

I am not sure why. I thought the "UPDATE" just writes out new rows and
maybe a journal of the old rows for recoverability - but I am not sure why
it would need very much memory.

Be that as it may - and with regard to your suggestion - and in light of
this error message - given that I only have 4GB of RAM on my PC - is this
really enough RAM to handle tables of this size ? Or am I giving Sqlite an
unreasonably small amount of resources and it's time for a serious hardware
upgrade?

Thanks,

Udi

On Thu, Feb 2, 2012 at 10:03 PM, Marcus Grimm wrote:

> > Given how clever and compelling Sqlite is - I am testing how it scales to
> > tables in the 100GB / 200 million row range. This is for a strictly "read
> > only" application - but first the tables must be populated in a one time
> > process. As is often the case with Big Data - the data is a little dirty
> -
> > so the process involves importing - selecting - counting - inspecting -
> > updating some rows - deleting some rows - selecting - counting -
> > inspecting, etc. until clean.
> >
> > Placing the Sqlite database on a traditional C: drive - IO was too slow.
> > At
> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> > external Raid array where I ran across an interesting find. IO wasn't
> that
> > much faster - until I vaccuumed the database - which increase IO 10X to
> > 150
> > MB/sec - with the same CPU utilization.
> >
> > This is good news for the final implementation of this read-only database
> > -
> > but still a dilemma at the data load phase. After a ".vaccuum" - issueing
> > a
> > single DML against a table - even a DELETE which deletes no rows at all -
> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
> > DMLing - which makes the data loading / cleansing phase very long.
> >
> > So I have 2 questions -
> >
> > (1) Why would simple DML cause such an extreme slowdown as compared with
> > "post vaccuum" speeds ?
> >
> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
> > without resorting to ".vaccuum" ?
>
>
> You didn't tell if you already set the usual tricks to speed up
> your load phase. That would be to increase the page cache and try
> with reduced syncs.
> See
> PRAGMA cache_size
> PRAGMA synchronous
>
> In particular the page cache should be increased dramatically
> for huge DB files.
>
> Marcus
>
> >
> > Thanks,
> >
> > Udi
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Udi Karni
Thanks !  Good ideas. SSD is still a little exotic price-wise, but closing
and defragging is easy to try.

Coming to think of it - your suggestion to break the process into batches -
is probably the way to go. By placing each table in its own DB - not only
are they smaller and easier to manager - but the vaccuum process becomes a
single table vaccuum instead of a multi-table vaccum - not having to redo
tables that are already streamlined. Very cool.

In which case - how about not combining them at all - and attaching?

In other words - is the performance of 1 large DB that includes all the
data tables plus the "work" tables - about equivalent to a "work only" DB
that attaches, as needed, to multiple external DBs - each containing 1
table? With some databases - cross-database communication is much slower
than keeping everything local. Is there substantial overhead to using
multiple "attach" statements in Sqlite?

Thanks,

Udi

On Thu, Feb 2, 2012 at 3:51 PM, Simon Slavin  wrote:

>
> On 2 Feb 2012, at 11:31pm, Udi Karni wrote:
>
> > Given how clever and compelling Sqlite is - I am testing how it scales to
> > tables in the 100GB / 200 million row range. This is for a strictly "read
> > only" application - but first the tables must be populated in a one time
> > process. As is often the case with Big Data - the data is a little dirty
> -
> > so the process involves importing - selecting - counting - inspecting -
> > updating some rows - deleting some rows - selecting - counting -
> > inspecting, etc. until clean.
> >
> > Placing the Sqlite database on a traditional C: drive - IO was too slow.
> At
> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> > external Raid array where I ran across an interesting find. IO wasn't
> that
> > much faster - until I vaccuumed the database - which increase IO 10X to
> 150
> > MB/sec - with the same CPU utilization.
> >
> > This is good news for the final implementation of this read-only
> database -
> > but still a dilemma at the data load phase. After a ".vaccuum" -
> issueing a
> > single DML against a table - even a DELETE which deletes no rows at all -
> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
> > DMLing - which makes the data loading / cleansing phase very long.
>
> Nice description of your situation and requirements.  Makes it easier to
> answer your questions.
>
> > So I have 2 questions -
> >
> > (1) Why would simple DML cause such an extreme slowdown as compared with
> > "post vaccuum" speeds ?
>
> Spins.  Reading one area of the database file means waiting for rotations
> of the disk.  If the data you need is fragmented you end up wasting a lot
> of time.  You can't speed it up because you are just sitting there waiting
> for the disk to turn to be in the right place.  Instead of using a RAID try
> using a non-episodic medium like a solid-state storage instead.
>
> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
> > without resorting to ".vaccuum" ?
>
> There are a ton of things which will help but probably not enough to make
> it worthwhile for a one-shot job.  You can presort your data into batches.
>  Searching an index which is already in order is faster.  You can close the
> database, use your OS' facilities to defragment the file, then reopen the
> database.  (This helps a lot under Windows which is very sensitive to
> fragmentation, somewhat less for other OSen.)  You can do your initial
> insertions into smaller databases then merge them.
>
> But your telling of your situation suggests to me that this isn't worth
> doing.  You have a one-time-only activity.  It's probably gonna take you
> longer to do the programming than you'll save.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Udi Karni
Given how clever and compelling Sqlite is - I am testing how it scales to
tables in the 100GB / 200 million row range. This is for a strictly "read
only" application - but first the tables must be populated in a one time
process. As is often the case with Big Data - the data is a little dirty -
so the process involves importing - selecting - counting - inspecting -
updating some rows - deleting some rows - selecting - counting -
inspecting, etc. until clean.

Placing the Sqlite database on a traditional C: drive - IO was too slow. At
15 MB/sec - reading a 50GB table would take an hour. So I moved it to
external Raid array where I ran across an interesting find. IO wasn't that
much faster - until I vaccuumed the database - which increase IO 10X to 150
MB/sec - with the same CPU utilization.

This is good news for the final implementation of this read-only database -
but still a dilemma at the data load phase. After a ".vaccuum" - issueing a
single DML against a table - even a DELETE which deletes no rows at all -
causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
DMLing - which makes the data loading / cleansing phase very long.

So I have 2 questions -

(1) Why would simple DML cause such an extreme slowdown as compared with
"post vaccuum" speeds ?

(2) Any knobs to turn to try and maintain the higher speeds post DML -
without resorting to ".vaccuum" ?

Thanks,

Udi
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 3.7.9 Import problem - a test case

2012-01-16 Thread Udi Karni
Thanks Dr. Hipp !

(1) Wow - what fast turnaround in the share-ware world  !!!

(2) I am new to share-ware. I don't compile my own code. How long between
"checked in" and "included in the shell-for-windows on the download page" ?

On Mon, Jan 16, 2012 at 8:57 AM, Richard Hipp  wrote:

> Changes to the command-line shell so that it understands quoted fields have
> now been checked in.
>
> On Mon, Jan 16, 2012 at 11:49 AM, Udi Karni  wrote:
>
> > Oliver - thanks !
> >
> > Hm so there is no way to specify (borrowing syntax from other
> DBs)
> > something like - "DELIMITED BY ',' ENCLOSED BY ' " ' ?
> >
> > On Mon, Jan 16, 2012 at 8:32 AM, Oliver Peters 
> wrote:
> >
> > > Am 16.01.2012 17:10, schrieb Udi Karni:
> > >
> > >  Simon - thanks - but this used to work on 3.7.7.1 (I believe was the
> > last
> > >> version I downloaded and unfortunately replaced with the latest...)
> > >>
> > >
> > >
> > > afaik this never worked with the original CLI (might be that you have
> > used
> > > a fork that can do this)
> > >
> > > I see 3 solutions:
> > >
> > > if you can influence the delimiter take a reliable delimiter (a string
> > > you're not using in your file - maybe a |)  and export without " as
> > markers
> > > for text columns
> > >
> > > if cant't influence the look of the source find and don't mind to do it
> > > manually you can import into a spreadsheet program and use a reliable
> > > delimiter without using "
> > >
> > > if you like scripting you can create a little program with your
> favorite
> > > language
> > >
> > > greetings
> > > Oliver
> > >
> > >
> > >
> > >> How, then, do you import character columns that contain commas (like
> > >> addresses) from a .csv which uses commas as a delimiter?
> > >>
> > >> Thanks !
> > >>
> > >> On Mon, Jan 16, 2012 at 7:43 AM, Simon Slavin
> > >>  wrote:
> > >>
> > >>
> > >>> On 16 Jan 2012, at 3:34pm, Udi Karni wrote:
> > >>>
> > >>>  "COL1","COL2"
> > >>>> "",""
> > >>>> "XX,X",""
> > >>>>
> > >>>
> > >>> Sorry, but the shell tool doesn't understand quotes in csv files.
> > >>>
> > >>> Simon.
> > >>>
> > >> __**_
> > > sqlite-users mailing list
> > > sqlite-users@sqlite.org
> > > http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
> > >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
>  ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 3.7.9 Import problem - a test case

2012-01-16 Thread Udi Karni
Oliver - thanks !

Hm so there is no way to specify (borrowing syntax from other DBs)
something like - "DELIMITED BY ',' ENCLOSED BY ' " ' ?

On Mon, Jan 16, 2012 at 8:32 AM, Oliver Peters  wrote:

> Am 16.01.2012 17:10, schrieb Udi Karni:
>
>  Simon - thanks - but this used to work on 3.7.7.1 (I believe was the last
>> version I downloaded and unfortunately replaced with the latest...)
>>
>
>
> afaik this never worked with the original CLI (might be that you have used
> a fork that can do this)
>
> I see 3 solutions:
>
> if you can influence the delimiter take a reliable delimiter (a string
> you're not using in your file - maybe a |)  and export without " as markers
> for text columns
>
> if cant't influence the look of the source find and don't mind to do it
> manually you can import into a spreadsheet program and use a reliable
> delimiter without using "
>
> if you like scripting you can create a little program with your favorite
> language
>
> greetings
> Oliver
>
>
>
>> How, then, do you import character columns that contain commas (like
>> addresses) from a .csv which uses commas as a delimiter?
>>
>> Thanks !
>>
>> On Mon, Jan 16, 2012 at 7:43 AM, Simon Slavin
>>  wrote:
>>
>>
>>> On 16 Jan 2012, at 3:34pm, Udi Karni wrote:
>>>
>>>  "COL1","COL2"
>>>> "",""
>>>> "XX,X",""
>>>>
>>>
>>> Sorry, but the shell tool doesn't understand quotes in csv files.
>>>
>>> Simon.
>>>
>> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 3.7.9 Import problem - a test case

2012-01-16 Thread Udi Karni
Simon - thanks - but this used to work on 3.7.7.1 (I believe was the last
version I downloaded and unfortunately replaced with the latest...)

How, then, do you import character columns that contain commas (like
addresses) from a .csv which uses commas as a delimiter?

Thanks !

On Mon, Jan 16, 2012 at 7:43 AM, Simon Slavin  wrote:

>
> On 16 Jan 2012, at 3:34pm, Udi Karni wrote:
>
> > "COL1","COL2"
> > "",""
> > "XX,X",""
>
> Sorry, but the shell tool doesn't understand quotes in csv files.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 3.7.9 Import problem - a test case

2012-01-16 Thread Udi Karni
C:\>type k:\data\test.csv
"COL1","COL2"
"",""
"XX,X",""

C:\>sqlite3 k:\db\testdb

SQLite version 3.7.9 2011-11-01 00:52:41
Enter ".help" for instructions
Enter SQL statements terminated with a ";"

sqlite> .separator ','
sqlite>
sqlite> CREATE TABLE TEST (COL1 CHAR (4), COL2 CHAR (4));
sqlite>
sqlite> .import k:\\data\\test.csv TEST

Error: k:\data\test.csv line 3: expected 2 columns of data but found 3

===

The 1st line is the titles and is accepted
The 2nd line has 2 simple character values and is accepted
The 3rd line is rejected because sqlite seems to take the comma in the
first column as a delimiter thinking there are 3 values in that line while
it expects only 2

This used to work in 3.7.7.1.

??
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 3.7.9 Import problem

2012-01-15 Thread Udi Karni
Hello,

I upgraded from 3.7.7 to 3.7.9 and suddenly the import of character columns
that contain commas from a comma delimited .csv file doesn't work.

The character columns are enclosed in double quotes and may contain commas
which are part of the data. Yet sqlite seems to take them as delimiters and
assumes I have an "extra" column in the .csv file.

What has changed? Any way to make sqlite ignore commas within the quotes
strings?

Thanks !
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users