Re: [sqlite] Countdown hold. Was: Testing begins on the SQLite "pi" release

2016-08-04 Thread Richard Hipp
On 8/2/16, Richard Hipp  wrote:
> Testing for the SQLite "pi" release is on hold pending the resolution
> of a design question.

The countdown to the "pi" release is now resuming.  The release
candidate is the current trunk version of Fossil, which is also in the
"Prerelease Snapshot" of the http://sqlite.org/download.html page.

The 3.14 release will occur when the checklist at
https://www.sqlite.org/checklists/314/index goes all-green.

If you have any issues or concerns with the code in the Prerelease
Snapshot, please speak up without delay!

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Education Opportunity Alert: The SQL Primer

2016-08-04 Thread r . a . nagy
Thanks!

=)


Sent from my iPhone

> On Aug 4, 2016, at 5:38 PM, jungle Boogie  wrote:
> 
>> On 4 August 2016 at 08:14, R.A. Nagy  wrote:
>> Comments & suggestion for improvement be both respected, as well as
>> appreciated here, as usual!
> 
> WOW! Such great energy and enthusiasm in the primer video!
> I'm looking forward to the rest.
> 
> Best,
> sean
> 
> 
> -- 
> ---
> inum: 883510009027723
> sip: jungleboo...@sip2sip.info
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Education Opportunity Alert: The SQL Primer

2016-08-04 Thread jungle Boogie
On 4 August 2016 at 08:14, R.A. Nagy  wrote:
> Comments & suggestion for improvement be both respected, as well as
> appreciated here, as usual!

WOW! Such great energy and enthusiasm in the primer video!
I'm looking forward to the rest.

Best,
sean


-- 
---
inum: 883510009027723
sip: jungleboo...@sip2sip.info
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Education Opportunity Alert: The SQL Primer

2016-08-04 Thread r . a . nagy
Enjoy yourself - by all means, do!

Sent from my iPhone

> On Aug 4, 2016, at 1:37 PM, Stephen Chrzanowski  wrote:
> 
> *sits back and waits for the people to cringe about Sequal or Ess-Que-El
> and smirks*
> 
>> On Thu, Aug 4, 2016 at 11:14 AM, R.A. Nagy  wrote:
>> 
>> Hi all,
>> 
>> Let me begin by thanking everyone for the feedback on the YouTube video
>> effort!
>> 
>> For those who would like to revisit our relatively comprehensive update to
>> a professional introduction to SQL & SQLite, please feel free to share &
>> enjoy =) this latest:
>> 
>> https://www.youtube.com/playlist?list=PLItP5KoawLqkPV2jqAVCH79fZGO5k0Uzy
>> 
>> Comments & suggestion for improvement be both respected, as well as
>> appreciated here, as usual!
>> 
>> 
>> Cheers,
>> 
>> Randall Nagy
>> President, Soft9000.com
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Education Opportunity Alert: The SQL Primer

2016-08-04 Thread Stephen Chrzanowski
*sits back and waits for the people to cringe about Sequal or Ess-Que-El
and smirks*

On Thu, Aug 4, 2016 at 11:14 AM, R.A. Nagy  wrote:

> Hi all,
>
> Let me begin by thanking everyone for the feedback on the YouTube video
> effort!
>
> For those who would like to revisit our relatively comprehensive update to
> a professional introduction to SQL & SQLite, please feel free to share &
> enjoy =) this latest:
>
> https://www.youtube.com/playlist?list=PLItP5KoawLqkPV2jqAVCH79fZGO5k0Uzy
>
> Comments & suggestion for improvement be both respected, as well as
> appreciated here, as usual!
>
>
> Cheers,
>
> Randall Nagy
> President, Soft9000.com
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Kevin O'Gorman
The metric for feasability is coding ease, not runtime.  I'm the
bottleneck, not the machine, at least at this point.

As for adding rows, it will be about like this time: a billion or so at a
time.  But there's no need to save the old data.  Each round can be
separate except for a persistent "solutions" table of much more modest
size.  I've been doing this for a while now, and the solutions file has
only 10 million or so lines, each representing a game position for which
optimum moves are known.  Getting this file to include the starting
position is the point of the exercise.

If I ever get to anything like "production" in this project, I expect it to
run for maybe three years...  That's after I tweak it for speed.

Background: in production, this will be running on a dual-Xeon with 16
cores (32 hyperthreads) and 1/4 TiB RAM.  It has sequential file update
through Linux flock() calls at the moment.  The code is bash gluing
together a collection of UNIX utilities and some custom C code.  The C is
kept as simple as possible, to minimize errors.

As you may surmise, this "hobby" is important to me.


On Thu, Aug 4, 2016 at 9:09 AM, R Smith  wrote:

>
>
> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:
>
>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne 
>> wrote:
>>
>>
>> It's even less dense than that.  Each character has only 3 possible
>> values,
>> and thus it's pretty easy to compress down to 2 bits each, for a 16 byte
>> blob.
>> It's just hard to do that without a bunch of SQLite code I'd have to learn
>> how to write.  The current effort amounts to a feasibility study, and I
>> want
>> to keep it as simple as possible.
>>
>
> A feasibility study using equipment that are hamstrung by weights they
> won't have in the real situation is not an accurate study.
>
> It's like studying fuel consumption on a different kind of road surface,
> but for the test purposes, the cars had to tow caravans containing their
> testing equipment - the study will not look feasible at all.
>
> It might of course be that the feasibility you are studying is completely
> unrelated to the data handling - in which case the point is moot.
>
> Let us know how it goes :)
> Ryan
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread R Smith



On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:

On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne 
wrote:


It's even less dense than that.  Each character has only 3 possible values,
and thus it's pretty easy to compress down to 2 bits each, for a 16 byte
blob.
It's just hard to do that without a bunch of SQLite code I'd have to learn
how to write.  The current effort amounts to a feasibility study, and I want
to keep it as simple as possible.


A feasibility study using equipment that are hamstrung by weights they 
won't have in the real situation is not an accurate study.


It's like studying fuel consumption on a different kind of road surface, 
but for the test purposes, the cars had to tow caravans containing their 
testing equipment - the study will not look feasible at all.


It might of course be that the feasibility you are studying is 
completely unrelated to the data handling - in which case the point is moot.


Let us know how it goes :)
Ryan
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Kevin O'Gorman
On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne 
wrote:

> On Thu, Aug 4, 2016 at 5:05 PM, Kevin O'Gorman 
> wrote:
>
> > 3. Positions are 64 bytes always, so your size guesses are right.  They
> are
> > in no particular order.  I like the suggestion of a separate position
> > table, because they're going to appear in multiple qmove records, with an
> > average of about 3 or 4 appearances I think.  Maybe more.
> >
>
> 3x or 4x duplication doesn't sound like a lot. What you'd gain in the moves
> table,
> you'd lose having to join to the positions table I suspect. Higher level
> duplicates, maybe.
>
> 64-bytes always? Is that a human readable string, or some 'code' in
> hexadecimal?
> If the latter, use a blob, which requires only 32-bytes to store the same
> info. You can
> use the blob literal notation x'abcdef01' (that's a 4-bytes blob).
>

It's even less dense than that.  Each character has only 3 possible values,
and thus it's pretty easy to compress down to 2 bits each, for a 16 byte
blob.
It's just hard to do that without a bunch of SQLite code I'd have to learn
how to write.  The current effort amounts to a feasibility study, and I want
to keep it as simple as possible.


> Finally, note that if your program writes a huge text file with all your
> values, that you
> .import into sqlite3 as you showed, you're IMHO wasting time, since you
> can't use
> prepared statements and binds, and you also force SQLite's SQL parser to
> parse
> a huge amount of text. By embedding SQLite into your generator program, you
> remove all parsing except for a trivial "insert into qmoves values (:1, :2,
> ...)", and
> all the rest is sqlite_bind*() and co. calls. (and if blob situation for
> positions, then
> you can bind the 32-bytes blob directly, no need to convert/parse to/from
> hex).
>

I understand the concept of prepared statements in principle, sort of,
don't how
binds work really, so I'm not quite ready to write the code you allude to.
And I'd
no longer be able to use sqlite3 at all to do simple experiments -- I'd
have to
be blobbing and de-blobbing to make sense of anything.


>
> My $0.02. --DD
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread R Smith



On 2016/08/04 5:05 PM, Kevin O'Gorman wrote:

Lots of answers, so thanks all around.  Some more info:

1. All partitions have at least 3 GB free, and it's not changing.  /tmp is
3 TiB and empty.
2. I have a RAID partition, for size, but no RAID controller.  As a hobby
project, I don't have spare parts, and I fear the results of a failure of a
hardware RAID without a spare, so I use Linux mdadm to manage software RAID
across three 4-TB drives.
3. Positions are 64 bytes always, so your size guesses are right.  They are
in no particular order.  I like the suggestion of a separate position
table, because they're going to appear in multiple qmove records, with an
average of about 3 or 4 appearances I think.  Maybe more.

I'm going to retry, using some of the suggestions above: smaller
transactions, IGNORE, position table rowids in the moves table, smaller
subsets being loaded, developing ideas of how time depends on data size.

If it doesn't go well fairly quickly, I'll probably go back to flat files
and writing the queries the hard way.  At least I know what I'm dealing
with there.


Note that there is nothing about SQLite that isn't working extremely 
well for your purpose - going back to flat files is akin to going back 
to the dark ages. Databases of this size (and even much larger) work 
perfectly all around with some people here able to quote you more 
staggering figures even - but it does require some special processing 
which might be thwarted by your current system layout or design choices. 
however, if done right, it should not be significantly slower than 
writing flat files - so try to persist to find what "done right" entails.


Remember that once this has worked - you can query the data with SQL... 
that is an amazingly powerful feature over flat files, and once an index 
exists (however slow it took to make), queries will be lightning fast, 
something a flat-file can never give you.




QUESTIONS:
If it's best in general to build indexes as a separate step, does this also
apply to primary indexes?  Can a table without a primary index have one
added later?  Isn't ROWID the real primary, presuming it has ROWIDs?  And
if so, then is a primary index on a ROWID table just for compliance with
standard SQL, and really no better than any other index?  Obviously, I'm a
bit confused about this.


A primary index is nothing other than a normal Unique index which has a 
guaranteed 1-to-1 key-to-row ratio and has look-up priority status. All 
SQL tables MUST have such a key to be able to guarantee access to any 
individual row, and if you omit the primary key bit, then some internal 
mechanism is used for it - in SQLite's case, this mechanism is called 
the row_id. A nice recent feature of SQLite allows you to get rid of 
this row_id overhead by explicitly specifying a Primary Key and then 
WITHOUT ROWID after the Table declaration.


The advantage of NOT using an explicit Primary key from the start is 
that the row_id can simply be added by increments during insert 
statements due to its auto-supplied-by-the-db-engine and INT type 
nature. A primary key has to be sorted (or at least, has to determine 
the correct sort-order insert position) for every added row. This costs 
a good bit of time on really large insert operations. The bulky single 
sort operation while building the PK index after-the-fact takes a lot 
less time than the repeated look-up-insert operations for the key while 
making the table.


I hope this answers the question, but feel free to ask more if anything 
remains unclear.




While I'm at it, I may as well ask if ROWID has any physical significance,
such that a VACUUM operation might change it.  Or is it just an arbitrary
ID inserted by SQLite and added to each record when they exist at all.


It is arbitrary, it is supplied by the DB engine in general, but it will 
never be changed by anything! That would go against all SQL and RDBMS 
premises.
You could supply it yourself to (and I often advocate this to be the 
better practice).


If you declare a table with a primary key specified /exactly/ like this:
CREATE myTable ("ID" INTEGER PRIMARY KEY,  )
Then the "ID" in this case becomes an alias for the row_id and 
manipulating/reading the value in it is in fact reading / manipulating 
the actual row_id.


(The "ID" can be anything else you like, but the "INTEGER PRIMARY KEY" 
part needs to be exactly written like that).




The current dataset is intended to solve one particular issue in the
overall project.  It looks like I'd want to build each such dataset
separately, as there will likely be a few hundred, and I gather that adding
to these tables will be pretty slow once the indexes have been built.  Or
is it sensible to drop indexes, add data and rebuild?


No this is not true - it will be really fast to add another few rows... 
it's just slow when you add the initial few zillion rows due to simple 
laws of quantity. If however you will be adding rows at an amazing rate, 
I 

Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Dominique Devienne
On Thu, Aug 4, 2016 at 5:29 PM, Dominique Devienne 
wrote:

> [...] you also force SQLite's SQL parser to parse a huge amount of text.
> [...]
>

OK, maybe not the SQL parser, depends what you write out and the .import
mode
(I guess, didn't look into the details). But for sure "some" parser (CSV,
SQL, or else...)
So I'm suggesting to eliminate the text "middle man", and use SQLite
directly with native values
(ints, strings, blobs, whatever). Of course I assume your program is C/C++,
which given
your sizes, it better be :). --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Dominique Devienne
On Thu, Aug 4, 2016 at 5:05 PM, Kevin O'Gorman 
wrote:

> 3. Positions are 64 bytes always, so your size guesses are right.  They are
> in no particular order.  I like the suggestion of a separate position
> table, because they're going to appear in multiple qmove records, with an
> average of about 3 or 4 appearances I think.  Maybe more.
>

3x or 4x duplication doesn't sound like a lot. What you'd gain in the moves
table,
you'd lose having to join to the positions table I suspect. Higher level
duplicates, maybe.

64-bytes always? Is that a human readable string, or some 'code' in
hexadecimal?
If the latter, use a blob, which requires only 32-bytes to store the same
info. You can
use the blob literal notation x'abcdef01' (that's a 4-bytes blob).

Finally, note that if your program writes a huge text file with all your
values, that you
.import into sqlite3 as you showed, you're IMHO wasting time, since you
can't use
prepared statements and binds, and you also force SQLite's SQL parser to
parse
a huge amount of text. By embedding SQLite into your generator program, you
remove all parsing except for a trivial "insert into qmoves values (:1, :2,
...)", and
all the rest is sqlite_bind*() and co. calls. (and if blob situation for
positions, then
you can bind the 32-bytes blob directly, no need to convert/parse to/from
hex).

My $0.02. --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Education Opportunity Alert: The SQL Primer

2016-08-04 Thread R.A. Nagy
Hi all,

Let me begin by thanking everyone for the feedback on the YouTube video
effort!

For those who would like to revisit our relatively comprehensive update to
a professional introduction to SQL & SQLite, please feel free to share &
enjoy =) this latest:

https://www.youtube.com/playlist?list=PLItP5KoawLqkPV2jqAVCH79fZGO5k0Uzy

Comments & suggestion for improvement be both respected, as well as
appreciated here, as usual!


Cheers,

Randall Nagy
President, Soft9000.com
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Kevin O'Gorman
Lots of answers, so thanks all around.  Some more info:

1. All partitions have at least 3 GB free, and it's not changing.  /tmp is
3 TiB and empty.
2. I have a RAID partition, for size, but no RAID controller.  As a hobby
project, I don't have spare parts, and I fear the results of a failure of a
hardware RAID without a spare, so I use Linux mdadm to manage software RAID
across three 4-TB drives.
3. Positions are 64 bytes always, so your size guesses are right.  They are
in no particular order.  I like the suggestion of a separate position
table, because they're going to appear in multiple qmove records, with an
average of about 3 or 4 appearances I think.  Maybe more.

I'm going to retry, using some of the suggestions above: smaller
transactions, IGNORE, position table rowids in the moves table, smaller
subsets being loaded, developing ideas of how time depends on data size.

If it doesn't go well fairly quickly, I'll probably go back to flat files
and writing the queries the hard way.  At least I know what I'm dealing
with there.

QUESTIONS:
If it's best in general to build indexes as a separate step, does this also
apply to primary indexes?  Can a table without a primary index have one
added later?  Isn't ROWID the real primary, presuming it has ROWIDs?  And
if so, then is a primary index on a ROWID table just for compliance with
standard SQL, and really no better than any other index?  Obviously, I'm a
bit confused about this.

While I'm at it, I may as well ask if ROWID has any physical significance,
such that a VACUUM operation might change it.  Or is it just an arbitrary
ID inserted by SQLite and added to each record when they exist at all.

The current dataset is intended to solve one particular issue in the
overall project.  It looks like I'd want to build each such dataset
separately, as there will likely be a few hundred, and I gather that adding
to these tables will be pretty slow once the indexes have been built.  Or
is it sensible to drop indexes, add data and rebuild?

On Thu, Aug 4, 2016 at 7:27 AM, Jim Callahan  wrote:

> Temp Files
> Have you checked how much storage is available to the temporary file
> locations?
> The temporary file locations are different depending on the OS, build, VFS
> and PRAGMA settings.
> See the last section "5.0 Temporary File Storage Locations" of:
> https://www.sqlite.org/tempfiles.html
>
>
> The database was growing for about 1-1/2 days.  Then its journal
> > disappeared, the file size dropped to zero, but sqlite3 is still running
> > 100% CPU time, now for a total of 3800+ minutes (63+ hours).  The
> database
> > is still locked, but I have no idea what sqlite3 is doing, or if it will
> > ever stop.  All partitions still have lots of space left (most of this is
> > running in a RAID partition of 11 TiB).  Here's what I gave to sqlite3 on
> > my Linux system:
>
>
> You might have a huge storage allocation for the main file and log, but
> some other temp file might be being dumped
> to a more constrained storage location.
>
> RAM
> Since you are using RAID disk controller; I assume you have 64 bit CPU and
> more than 8 GB of RAM?
> If you have 8 GB or more of RAM would it help to use an in memory database?
>
> Transactions
> Are you using explicit or implicit transactions?
> https://www.sqlite.org/lang_transaction.html
>
>
> Steps
> Agree with Darren Duncan and Dr. Hipp you may want to have at least 3
> separate steps
> (each step should be a separate transaction):
>
> 1. Simple load
> 2. Create additional column
> 3. Create index
>
> Have you pre-defined the table you are loading data into? (step 0 CREATE
> TABLE)
>
> If "Step 1 Simple Load" does not complete; then may want to load a fixed
> number of rows into separate tables (per Darren Duncan)  and then combine
> using an APPEND
> or a UNION query (doing so before steps 2 and 3).
>
> HTH
>
> Jim Callahan
> Data Scientist
> Orlando, FL
>
>
>
>
> On Wed, Aug 3, 2016 at 11:00 PM, Kevin O'Gorman 
> wrote:
>
> > I'm working on a hobby project, but the data has gotten a bit out of
> hand.
> > I thought I'd put it in a real database rather than flat ASCII files.
> >
> > I've got a problem set of about 1 billion game positions and 187GB to
> work
> > on (no, I won't have to solve them all) that took about 4 hours for a
> > generator program just to write.  I wrote code to turn them into
> something
> > SQLite could import.  Actually, it's import, build a non-primary index,
> and
> > alter table to add a column, all in sqlite3.
> >
> > The database was growing for about 1-1/2 days.  Then its journal
> > disappeared, the file size dropped to zero, but sqlite3 is still running
> > 100% CPU time, now for a total of 3800+ minutes (63+ hours).  The
> database
> > is still locked, but I have no idea what sqlite3 is doing, or if it will
> > ever stop.  All partitions still have lots of space left (most of this is
> > running in a RAID partition of 11 TiB).  

Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Jim Callahan
Temp Files
Have you checked how much storage is available to the temporary file
locations?
The temporary file locations are different depending on the OS, build, VFS
and PRAGMA settings.
See the last section "5.0 Temporary File Storage Locations" of:
https://www.sqlite.org/tempfiles.html


The database was growing for about 1-1/2 days.  Then its journal
> disappeared, the file size dropped to zero, but sqlite3 is still running
> 100% CPU time, now for a total of 3800+ minutes (63+ hours).  The database
> is still locked, but I have no idea what sqlite3 is doing, or if it will
> ever stop.  All partitions still have lots of space left (most of this is
> running in a RAID partition of 11 TiB).  Here's what I gave to sqlite3 on
> my Linux system:


You might have a huge storage allocation for the main file and log, but
some other temp file might be being dumped
to a more constrained storage location.

RAM
Since you are using RAID disk controller; I assume you have 64 bit CPU and
more than 8 GB of RAM?
If you have 8 GB or more of RAM would it help to use an in memory database?

Transactions
Are you using explicit or implicit transactions?
https://www.sqlite.org/lang_transaction.html


Steps
Agree with Darren Duncan and Dr. Hipp you may want to have at least 3
separate steps
(each step should be a separate transaction):

1. Simple load
2. Create additional column
3. Create index

Have you pre-defined the table you are loading data into? (step 0 CREATE
TABLE)

If "Step 1 Simple Load" does not complete; then may want to load a fixed
number of rows into separate tables (per Darren Duncan)  and then combine
using an APPEND
or a UNION query (doing so before steps 2 and 3).

HTH

Jim Callahan
Data Scientist
Orlando, FL




On Wed, Aug 3, 2016 at 11:00 PM, Kevin O'Gorman 
wrote:

> I'm working on a hobby project, but the data has gotten a bit out of hand.
> I thought I'd put it in a real database rather than flat ASCII files.
>
> I've got a problem set of about 1 billion game positions and 187GB to work
> on (no, I won't have to solve them all) that took about 4 hours for a
> generator program just to write.  I wrote code to turn them into something
> SQLite could import.  Actually, it's import, build a non-primary index, and
> alter table to add a column, all in sqlite3.
>
> The database was growing for about 1-1/2 days.  Then its journal
> disappeared, the file size dropped to zero, but sqlite3 is still running
> 100% CPU time, now for a total of 3800+ minutes (63+ hours).  The database
> is still locked, but I have no idea what sqlite3 is doing, or if it will
> ever stop.  All partitions still have lots of space left (most of this is
> running in a RAID partition of 11 TiB).  Here's what I gave to sqlite3 on
> my Linux system:
>
> time sqlite3 qubic.db < BEGIN EXCLUSIVE TRANSACTION;
> DROP TABLE IF EXISTS qmoves;
> CREATE TABLE qmoves (
>   qfrom CHAR(64),
>   qmove INT,
>   qto   CHAR(64),
>   qweight INT,
>   PRIMARY KEY (qfrom, qmove) ON CONFLICT ROLLBACK
> );
> CREATE INDEX IF NOT EXISTS qmoves_by_dest ON qmoves (
>   qto,
>   qweight
> );
> CREATE TABLE IF NOT EXISTS qposn (
>   qposn CHAR(64) PRIMARY KEY ON CONFLICT ROLLBACK,
>   qmaxval INT,
>   qmove INT,
>   qminval INT,
>   qstatus INT
> );
> .separator " "
> .import am.all qmoves
> ALTER TABLE qmoves ADD COLUMN qstatus INT DEFAULT NULL;
> .schema
> COMMIT TRANSACTION;
>
> EOF
>
> Any clues, hints, or advice?
>
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Richard Hipp
On 8/4/16, Wade, William  wrote:
>
> I believe that with SQLite, if you don't specify WITHOUT ROWID your "real"
> record order is based on rowid,

Correct

>
> In principle, indices can be created by writing the needed information
> (index key, record position) in the original order, and then sorting that
> into key-order. That can be done with many less random seeks (merge sorts
> involve mostly sequential reads and writes). I don't know if, or when,
> SQLite does that.
>

SQLite runs CREATE INDEX commands using an external merge sort, which
is what I think you are eluding to above.

But if the index already exists, and you are merely inserting new rows
into the table, then each index entry is inserted separately.  Each
such insert is an O(logN) operation.  Such isolated inserts typically
involve a lot of disk seeks and write amplification.  That is why we
recommend that you populate large tables first and then run CREATE
INDEX, rather than the other way around, whenever practical.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Wade, William
A lot of speculation here. I am certainly no SQLite expert.

Your input has 1g positions, taking 187gb, so averaging 187b/position. From 
your CREATE TABLE, it looks like to get that size most of your qfrom and qto 
are fairly long strings. I'm assuming there are a great many duplications in 
those positions. If so, put them into a separate position table { positionId 
INT, positionName TEXT }, with positionId as the primary key and positionName 
also being unique. This will be even more useful if you have a fixed set of 
possible positions, and you make it so that positionId is increasing whenever 
positionName is increasing. In your qmoves table, store positionId values, 
rather than postionName values. Saves a lot of space because no name is in the 
database more than once, and most of your space is related to names.

Space is important, because at a few hundred gb, your database is probably 
bigger than all of your available fast cache's, and you are probably storing 
your data on rotating storage. Writes to random positions might easily average 
10ms, and the import of most of your records may involve one or more such a 
writes by the time indices are updated. Reducing sizes mean fewer such writes, 
because things are more likely to fit in the various caches.

I believe that with SQLite, if you don't specify WITHOUT ROWID your "real" 
record order is based on rowid, so complete records are, perhaps, stored in 
import order. The PRIMARY KEY you specified is large (a name and an integer), 
so the index entries are likely about half as large as a complete record, and 
they don't fit in cache either. It is also likely that your input data was not 
in sorted order. That means that adding an entry to that index likely involves 
a write to a random position. 10ms * 1g = 10e6s, or about three months.

Your qmoves_by_dest index is about as large as your primary (it also has a 
position and an integer), so similar timings might apply.

I suggest trying timing your script on smaller inputs (1gb, 10gb, 20gb) and 
seeing if there is a size where things suddenly get worse (available caches are 
not big enough). See if my suggestions help those timings. See if WITHOUT ROWID 
helps those timings.

In principle, indices can be created by writing the needed information (index 
key, record position) in the original order, and then sorting that into 
key-order. That can be done with many less random seeks (merge sorts involve 
mostly sequential reads and writes). I don't know if, or when, SQLite does that.

Regards,
Bill

-Original Message-
From: Kevin O'Gorman [mailto:kevinogorm...@gmail.com]
Sent: Wednesday, August 03, 2016 10:00 PM
To: sqlite-users
Subject: [sqlite] newbie has waited days for a DB build to complete. what's up 
with this.

I'm working on a hobby project, but the data has gotten a bit out of hand.
I thought I'd put it in a real database rather than flat ASCII files.

I've got a problem set of about 1 billion game positions and 187GB to work on 
(no, I won't have to solve them all) that took about 4 hours for a generator 
program just to write.  I wrote code to turn them into something SQLite could 
import.  Actually, it's import, build a non-primary index, and alter table to 
add a column, all in sqlite3.

The database was growing for about 1-1/2 days.  Then its journal disappeared, 
the file size dropped to zero, but sqlite3 is still running 100% CPU time, now 
for a total of 3800+ minutes (63+ hours).  The database is still locked, but I 
have no idea what sqlite3 is doing, or if it will ever stop.  All partitions 
still have lots of space left (most of this is running in a RAID partition of 
11 TiB).  Here's what I gave to sqlite3 on my Linux system:

time sqlite3 qubic.db 

Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Teg
Hello Kevin,

I'd  write  a  utility to do it instead of using the command
line  tool  then add logging to the program in order to note
progress.

I like the idea of chopping it into smaller parts too.

"ON CONFLICT ROLLBACK"

You're  doing  one  large  transaction  and if it rolls back
it'll have to undo everything right? I wonder if that's what
you're seeing here. You might want to try "IGNORE" to see if
you can even run through all the data.

Writing  a  utility  for  this  would  let  you manage the
rollbacks too.

C


Wednesday, August 3, 2016, 11:00:12 PM, you wrote:

KOG> I'm working on a hobby project, but the data has gotten a bit out of hand.
KOG> I thought I'd put it in a real database rather than flat ASCII files.

KOG> I've got a problem set of about 1 billion game
KOG> positions and 187GB to work
KOG> on (no, I won't have to solve them all) that took about 4 hours for a
KOG> generator program just to write.  I wrote code to turn them into something
KOG> SQLite could import.  Actually, it's import, build a non-primary index, and
KOG> alter table to add a column, all in sqlite3.

KOG> The database was growing for about 1-1/2 days.  Then its journal
KOG> disappeared, the file size dropped to zero, but sqlite3 is still running
KOG> 100% CPU time, now for a total of 3800+ minutes (63+ hours).  The database
KOG> is still locked, but I have no idea what sqlite3 is doing, or if it will
KOG> ever stop.  All partitions still have lots of space left (most of this is
KOG> running in a RAID partition of 11 TiB).  Here's what I gave to sqlite3 on
KOG> my Linux system:

KOG> time sqlite3 qubic.db < BEGIN EXCLUSIVE TRANSACTION;
KOG> DROP TABLE IF EXISTS qmoves;
KOG> CREATE TABLE qmoves (
KOG>   qfrom CHAR(64),
KOG>   qmove INT,
KOG>   qto   CHAR(64),
KOG>   qweight INT,
KOG>   PRIMARY KEY (qfrom, qmove) ON CONFLICT ROLLBACK
KOG> );
KOG> CREATE INDEX IF NOT EXISTS qmoves_by_dest ON qmoves (
KOG>   qto,
KOG>   qweight
KOG> );
KOG> CREATE TABLE IF NOT EXISTS qposn (
KOG>   qposn CHAR(64) PRIMARY KEY ON CONFLICT ROLLBACK,
KOG>   qmaxval INT,
KOG>   qmove INT,
KOG>   qminval INT,
KOG>   qstatus INT
KOG> );
KOG> .separator " "
KOG> .import am.all qmoves
KOG> ALTER TABLE qmoves ADD COLUMN qstatus INT DEFAULT NULL;
KOG> .schema
KOG> COMMIT TRANSACTION;

KOG> EOF

KOG> Any clues, hints, or advice?





-- 
 Tegmailto:t...@djii.com

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Tcl [db eval varname ...] non-array varname silently fails

2016-08-04 Thread Poor Yorick
If the array-name in a [db eval arrayname ...] command isn't actually an 
array,

sqlite silently fails.  The script is evaluated but the variable doesn't
contain values from the query:


package require sqlite3
sqlite3 db :memory:
db eval {
create table t (f)
; insert into t values ("h") , ("he")
}
set record 2
db eval {select * from t} record {
puts [list row [array get record]]
}

Of course, one reasonable answer is, "Then don't do that!", but it would 
be

less surprising if the underlying error propagated in this case.


--
Poor Yorick


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] core dump when writing the DB in the middle of the long read

2016-08-04 Thread ChingChang Hsiao
More detailed core dump info.

(gdb) bt
#0  0x2c673d5c in memset () from /lib32/libc.so.6
#1  0x2c7b0fa8 in sqlite3VdbeHalt () from /ovn/lib/libsqlite3.so.3
#2  0x2c7b1904 in sqlite3VdbeReset () from /ovn/lib/libsqlite3.so.3
#3  0x2c7b55a8 in sqlite3_reset () from /ovn/lib/libsqlite3.so.3
#4  0x2c7b6888 in sqlite3Step () from /ovn/lib/libsqlite3.so.3
#5  0x2c7b6d54 in sqlite3_step () from /ovn/lib/libsqlite3.so.3
#6  0x2b3f91cc in SqlQuery::step(int) () from /ovn/lib/libplatform.so.1
#7  0x10189ea0 in printServiceConfig(void*, ArgumentArray*, unsigned char, 
char*, void**, char*) ()
#8  0x1018a3d4 in IsgShowCfgSvc ()
#9  0x2b8e1374 in OvnRcRunShowConfig () from /ovn/lib/libEngine.so.1
#10 0x101297c4 in IsgShowCfgMainConfig ()
#11 0x2b8e13f8 in OvnRcRunShowConfig () from /ovn/lib/libEngine.so.1
#12 0x10122b54 in IsgShowCfg ()
#13 0x2b8df200 in RcParseLine () from /ovn/lib/libEngine.so.1
#14 0x2b8da788 in RcFiniteStateMachine () from /ovn/lib/libEngine.so.1
#15 0x2b8d4298 in AllegroMainTask () from /ovn/lib/libEngine.so.1
#16 0x2b8d5b2c in c_main () from /ovn/lib/libEngine.so.1
#17 0x2b8e3af0 in app_main(int, char**) () from /ovn/lib/libEngine.so.1
#18 0x1004c198 in main ()

---

I need more debugging information.  Perhaps:

(1) Recompile libsqlite3.a from source code.  Using -O0 (not -O2) and -g.
(2) Rerun your program to crash
(3) Send me the new stack trace that shows exactly which line the error occurs 
on
(4) Also send the sqlite_source_id() for the specific version of SQLite you are 
using.

On 8/3/16, ChingChang Hsiao  wrote:
> Our sqlite version is "SQLite version 3.8.8.1".
> Modify a configuration data while reading a big configuration DB(show 
> configuration).  It goes to busyhandler and write is successful and 
> read configuration codes goes to core dump. Is there any idea why 
> going to core dump?  Is something to do with mutex handling? Thanks.
>
> ChingChang
>
> int32
> DbHandle::registerBusyHook( sqlite3* db, dbCallback_data_t *cbData ) {
>   sqlite3_busy_handler( db,
> ,
> (void*)cbData );
>   return 0;
> }
>
> (gdb) bt
> #0  0x2ce4f4fc in sqlite3_step () from /ovn/lib/libsqlite3.so.3
> #1  0x2ba471cc in SqlQuery::step(int) () from 
> /ovn/lib/libplatform.so.1
> #2  0x10189e50 in printServiceConfig(void*, ArgumentArray*, unsigned 
> char, char*, void**, char*) ()
> #3  0x1018a384 in IsgShowCfgSvc ()
> #4  0x2bf2f3f8 in OvnRcRunShowConfig () from /ovn/lib/libEngine.so.1
> #5  0x10129774 in IsgShowCfgMainConfig ()
> #6  0x2bf2f3f8 in OvnRcRunShowConfig () from /ovn/lib/libEngine.so.1
> #7  0x10122b04 in IsgShowCfg ()
> #8  0x2bf2d200 in RcParseLine () from /ovn/lib/libEngine.so.1
> #9  0x2bf28788 in RcFiniteStateMachine () from /ovn/lib/libEngine.so.1
> #10 0x2bf22298 in AllegroMainTask () from /ovn/lib/libEngine.so.1
> #11 0x2bf23b2c in c_main () from /ovn/lib/libEngine.so.1
> #12 0x2bf31af0 in app_main(int, char**) () from 
> /ovn/lib/libEngine.so.1
> #13 0x1004c148 in main ()
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Simon Slavin

On 4 Aug 2016, at 4:00am, Kevin O'Gorman  wrote:

> I've got a problem set of about 1 billion game positions and 187GB to work
> on (no, I won't have to solve them all) that took about 4 hours for a
> generator program just to write.  I wrote code to turn them into something
> SQLite could import.  Actually, it's import, build a non-primary index, and
> alter table to add a column, all in sqlite3.

Hmm.  The closest I have to that is a 43 Gigabyte sqlite database which works 
fine.  In that case almost all the space is taken up by one thin table which 
has a ridiculous number of rows.

> The database was growing for about 1-1/2 days.  Then its journal
> disappeared, the file size dropped to zero, but sqlite3 is still running

I assume that those are what 'ls' is showing you.  Have you tried using 'lsof' 
or 'strace' ?  Or monitoring the amount of free space on the disk to see if 
it's shrinking ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users