Re: db new! performance

Joe Bogner Wed, 30 May 2012 09:33:49 -0700

Hi Alex,

Thanks for the reply. Just for reference, using seq is actually
considerably slower. It ran in 39 seconds vs. 4 seconds. I think it's
because it has to look up every object from disk to get the value of 'id
instead of using the index which is likely in memory. The index appears to
be stored as a simple list of external symbols and the index value.  I'm
just guessing through.


Thanks,
Joe

On Wed, May 30, 2012 at 9:36 AM, Alexander Burger <a...@software-lab.de>wrote:

> Hi Joe,
>
> > Thank you. That sped it up. It's taking 69 seconds to insert 1M records
> >
> > (pool "foo.db")
> > (class +Invoice +Entity)
> > (rel id (+Key +Number))
> > (zero N)
> > (bench (do 1000000 (new (db: +Invoice) '(+Invoice) 'id (inc 'N)) ))
> > (commit)
>
> You can further speed it up if you distribute objects and indices across
> separate files. For the above example:
>
>   (class +Invoice +Entity)
>   (rel id (+Key +Number))
>
>    (dbs
>      (3 )                    # First file, 512 byte blocks
>      (2 +Invoice)            # Second file, 256 byte blocks
>      (4 (+Invoice id)) )     # Third file, 1024 byte blocks
>
> This puts the '+Invoice' objects into the second file (with a block size
> of 256), and the 'id' index into the third (with a block size of 1024).
>
> The first file (with a block size of 512) is not specified to hold any
> entities here, so it contains only the administrative data (root and
> base objects).
>
>
> Then you must pass a directory (instead of a file name) and the database
> size specifications to 'pool':
>
>   (pool "foo.db/" *Dbs)
>
> If you have really large indexes (more than, say, 10 or 100 million
> entries), the you might experiment with an even larger block size (e.g.
> 6, giving 4096 byte blocks). In my experience performance goes down
> again if you use too large block sizes.
>
>
>
> > I can work with that. Now I am testing out queries.
> >
> > ? (bench (iter (tree 'id '+Invoice) '((This) (inc 'Z (: id) )) )))
> > 11.822 sec
> >
> > ? (bench (scan (tree 'id '+Invoice) '((Val Key) (inc 'Z Val )) )))
> > 4.430 sec
> >
> > It makes sense that scan would be fastest because I can use the index
> > directly. Is that likely the fastest query to sum up a number relation?
>
> Yes, it is surely faster than a Pilog query (though less powerful).
>
> The absolutely fastest, though, would be to use 'seq', i.e. avoid
> completely to use an index. This can be used occasionally, when
> (as in the above case) a file consists mainly of objects of a
> single type:
>
>   (bench
>      (for (This (seq (db: +Invoice)) This (seq This))
>          (inc 'Z (: id)) ) )
>
> If the file also might contain other objects, use this as the last line:
>
>         (and (isa '+Invoice This) (inc 'Z (: id))
>
> Cheers,
> - Alex
> --
> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
>

Re: db new! performance

Reply via email to