Insertion time into the database seems to increase exponentially!

2012-07-31 Thread Laurent Artaud

Hello all!
As I like to do something useful when I learn a new language, I decided to do a 
backup system.

I obviously chose to use the integrated database engine to store the meta-data.
Unfortunately, the timings I get are starting to frighten me:

# Backup returns the number of +File insertions
# Before each run, I removed the database dir and exited pil, in order to ensure 
no interference.

: (bench (Backup /tmp/pico))
0.303 sec
- 322
: (bench (Backup /tmp/pico))
2.252 sec
- 644
: (bench (Backup /tmp/pico))
7.677 sec
- 966
: (bench (Backup /tmp/pico))
18.238 sec
- 1288
: (bench (Backup /tmp/pico))
35.535 sec
- 1610
: (bench (Backup /tmp/pico))
64.718 sec
- 1932
: (bench (Backup /tmp/pico))
107.136 sec
- 2254
: (bench (Backup /tmp/pico))
163.182 sec
- 2576
: (bench (Backup /tmp/pico))
227.789 sec
- 2898
: (bench (Backup /tmp/pico))
316.216 sec
- 3220

As you can see, given the progression, it will soon take more time to store the 
meta-data than processing the data itself.

Does anybody see if I made a mistake in my code?

(class +Chunk +Entity)
(rel cs (+Need +Number))  # checksum of this chunk
(rel files (+List +Joint) chunks (+File)) #

(class +File +Entity)
(rel pth (+Need +Ref +String))# file path
(rel size (+Need +Number))# file size
(rel accessTime (+Need +Number))  #
(rel modifyTime (+Need +Number))  #
(rel changeTime (+Need +Number))  #
(rel inode (+Need +Ref +Number))  #
(rel links (+Need +Number))   #
(rel uid (+Need +Number)) #
(rel gid (+Need +Number)) #
(rel accessRights (+Need +Number))#
(rel target (+String))# target of the softlink (if any)
(rel chunks (+List +Joint) files (+Chunk))# list of chunks (if any)
(rel backups (+List +Joint) files (+Backup))  #

(class +Backup +Entity)
(rel name (+Ref +String)) # name of this backup
(rel startDT (+Need +Ref +String))# start datetime of this backup 
(stamp T)
(rel endDT (+String)) # end datetime of this backup 
(empty if not finished)

(rel exclusions (+List +String))  # exclusion flags
(rel inclusions (+List +String))  # inclusion flags
(rel basePath (+Need +List +String))  # paths considered for this backup
(rel hostName (+Need +Ref +String))   # name of the host being backup'd
(rel files (+List +Joint) backups (+File))#

(dbs
   (0)
   (0 +Chunk)
   (0 +File)
   (0 (+File pth inode))
   (4 +Backup)
   (0 (+Backup name startDT hostName)) )
(let pa /tmp/test_db
   (ifn (info pa)  # if it does not already exists...
  (call 'mkdir pa) )   # create the path to store the DB
   (pool (pack pa /) *Dbs) ) # open the DB

# fake...
(de addFile (Bk P)
   (let o
  (request '(+File)
  'pth P
  'size 0
  'accessTime 1
  'modifyTime 2
  'changeTime 3
  'inode 4
  'links 1
  'uid 1001
  'gid 1002
  'accessRights 766 )
  (put!
 o
 'backups
 (append (; 'o backups) Bk) )
  o ) )

(de Backup (rootPath)
   (let obj1
  (request '(+Backup)
  'name (stamp)
  'startDT (stamp)
  'basePath rootPath
  'hostName (host localhost) )
  (put! *DB 'currentBackup obj1)
  # now, walk the path
  (let Dir rootPath
 (recur (Dir)
(for F (dir Dir)
   (let Path (pack Dir / F)
  (addFile obj1 Path)
  # note: change this test: it considers a link to a dir as a 
dir!
  (if (=T (car (info Path)))
 (recurse Path) ) ) ) ) )
  (length (get obj1 'files)) ) )


Thanks for your time.

Regards,
--
Laurent ARTAUD
--
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: Insertion time into the database seems to increase

2012-07-31 Thread Alexander Burger
Hi Laurent,

 As I like to do something useful when I learn a new language, I
 decided to do a backup system.

OK.

 As you can see, given the progression, it will soon take more time
 to store the meta-data than processing the data itself.
 Does anybody see if I made a mistake in my code?

Yes, the main problem is this +Joint:

 (class +File +Entity)
 ...
 (rel backups (+List +Joint) files (+Backup))  #
 
 (class +Backup +Entity)
 ...
 (rel files (+List +Joint) backups (+File))#

It causes each time a +File is created to be the list of files
in the +Backup object to be extended:

 (de addFile (Bk P)
 ...
   (put!
  o
  'backups
  (append (; 'o backups) Bk) )

Thus, the single +Backup object gets larger and larger.

As a general rule, you can always use '+Joint' and '+Ref +Link'
interchangeably. You want to use a list of '+Joint's only if the list
doesn't get too long (less than, say, 100).

So the most important change is to remove the line

   (rel files (+List +Joint) backups (+File))

from the +Backup class, and use

   (rel backups (+List +Ref +Link) NIL (+Backup))

in the +File class. This will increase the speed dramatically.


After that, there are a few more places that should be optimized. You'll
notice the differences only when you create a lot more objects.

In the specification of the database block sizes


 (dbs
(0)
(0 +Chunk)
(0 +File)
(0 (+File pth inode))
(4 +Backup)
(0 (+Backup name startDT hostName)) )

the '0's mean that the block size is 64. This is a bit to small for
'+File' and, more important, for the index trees. I would use '2' for
the '+File' and '+Backup' objects, and '4' for the indexes. This gives:

   (2 +File)
   (4 (+File pth inode))
   (2 +Backup)
   (4 (+Backup name startDT hostName)) )


Then, the usage of 'request' is not as intended

   (request '(+File)
   'pth P
   'size 0
   'accessTime 1
   'modifyTime 2
   'changeTime 3
   'inode 4
   'links 1
   'uid 1001
   'gid 1002
   'accessRights 766 )

'request' searches with the given keys for an object, before it decides
whether to use an existing object or to create a new one.

So, typically, if 'pth' is a characteristic key, it would be called as

   (let Obj (request '(+File) 'pth P)
  (put Obj ..)
  ...

However, I suspect that 'request' is not needed here at all, as you
create new +File objects. So 'new' is the way:

   (de addFile (Bk P)
  (let Obj
 (new (db: +File) '(+File)
'pth P
'size 0
'accessTime 1
'modifyTime 2
'changeTime 3
'inode 4
'links 1
'uid 1001
'gid 1002
'accessRights 766 )
 ## Not necessary (put Obj 'backups (append (; Obj backups) Bk))
 (put Obj 'backups Bk)
 (at (0 . 1) (commit))
 Obj ) )

Note two other changes I made:

   - Because 'backups' is a +List relation, the explicit 'append' is not
 necessary. Just putting 'Bk' is enough, the list will be created
 automatically.

   - Calling 'new!', 'put!' etc., i.e. the functions which call
 (dbSync) and then (commit) each time they are called, is very
 expensive. For a large-volume input it is better to go into
 single-user mode of the DB, avoid (dbSync), and call 'commit' less
 often. In the example above, it is called only every 1th time.

The same applies to the backup function


 (de Backup (rootPath)
(let obj1
   (request '(+Backup)
   'name (stamp)
   'startDT (stamp)
   'basePath rootPath
   'hostName (host localhost) )
   (put! *DB 'currentBackup obj1)
   # now, walk the path
   (let Dir rootPath
  (recur (Dir)
 (for F (dir Dir)
(let Path (pack Dir / F)
   (addFile obj1 Path)
   # note: change this test: it considers a link to a dir as a 
 dir!
   (if (=T (car (info Path)))
  (recurse Path) ) ) ) ) )
   (length (get obj1 'files)) ) )

Avoiding 'request' and 'put!' gives basically

   (de backup (RootPath)
  (let Obj1
 (new (db: +Backup) '(+Backup)
'name (stamp)
'startDT (stamp)
'basePath RootPath
'hostName (host localhost) )
 (put *DB 'currentBackup Obj1)
 (commit)
 # now, walk the path
 (let Dir RootPath
(recur (Dir)
   (for F (dir Dir)
  (let Path (pack Dir / F)
 (addFile Obj1 Path)
 # note: change this test: it considers a link to a dir as 
a dir!
 (if (=T (car (info Path)))
(recurse Path) ) ) ) ) )
 (commit)
 (count (tree 'backups '+File)) ) )

: (bench (backup /home))
0.460 sec
- 7125

Cheers,
- Alex
-- 
UNSUBSCRIBE: 

Re: Insertion time into the database seems to increase

2012-07-31 Thread Alexander Burger
Hi Laurent,

ha, our mails just crossed their ways :)

 What I intended for was to have an immediate list of the +Files for
 any +Backup, and to be able to know if any +File had no +Backup
 anymore for cleanup.
 I guess I'll have to query the database for all the +File linking to
 a given +Backup, here (I haven't yet done more than gloss over this
 part of the tutorial...)

Yes. You can get this list with 'collect':

   : (select +Backup)
   {5-1} (+Backup)
  hostName software-lab
  basePath (erp)
  startDT 2012-07-31 21:20:57
  name 2012-07-31 21:20:57

   : (collect 'backups '+File (db 'startDT '+Backup 2012-07-31 21:20:57))
   - (...)


 I'll have to study this carefully, because I do need the 'request'
 concept, here: when doing a subsequent backup, I don't want to
 re-create all the +Files that are already in the database. The idea

Yes, that's what occurred to me, and I wrote it in my previous mail.

So, 'request' is correct in this case. Just, as I said, the difficult
point is to determine what the identity of a file means.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe