[REBOL] online database manipulation Re:(3)

bhawley Fri, 21 Apr 2000 16:09:13 -0700
Hi Ralph!

Gabrielle wrote:
> > Hello [EMAIL PROTECTED]!
> >
> > On 21-Apr-00, you wrote:
> >
> >  r>     a: read/lines %auto.db
> >
> >
> >  r>     a: to-hash a
> >
> > You don't need a hash if you're just using functions like PICK and
> > not FIND, SELECT etc. This conversion could take some time if the
> > db is very big...
> >

You replied: 
> Thanks, Gabriele, but I am searching each line for a specific 
> string. I hope I am correct in that converting it to hash
> speeds up such searches.

Hashes only speed up searching if you arrange the data as in pairs,
[key value key value ...]. It won't help your database directly,
but would help if you used a hash as an index.

Say %auto.db is a flat file with tab-delimited fields and line-
delimited records. If you wanted a straight name index, and the
name was in the first field, you could do something like this:

    ; Split the database into records
    a: read/lines %auto.db
    ; Make index as block at first for insert speed
    i: make block! 2 * length? a
    foreach r a [
        ; Split record into fields
        r: parse/all r "^-"
        ; Insert first field and record into index
        i: insert insert i r/1 r
    ]
    ; Change index into hash for efficiency
    i: make hash! head i
    ; Now select the record as often as you want
    result: select i value

but that only makes sense if you are doing many searches in one
transaction, quite unlike what you are doing on your web site.

Instead, try this:

    value: whatever...
    lines: make list! 0  ; List for insertion speed/space
    use [a rb re] [
        ; Read the database in text mode
        a: read %auto.db
        ; Find the value in the database
        while [re: find a value] [ ; Use find/any here?
            ; Get the beginning of the record
            either rb: find/reverse re newline [rb: next rb] [rb: a]
            ; Get the end of the record
            if none? re: find rb newline [re: tail rb]
            ; Set the data to the next record
            a: next re
            ; Store a copy of the record, split into fields
            lines: tail insert lines (parse (copy/part rb re) "^-")
        ]
    ]
    ; Here are the found record lines
    lines: head lines

It may not be as pretty, but it's much faster and uses less
memory as well. You can even add wildcard searching by using
find/any instead of find where marked above, but that might
find patterns that span records.

I suppose that parse could be used instead, but I don't feel
like figuring out how right now, and it would be about as fast.

This technique does require you to read the entire database
into memory for every transaction though, just like your first
solution did. If you wanted to use open/lines instead of read
you would be back to doing a find on each line in turn.

Does this help?

Brian Hawley
[REBOL] online database manipulation Re:(3)

Reply via email to