Hi Henrik -

Thanks for sharing. I used your approach and it ran quickly after I built
the index using balance.

(bench (setq  SL (by '((X) (get X 'CustNum)) sort L))) T)
(bench (setq SLC (mapcar '((This) (: CustNum)) SL)) T)
(off A) (bench (balance 'A SLC T))

I'm stumped one piece. If I run the below code multiple times then my total
increases

: (Sum)
4.466 sec
-> 495029119
: (Sum)
4.497 sec
-> 990058238
: (Sum)
4.507 sec
-> 1485087357

(de Sum ()
  (zero Amount)
  (bench
    (for This SL
      (let (Key (: CustNum) Amt (: Amount) Idx (idx 'A Key))
        (setq Amt (if Amt Amt 0))
        (inc 'Amount Amt) #check figure to make sure it sums up

        # the val of the cell is by default a customer number, set it to be
0 if it's non-numeric
        (ifn (num? (val (car Idx))) (set (car Idx) 0))

        (set (car Idx) (+ (val (car Idx)) Amt)) ) ) )
  (sum '((X) (car X)) (idx 'A)) )


I don't know exactly how to phrase the question. I'm storing the total in
the val of the cell (I think). I would have thought it was in the val of
the cell stored in the index. However, if I

(off A) (bench (balance 'A SLA T))

, it still duplicates.

If I run this first, it clears it out: (for X (idx 'A) (set (car (idx 'A
X)) 0))

Where is the value being stored such that I need to set each value of the
cell to 0 regardless of rebuilding  the index?


Here's a simple example that I used to understand the concept:

: (setq Z "abc")
-> "abc"
: (val Z)
-> "abc"
: (set Z 0)
-> 0
: (val Z)
-> 0
: (set Z (+ (val Z) 1))
-> 1
: (val Z)
-> 1
: Z
-> "abc"

Like your example, I think I'm storing the number in the val of the symbol
(cell).

I apologize for the long winded question

Thanks
Joe




On Fri, Jun 1, 2012 at 1:38 AM, Henrik Sarvell <hsarv...@gmail.com> wrote:

> I noticed you were talking about idx.
>
> The below code is from vizreader and was part of a system that counted
> and stored all the non-common words in every article:
>
> # We extract all words from the article without special characters and
> count them
> (dm words> (L)
>   (let Words NIL
>      (for W L
>         (and
>            (setq W (lowc (pack W)))
>            (not (common?> This W))
>            (if (idx 'Words W T)
>               (inc (car @))
>               (set W 1))))
>      (idx 'Words)))
>
> It is using idx and summing up the occurrences of each word and turned
> out to be the fastest way of solving that problem anyway, maybe it's
> helpful to you.
>
>
>
>
> On Fri, Jun 1, 2012 at 10:33 AM, Joe Bogner <joebog...@gmail.com> wrote:
> > Thanks Tomas, I've started using nil now.
> >
> >  This is what I came up with to aggregate the data. It actually runs
> > reasonably well. I'm sharing because I always enjoy reading other
> people's
> > picoLisp code so I figure others may as well.
> >
> > My source file has 4 million rows
> >
> > : (bench (pivot L 'CustNum))
> > 35.226 sec
> >
> > # outputs 31,000 rows.
> >
> > My approach is to load it in as follows:
> >
> > (class +Invoice)
> > (rel CustNum (+String))
> > (rel ProdNum (+String))
> > (rel Amount (+Number))
> > (rel Quantity (+Number))
> >
> > (de Load ()
> >   (zero N)
> >   (setq L (make (
> >   (in "invoices.txt"
> >     (until (eof)
> >       (setq Line (line) )
> >       (setq D (mapcar pack (split Line "^I")))
> >       (link (new
> >         '(+Invoice)
> >         'CustNum (car (nth D 1))
> >         'ProdNum (car (nth D 2))
> >         'Amount (format (car (nth D 3)))
> >         'Quantity (format (car (nth D 4))) )) ) ) ) ) ) T )
> >
> >
> > I can probably clean this up.  I tinkered around with various approaches
> and
> > this was the best I could come up with in a few hours. At first I was
> using
> > something like the group from lib.l but found it to be too slow. I think
> it
> > was due to the fact that I optimize for a sorted list instead of scanning
> > for a match in the made list
> >
> > (de sortedGroup (List Fld)
> >   (make
> >     (let (Last NIL LastSym NIL)
> >      (for This List
> >       (let Key (get This Fld)
> >         (if (<> Last Key)
> >             (prog
> >             (if LastSym (link LastSym))
> >             (off LastSym)
> >             (push 'LastSym Key)) )
> >          (push 'LastSym This)
> >          (setq Last Key) ) )
> >          (link LastSym)) ) )
> >
> > And here's the piece that ties it all together:
> >
> > (de pivot (L Fld)
> >   (let (SL (by '((X) (get X Fld)) sort L) SG (sortedGroup SL Fld))
> >     (out "pivot.txt"
> >       (for X SG
> >         (let (Amt 0)
> >           (mapc '((This) (inc 'Amt (: Amount))) (cdr (reverse X)))
> >           (setq Key (get (car X) Fld))
> >           (prinl Key "^I" Amt) ) ) ) ) )
> >
> >
> > (Load)
> >
> > : (bench (pivot L 'CustNum))
> > 35.226 sec
> >
> > : (bench (pivot L 'ProdNum))
> > 40.945 sec
> >
> > It seems the best performance was by sorting, then splitting and then
> > summing the individual parts. It also makes for a nice report.
> >
> > Sidenote: At first I thought I was getting better performance by using a
> > modified version of quicksort off rosetta code, but then I switched it to
> > the built-in sort and saw considerably better speed.
> >
> > Thanks for the help everyone
> >
> > On Thu, May 31, 2012 at 3:37 PM, Tomas Hlavaty <t...@logand.com> wrote:
> >>
> >> Hi Joe,
> >>
> >> > Sidebar: Is there a way to disable the interactive session from
> >> > printing the return of a statement? For example, if I do a (setq ABC
> >> > L) where L is a million items, I'd prefer the option of not having all
> >> > million items print on my console. I've worked around this by wrapping
> >> > it in a prog and returning NIL. Is there an easier way?
> >>
> >> you could also use http://software-lab.de/doc/refN.html#nil or
> >> http://software-lab.de/doc/refT.html#t
> >>
> >> Cheers,
> >>
> >> Tomas
> >> --
> >> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
> >
> >
> --
> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subjectUnsubscribe
>

Reply via email to