Hi Henrik - Thanks for sharing. I used your approach and it ran quickly after I built the index using balance.
(bench (setq SL (by '((X) (get X 'CustNum)) sort L))) T) (bench (setq SLC (mapcar '((This) (: CustNum)) SL)) T) (off A) (bench (balance 'A SLC T)) I'm stumped one piece. If I run the below code multiple times then my total increases : (Sum) 4.466 sec -> 495029119 : (Sum) 4.497 sec -> 990058238 : (Sum) 4.507 sec -> 1485087357 (de Sum () (zero Amount) (bench (for This SL (let (Key (: CustNum) Amt (: Amount) Idx (idx 'A Key)) (setq Amt (if Amt Amt 0)) (inc 'Amount Amt) #check figure to make sure it sums up # the val of the cell is by default a customer number, set it to be 0 if it's non-numeric (ifn (num? (val (car Idx))) (set (car Idx) 0)) (set (car Idx) (+ (val (car Idx)) Amt)) ) ) ) (sum '((X) (car X)) (idx 'A)) ) I don't know exactly how to phrase the question. I'm storing the total in the val of the cell (I think). I would have thought it was in the val of the cell stored in the index. However, if I (off A) (bench (balance 'A SLA T)) , it still duplicates. If I run this first, it clears it out: (for X (idx 'A) (set (car (idx 'A X)) 0)) Where is the value being stored such that I need to set each value of the cell to 0 regardless of rebuilding the index? Here's a simple example that I used to understand the concept: : (setq Z "abc") -> "abc" : (val Z) -> "abc" : (set Z 0) -> 0 : (val Z) -> 0 : (set Z (+ (val Z) 1)) -> 1 : (val Z) -> 1 : Z -> "abc" Like your example, I think I'm storing the number in the val of the symbol (cell). I apologize for the long winded question Thanks Joe On Fri, Jun 1, 2012 at 1:38 AM, Henrik Sarvell <hsarv...@gmail.com> wrote: > I noticed you were talking about idx. > > The below code is from vizreader and was part of a system that counted > and stored all the non-common words in every article: > > # We extract all words from the article without special characters and > count them > (dm words> (L) > (let Words NIL > (for W L > (and > (setq W (lowc (pack W))) > (not (common?> This W)) > (if (idx 'Words W T) > (inc (car @)) > (set W 1)))) > (idx 'Words))) > > It is using idx and summing up the occurrences of each word and turned > out to be the fastest way of solving that problem anyway, maybe it's > helpful to you. > > > > > On Fri, Jun 1, 2012 at 10:33 AM, Joe Bogner <joebog...@gmail.com> wrote: > > Thanks Tomas, I've started using nil now. > > > > This is what I came up with to aggregate the data. It actually runs > > reasonably well. I'm sharing because I always enjoy reading other > people's > > picoLisp code so I figure others may as well. > > > > My source file has 4 million rows > > > > : (bench (pivot L 'CustNum)) > > 35.226 sec > > > > # outputs 31,000 rows. > > > > My approach is to load it in as follows: > > > > (class +Invoice) > > (rel CustNum (+String)) > > (rel ProdNum (+String)) > > (rel Amount (+Number)) > > (rel Quantity (+Number)) > > > > (de Load () > > (zero N) > > (setq L (make ( > > (in "invoices.txt" > > (until (eof) > > (setq Line (line) ) > > (setq D (mapcar pack (split Line "^I"))) > > (link (new > > '(+Invoice) > > 'CustNum (car (nth D 1)) > > 'ProdNum (car (nth D 2)) > > 'Amount (format (car (nth D 3))) > > 'Quantity (format (car (nth D 4))) )) ) ) ) ) ) T ) > > > > > > I can probably clean this up. I tinkered around with various approaches > and > > this was the best I could come up with in a few hours. At first I was > using > > something like the group from lib.l but found it to be too slow. I think > it > > was due to the fact that I optimize for a sorted list instead of scanning > > for a match in the made list > > > > (de sortedGroup (List Fld) > > (make > > (let (Last NIL LastSym NIL) > > (for This List > > (let Key (get This Fld) > > (if (<> Last Key) > > (prog > > (if LastSym (link LastSym)) > > (off LastSym) > > (push 'LastSym Key)) ) > > (push 'LastSym This) > > (setq Last Key) ) ) > > (link LastSym)) ) ) > > > > And here's the piece that ties it all together: > > > > (de pivot (L Fld) > > (let (SL (by '((X) (get X Fld)) sort L) SG (sortedGroup SL Fld)) > > (out "pivot.txt" > > (for X SG > > (let (Amt 0) > > (mapc '((This) (inc 'Amt (: Amount))) (cdr (reverse X))) > > (setq Key (get (car X) Fld)) > > (prinl Key "^I" Amt) ) ) ) ) ) > > > > > > (Load) > > > > : (bench (pivot L 'CustNum)) > > 35.226 sec > > > > : (bench (pivot L 'ProdNum)) > > 40.945 sec > > > > It seems the best performance was by sorting, then splitting and then > > summing the individual parts. It also makes for a nice report. > > > > Sidenote: At first I thought I was getting better performance by using a > > modified version of quicksort off rosetta code, but then I switched it to > > the built-in sort and saw considerably better speed. > > > > Thanks for the help everyone > > > > On Thu, May 31, 2012 at 3:37 PM, Tomas Hlavaty <t...@logand.com> wrote: > >> > >> Hi Joe, > >> > >> > Sidebar: Is there a way to disable the interactive session from > >> > printing the return of a statement? For example, if I do a (setq ABC > >> > L) where L is a million items, I'd prefer the option of not having all > >> > million items print on my console. I've worked around this by wrapping > >> > it in a prog and returning NIL. Is there an easier way? > >> > >> you could also use http://software-lab.de/doc/refN.html#nil or > >> http://software-lab.de/doc/refT.html#t > >> > >> Cheers, > >> > >> Tomas > >> -- > >> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe > > > > > -- > UNSUBSCRIBE: mailto:picolisp@software-lab.de?subjectUnsubscribe >