I can throw together an example or two, but first allow me to make an
observation, and ask a question:

The observation is that the xml itself is a bit messy. Every data item you
have mentioned corresponds to a pair of [consecutive] xml elements. And, as
is usual for xml, the result is somewhat overspecified. This means that we
have a variety of ways of extracting the data and it's worth just a few
moments of thought about potential variations:

(1) We can rely on sequence for all elements in a dict - they will not
change, or
(2) We can only rely on sequence for the minimal pairs needed to identify
the data items of a dict, or
 (3) We cannot rely on the sequence but instead must fail with an error
when it is violated.

And there's a similar line of questions about potential future extensions
of the underlying dataset - are they errors, or to be ignored or to be
handled according to some uniform rule? Or is it simply that you would plan
on fixing the code if the format changed?

(This could be simpler if the xml were structured differently.)

Since you have suggested that this represents a database, I thought I would
know the answers to these questions, but looking at your implementation I
see something different, and I would like to hear your thoughts before
tackling any implementation(s).



On Wed, Feb 12, 2014 at 3:47 PM, Brian Schott <schott.br...@gmail.com>wrote:

> http://www.jsoftware.com/pipermail/programming/2011-January/021609.html
> In the thread mentioned above I was trying to extract information
> corresponding to the following key tags from a file excerpted from there:
> date, duration, gameNumber, moves, and result. Each "dict" group represents
> a game of Freecell. I was able to do it with help from the forum, but it
> being such a simple database, I wonder if you would show how to do it with
> your 2 blog codes? Understand please that I may be asking much more than
> you can provide, do feel free to decline.
> Below is some code that I used on the data file. The code was only intended
> for my own use, but maybe a cursory look could clarify what I did. At the
> very bottom is a "demo" list of things I did with the dataset. I hope there
> is enough meat and not too much fat for you to see what I was doing.
> NB. freecellscore.ijs
> NB. 1/25/11
> NB. 10/3/13 revision
> require 'xml/sax format/datefmt types/datetime'
> saxclass 'pfreecell'
> noba=: -.&' '
> startDocument=: 3 : 0
>   S=: ''
>   Z=: i.0 2
> )
> endDocument=: 3 : 'Z'
> startElement=: 4 : 0
>   S=: S,<y
> )
> endElement=: 3 : 0
>   S=: }:S
> )
> characters=: 3 : 0
>   s2=. _2{.S
>   if. s2 -: ;:'dict key' do. Z=:Z,y
>   elseif. (1<#y) *. #noba y do. Z=:Z,  y
> end.
> )
> coclass'base'
> NB. td means time-date
> tdmask=: 1 1,,6($,:)1 1j1 0
> tdfromiso=: tdmask&(#!.' ')
> tfromiso=: (]- 0 0 0,~1800 1 1-~3&{.)&(".@tdfromiso)
> tdfmt=: ":!.13
> td=: tdfmt@tsrep@(".@tdfromiso)
> ti=: tdfmt@tsrep@tfromiso
> arrayed=: _5&(;/\)
> amend0=: 0&((".@tdfromiso each @{)`(,@[)`]}"0 1)
> amend1=: 1&((tfromiso each @{)`(,@[)`]}"0 1)
> clean=: }:@}.@('<' dropto&.|. '>' dropto ])
> NB. create verbs for processing 5 input types
> ti=: td=: ]
> date=: td
> duration=: ti
> gameNumber=: moves=: result=: ]
> Date=: tdfromiso@(0&pick"1)
> Duration=: tfromiso@(1&pick"1)
> Duration=: 24 60 60&#.@(_3&{.)@tfromiso@(1&pick"1)
> GameNumber=: 2&pick"1
> Moves=:  ".@(3&pick"1)
> Result=: 4&pick"1
> NB. execute verb,noun data pairs
> ex=:    (],~'~',~[)&('''',],''''"_)&dtb/
> freqcount=: (\: {:"1)@(~. ,. #/.~)
> midpt=: -:@<:@#
> median=: -:@(+/)@((<. , >.)@midpt { /:~)
> mean=:+/ % #
> Note 'demo'
> load'plot'
> load 'files'
> NB. next line wraps
> #indata=: fread
> '/Users/brian/Library/Preferences/org.wasters.Freecell.history.plist'
> indata=: '</dict>' dropto&.|. '<dict' dropto indata
> NB. next line wraps
> data=: }. _12 clean each@((0 0 1 0 1 0 1 0 1 0 1 0&#))\<;._2  indata,LF
> NB.
> get data only
> #u=. /:Date data      NB. sort order
> sdata=: u{data
> NB. next line wraps
> between=: 100000,~0 0 30 24 60 60 #.2 tsDiff/\&.|.". Date sdata  NB.
> seconds between Dates
> threshold=: */2  60 60  NB. 2 hours
> outliers=: 3600>Duration sdata  NB. really, games of less than 1 hour
> freqcount #&>(between<:threshold)<;. 1 sdata NB. # games played per
> session?
> NB. next line wraps
> plot (#~0 1 $~ #)median @:Duration&>(outliers#between<:threshold)<;. 1
> outliers#sdata
> outliers=: 1000>Duration sdata  NB. really, games of less than 1000 secs
> 'dot' plot (Duration;~Moves)outliers#sdata
> )
> --
> (B=)
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to