On Apr 27, 11:12 pm, Jorge Godoy <[EMAIL PROTECTED]> wrote: > bullockbefriending bard wrote: > > A further complication is that at a later point, I will want to do > > real-time time series prediction on all this data (viz. predicting > > actual starting prices at post time x minutes in the future). Assuming > > I can quickly (enough) retrieve the relevant last n tote data samples > > from the database in order to do this, then it will indeed be much > > simpler to make things much more DB-centric... as opposed to > > maintaining all this state/history in program data structures and > > updating it in real time. > > If instead of storing XML and YAML you store the data points, you can do > everything from inside the database. > > PostgreSQL supports Python stored procedures / functions and also support > using R in the same way, for manipulating data. Then you can work with > everything and just retrieve the resulting information. > > You might try storing the raw data and the XML / YAML, but I believe that > keeping those sync'ed might cause you some extra work.
Tempting thought, but one of the problems with this kind of horse racing tote data is that a lot of it is for combinations of runners rather than single runners. Whilst there might be (say) 14 horses in a race, there are 91 quinella price combinations (1-2 through 13-14, i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations. It is not really practical (I suspect) to have database tables with columns for that many combinations? I certainly DO have a horror of having my XML / whatever else formats getting out of sync. I also have to worry about the tote company later changing their XML format. From that viewpoint, there is indeed a lot to be said for storing the tote data as numbers in tables. -- http://mail.python.org/mailman/listinfo/python-list