Re: design choice: multi-threaded / asynchronous wxpython client?
bullockbefriending bard wrote: > 1) The data for the race about to start updates every (say) 15 > seconds, and the data for earlier and later races updates only > every > (say) 5 minutes. There is no point for me to be hammering the > server with requests every 15 seconds for data for races after the > upcoming race... I should query for this perhaps every 150s to be > safe. But for the upcoming race, I must not miss any updates and > should query every > ~7s to be safe. So... in the middle of a race meeting the > situation might be: I don't fully understand this, but can't you design the server in a way that you can connect to it and it notifies you about important things? IMHO, polling isn't ideal. > My initial thought was to have two threads for the different > update polling cycles. In addition I would probably need another > thread to handle UI stuff, and perhaps another for dealing with > file/DB data write out. No need for any additional threads. UI, networking and file I/O can operate asynchronously. Using wxPython's timers with callback functions, you should need only standard Python modules (except wx). > But, I wonder if using Twisted is a better idea? IMHO that's only advisable if you like to create own protocols and reuse them in different apps, or need full-featured customisable implementations of advanced protocols. Additionally, you'd *have to* use multiple threads: One for the Twisted event loop and one for the wxPython one. There is a wxreactor in Twisted which integrates the wxPython event loop, but I stopped using it due to strange deadlock problems which began with some wxPython version. Also, it seems it's no more in development. But my alternative works perfectly (main thread with Twisted, and a GUI thread for wxPython, communicating over Python standard queues). You'd only need additional threads if you would do heavy number crunching inside the wxPython or Twisted thread. For the respective event loop not to hang, it's advisable to use a separate thread for long-running calculations. > I have zero experience with these kinds of design choices and > would be very happy if those with experience could point out the > pros and cons of each (synchronous/multithreaded, or Twisted) for > dealing with the two differing sample rates problem outlined > above. I'd favor "as few threads as neccessary" approach. In my experience this saves pain (i. e. deadlocks and boilerplate queueing code). Regards, Björn -- BOFH excuse #27: radiosity depletion -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
> Tempting thought, but one of the problems with this kind of horse > racing tote data is that a lot of it is for combinations of runners > rather than single runners. Whilst there might be (say) 14 horses in a > race, there are 91 quinella price combinations (1-2 through 13-14, > i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations. > It is not really practical (I suspect) to have database tables with > columns for that many combinations? If you normalise your tables correctly, these will be represented as one-to many or many-to-many relationships in your database. Like the other poster I don't know the first thing about horses, and I may be misunderstanding something, but here is one (basic) normalised db schema: tables & descriptions: - horse - holds info about each horse - race - one record per race. Has times, etc - race_hourse - holds records linking horses and races together. You can derive all possible horse combinations from the above info. You don't need to store it in the db unless you need to link something else to it (eg: betting data). In which case: - combination - represents one combination of horses. - combination_horse - links a combinaition to 1 horse. 1 of these per horse per combination. - bet - Represents a bet. Has foreign relationship with combination (and other tables, eg: better, race) With a structure like the above you don't need hudreds of database columns :-) David. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
> 3) I need to dump this data (for all races, not just current about to > start race) to text files, store it as BLOBs in a DB *and* update real > time display in a wxpython windowed client. A few important questions: 1) How real-time must the display be? (should update immediately after you get new XML data, or is it ok to update a few seconds later?). 2) How much data is being processed at peak? (100 records a second, 1000?) 3) Does your app need to share fetched data with other apps? If so, how? (read from db, download HTML, RPC, etc). 4) Does your app need to use data from previous executions? (eg: if you restart it, does it need to have a fully populated UI, or can it start from an empty UI and start updating as it downloads new XML updates). How you answer the above questionss determines what kind of algorithm will work best. David. PS: I suggest that you contact the people you're downloading the XML from if you haven't already. eg: it might be against their TOS to constantly scrape data (I assume not, since they provide XML). You don't want them to black-list your IP address ;-). Also, maybe they have ideas for efficient data retrieval (eg: RSS feeds). -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
> Date is the time of the server response and not last data update. Data > is definitely time of server response to my request and bears no > relation to when the live XML data was updated. I know this for a fact > because right now there is no active race meeting and any data still > available is static and many hours old. I would not feel confident > rejecting incoming data as duplicate based only on same content length > criterion. Am I missing something here? It looks like the data is dynamically generated on the server, so the web server doesn't know if/when the data changed. You will usually see this for static content (images, html files, etc). You could go by the Cache-Control line and only fetch data every 30 seconds, but it's possible for you to miss some updates this way. Another thing you could try (if necessary, this is a bit of an overkill) - download the first part of the XML (GET request with a range header), and check the timestamp you mentinoed. If that changed then re-request the doc (a download resume is risky, the XML might change between your 2 requests). David. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
bullockbefriending bard wrote: > Tempting thought, but one of the problems with this kind of horse > racing tote data is that a lot of it is for combinations of runners > rather than single runners. Whilst there might be (say) 14 horses in a > race, there are 91 quinella price combinations (1-2 through 13-14, > i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations. > It is not really practical (I suspect) to have database tables with > columns for that many combinations? > > I certainly DO have a horror of having my XML / whatever else formats > getting out of sync. I also have to worry about the tote company later > changing their XML format. From that viewpoint, there is indeed a lot > to be said for storing the tote data as numbers in tables. I don't understand anything about horse races... But it should be possible to normalize such information into some tables (not necessarily one). But then, there is nothing that prevents you from having dozens of columns on one table if it is needed (it might not be the most efficient solution performance and disk space-wise depending on what you have, but it works). Using things like that you can even enhance your system and provide more information about each horse, its race history, price history, etc. I love working with data and statistics, so even though I don't know the rules and workings of horse racings, I can think of several things I'd like to track or extract from the information you seem to have :-) How does that price thing work? Are these the ratio of payings for bets? What is a quinella or a trio? Two or three horses in a defined order winning the race? -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On Apr 27, 11:27 pm, "BJörn Lindqvist" <[EMAIL PROTECTED]> wrote: > I think twisted is overkill for this problem. Threading, elementtree > and urllib should more than suffice. One thread polling the server for > each race with the desired polling interval. Each time some data is > treated, that thread sends a signal containing information about what > changed. The gui listens to the signal and will, if needed, update > itself with the new information. The database handler also listens to > the signal and updates the db. So, if i understand you correctly: Assuming 8 races and we are just about to start the race 1, we would have 8 polling threads with the race 1 thread polling at faster rate than the other ones. after race 1 betting closed, could dispense with that thread, change race 2 thread to poll faster, and so on...? I had been rather stupidly thinking of just two polling threads, one for the current race and one for races not yet run... but starting out with a thread for each extant race seems simpler given there then is no need to handle the mechanics of shifting the polling of races from the omnibus slow thread to the current race fast thread. Having got my minidom parser working nicely, I'm inclined to stick with it for now while I get other parts of the problem licked into shape. However, I do take your point that it's probably overkill for this simple kind of structured, mostly numerical data and will try to find time to experiment with the elementtree approach later. No harm at all in shaving the odd second off document parse times. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On Apr 27, 11:12 pm, Jorge Godoy <[EMAIL PROTECTED]> wrote: > bullockbefriending bard wrote: > > A further complication is that at a later point, I will want to do > > real-time time series prediction on all this data (viz. predicting > > actual starting prices at post time x minutes in the future). Assuming > > I can quickly (enough) retrieve the relevant last n tote data samples > > from the database in order to do this, then it will indeed be much > > simpler to make things much more DB-centric... as opposed to > > maintaining all this state/history in program data structures and > > updating it in real time. > > If instead of storing XML and YAML you store the data points, you can do > everything from inside the database. > > PostgreSQL supports Python stored procedures / functions and also support > using R in the same way, for manipulating data. Then you can work with > everything and just retrieve the resulting information. > > You might try storing the raw data and the XML / YAML, but I believe that > keeping those sync'ed might cause you some extra work. Tempting thought, but one of the problems with this kind of horse racing tote data is that a lot of it is for combinations of runners rather than single runners. Whilst there might be (say) 14 horses in a race, there are 91 quinella price combinations (1-2 through 13-14, i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations. It is not really practical (I suspect) to have database tables with columns for that many combinations? I certainly DO have a horror of having my XML / whatever else formats getting out of sync. I also have to worry about the tote company later changing their XML format. From that viewpoint, there is indeed a lot to be said for storing the tote data as numbers in tables. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On 2008-04-27, David <[EMAIL PROTECTED]> wrote: >> >> 1) The data for the race about to start updates every (say) 15 >> seconds, and the data for earlier and later races updates only every >> (say) 5 minutes. There is no point for me to be hammering the server >> with requests every 15 seconds for data for races after the upcoming > > Try using an HTTP HEAD instruction instead to check if the data has > changed since last time. Get If-Modified-Since is still better (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html 14.25) -- Jarkko Torppa -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
I think twisted is overkill for this problem. Threading, elementtree and urllib should more than suffice. One thread polling the server for each race with the desired polling interval. Each time some data is treated, that thread sends a signal containing information about what changed. The gui listens to the signal and will, if needed, update itself with the new information. The database handler also listens to the signal and updates the db. 2008/4/27, bullockbefriending bard <[EMAIL PROTECTED]>: > I am a complete ignoramus and newbie when it comes to designing and > coding networked clients (or servers for that matter). I have a copy > of Goerzen (Foundations of Python Network Programming) and once > pointed in the best direction should be able to follow my nose and get > things sorted... but I am not quite sure which is the best path to > take and would be grateful for advice from networking gurus. > > I am writing a program to display horse racing tote odds in a desktop > client program. I have access to an HTTP (open one of several URLs, > and I get back an XML doc with some data... not XML-RPC.) source of > XML data which I am able to parse and munge with no difficulty at all. > I have written and successfully tested a simple command line program > which allows me to repeatedly poll the server and parse the XML. Easy > enough, but the real world production complications are: > > 1) The data for the race about to start updates every (say) 15 > seconds, and the data for earlier and later races updates only every > (say) 5 minutes. There is no point for me to be hammering the server > with requests every 15 seconds for data for races after the upcoming > race... I should query for this perhaps every 150s to be safe. But for > the upcoming race, I must not miss any updates and should query every > ~7s to be safe. So... in the middle of a race meeting the situation > might be: > race 1 (race done with, no-longer querying), race 2 (race done with, > no longer querying) race 3 (about to start, data on server for this > race updating every 15s, my client querying every 7s), races 4-8 (data > on server for these races updating every 5 mins, my client querying > every 2.5 mins) > > 2) After a race has started and betting is cut off and there are > consequently no more tote updates for that race (it is possible to > determine when this occurs precisely because of an attribute in the > XML data), I need to stop querying (say) race 3 every 7s and remove > race 4 from the 150s query group and begin querying its data every 7s. > > 3) I need to dump this data (for all races, not just current about to > start race) to text files, store it as BLOBs in a DB *and* update real > time display in a wxpython windowed client. > > My initial thought was to have two threads for the different update > polling cycles. In addition I would probably need another thread to > handle UI stuff, and perhaps another for dealing with file/DB data > write out. But, I wonder if using Twisted is a better idea? I will > still need to handle some threading myself, but (I think) only for > keeping wxpython happy by doing all this other stuff off the main > thread + perhaps also persisting received data in yet another thread. > > I have zero experience with these kinds of design choices and would be > very happy if those with experience could point out the pros and cons > of each (synchronous/multithreaded, or Twisted) for dealing with the > two differing sample rates problem outlined above. > > Many TIA! > > > > > -- > http://mail.python.org/mailman/listinfo/python-list > -- mvh Björn -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
bullockbefriending bard wrote: > 3) I need to dump this data (for all races, not just current about to > start race) to text files, store it as BLOBs in a DB *and* update real > time display in a wxpython windowed client. Why in a BLOB? Why not into specific data types and normalized tables? You can also save the BLOB for backup or auditing, but this won't allow you to use your DB to the best of its capabilities... It will just act as a data container, the same as a network share (which would not penalize you too much to have connections open/closed). -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On Apr 27, 10:10 pm, David <[EMAIL PROTECTED]> wrote: > > 1) The data for the race about to start updates every (say) 15 > > seconds, and the data for earlier and later races updates only every > > (say) 5 minutes. There is no point for me to be hammering the server > > with requests every 15 seconds for data for races after the upcoming > > Try using an HTTP HEAD instruction instead to check if the data has > changed since last time. Thanks for the suggestion... am I going about this the right way here? import urllib2 request = urllib2.Request("http://get-rich.quick.com";) request.get_method = lambda: "HEAD" http_file = urllib2.urlopen(request) print http_file.headers ->>> Age: 0 Date: Sun, 27 Apr 2008 16:07:11 GMT Content-Length: 521 Content-Type: text/xml; charset=utf-8 Expires: Sun, 27 Apr 2008 16:07:41 GMT Cache-Control: public, max-age=30, must-revalidate Connection: close Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 1.1.4322 Via: 1.1 jcbw-nc3 (NetCache NetApp/5.5R4D6) Date is the time of the server response and not last data update. Data is definitely time of server response to my request and bears no relation to when the live XML data was updated. I know this for a fact because right now there is no active race meeting and any data still available is static and many hours old. I would not feel confident rejecting incoming data as duplicate based only on same content length criterion. Am I missing something here? Actually there doesn't seem to be too much difficulty performance-wise in fetching and parsing (minidom) the XML data and checking the internal (it's an attribute) update time stamp in the parsed doc. If timings got really tight, presumably I could more quickly check each doc's time stamp with SAX (time stamp comes early in data as one might reasonably expect) before deciding whether to go the whole hog with minidom if the time stamp has in fact changed since I last polled the server. But if there is something I don't get about HTTP HEAD approach, please let me know as a simple check like this would obviously be a good thing for me. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
bullockbefriending bard wrote: > A further complication is that at a later point, I will want to do > real-time time series prediction on all this data (viz. predicting > actual starting prices at post time x minutes in the future). Assuming > I can quickly (enough) retrieve the relevant last n tote data samples > from the database in order to do this, then it will indeed be much > simpler to make things much more DB-centric... as opposed to > maintaining all this state/history in program data structures and > updating it in real time. If instead of storing XML and YAML you store the data points, you can do everything from inside the database. PostgreSQL supports Python stored procedures / functions and also support using R in the same way, for manipulating data. Then you can work with everything and just retrieve the resulting information. You might try storing the raw data and the XML / YAML, but I believe that keeping those sync'ed might cause you some extra work. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On Apr 27, 10:05 pm, "Eric Wertman" <[EMAIL PROTECTED]> wrote: > HI, that does look like a lot of fun... You might consider breaking > that into 2 separate programs. Write one that's threaded to keep a db > updated properly, and write a completely separate one to handle > displaying data from your db. This would allow you to later change or > add a web interface without having to muck with the code that handles > data. Thanks for the good point. It certainly is a lot of 'fun'. One of those jobs which at first looks easy (XML, very simple to parse data), but a few gotchas in the real-time nature of the beast. After thinking about your idea more, I am sure this decoupling of functions and making everything DB-centric can simplify a lot of issues. I quite like the idea of persisting pickled or YAML data along with the raw XML (for archival purposes + occurs to me I might be able to do something with XSLT to get it directly into screen viewable form without too much work) to a DB and then having a client program which queries most recent time-stamped data for display. A further complication is that at a later point, I will want to do real-time time series prediction on all this data (viz. predicting actual starting prices at post time x minutes in the future). Assuming I can quickly (enough) retrieve the relevant last n tote data samples from the database in order to do this, then it will indeed be much simpler to make things much more DB-centric... as opposed to maintaining all this state/history in program data structures and updating it in real time. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
> > 1) The data for the race about to start updates every (say) 15 > seconds, and the data for earlier and later races updates only every > (say) 5 minutes. There is no point for me to be hammering the server > with requests every 15 seconds for data for races after the upcoming Try using an HTTP HEAD instruction instead to check if the data has changed since last time. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
HI, that does look like a lot of fun... You might consider breaking that into 2 separate programs. Write one that's threaded to keep a db updated properly, and write a completely separate one to handle displaying data from your db. This would allow you to later change or add a web interface without having to muck with the code that handles data. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
bullockbefriending bard wrote: I am a complete ignoramus and newbie when it comes to designing and coding networked clients (or servers for that matter). I have a copy of Goerzen (Foundations of Python Network Programming) and once pointed in the best direction should be able to follow my nose and get things sorted... but I am not quite sure which is the best path to take and would be grateful for advice from networking gurus. I am writing a program to display horse racing tote odds in a desktop client program. I have access to an HTTP (open one of several URLs, and I get back an XML doc with some data... not XML-RPC.) source of XML data which I am able to parse and munge with no difficulty at all. I have written and successfully tested a simple command line program which allows me to repeatedly poll the server and parse the XML. Easy enough, but the real world production complications are: 1) The data for the race about to start updates every (say) 15 seconds, and the data for earlier and later races updates only every (say) 5 minutes. There is no point for me to be hammering the server with requests every 15 seconds for data for races after the upcoming race... I should query for this perhaps every 150s to be safe. But for the upcoming race, I must not miss any updates and should query every ~7s to be safe. So... in the middle of a race meeting the situation might be: race 1 (race done with, no-longer querying), race 2 (race done with, no longer querying) race 3 (about to start, data on server for this race updating every 15s, my client querying every 7s), races 4-8 (data on server for these races updating every 5 mins, my client querying every 2.5 mins) 2) After a race has started and betting is cut off and there are consequently no more tote updates for that race (it is possible to determine when this occurs precisely because of an attribute in the XML data), I need to stop querying (say) race 3 every 7s and remove race 4 from the 150s query group and begin querying its data every 7s. 3) I need to dump this data (for all races, not just current about to start race) to text files, store it as BLOBs in a DB *and* update real time display in a wxpython windowed client. My initial thought was to have two threads for the different update polling cycles. In addition I would probably need another thread to handle UI stuff, and perhaps another for dealing with file/DB data write out. But, I wonder if using Twisted is a better idea? I will still need to handle some threading myself, but (I think) only for keeping wxpython happy by doing all this other stuff off the main thread + perhaps also persisting received data in yet another thread. I have zero experience with these kinds of design choices and would be very happy if those with experience could point out the pros and cons of each (synchronous/multithreaded, or Twisted) for dealing with the two differing sample rates problem outlined above. Many TIA! IMHO using twisted will give you the best performance and framework. Since it uses callbacks for every request, your machine could handle a LOT of different external queries and keep everything updated in WX. Might be a little tricky to get working with WX, but I recall Googling for something like this not long ago and there appeared to be sufficient information on how to get working. http://twistedmatrix.com/projects/core/documentation/howto/choosing-reactor.html Twisted even automatically uses threads to keep SQL database storage routines from blocking (see Chapter 4 of Twisted Network Programming Essentials) This is an ambitious project, good luck. -Larry -- http://mail.python.org/mailman/listinfo/python-list
design choice: multi-threaded / asynchronous wxpython client?
I am a complete ignoramus and newbie when it comes to designing and coding networked clients (or servers for that matter). I have a copy of Goerzen (Foundations of Python Network Programming) and once pointed in the best direction should be able to follow my nose and get things sorted... but I am not quite sure which is the best path to take and would be grateful for advice from networking gurus. I am writing a program to display horse racing tote odds in a desktop client program. I have access to an HTTP (open one of several URLs, and I get back an XML doc with some data... not XML-RPC.) source of XML data which I am able to parse and munge with no difficulty at all. I have written and successfully tested a simple command line program which allows me to repeatedly poll the server and parse the XML. Easy enough, but the real world production complications are: 1) The data for the race about to start updates every (say) 15 seconds, and the data for earlier and later races updates only every (say) 5 minutes. There is no point for me to be hammering the server with requests every 15 seconds for data for races after the upcoming race... I should query for this perhaps every 150s to be safe. But for the upcoming race, I must not miss any updates and should query every ~7s to be safe. So... in the middle of a race meeting the situation might be: race 1 (race done with, no-longer querying), race 2 (race done with, no longer querying) race 3 (about to start, data on server for this race updating every 15s, my client querying every 7s), races 4-8 (data on server for these races updating every 5 mins, my client querying every 2.5 mins) 2) After a race has started and betting is cut off and there are consequently no more tote updates for that race (it is possible to determine when this occurs precisely because of an attribute in the XML data), I need to stop querying (say) race 3 every 7s and remove race 4 from the 150s query group and begin querying its data every 7s. 3) I need to dump this data (for all races, not just current about to start race) to text files, store it as BLOBs in a DB *and* update real time display in a wxpython windowed client. My initial thought was to have two threads for the different update polling cycles. In addition I would probably need another thread to handle UI stuff, and perhaps another for dealing with file/DB data write out. But, I wonder if using Twisted is a better idea? I will still need to handle some threading myself, but (I think) only for keeping wxpython happy by doing all this other stuff off the main thread + perhaps also persisting received data in yet another thread. I have zero experience with these kinds of design choices and would be very happy if those with experience could point out the pros and cons of each (synchronous/multithreaded, or Twisted) for dealing with the two differing sample rates problem outlined above. Many TIA! -- http://mail.python.org/mailman/listinfo/python-list