Re: Enumerating k-segmentations of a sequence
On Nov 26, 12:15 am, [EMAIL PROTECTED] wrote: bullockbefriending bard napisa³(a): I'm not sure if my terminology is precise enough, but what I want to do is: Given an ordered sequence of n items, enumerate all its possible k- segmentations. This is *not* the same as enumerating the k set partitions of the n items because I am only interested in those set partitions which preserve the order of the items. i.e. if the input sequence is (1 2 3 4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4 5)) is acceptable. Hence use of term 'segmentations'. e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function which enumerates the following 3-segmentations: ((1 2 3) (4) (5)) ((1 2) (3 4) (5)) ((1 2) (3) (4 5)) ((1) (2 3 4) (5)) ((1) (2 3) (4 5)) ((1) (2) (3 4 5)) The reason I want to do this is to use it in some simple one- dimensional clustering, i.e. each set item (won't be just integers as above) will have a value of interest, and i'll select the particular set partition which minimises Sigma SSD (sum of squared differences from mean) of these values. It seems overkill to generate the full list of set partitions [including e.g. ((1 4) (2) (3 5))] because I intend to begin by sorting the input sequence such that element 1 element 2 ... element n. Structural Recursion not being my strong point, any ideas on how to go about this would be much appreciated! I'm not sure if I correctly understood the semantics of what you need. Anyway it looks like a nice exercise in nested generators: def segmentations(sequence, k): assert 1 = k = len(sequence) if k == 1: yield [sequence] else: for headlen in xrange(1, len(sequence) - k + 2): head, tail = sequence[:headlen], sequence[headlen:] for tail_seg in segmentations(tail, k - 1): yield [head] + tail_seg for segmentation in segmentations((1, 2, 3, 4, 5), 3): print segmentation Regards, Marek Exactly what I needed. Thank you all for examples and also for explaining how to think about problems of this nature! -- http://mail.python.org/mailman/listinfo/python-list
Enumerating k-segmentations of a sequence
I'm not sure if my terminology is precise enough, but what I want to do is: Given an ordered sequence of n items, enumerate all its possible k- segmentations. This is *not* the same as enumerating the k set partitions of the n items because I am only interested in those set partitions which preserve the order of the items. i.e. if the input sequence is (1 2 3 4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4 5)) is acceptable. Hence use of term 'segmentations'. e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function which enumerates the following 3-segmentations: ((1 2 3) (4) (5)) ((1 2) (3 4) (5)) ((1 2) (3) (4 5)) ((1) (2 3 4) (5)) ((1) (2 3) (4 5)) ((1) (2) (3 4 5)) The reason I want to do this is to use it in some simple one- dimensional clustering, i.e. each set item (won't be just integers as above) will have a value of interest, and i'll select the particular set partition which minimises Sigma SSD (sum of squared differences from mean) of these values. It seems overkill to generate the full list of set partitions [including e.g. ((1 4) (2) (3 5))] because I intend to begin by sorting the input sequence such that element 1 element 2 ... element n. Structural Recursion not being my strong point, any ideas on how to go about this would be much appreciated! -- http://mail.python.org/mailman/listinfo/python-list
design choice: multi-threaded / asynchronous wxpython client?
I am a complete ignoramus and newbie when it comes to designing and coding networked clients (or servers for that matter). I have a copy of Goerzen (Foundations of Python Network Programming) and once pointed in the best direction should be able to follow my nose and get things sorted... but I am not quite sure which is the best path to take and would be grateful for advice from networking gurus. I am writing a program to display horse racing tote odds in a desktop client program. I have access to an HTTP (open one of several URLs, and I get back an XML doc with some data... not XML-RPC.) source of XML data which I am able to parse and munge with no difficulty at all. I have written and successfully tested a simple command line program which allows me to repeatedly poll the server and parse the XML. Easy enough, but the real world production complications are: 1) The data for the race about to start updates every (say) 15 seconds, and the data for earlier and later races updates only every (say) 5 minutes. There is no point for me to be hammering the server with requests every 15 seconds for data for races after the upcoming race... I should query for this perhaps every 150s to be safe. But for the upcoming race, I must not miss any updates and should query every ~7s to be safe. So... in the middle of a race meeting the situation might be: race 1 (race done with, no-longer querying), race 2 (race done with, no longer querying) race 3 (about to start, data on server for this race updating every 15s, my client querying every 7s), races 4-8 (data on server for these races updating every 5 mins, my client querying every 2.5 mins) 2) After a race has started and betting is cut off and there are consequently no more tote updates for that race (it is possible to determine when this occurs precisely because of an attribute in the XML data), I need to stop querying (say) race 3 every 7s and remove race 4 from the 150s query group and begin querying its data every 7s. 3) I need to dump this data (for all races, not just current about to start race) to text files, store it as BLOBs in a DB *and* update real time display in a wxpython windowed client. My initial thought was to have two threads for the different update polling cycles. In addition I would probably need another thread to handle UI stuff, and perhaps another for dealing with file/DB data write out. But, I wonder if using Twisted is a better idea? I will still need to handle some threading myself, but (I think) only for keeping wxpython happy by doing all this other stuff off the main thread + perhaps also persisting received data in yet another thread. I have zero experience with these kinds of design choices and would be very happy if those with experience could point out the pros and cons of each (synchronous/multithreaded, or Twisted) for dealing with the two differing sample rates problem outlined above. Many TIA! -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On Apr 27, 10:05 pm, Eric Wertman [EMAIL PROTECTED] wrote: HI, that does look like a lot of fun... You might consider breaking that into 2 separate programs. Write one that's threaded to keep a db updated properly, and write a completely separate one to handle displaying data from your db. This would allow you to later change or add a web interface without having to muck with the code that handles data. Thanks for the good point. It certainly is a lot of 'fun'. One of those jobs which at first looks easy (XML, very simple to parse data), but a few gotchas in the real-time nature of the beast. After thinking about your idea more, I am sure this decoupling of functions and making everything DB-centric can simplify a lot of issues. I quite like the idea of persisting pickled or YAML data along with the raw XML (for archival purposes + occurs to me I might be able to do something with XSLT to get it directly into screen viewable form without too much work) to a DB and then having a client program which queries most recent time-stamped data for display. A further complication is that at a later point, I will want to do real-time time series prediction on all this data (viz. predicting actual starting prices at post time x minutes in the future). Assuming I can quickly (enough) retrieve the relevant last n tote data samples from the database in order to do this, then it will indeed be much simpler to make things much more DB-centric... as opposed to maintaining all this state/history in program data structures and updating it in real time. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On Apr 27, 10:10 pm, David [EMAIL PROTECTED] wrote: 1) The data for the race about to start updates every (say) 15 seconds, and the data for earlier and later races updates only every (say) 5 minutes. There is no point for me to be hammering the server with requests every 15 seconds for data for races after the upcoming Try using an HTTP HEAD instruction instead to check if the data has changed since last time. Thanks for the suggestion... am I going about this the right way here? import urllib2 request = urllib2.Request(http://get-rich.quick.com;) request.get_method = lambda: HEAD http_file = urllib2.urlopen(request) print http_file.headers - Age: 0 Date: Sun, 27 Apr 2008 16:07:11 GMT Content-Length: 521 Content-Type: text/xml; charset=utf-8 Expires: Sun, 27 Apr 2008 16:07:41 GMT Cache-Control: public, max-age=30, must-revalidate Connection: close Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 1.1.4322 Via: 1.1 jcbw-nc3 (NetCache NetApp/5.5R4D6) Date is the time of the server response and not last data update. Data is definitely time of server response to my request and bears no relation to when the live XML data was updated. I know this for a fact because right now there is no active race meeting and any data still available is static and many hours old. I would not feel confident rejecting incoming data as duplicate based only on same content length criterion. Am I missing something here? Actually there doesn't seem to be too much difficulty performance-wise in fetching and parsing (minidom) the XML data and checking the internal (it's an attribute) update time stamp in the parsed doc. If timings got really tight, presumably I could more quickly check each doc's time stamp with SAX (time stamp comes early in data as one might reasonably expect) before deciding whether to go the whole hog with minidom if the time stamp has in fact changed since I last polled the server. But if there is something I don't get about HTTP HEAD approach, please let me know as a simple check like this would obviously be a good thing for me. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On Apr 27, 11:12 pm, Jorge Godoy [EMAIL PROTECTED] wrote: bullockbefriending bard wrote: A further complication is that at a later point, I will want to do real-time time series prediction on all this data (viz. predicting actual starting prices at post time x minutes in the future). Assuming I can quickly (enough) retrieve the relevant last n tote data samples from the database in order to do this, then it will indeed be much simpler to make things much more DB-centric... as opposed to maintaining all this state/history in program data structures and updating it in real time. If instead of storing XML and YAML you store the data points, you can do everything from inside the database. PostgreSQL supports Python stored procedures / functions and also support using R in the same way, for manipulating data. Then you can work with everything and just retrieve the resulting information. You might try storing the raw data and the XML / YAML, but I believe that keeping those sync'ed might cause you some extra work. Tempting thought, but one of the problems with this kind of horse racing tote data is that a lot of it is for combinations of runners rather than single runners. Whilst there might be (say) 14 horses in a race, there are 91 quinella price combinations (1-2 through 13-14, i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations. It is not really practical (I suspect) to have database tables with columns for that many combinations? I certainly DO have a horror of having my XML / whatever else formats getting out of sync. I also have to worry about the tote company later changing their XML format. From that viewpoint, there is indeed a lot to be said for storing the tote data as numbers in tables. -- http://mail.python.org/mailman/listinfo/python-list
Re: design choice: multi-threaded / asynchronous wxpython client?
On Apr 27, 11:27 pm, BJörn Lindqvist [EMAIL PROTECTED] wrote: I think twisted is overkill for this problem. Threading, elementtree and urllib should more than suffice. One thread polling the server for each race with the desired polling interval. Each time some data is treated, that thread sends a signal containing information about what changed. The gui listens to the signal and will, if needed, update itself with the new information. The database handler also listens to the signal and updates the db. So, if i understand you correctly: Assuming 8 races and we are just about to start the race 1, we would have 8 polling threads with the race 1 thread polling at faster rate than the other ones. after race 1 betting closed, could dispense with that thread, change race 2 thread to poll faster, and so on...? I had been rather stupidly thinking of just two polling threads, one for the current race and one for races not yet run... but starting out with a thread for each extant race seems simpler given there then is no need to handle the mechanics of shifting the polling of races from the omnibus slow thread to the current race fast thread. Having got my minidom parser working nicely, I'm inclined to stick with it for now while I get other parts of the problem licked into shape. However, I do take your point that it's probably overkill for this simple kind of structured, mostly numerical data and will try to find time to experiment with the elementtree approach later. No harm at all in shaving the odd second off document parse times. -- http://mail.python.org/mailman/listinfo/python-list
Generator for k-permutations without repetition
I was able to google a recipe for a k_permutations generator, such that i can write: x = range(1, 4) # (say) [combi for combi in k_permutations(x, 3)] = [[1, 1, 1], [1, 1, 2], [1, 1, 3], [1, 2, 1], [1, 2, 2], [1, 2, 3], [1, 3, 1], [1, 3, 2], [1, 3, 3], [2, 1, 1], [2, 1, 2], [2, 1, 3], [2, 2, 1], [2, 2, 2], [2, 2, 3], [2, 3, 1], [2, 3, 2], [2, 3, 3], [3, 1, 1], [3, 1, 2], [3, 1, 3], [3, 2, 1], [3, 2, 2], [3, 2, 3], [3, 3, 1], [3, 3, 2], [3, 3, 3]] but what i really want is the above without repetition, i.e.: [combi for combi in k_permutations_without_repetitions(x, 3)] = [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]] For smallish lists, it's no problem to generate the list as follows: no_reps_combis = [combi for combi in k_permutations(x, 3) if \ len(set(combi)) == 3] but i'm sure there must be a way to do this as a pure generator, just that i wouldn't have a clue how to go about it. Help please, Python Gods! :) Apologies in advance if I have got my terminology wrong and have been googling the wrong stuff and therefore coming up blank. -- http://mail.python.org/mailman/listinfo/python-list
Re: Generator for k-permutations without repetition
On Jul 4, 7:09 pm, Nis Jørgensen [EMAIL PROTECTED] wrote: bullockbefriending bard skrev: I was able to google a recipe for a k_permutations generator, such that i can write: x = range(1, 4) # (say) [combi for combi in k_permutations(x, 3)] = [[1, 1, 1], [1, 1, 2], [1, 1, 3], [1, 2, 1], [1, 2, 2], [1, 2, 3], [1, 3, 1], [1, 3, 2], [1, 3, 3], [2, 1, 1], [2, 1, 2], [2, 1, 3], [2, 2, 1], [2, 2, 2], [2, 2, 3], [2, 3, 1], [2, 3, 2], [2, 3, 3], [3, 1, 1], [3, 1, 2], [3, 1, 3], [3, 2, 1], [3, 2, 2], [3, 2, 3], [3, 3, 1], [3, 3, 2], [3, 3, 3]] but what i really want is the above without repetition, i.e.: [combi for combi in k_permutations_without_repetitions(x, 3)] = [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]] For smallish lists, it's no problem to generate the list as follows: no_reps_combis = [combi for combi in k_permutations(x, 3) if \ len(set(combi)) == 3] but i'm sure there must be a way to do this as a pure generator, just that i wouldn't have a clue how to go about it. Help please, Python Gods! :) I was initially confused by your example, since the size of the underlying set is the same as of the individual permutations. In this case, you are just asking for all permutations of the set. As far as I can tell from a quick googling, the relevant definition of k-permutation is an ordered list of k elements from the set. Thus your first example does not really generate k_permutations. A quick solution, not extensively tested # base needs to be an of the python builtin set def k_perm(base,k): for e in base: if k == 1: yield [e] else: for perm in k_perm(base-set([e]),k-1): yield [e] + perm list(k_perm(set([1,2,3,4]),3)) [[1, 2, 3], [1, 2, 4], [1, 3, 2], [1, 3, 4], [1, 4, 2], [1, 4, 3], [2, 1, 3], [2, 1, 4], [2, 3, 1], [2, 3, 4], [2, 4, 1], [2, 4, 3], [3, 1, 2], [3, 1, 4], [3, 2, 1], [3, 2, 4], [3, 4, 1], [3, 4, 2], [4, 1, 2], [4, 1, 3], [4, 2, 1], [4, 2, 3], [4, 3, 1], [4, 3, 2]] Much to my initial surprise, it works for k1 as well: list(k_perm(set([1,2,3,4]),-1)) [] Hope this helps Nis thank you!. i'll give this a try. seems to do exactly what i need. sorry about the ambiguity in my example. i chose 3 from 3 merely to keep the size of the example case small, without thinking enough about how it might cause confusion. -- http://mail.python.org/mailman/listinfo/python-list
matching objects by a tuple field criterion
i have a large collection of python objects, each of which contains an integer 6-tuple as part of its data payload. what i need to be able to do is select only those objects which meet a simple tuple element wildcard matching criterion. e.g. given the following python objects: object A includes tuple (1,2,3,4,5,6) object B includes tuple (1,4,4,4,11,1) object C includes tuple (1,3,9,1,1,1) all tuples are unique. for what it's worth, the values in each field are independent of the other fields and range from 1 to 14. although 'large', my collection is sparse cf. the 14^6 possible tuples. i want to search on *one only* tuple field/value. if my search criterion is (*,*,*,4,*,*), then i want to find object A and object B. if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only ever specify an integer match for one tuple field. i can think of some naive approaches, but is there an elegant way to do this? -- http://mail.python.org/mailman/listinfo/python-list
Re: matching objects by a tuple field criterion
Instead of passing a wild-card tuple like (*,*,*,4,*,*) simply pass the integer you want to match and the position you want to match it in. for sure. that was more for expository purpose rather than how i was planning to go about it. As a generator expression: (obj for obj in list_of_objects if obj.data[what] == where) above or equivalent list comprehension was what i had in mind as far as linear search goes. and scanning the list like this will most likely be 'good enough' performance-wise. however, partly just out of curiosity, i was wondering if there is some kind of data structure which might let me find all the matches a bit faster. -- http://mail.python.org/mailman/listinfo/python-list
Re: matching objects by a tuple field criterion
quite so, i rephrased docstring to be: criteria is an iterable containing either '*' instances or strings of comma-separated integers. e.g. ['*','1,2,3', '11,12'] thanks very much for the idea! upon further reflection, this seems to be a more elegant solution for my case than the ad-hoc generator or list comprehension approach. this is because i *do* have to filter data based on multiple single field criteria and i know all of these criteria at the same time - so it makes a lot of sense to do as you have done and then only do one filter operation to pull out all the objects i am interested in. -- http://mail.python.org/mailman/listinfo/python-list
Re: matching objects by a tuple field criterion
There are certainly cases where the speedup is tremendous - think of a single integer in the first criteria - but then the overall performance depends on the real-live queries. If lot's of wildcards are used, you might end up slower if the tree-walk takes more time than the C-implemented list-iteration will cost you. hopefully, won't need to do this then. i'll test the filter approach tonight. generally speaking, going to be mostly wildcards most of the time. -- http://mail.python.org/mailman/listinfo/python-list
Re: speeding things up with C++
Are you sure you want an STL container? Since the primary operator here is Python, the extra benefits from the STL container over plain C arrays isn't as evident. Pyrex is a good way to write the interface between your C++ code and the Python code - it handles the refcounting and boilerplate for you - and perhaps for writing the algorithms as well, depending on how complicated and performance sensitive they are. good point. while i bow to the genius of the folks who invented template metaprogramming, the compiler error messages tend to be profoundly depressing :). one way or the other, pyrex is something i need to learn since i'm now completely enamoured with python and had better develop an arsenal of tricks for the rare times when it's just not fast enough. Also, using numeric/Numarray can be a very big win. It can potentially save you a fair amount of marshalling overhead. as i understand it, this is so for applying the likes of matrix operations, autocorrelations, FFTs, etc...where python essentially provides scripting glue to some highly optimised C functions. i'm assuming that the kind of algorithm i am looking at which involves some set operations on list elements + copying between lists isn't going to be helped so much by using numpy or similar. -- http://mail.python.org/mailman/listinfo/python-list
Re: speeding things up with C++
Thanks this is good news. I think my C/C++ background is sufficient to manage to figure things out if I RTFM carefully. Basically I want to pass in a Python list of integer tuples, create an STL container full of equivalent tuples, apply some processor- intensive algorithm to said list of tuples, and finally poke the results back into another Python list of integer tuples and return it to the calling Python environment. Data structures are well-defind and simple, and the most complex case would be 3-deep nested list, so I will seriously consider figuring out how to do it manually as you suggest. On May 31, 3:04 am, Jorgen Grahn [EMAIL PROTECTED] wrote: On 26 May 2007 02:19:39 -0700, bullockbefriending bard [EMAIL PROTECTED] wrote: ... Essentially, I need to pass a list of 6-tuples containing only integers to my new sadly necessary super-fast compiled language function which i am not looking forward to writing: input: [(1,2,3,4,5,6), (7,8,9,10,11,12),...] and after much thrashing of processor resources, return data which looks like this to the Python calling environment: output: [( (1, 2), (1,), (12,), (13), (1, 7, 11), (9,) ), ( another nested tuple like preceding one ), ] ... However, I hope someone reading this will be able to tell me that I'm being a total pessimist and that in fact it isn't very difficult to do what I want to do using SWIG. You're talking about the actual conversion between Python data structures and C or C++ data structures? That is easy to do even manually, IMHO -- provided a decent C background. Have a look in the Python/C API Reference Manual, and the mapping becomes clear. The PyListObject stuff for example, where there's a C function for every basic operation on lists, and where the elements have the C type PyObject. And so on. Mapping to C is just a matter of walking a nested data structure, where you have a good idea what it is supposed to look like (a list of six-tuples of some kind of numbers). /Jorgen -- // Jorgen Grahn grahn@Ph'nglui mglw'nafh Cthulhu \X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn! -- http://mail.python.org/mailman/listinfo/python-list
Re: speeding things up with C++
thanks! i'll look into this. On May 27, 5:35 am, Che Guevara [EMAIL PROTECTED] wrote: On May 26, 11:19 am, bullockbefriending bard [EMAIL PROTECTED] wrote: However, I hope someone reading this will be able to tell me that I'm being a total pessimist and that in fact it isn't very difficult to do what I want to do using SWIG. I'm not asking for a complete solution, more like some general pointers from someone who has actually done something similar before. I do stuff like that with ctypes (http://docs.python.org/lib/module-ctypes.html )! I'm sure you can figure out some way to pass/return data to/from your C ++ lib. The easiest way - use strings or arrays, but if you dig deeper into ctyles documentation, I'm sure you will find a better solution! Good luck, -- misreckoning -- http://mail.python.org/mailman/listinfo/python-list
Re: speeding things up with C++
I wonder if Jython might be the answer? Java is going to be faster than Python for the time-critical part of my program. Does anybody have experience getting data structures like nested lists / tuples into a java routine from a running jython program (and then back again)? -- http://mail.python.org/mailman/listinfo/python-list
Re: speeding things up with C++
thanks. i'll definitely look into this. On May 28, 10:48 pm, Kay Schluehr [EMAIL PROTECTED] wrote: On May 26, 11:19 am, bullockbefriending bard [EMAIL PROTECTED] wrote: I've done all the requisite profiling and thought fairly deeply about the efficiency of my python code, but am still going to have to speed up the innermost guts of what I am doing. Essentially, I need to pass a list of 6-tuples containing only integers to my new sadly necessary super-fast compiled language function which i am not looking forward to writing: input: [(1,2,3,4,5,6), (7,8,9,10,11,12),...] and after much thrashing of processor resources, return data which looks like this to the Python calling environment: output: [( (1, 2), (1,), (12,), (13), (1, 7, 11), (9,) ), ( another nested tuple like preceding one ), ] Each member of the returned list is a tuple containing 6 tuples, and each of those 6 tuples has at least one integer member. It would also be acceptable for the return values to be entirely nested lists instead of having the two innermost sequence types as tuples. I probably want to be using something like C++ because i'm going to need to use STL vectors and sets for what i'm doing in the algorithm i'm trying to speed up. IIRC Boost tuple is a bit painful for more than 10 elements + isn't really essential that I use a a non-mutable data type in what will be a small encapsulated bit of a much larger system. Anyway, I can probably very quickly figure out some hacked way to get the data into my function given that in the worst case I could take advantage of the knowledge that each input tuple always has 6 elements, and simply pass in a big array of ints. Yes, I know this is a bit retarded, but I'm talking worst case assuming on very tight schedule and no time to delve deeply into SWIG or whatever. Similarly it wouldn't be too difficult to return the result as the mother all of all strings which i could then parse fairly easily. However, I hope someone reading this will be able to tell me that I'm being a total pessimist and that in fact it isn't very difficult to do what I want to do using SWIG. I'm not asking for a complete solution, more like some general pointers from someone who has actually done something similar before. As an added bonus, I wouldn't if this is 'easily' doable using Ocaml as the guts instead of C++, I'd be happy to know about it. A just too obvious recommendation is to drop C++ and use D together with the Pyd/celerid bridge. Yeah, it sounds a little esoteric but D is really the Python among the statically typed imperative languages. http://www.digitalmars.com/d/http://pyd.dsource.org/index.html -- http://mail.python.org/mailman/listinfo/python-list
speeding things up with C++
I've done all the requisite profiling and thought fairly deeply about the efficiency of my python code, but am still going to have to speed up the innermost guts of what I am doing. Essentially, I need to pass a list of 6-tuples containing only integers to my new sadly necessary super-fast compiled language function which i am not looking forward to writing: input: [(1,2,3,4,5,6), (7,8,9,10,11,12),...] and after much thrashing of processor resources, return data which looks like this to the Python calling environment: output: [( (1, 2), (1,), (12,), (13), (1, 7, 11), (9,) ), ( another nested tuple like preceding one ), ] Each member of the returned list is a tuple containing 6 tuples, and each of those 6 tuples has at least one integer member. It would also be acceptable for the return values to be entirely nested lists instead of having the two innermost sequence types as tuples. I probably want to be using something like C++ because i'm going to need to use STL vectors and sets for what i'm doing in the algorithm i'm trying to speed up. IIRC Boost tuple is a bit painful for more than 10 elements + isn't really essential that I use a a non-mutable data type in what will be a small encapsulated bit of a much larger system. Anyway, I can probably very quickly figure out some hacked way to get the data into my function given that in the worst case I could take advantage of the knowledge that each input tuple always has 6 elements, and simply pass in a big array of ints. Yes, I know this is a bit retarded, but I'm talking worst case assuming on very tight schedule and no time to delve deeply into SWIG or whatever. Similarly it wouldn't be too difficult to return the result as the mother all of all strings which i could then parse fairly easily. However, I hope someone reading this will be able to tell me that I'm being a total pessimist and that in fact it isn't very difficult to do what I want to do using SWIG. I'm not asking for a complete solution, more like some general pointers from someone who has actually done something similar before. As an added bonus, I wouldn't if this is 'easily' doable using Ocaml as the guts instead of C++, I'd be happy to know about it. -- http://mail.python.org/mailman/listinfo/python-list
regex matching question
first, regex part: I am new to regexes and have come up with the following expression: ((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),(1[0-4]|[1-9]) to exactly match strings which look like this: 1,2/3,4/5,6/7,8/9,10/11,12 i.e. 6 comma-delimited pairs of integer numbers separated by the backslash character + constraint that numbers must be in range 1-14. i should add that i am only interested in finding exact matches (doing some kind of command line validation). this seems to work fine, although i would welcome any advice about how to shorten the above. it seems to me that there should exist some shorthand for (1[0-4]|[1-9]) once i have defined it once? also (and this is where my total beginner status brings me here looking for help :)) i would like to add one more constraint to the above regex. i want to match strings *iff* each pair of numbers are different. e.g: 1,1/3,4/5,6/7,8/9,10/11,12 or 1,2/3,4/5,6/7,8/9,10/12,12 should fail to be matched by my final regex whereas 1,2/3,4/5,6/7,8/9,10/11,12 should match OK. any tips would be much appreciated - especially regarding preceding paragraph! and now for the python part: results = 1,2/3,4/5,6/7,8/9,10/11,12 match = re.match(((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]), (1[0-4]|[1-9]), results) if match == None or match.group(0) != results: raise FormatError(Error in format of input string: %s % (results)) results = [leg.split(',') for leg in results.split('/')] # = [['1', '2'], ['3', '4'], ['5', '6'], ['7', '8'], ['9', '10'], ['11', '12']] . . . the idea in the above code being that i want to use the regex match as a test of whether or not the input string (results) is correctly formatted. if the string results is not exactly matched by the regex, i want my program to barf an exception and bail out. apart from whether or not the regex is good idiom, is my approach suitably pythonic? TIA for any help here. -- http://mail.python.org/mailman/listinfo/python-list
Re: regex matching question
thanks for your suggestion. i had already implemented the all pairs different constraint in python code. even though i don't really need to give very explicit error messages about what might be wrong with my data (obviously easier to do if do all constraint validation in code rather than one regex), there is something to be said for your suggestion to simplify my regex further - it might be sensible from a maintainability/readability perspective to use regex for *format* validation and then validate all *values* in code. from my cursory skimming of friedl, i get the feeling that the all pairs different constraint would give rise to some kind of fairly baroque expression, perhaps likely to bring to mind the following quotation from samuel johnson: Sir, a woman's preaching is like a dog's walking on his hind legs. It is not done well; but you are surprised to find it done at all. however, being human, sometimes some things should be done, just because they can :)... so if anyone knows hows to do it, i'm still interested, even if just out of idle curiosity! On May 20, 12:57 am, Marc 'BlackJack' Rintsch [EMAIL PROTECTED] wrote: In [EMAIL PROTECTED], bullockbefriending bard wrote: first, regex part: I am new to regexes and have come up with the following expression: ((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),(1[0-4]|[1-9]) to exactly match strings which look like this: 1,2/3,4/5,6/7,8/9,10/11,12 i.e. 6 comma-delimited pairs of integer numbers separated by the backslash character + constraint that numbers must be in range 1-14. i should add that i am only interested in finding exact matches (doing some kind of command line validation). [...] the idea in the above code being that i want to use the regex match as a test of whether or not the input string (results) is correctly formatted. if the string results is not exactly matched by the regex, i want my program to barf an exception and bail out. apart from whether or not the regex is good idiom, is my approach suitably pythonic? I would use a simple regular expression to extract candidates and a Python function to split the candidate and check for the extra constraints. Especially the all pairs different constraint is something I would not even attempt to put in a regex. For searching candidates this should be good enough:: r'(\d+,\d+/){5}\d+,\d+' Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: regex matching question
Backslash? Your example uses a [forward] slash. correct.. my mistake. i use forward slashes. Are you sure you don't want to allow for some spaces in the data, for the benefit of the humans, e.g. 1,2 / 3,4 / 5,6 / 7,8 / 9,10 / 11,12 you are correct. however, i am using string as a command line option and can get away without quoting it if there are no optional spaces. Always use raw strings for patterns, even if you don't have backslashes in them -- and this one needs a backslash; see below. knew this, but had not done so in my code because wanted to use '\' as a line continuation character to keep everything within 80 columns. have adopted your advice regarding \Z below and now am using raw string. For clarity, consider using mobj or even m instead of match to name the result of re.match. good point. if match == None or match.group(0) != results: Instead of if mobj == None use if mobj is None ... or if not mobj ... Instead of the or match.group(0) != results caper, put \Z (*not* $) at the end of your pattern: mobj = re.match(rpattern\Z, results) if not mobj: HTH, John very helpful advice. thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: regex matching question
Instead of the or match.group(0) != results caper, put \Z (*not* $) at the end of your pattern: mobj = re.match(rpattern\Z, results) if not mobj: as the string i am matching against is coming from a command line argument to a script, is there any reason why i cannot get away with just $ given that this means that there is no way a newline could find its way into my string? certainly passes all my unit tests as well as \Z. or am i missing the point of \Z ? -- http://mail.python.org/mailman/listinfo/python-list
Re: regex matching question
Here all pairs different means for each pair, both numbers must be different, but they may appear in another pair. That is, won't flag 1,2/3,4/3,5/2,6/8,3/1,2 as invalid, but this wasn't clear from your original post. -- Gabriel Genellina thanks! you are correct that the 'all pairs different' nomenclature is ambiguous. i require that each pair have different values, but is OK for different pairs to be identical... so exactly as per your code snippet. -- http://mail.python.org/mailman/listinfo/python-list
Re: regex matching question
No way? Famous last words :-) C:\junktype showargs.py import sys; print sys.argv C:\junk\python25\python Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import subprocess subprocess.call('\\python25\\python showargs.py teehee\n') ['showargs.py', 'teehee\n'] can't argue with that :) back to \Z -- http://mail.python.org/mailman/listinfo/python-list
Re: List comprehension returning subclassed list type?
Thanks! I went with extend and generator expression as I *am* dealing with rather a lot of data. Now I think I'm going to go on a little hunt through my code looking for more places where I should replace list comprehensions with generator expressions - bit of a newbie here. On Mar 25, 3:57 pm, Steven D'Aprano [EMAIL PROTECTED] wrote: On Sat, 24 Mar 2007 23:43:10 -0700, bullockbefriending bard wrote: z_list = [Z(y.var1, y.var2,..) for y in list_of_objects_of_class_Y] Of course this just gives me a plain list and no access to the methodsof z_list. List comprehensions give you a list. If you want to convert that list into the type of z_list, you need to do it yourself. Since ZList sub-classes from list, probably the easiest way is just: z_list = ZList([some list comprehension here]) I could, of course go and write a static method in ZList which takes a plain list of Z objects and returns a ZList. Yes, that would be one such way. Another way is: z_list.extend([some list comprehension here]) If you are using a recent enough version of Python, you probably don't even need the list comprehension. Just use a generator expression: z_list.extend(Z(y.var1, y.var2,..) for y in list_of_objects_of_class_Y) That's especially useful if the list of objects is huge, because it avoids creating the list twice: once in the list comp, and once as z_list. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
List comprehension returning subclassed list type?
Given: class Z(object): various defs, etc. class ZList(list): various defs, etc. i would like to be able to replace z_list = ZList() for y in list_of_objects_of_class_Y: z_list.append(y) with something like this: z_list = [Z(y.var1, y.var2,..) for y in list_of_objects_of_class_Y] Of course this just gives me a plain list and no access to the methodsof z_list. I could, of course go and write a static method in ZList which takes a plain list of Z objects and returns a ZList. Anyway, my question is whether or not this can be done more elegantly via list comprehension? -- http://mail.python.org/mailman/listinfo/python-list
6 Pick Bet Grouping
(I apologise in advance for posting something slightly OT, but plead in mitigation that I'm trying to re-write an old, suboptimal VB6 (erk) brute-force attack in a shiny, elegant, pythonic manner. I would really appreciate some ideas about an appropriate algorithmic approach to this + pointers to any relevant implementations in Python. Along the way, I'm also trying to get Python into my organisation through the back door - so a further plea for tolerance in posting this here! :)) A single 6 Pick bet looks like this: RACE1RACE2RACE3RACE4RACE5 RACE6 runner1 / runner 2 / runner 3 / runner4 / runner5 / runner6 - $amount e.g. we might have: 5 / 3 / 11 / 7 / 1 / 9 - $50 (5 to come 1st or 2nd in Race1, 3 to come first or 2nd in Race 2, etc.) 7 / 3 / 11 / 7 / 1 / 9 - $50 5 / 3 / 11 / 14 / 1 / 9 - $50 7 / 3 / 11 / 14 / 1 / 9 - $50 The totalizator system allows us to merge or group these four bets as follows: 5 + 7 / 3 / 11 / 7 + 14 / 1 / 9 - $50 ($200 total) This method of expressing multiple bets in one line is advantageous because we may have many thousands of combinations we wish to bet and only a limited amount of time in which to bet them. There are up to 14 horses in each race, so 7,529,536 possible 6 Pick bets. In practice, one might wish to bet (say)15,000 combinations out of these 7.5 million. However, it would be very nice to be able to *optimally* merge (as shown above) these 15,000 combinations so that they might fit on (say) 2,000 betting tickets instead of trying to write out 15,000 single tickets. To keep things simple, let's assume that all single bets are for the same amount. (In practice, however, this is not so.) Now, it's certainly possible to go about this via brute force iteration, but I would really appreciate it if anyone with a CS background could point me toward a smarter plan of attack. I have perused Skiena's Algorithm Handbook and various websites and can't seem to find an analogous problem. I'm hoping this is just my ignorance and that my brief exposition rings a bell for someone here. -- http://mail.python.org/mailman/listinfo/python-list
Re: 6 Pick Bet Grouping
Your explanation is correct. It's important to realise that effectively the input many thousands of single bets come out of a black box and it is then necessary for me to merge them into the grouped format to make things more manageable. Given an arbitrary bucket of single bets, it is by no means obvious how they should be grouped without doing something algorithmic. In grouping, it is also important to not inadvertently introduce any new single bets. For the purpose of this discussion, I am also assuming that all bets are the same size. That's not the case, but can leave that little twist for another day. -- http://mail.python.org/mailman/listinfo/python-list
Re: 6 Pick Bet Grouping
Sorry, I perhaps didn't frame my initial post very well. I hope my reply to your other post below has answered these questions. -- http://mail.python.org/mailman/listinfo/python-list