Re: Enumerating k-segmentations of a sequence

2008-11-26 Thread bullockbefriending bard
On Nov 26, 12:15 am, [EMAIL PROTECTED] wrote:
 bullockbefriending bard napisa³(a):



  I'm not sure if my terminology is precise enough, but what I want to
  do is:

  Given an ordered sequence of n items, enumerate all its possible k-
  segmentations.

  This is *not* the same as enumerating the k set partitions of the n
  items because I am only interested in those set partitions which
  preserve the order of the items. i.e. if the input sequence is (1 2 3
  4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4
  5)) is acceptable. Hence use of term 'segmentations'.

  e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function
  which enumerates the following 3-segmentations:

  ((1 2 3) (4) (5))
  ((1 2) (3 4) (5))
  ((1 2) (3) (4 5))
  ((1) (2 3 4) (5))
  ((1) (2 3) (4 5))
  ((1) (2) (3 4 5))

  The reason I want to do this is to use it in some simple one-
  dimensional clustering, i.e. each set item (won't be just integers as
  above) will have a value of interest, and i'll select the particular
  set partition which minimises Sigma SSD (sum of squared differences
  from mean) of these values.

  It seems overkill to generate the full list of set partitions
  [including e.g. ((1 4) (2) (3 5))] because I intend to begin by
  sorting the input sequence such that element 1  element 2  ... 
  element n.

  Structural Recursion not being my strong point, any ideas on how to go
  about this would be much appreciated!

 I'm not sure if I correctly understood the semantics of what you need.
 Anyway it looks like a nice exercise in nested generators:

 def segmentations(sequence, k):
         assert 1 = k = len(sequence)
         if k == 1:
                 yield [sequence]
         else:
                 for headlen in xrange(1, len(sequence) - k + 2):
                         head, tail = sequence[:headlen], sequence[headlen:]
                         for tail_seg in segmentations(tail, k - 1):
                                 yield [head] + tail_seg

 for segmentation in segmentations((1, 2, 3, 4, 5), 3):
         print segmentation

 Regards,
 Marek

Exactly what I needed. Thank you all for examples and also for
explaining how to think about problems of this nature!

--
http://mail.python.org/mailman/listinfo/python-list


Enumerating k-segmentations of a sequence

2008-11-25 Thread bullockbefriending bard
I'm not sure if my terminology is precise enough, but what I want to
do is:

Given an ordered sequence of n items, enumerate all its possible k-
segmentations.

This is *not* the same as enumerating the k set partitions of the n
items because I am only interested in those set partitions which
preserve the order of the items. i.e. if the input sequence is (1 2 3
4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4
5)) is acceptable. Hence use of term 'segmentations'.

e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function
which enumerates the following 3-segmentations:

((1 2 3) (4) (5))
((1 2) (3 4) (5))
((1 2) (3) (4 5))
((1) (2 3 4) (5))
((1) (2 3) (4 5))
((1) (2) (3 4 5))

The reason I want to do this is to use it in some simple one-
dimensional clustering, i.e. each set item (won't be just integers as
above) will have a value of interest, and i'll select the particular
set partition which minimises Sigma SSD (sum of squared differences
from mean) of these values.

It seems overkill to generate the full list of set partitions
[including e.g. ((1 4) (2) (3 5))] because I intend to begin by
sorting the input sequence such that element 1  element 2  ... 
element n.

Structural Recursion not being my strong point, any ideas on how to go
about this would be much appreciated!

--
http://mail.python.org/mailman/listinfo/python-list


design choice: multi-threaded / asynchronous wxpython client?

2008-04-27 Thread bullockbefriending bard
I am a complete ignoramus and newbie when it comes to designing and
coding networked clients (or servers for that matter). I have a copy
of Goerzen (Foundations of Python Network Programming) and once
pointed in the best direction should be able to follow my nose and get
things sorted... but I am not quite sure which is the best path to
take and would be grateful for advice from networking gurus.

I am writing a program to display horse racing tote odds in a desktop
client program. I have access to an HTTP (open one of several URLs,
and I get back an XML doc with some data... not XML-RPC.) source of
XML data which I am able to parse and munge with no difficulty at all.
I have written and successfully tested a simple command line program
which allows me to repeatedly poll the server and parse the XML. Easy
enough, but the real world production complications are:

1) The data for the race about to start updates every (say) 15
seconds, and the data for earlier and later races updates only every
(say) 5 minutes. There is  no point for me to be hammering the server
with requests every 15 seconds for data for races after the upcoming
race... I should query for this perhaps every 150s to be safe. But for
the upcoming race, I must not miss any updates and should query every
~7s to be safe. So... in the middle of  a race meeting the situation
might be:
race 1 (race done with, no-longer querying), race 2 (race done with,
no longer querying) race 3 (about to start, data on server for this
race updating every 15s, my client querying every 7s), races 4-8 (data
on server for these races updating every 5 mins, my client querying
every 2.5 mins)

2) After a race has started and betting is cut off and there are
consequently no more tote updates for that race (it is possible to
determine when this occurs precisely because of an attribute in the
XML data), I need to stop querying (say) race 3 every 7s and remove
race 4 from the 150s query group and begin querying its data every 7s.

3) I need to dump this data (for all races, not just current about to
start race) to text files, store it as BLOBs in a DB *and* update real
time display in a wxpython windowed client.

My initial thought was to have two threads for the different update
polling cycles. In addition I would probably need another thread to
handle UI stuff, and perhaps another for dealing with file/DB data
write out. But, I wonder if using Twisted is a better idea? I will
still need to handle some threading myself, but (I think) only for
keeping wxpython happy by doing all this other stuff off the main
thread + perhaps also persisting received data in yet another thread.

I have zero experience with these kinds of design choices and would be
very happy if those with experience could point out the pros and cons
of each (synchronous/multithreaded,  or Twisted) for dealing with the
two differing sample rates problem outlined above.

Many TIA!




--
http://mail.python.org/mailman/listinfo/python-list


Re: design choice: multi-threaded / asynchronous wxpython client?

2008-04-27 Thread bullockbefriending bard
On Apr 27, 10:05 pm, Eric Wertman [EMAIL PROTECTED] wrote:
 HI, that does look like a lot of fun... You might consider breaking
 that into 2 separate programs.  Write one that's threaded to keep a db
 updated properly, and write a completely separate one to handle
 displaying data from your db.  This would allow you to later change or
 add a web interface without having to muck with the code that handles
 data.

Thanks for the good point. It certainly is a lot of 'fun'. One of
those jobs which at first looks easy (XML, very simple to parse data),
but a few gotchas in the real-time nature of the beast.

After thinking about your idea more, I am sure this decoupling of
functions and making everything DB-centric can simplify a lot of
issues. I quite like the idea of persisting pickled or YAML data along
with the raw XML (for archival purposes + occurs to me I might be able
to do something with XSLT to get it directly into screen viewable form
without too much work) to a DB and then having a client program which
queries most recent time-stamped data for display.

A further complication is that at a later point, I will want to do
real-time time series prediction on all this data (viz. predicting
actual starting prices at post time x minutes in the future). Assuming
I can quickly (enough) retrieve the relevant last n tote data samples
from the database in order to do this, then it will indeed be much
simpler to make things much more DB-centric... as opposed to
maintaining all this state/history in program data structures and
updating it in real time.
--
http://mail.python.org/mailman/listinfo/python-list


Re: design choice: multi-threaded / asynchronous wxpython client?

2008-04-27 Thread bullockbefriending bard
On Apr 27, 10:10 pm, David [EMAIL PROTECTED] wrote:
   1) The data for the race about to start updates every (say) 15
   seconds, and the data for earlier and later races updates only every
   (say) 5 minutes. There is  no point for me to be hammering the server
   with requests every 15 seconds for data for races after the upcoming

 Try using an HTTP HEAD instruction instead to check if the data has
 changed since last time.

Thanks for the suggestion... am I going about this the right way here?

import urllib2
request = urllib2.Request(http://get-rich.quick.com;)
request.get_method = lambda: HEAD
http_file = urllib2.urlopen(request)

print http_file.headers

-
Age: 0
Date: Sun, 27 Apr 2008 16:07:11 GMT
Content-Length: 521
Content-Type: text/xml; charset=utf-8
Expires: Sun, 27 Apr 2008 16:07:41 GMT
Cache-Control: public, max-age=30, must-revalidate
Connection: close
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Via: 1.1 jcbw-nc3 (NetCache NetApp/5.5R4D6)

Date is the time of the server response and not last data update. Data
is definitely time of server response to my request and bears no
relation to when the live XML data was updated. I know this for a fact
because right now there is no active race meeting and any data still
available is static and many hours old. I would not feel confident
rejecting incoming data as duplicate based only on same content length
criterion. Am I missing something here?

Actually there doesn't seem to be too much difficulty performance-wise
in fetching and parsing (minidom) the XML data and checking the
internal (it's an attribute) update time stamp in the parsed doc. If
timings  got really tight, presumably I could more quickly check each
doc's time stamp with SAX (time stamp comes early in data as one might
reasonably expect) before deciding whether to go the whole hog with
minidom if the time stamp has in fact changed since I last polled the
server.

But if there is something I don't get about HTTP HEAD approach, please
let me know as a simple check like this would obviously be a good
thing for me.
--
http://mail.python.org/mailman/listinfo/python-list


Re: design choice: multi-threaded / asynchronous wxpython client?

2008-04-27 Thread bullockbefriending bard
On Apr 27, 11:12 pm, Jorge Godoy [EMAIL PROTECTED] wrote:
 bullockbefriending bard wrote:
  A further complication is that at a later point, I will want to do
  real-time time series prediction on all this data (viz. predicting
  actual starting prices at post time x minutes in the future). Assuming
  I can quickly (enough) retrieve the relevant last n tote data samples
  from the database in order to do this, then it will indeed be much
  simpler to make things much more DB-centric... as opposed to
  maintaining all this state/history in program data structures and
  updating it in real time.

 If instead of storing XML and YAML you store the data points, you can do
 everything from inside the database.

 PostgreSQL supports Python stored procedures / functions and also support
 using R in the same way, for manipulating data.  Then you can work with
 everything and just retrieve the resulting information.

 You might try storing the raw data and the XML / YAML, but I believe that
 keeping those sync'ed might cause you some extra work.

Tempting thought, but one of the problems with this kind of horse
racing tote data is that a lot of it is for combinations of runners
rather than single runners. Whilst there might be (say) 14 horses in a
race, there are 91 quinella price combinations (1-2 through 13-14,
i.e. the 2-subsets of range(1, 15)) and 364 trio price combinations.
It is not really practical (I suspect) to have database tables with
columns for that many combinations?

I certainly DO have a horror of having my XML / whatever else formats
getting out of sync. I also have to worry about the tote company later
changing their XML format. From that viewpoint, there is indeed a lot
to be said for storing the tote data as numbers in tables.
--
http://mail.python.org/mailman/listinfo/python-list


Re: design choice: multi-threaded / asynchronous wxpython client?

2008-04-27 Thread bullockbefriending bard
On Apr 27, 11:27 pm, BJörn Lindqvist [EMAIL PROTECTED] wrote:
 I think twisted is overkill for this problem. Threading, elementtree
 and urllib should more than suffice. One thread polling the server for
 each race with the desired polling interval. Each time some data is
 treated, that thread sends a signal containing information about what
 changed. The gui listens to the signal and will, if needed, update
 itself with the new information. The database handler also listens to
 the signal and updates the db.

So, if i understand you correctly:

Assuming 8 races and we are just about to start the race 1, we would
have 8 polling threads with the race 1 thread polling at faster rate
than the other ones. after race 1 betting closed, could dispense with
that thread, change race 2 thread to poll faster, and so on...? I had
been rather stupidly thinking of just two polling threads, one for the
current race and one for races not yet run... but starting out with a
thread for each extant race seems simpler given there then is no need
to handle the mechanics of shifting the polling of races from the
omnibus slow thread to the current race fast thread.

Having got my minidom parser working nicely, I'm inclined to stick
with it for now while I get other parts of the problem licked into
shape. However, I do take your point that it's probably overkill for
this simple kind of structured, mostly numerical data and will try to
find time to experiment with the elementtree approach later. No harm
at all in shaving the odd second off document parse times.
--
http://mail.python.org/mailman/listinfo/python-list


Generator for k-permutations without repetition

2007-07-04 Thread bullockbefriending bard
I was able to google a recipe for a k_permutations generator,  such
that i can write:

x = range(1, 4)  # (say)
[combi for combi in k_permutations(x, 3)] =

[[1, 1, 1], [1, 1, 2], [1, 1, 3], [1, 2, 1], [1, 2, 2], [1, 2, 3], [1,
3, 1], [1, 3, 2], [1, 3, 3], [2, 1, 1], [2, 1, 2], [2, 1, 3], [2, 2,
1], [2, 2, 2], [2, 2, 3], [2, 3, 1], [2, 3, 2], [2, 3, 3], [3, 1, 1],
[3, 1, 2], [3, 1, 3], [3, 2, 1], [3, 2, 2], [3, 2, 3], [3, 3, 1], [3,
3, 2], [3, 3, 3]]

but what i really want is the above without repetition, i.e.:

[combi for combi in k_permutations_without_repetitions(x, 3)] =

[[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]

For smallish lists, it's no problem to generate the list as follows:

no_reps_combis = [combi for combi in k_permutations(x, 3) if \
 
len(set(combi)) == 3]

but i'm sure there must be a way to do this as a pure generator, just
that i wouldn't have a clue how to go about it.

Help please, Python Gods! :)

Apologies in advance if I have got my terminology wrong and have been
googling the wrong stuff and therefore coming up blank.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generator for k-permutations without repetition

2007-07-04 Thread bullockbefriending bard
On Jul 4, 7:09 pm, Nis Jørgensen [EMAIL PROTECTED] wrote:
 bullockbefriending bard skrev:



  I was able to google a recipe for a k_permutations generator,  such
  that i can write:

  x = range(1, 4)  # (say)
  [combi for combi in k_permutations(x, 3)] =

  [[1, 1, 1], [1, 1, 2], [1, 1, 3], [1, 2, 1], [1, 2, 2], [1, 2, 3], [1,
  3, 1], [1, 3, 2], [1, 3, 3], [2, 1, 1], [2, 1, 2], [2, 1, 3], [2, 2,
  1], [2, 2, 2], [2, 2, 3], [2, 3, 1], [2, 3, 2], [2, 3, 3], [3, 1, 1],
  [3, 1, 2], [3, 1, 3], [3, 2, 1], [3, 2, 2], [3, 2, 3], [3, 3, 1], [3,
  3, 2], [3, 3, 3]]

  but what i really want is the above without repetition, i.e.:

  [combi for combi in k_permutations_without_repetitions(x, 3)] =

  [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]

  For smallish lists, it's no problem to generate the list as follows:

  no_reps_combis = [combi for combi in k_permutations(x, 3) if \

  len(set(combi)) == 3]

  but i'm sure there must be a way to do this as a pure generator, just
  that i wouldn't have a clue how to go about it.

  Help please, Python Gods! :)

 I was initially confused by your example, since the size of the
 underlying set is the same as of the individual permutations. In this
 case, you are just asking for all permutations of the set.

 As far as I can tell from a quick googling, the relevant definition of
 k-permutation is an ordered list of k elements from the set. Thus your
 first example does not really generate k_permutations.

 A quick solution, not extensively tested

 # base needs to be an of the python builtin  set

 def k_perm(base,k):
 for e in base:  
 if k == 1:
 yield [e]  
 else:
 for perm in k_perm(base-set([e]),k-1):
 yield [e] + perm

  list(k_perm(set([1,2,3,4]),3))

 [[1, 2, 3], [1, 2, 4], [1, 3, 2], [1, 3, 4], [1, 4, 2], [1, 4, 3], [2,
 1, 3], [2, 1, 4], [2, 3, 1], [2, 3, 4], [2, 4, 1], [2, 4, 3], [3, 1, 2],
 [3, 1, 4], [3, 2, 1], [3, 2, 4], [3, 4, 1], [3, 4, 2], [4, 1, 2], [4, 1,
 3], [4, 2, 1], [4, 2, 3], [4, 3, 1], [4, 3, 2]]

 Much to my initial surprise, it works for k1 as well:

  list(k_perm(set([1,2,3,4]),-1))

 []

 Hope this helps

 Nis

thank you!. i'll give this a try. seems to do exactly what i need.
sorry about the ambiguity in my example. i chose 3 from 3 merely to
keep the size of the example case small, without thinking enough about
how it might cause confusion.

-- 
http://mail.python.org/mailman/listinfo/python-list

matching objects by a tuple field criterion

2007-06-10 Thread bullockbefriending bard
i have a large collection of python objects, each of which contains an
integer 6-tuple as part of its data payload. what i need to be able to
do is select only those objects which meet a simple tuple element
wildcard matching criterion. e.g. given the following python objects:

object A includes tuple (1,2,3,4,5,6)
object B includes tuple (1,4,4,4,11,1)
object C includes tuple (1,3,9,1,1,1)

all tuples are unique. for what it's worth, the values in each field
are independent of the other fields and range from 1 to 14. although
'large', my collection is sparse cf. the 14^6 possible tuples.

i want to search on *one only* tuple field/value. if my search
criterion is (*,*,*,4,*,*), then i want to find object A and object B.
if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
ever specify an integer match for one tuple field.

i can think of some naive approaches, but is there an elegant way to
do this?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: matching objects by a tuple field criterion

2007-06-10 Thread bullockbefriending bard

 Instead of passing a wild-card tuple like (*,*,*,4,*,*) simply pass the
 integer you want to match and the position you want to match it in.

for sure. that was more for expository purpose rather than how i was
planning to go about it.


 As a generator expression:

 (obj for obj in list_of_objects if obj.data[what] == where)

above or equivalent list comprehension was what i had in mind as far
as linear search goes. and scanning the list like this will most
likely be 'good enough' performance-wise. however, partly just out of
curiosity, i was wondering if there is some kind of data structure
which might let me find all the matches a bit faster.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: matching objects by a tuple field criterion

2007-06-10 Thread bullockbefriending bard
quite so, i rephrased docstring to be:

criteria is an iterable containing either '*' instances or  strings
of comma-separated integers. e.g.  ['*','1,2,3', '11,12']

thanks very much for the idea! upon further reflection, this seems to
be a more elegant solution for my case than the ad-hoc generator or
list comprehension approach. this is because i *do* have to filter
data based on multiple single field criteria and i know all of these
criteria at the same time - so it makes a lot of sense to do as you
have done and then only do one filter operation to pull out all the
objects i am interested in.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: matching objects by a tuple field criterion

2007-06-10 Thread bullockbefriending bard
 There are certainly cases where the speedup is tremendous - think of a
 single integer in the first criteria - but then the overall performance
 depends on the real-live queries. If lot's of wildcards are used, you
 might end up slower if the tree-walk takes more time than the
 C-implemented list-iteration will cost you.

hopefully, won't need to do this then. i'll test the filter approach
tonight. generally speaking, going to be mostly wildcards most of the
time.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: speeding things up with C++

2007-06-01 Thread bullockbefriending bard

 Are you sure you want an STL container? Since the primary operator
 here is Python, the extra benefits from the STL container over plain C
 arrays isn't as evident.

 Pyrex is a good way to write the interface between your C++ code and
 the Python code - it handles the refcounting and boilerplate for you -
 and perhaps for writing the algorithms as well, depending on how
 complicated and performance sensitive they are.

good point. while i bow to the genius of the folks who invented
template metaprogramming, the compiler error messages tend to be
profoundly depressing :). one way or the other, pyrex is something i
need to learn since i'm now completely enamoured with python and had
better develop an arsenal of tricks for the rare times when it's just
not fast enough.

 Also, using numeric/Numarray can be a very big win. It can potentially
 save you a fair amount of marshalling overhead.

as i understand it, this is so for applying the likes of matrix
operations, autocorrelations, FFTs, etc...where python essentially
provides scripting glue to some highly optimised C functions. i'm
assuming that the kind of algorithm i am looking at which involves
some set operations on list elements + copying between lists isn't
going to be helped so much by using numpy or similar.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: speeding things up with C++

2007-05-31 Thread bullockbefriending bard
Thanks this is good news. I think my C/C++ background is sufficient to
manage to figure things out if I RTFM carefully.

Basically I want to pass in a Python list of integer tuples, create an
STL container full of equivalent tuples, apply some processor-
intensive algorithm to said list of tuples, and finally poke the
results back into another Python list of integer tuples and return it
to the calling Python environment. Data structures are well-defind and
simple, and the most complex case would be 3-deep nested list, so I
will seriously consider figuring out how to do it manually as you
suggest.

On May 31, 3:04 am, Jorgen Grahn [EMAIL PROTECTED]
wrote:
 On 26 May 2007 02:19:39 -0700, bullockbefriending bard [EMAIL PROTECTED] 
 wrote:
 ...

  Essentially, I need to pass a list of 6-tuples containing only
  integers to my new sadly necessary super-fast compiled language
  function which i am not looking forward to writing:

  input: [(1,2,3,4,5,6), (7,8,9,10,11,12),...]

  and after much thrashing of processor resources, return data which
  looks like this to the Python calling environment:

  output: [( (1, 2), (1,), (12,), (13), (1, 7, 11), (9,) ), (  another
  nested tuple like preceding one  ),  ]
 ...
  However, I hope someone reading this will be able to tell me that I'm
  being a total pessimist and that in fact it isn't very difficult to do
  what I want to do using SWIG.

 You're talking about the actual conversion between Python data
 structures and C or C++ data structures?  That is easy to do even
 manually, IMHO -- provided a decent C background.

 Have a look in the Python/C API Reference Manual, and the mapping
 becomes clear. The PyListObject stuff for example, where there's a C
 function for every basic operation on lists, and where the elements
 have the C type PyObject. And so on.  Mapping to C is just a matter of
 walking a nested data structure, where you have a good idea what it is
 supposed to look like (a list of six-tuples of some kind of numbers).

 /Jorgen

 --
   // Jorgen Grahn grahn@Ph'nglui mglw'nafh Cthulhu
 \X/ snipabacken.dyndns.org  R'lyeh wgah'nagl fhtagn!


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: speeding things up with C++

2007-05-28 Thread bullockbefriending bard
thanks! i'll look into this.

On May 27, 5:35 am, Che Guevara [EMAIL PROTECTED] wrote:
 On May 26, 11:19 am, bullockbefriending bard [EMAIL PROTECTED]
 wrote:

  However, I hope someone reading this will be able to tell me that I'm
  being a total pessimist and that in fact it isn't very difficult to do
  what I want to do using SWIG. I'm not asking for a complete solution,
  more like some general pointers from someone who has actually done
  something similar before.

 I do stuff like that with ctypes 
 (http://docs.python.org/lib/module-ctypes.html
 )!
 I'm sure you can figure out some way to pass/return data to/from your C
 ++ lib.
 The easiest way - use strings or arrays, but if you dig deeper into
 ctyles documentation,
 I'm sure you will find a better solution!

 Good luck,
 -- misreckoning


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: speeding things up with C++

2007-05-28 Thread bullockbefriending bard
I wonder if Jython might be the answer? Java is going to be faster
than Python for the time-critical part of my program. Does anybody
have experience getting data structures like nested lists / tuples
into a java routine from a running jython program (and then back
again)?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: speeding things up with C++

2007-05-28 Thread bullockbefriending bard
thanks. i'll definitely look into this.

On May 28, 10:48 pm, Kay Schluehr [EMAIL PROTECTED] wrote:
 On May 26, 11:19 am, bullockbefriending bard [EMAIL PROTECTED]
 wrote:



  I've done all the requisite profiling and thought fairly deeply about
  the efficiency of my python code, but am still going to have to speed
  up the innermost guts of what I am doing.

  Essentially, I need to pass a list of 6-tuples containing only
  integers to my new sadly necessary super-fast compiled language
  function which i am not looking forward to writing:

  input: [(1,2,3,4,5,6), (7,8,9,10,11,12),...]

  and after much thrashing of processor resources, return data which
  looks like this to the Python calling environment:

  output: [( (1, 2), (1,), (12,), (13), (1, 7, 11), (9,) ), (  another
  nested tuple like preceding one  ),  ]

  Each member of the returned list is a tuple containing 6 tuples, and
  each of those 6 tuples has at least one integer member. It would also
  be acceptable for the return values to be entirely nested lists
  instead of having the two innermost sequence types as tuples.

  I probably want to be using something like C++ because i'm going to
  need to use STL vectors and sets for what i'm doing in the algorithm
  i'm trying to speed up. IIRC Boost tuple is a bit painful for more
  than 10 elements + isn't really essential that I use a a non-mutable
  data type in what will be a small encapsulated bit of a much larger
  system.

  Anyway, I can probably very quickly figure out some hacked way to get
  the data into my function given that in the worst case I could take
  advantage of the knowledge that each input tuple always has 6
  elements, and simply pass in a big array of ints. Yes, I know this is
  a bit retarded, but I'm talking worst case assuming on very tight
  schedule and no time to delve deeply into SWIG or whatever. Similarly
  it wouldn't be too difficult to return the result as the mother all of
  all strings which i could then parse fairly easily.

  However, I hope someone reading this will be able to tell me that I'm
  being a total pessimist and that in fact it isn't very difficult to do
  what I want to do using SWIG. I'm not asking for a complete solution,
  more like some general pointers from someone who has actually done
  something similar before.

  As an added bonus, I wouldn't if this is 'easily' doable using Ocaml
  as the guts instead of C++, I'd be happy to know about it.

 A just too obvious recommendation is to drop C++ and use D together
 with the Pyd/celerid bridge. Yeah, it sounds a little esoteric but D
 is really the Python among the statically typed imperative languages.

 http://www.digitalmars.com/d/http://pyd.dsource.org/index.html


-- 
http://mail.python.org/mailman/listinfo/python-list


speeding things up with C++

2007-05-26 Thread bullockbefriending bard
I've done all the requisite profiling and thought fairly deeply about
the efficiency of my python code, but am still going to have to speed
up the innermost guts of what I am doing.

Essentially, I need to pass a list of 6-tuples containing only
integers to my new sadly necessary super-fast compiled language
function which i am not looking forward to writing:

input: [(1,2,3,4,5,6), (7,8,9,10,11,12),...]

and after much thrashing of processor resources, return data which
looks like this to the Python calling environment:

output: [( (1, 2), (1,), (12,), (13), (1, 7, 11), (9,) ), (  another
nested tuple like preceding one  ),  ]

Each member of the returned list is a tuple containing 6 tuples, and
each of those 6 tuples has at least one integer member. It would also
be acceptable for the return values to be entirely nested lists
instead of having the two innermost sequence types as tuples.

I probably want to be using something like C++ because i'm going to
need to use STL vectors and sets for what i'm doing in the algorithm
i'm trying to speed up. IIRC Boost tuple is a bit painful for more
than 10 elements + isn't really essential that I use a a non-mutable
data type in what will be a small encapsulated bit of a much larger
system.

Anyway, I can probably very quickly figure out some hacked way to get
the data into my function given that in the worst case I could take
advantage of the knowledge that each input tuple always has 6
elements, and simply pass in a big array of ints. Yes, I know this is
a bit retarded, but I'm talking worst case assuming on very tight
schedule and no time to delve deeply into SWIG or whatever. Similarly
it wouldn't be too difficult to return the result as the mother all of
all strings which i could then parse fairly easily.

However, I hope someone reading this will be able to tell me that I'm
being a total pessimist and that in fact it isn't very difficult to do
what I want to do using SWIG. I'm not asking for a complete solution,
more like some general pointers from someone who has actually done
something similar before.

As an added bonus, I wouldn't if this is 'easily' doable using Ocaml
as the guts instead of C++, I'd be happy to know about it.

-- 
http://mail.python.org/mailman/listinfo/python-list


regex matching question

2007-05-19 Thread bullockbefriending bard
first, regex part:

I am new to regexes and have come up with the following expression:
 ((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),(1[0-4]|[1-9])

to exactly match strings which look like this:

 1,2/3,4/5,6/7,8/9,10/11,12

i.e. 6 comma-delimited pairs of integer numbers separated by the
backslash character + constraint that numbers must be in range 1-14.

i should add that i am only interested in finding exact matches (doing
some kind of command line validation).

this seems to work fine, although i would welcome any advice about how
to shorten the above. it seems to me that there should exist some
shorthand for (1[0-4]|[1-9]) once i have defined it once?

also (and this is where my total beginner status brings me here
looking for help :)) i would like to add one more constraint to the
above regex. i want to match strings *iff* each pair of numbers are
different. e.g: 1,1/3,4/5,6/7,8/9,10/11,12 or
1,2/3,4/5,6/7,8/9,10/12,12 should fail to be matched  by my final
regex whereas 1,2/3,4/5,6/7,8/9,10/11,12 should match OK.

any tips would be much appreciated - especially regarding preceding
paragraph!

and now for the python part:

results = 1,2/3,4/5,6/7,8/9,10/11,12
match = re.match(((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),
(1[0-4]|[1-9]), results)
if match == None or match.group(0) != results:
raise FormatError(Error in format of input string: %s %
(results))
results = [leg.split(',') for leg in results.split('/')]
# = [['1', '2'], ['3', '4'], ['5', '6'], ['7', '8'], ['9', '10'],
['11', '12']]
.
.
.
the idea in the above code being that i want to use the regex match as
a test of whether or not the input string (results) is correctly
formatted. if the string results is not exactly matched by the regex,
i want my program to barf an exception and bail out. apart from
whether or not the regex is good idiom, is my approach suitably
pythonic?

TIA for any help here.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex matching question

2007-05-19 Thread bullockbefriending bard
thanks for your suggestion. i had already implemented the all pairs
different constraint in python code. even though i don't really need
to give very explicit error messages about what might be wrong with my
data (obviously easier to do if do all constraint validation in code
rather than one regex), there is something to be said for your
suggestion to simplify my regex further - it might be sensible from a
maintainability/readability perspective to use regex for  *format*
validation and then validate all *values* in code.

from my cursory skimming of friedl, i get the feeling that the all
pairs different constraint would give rise to some kind of fairly
baroque expression, perhaps likely to bring to mind the following
quotation from samuel johnson:

 Sir, a woman's preaching is like a dog's walking on his hind legs.
It is not done well; but you are surprised to find it done at all.

however, being human, sometimes some things should be done, just
because they can :)... so if anyone knows hows to do it, i'm still
interested, even if just out of idle curiosity!

On May 20, 12:57 am, Marc 'BlackJack' Rintsch [EMAIL PROTECTED] wrote:
 In [EMAIL PROTECTED],



 bullockbefriending bard wrote:
  first, regex part:

  I am new to regexes and have come up with the following expression:
   ((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),(1[0-4]|[1-9])

  to exactly match strings which look like this:

   1,2/3,4/5,6/7,8/9,10/11,12

  i.e. 6 comma-delimited pairs of integer numbers separated by the
  backslash character + constraint that numbers must be in range 1-14.

  i should add that i am only interested in finding exact matches (doing
  some kind of command line validation).

  [...]

  the idea in the above code being that i want to use the regex match as
  a test of whether or not the input string (results) is correctly
  formatted. if the string results is not exactly matched by the regex,
  i want my program to barf an exception and bail out. apart from
  whether or not the regex is good idiom, is my approach suitably
  pythonic?

 I would use a simple regular expression to extract candidates and a
 Python function to split the candidate and check for the extra
 constraints.  Especially the all pairs different constraint is something
 I would not even attempt to put in a regex.  For searching candidates this
 should be good enough::

   r'(\d+,\d+/){5}\d+,\d+'

 Ciao,
 Marc 'BlackJack' Rintsch


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex matching question

2007-05-19 Thread bullockbefriending bard

 Backslash? Your example uses a [forward] slash.

correct.. my mistake. i use forward slashes.

 Are you sure you don't want to allow for some spaces in the data, for
 the benefit of the humans, e.g.
 1,2 / 3,4 / 5,6 / 7,8 / 9,10 / 11,12

you are correct. however, i am using string as a command line option
and can get away without quoting it if there are no optional spaces.

 Always use raw strings for patterns, even if you don't have
 backslashes in them -- and this one needs a backslash; see below.

knew this, but had not done so in my code because wanted to use '\' as
a line continuation character to keep everything within 80 columns.
have adopted your advice regarding \Z below and now am using raw
string.

 For clarity, consider using mobj or even m instead of match to
 name the result of re.match.

good point.

  if match == None or match.group(0) != results:

 Instead of
 if mobj == None 
 use
 if mobj is None ...
 or
 if not mobj ...

 Instead of the or match.group(0) != results caper, put \Z (*not* $) at
 the end of your pattern:
 mobj = re.match(rpattern\Z, results)
 if not mobj:

 HTH,
 John

very helpful advice. thanks!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex matching question

2007-05-19 Thread bullockbefriending bard
 Instead of the or match.group(0) != results caper, put \Z (*not* $) at
 the end of your pattern:
 mobj = re.match(rpattern\Z, results)
 if not mobj:

as the string i am matching against is coming from a command line
argument to a script, is there any reason why i cannot get away with
just $ given that this means that there is no way a newline could find
its way into my string? certainly passes all my unit tests as well as
\Z. or am i missing the point of \Z ?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex matching question

2007-05-19 Thread bullockbefriending bard
 Here all pairs different means for each pair, both numbers must be  
 different, but they may appear in another pair. That is, won't flag  
 1,2/3,4/3,5/2,6/8,3/1,2 as invalid, but this wasn't clear from your  
 original post.

 --
 Gabriel Genellina

thanks! you are correct that the 'all pairs different' nomenclature is
ambiguous. i require that each pair have different values, but is OK
for different pairs to be identical... so exactly as per your code
snippet.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex matching question

2007-05-19 Thread bullockbefriending bard


 No way? Famous last words :-)

 C:\junktype showargs.py
 import sys; print sys.argv

 C:\junk\python25\python
 Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
 (Intel)] on win32
 Type help, copyright, credits or license for more information.
   import subprocess
   subprocess.call('\\python25\\python showargs.py teehee\n')
 ['showargs.py', 'teehee\n']

can't argue with that :) back to \Z

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: List comprehension returning subclassed list type?

2007-03-25 Thread bullockbefriending bard
Thanks! I went with extend and generator expression as I *am* dealing
with rather a lot of data. Now I think I'm going to go on a little
hunt through my code looking for more places where I should replace
list comprehensions with generator expressions - bit of a newbie here.

On Mar 25, 3:57 pm, Steven D'Aprano
[EMAIL PROTECTED] wrote:
 On Sat, 24 Mar 2007 23:43:10 -0700, bullockbefriending bard wrote:
  z_list = [Z(y.var1, y.var2,..) for y in list_of_objects_of_class_Y]

  Of course this just gives me a plain list and no access to the
  methodsof z_list.

 List comprehensions give you a list. If you want to convert that list into
 the type of z_list, you need to do it yourself. Since ZList sub-classes
 from list, probably the easiest way is just:

 z_list = ZList([some list comprehension here])

  I could, of course go and write a static method in
  ZList which takes a plain list of Z objects and returns a ZList.

 Yes, that would be one such way. Another way is:

 z_list.extend([some list comprehension here])

 If you are using a recent enough version of Python, you probably don't
 even need the list comprehension. Just use a generator expression:

 z_list.extend(Z(y.var1, y.var2,..) for y in list_of_objects_of_class_Y)

 That's especially useful if the list of objects is huge, because it avoids
 creating the list twice: once in the list comp, and once as z_list.

 --
 Steven.


-- 
http://mail.python.org/mailman/listinfo/python-list


List comprehension returning subclassed list type?

2007-03-24 Thread bullockbefriending bard
Given:

class Z(object):
various defs, etc.

class ZList(list):
various defs, etc.

i would like to be able to replace

z_list = ZList()
for y in list_of_objects_of_class_Y:
z_list.append(y)


with something like this:

z_list = [Z(y.var1, y.var2,..) for y in list_of_objects_of_class_Y]

Of course this just gives me a plain list and no access to the
methodsof z_list. I could, of course go and write a static method in
ZList which takes a plain list of Z objects and returns a ZList.

Anyway, my question is whether or not this can be done more elegantly
via list comprehension?

-- 
http://mail.python.org/mailman/listinfo/python-list


6 Pick Bet Grouping

2006-11-28 Thread bullockbefriending bard
(I apologise in advance for posting something slightly OT, but plead in
mitigation that I'm trying to re-write an old, suboptimal VB6 (erk)
brute-force attack in a shiny, elegant, pythonic manner. I would really
appreciate some ideas about an appropriate algorithmic approach to this
+ pointers to any relevant implementations in Python. Along the way,
I'm also trying to get Python into my organisation through the back
door - so a further plea for tolerance in posting this here! :))

A single 6 Pick bet looks like this:
RACE1RACE2RACE3RACE4RACE5   RACE6
runner1 / runner 2 / runner 3 / runner4 / runner5 / runner6  - $amount

e.g. we might have:

5 / 3 / 11 / 7 / 1 / 9  - $50  (5 to come 1st or 2nd in Race1, 3 to
come first or 2nd in Race 2, etc.)
7 / 3 / 11 / 7 / 1 / 9  - $50
5 / 3 / 11 / 14 / 1 / 9  - $50
7 / 3 / 11 / 14 / 1 / 9  - $50

The totalizator system allows us to merge or group these four bets as
follows:

5 + 7 / 3 / 11 / 7 + 14 / 1 / 9 - $50  ($200 total)

This method of expressing multiple bets in one line is advantageous
because we may have many thousands of combinations we wish to bet and
only a limited amount of time in which to bet them.

There are up to 14 horses in each race, so 7,529,536 possible 6 Pick
bets. In practice, one might wish to bet (say)15,000 combinations out
of these 7.5 million. However, it would be very nice to be able to
*optimally* merge (as shown above) these 15,000 combinations so that
they might fit on (say)  2,000 betting tickets instead of trying to
write out 15,000 single tickets.

To keep things simple, let's assume that all single bets are for the
same amount. (In practice, however, this is not so.)

Now, it's certainly possible to go about this via brute force
iteration, but I would really appreciate it if anyone with a CS
background could point me toward a smarter plan of attack. I have
perused Skiena's Algorithm Handbook and various websites and can't seem
to find an analogous problem. I'm hoping this is just my ignorance and
that my brief exposition rings a bell for someone here.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: 6 Pick Bet Grouping

2006-11-28 Thread bullockbefriending bard
Your explanation is correct. It's important to realise that effectively
the input many thousands of single bets come out of a black box and it
is then necessary for me to merge them into the grouped format to make
things more manageable. Given an arbitrary bucket of single bets, it is
by no means obvious how they should be grouped without doing something
algorithmic. In grouping, it is also important to not inadvertently
introduce any new single bets.

For the purpose of this discussion, I am also assuming that all bets
are the same size. That's not the case, but can leave that little twist
for another day.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: 6 Pick Bet Grouping

2006-11-28 Thread bullockbefriending bard
Sorry, I perhaps didn't frame my initial post very well. I hope my
reply to your other post below has answered these questions.

-- 
http://mail.python.org/mailman/listinfo/python-list