Re: [Tutor] Programming microsoft excel

2010-05-06 Thread wesley chun
> guys can i use python's win32com module to do the same tasks that i do with 
> visual basic for applications (vba). I mean automating tasks for excel e.t.c 
> and accessing Access databases. If win32com doesnt which module can i use?


that's definitely the right one, and yes, you can use VB/VBA examples
if you port them to Python.

i wrote a good-sized section on how to do this in Chapter 23 of my
book (see below) with examples for Word, Excel, PowerPoint, and
OutLook.

hope this helps!
-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall, (c)2007,2001
"Python Fundamentals", Prentice Hall, (c)2009
http://corepython.com

wesley.j.chun :: wescpy-at-gmail.com
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Steven D'Aprano
On Fri, 7 May 2010 03:53:08 am Damon Timm wrote:
> Hi Lie -
>
> Thanks for that idea -- I tried it but am getting an error.  I read a
> little about the __dict__ feature but couldn't figure it.  I am going
> to keep searching around for how to dynamically add methods to a
> class ... here is the error and then the code.

With respect to Lie, dynamically adding methods is an advanced technique 
that is overkill for what you seem to be doing, and the code he gave 
you can't work without major modification.

Tests are code too, and the more complicated you make your tests, the 
less confidence you should have in them. The more tricks you use 
(dynamic methods, metaclasses, complicated frameworks, etc.) the higher 
the chances that your test code itself will be buggy, and therefore 
your pass/fail results are meaningless. For example, some time ago I 
was testing some code I had written, and was very happy that all my 
tests were passing. Then I discovered that *dozens* of tests weren't 
even being called due to a bug in the framework. When I worked around 
that problem, I discovered that now my tests were failing. Because my 
framework was complicated, it had a bug in it, which meant my tests 
were buggy, which meant my code was buggy and I didn't know.

The lesson is, keep your test code simple. Don't play tricks or be too 
clever. Don't trust your test framework, not even well-known ones like 
Python's own doctest: you should double check your results, e.g. 
sometimes I will add a test I know will fail, just to make sure that 
the framework will see it.


-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Steven D'Aprano
On Thu, 6 May 2010 10:37:20 am Damon Timm wrote:


> class TestFiles(unittest.TestCase):
>
> # this is the basic test
> def test_values(self):
> '''see if values from my object match what they should
> match''' 
> for file in FILES:  
> for k, v in TAG_VALUES:
> self.assertEqual(self.file.tags[k], v)
>
> This test works, however, it only runs as *one* test (which either
> fails or passes) 

That is correct, because you have written it as one test. In 
unit-testing, a single test can be arbitrarily complex. In your case, 
you've created a single test which makes 12 different comparisons, and 
fails if *any* of them fail.

Here is an analogy... suppose you are comparing the two strings for 
equality. Python does this:

* If the lengths are different, return False (the test fails);
* If the first characters differ, return False;
* If the second characters differ, return False;
* If the third characters differ, return False;
* ... and so on ...
* return True (the test passes)

The data that you feed are the strings "abcd" and "abce". Is that five 
tests, with the first four passing and the last failing? Well, yes, but 
not in any meaningful sense. Even if it is useful to see *where* the 
strings differ, it would be silly to treat each comparison as a 
separate test.


> and I want it to run as 12 different tests (three 
> for each file type) and be able to see which key is failing for which
> file type.  I know I could write them all out individually but that
> seems unnecessary.

Unfortunately, if you really want them to be twelve individual tests, 
then you need to write them out individually as twelve separate tests.

As for the second question, to see where the failure is, you can pass an 
extra argument to assertEqual:

self.assertEqual(self.file.tags[k], v, 
"fails for file %s with tag %s and value %s" % (file, k, v))

Alternatively, you can take a separate approach. Since you have four 
different file types, I would test each type separately, using 
inheritance to reduce the amount of work needed.

# Untested.
class TestMP3(unittest.TestCase):
filename = 'data/lossy/04 - Christmas Waltz (MP3-79).mp3'
filetype = MP3File
def __init__(self):
self.file = self.filetype(self.filename)
def test_title(self):
self.assertEquals(self.file.tags['title'], 'Christmas Waltz')
def test_artist(self):
self.assertEquals(self.file.tags['artist'], 'Damon Timm')
def test_album(self):
self.assertEquals(self.file.tags['album'], 'Homemade')


class TestFLAC(TestMP3):
filename = 'data/lossless/01 - Christmas Waltz.flac'
filetype = FLACFile

class TestOGG(TestMP3):
    filetype = OGGFile
filename = 'data/lossy/01 - Christmas Waltz (OGG-77).ogg'

class TestMP4(TestMP3):
    filetype = MP4File
filename = 'data/lossy/06 - Christmas Waltz (M4A-64).m4a'


And you're done, 12 separate tests with hardly any extra typing. And now 
you can easily add specific tests, e.g. testing that FLAC actually is 
lossless:


class TestFLAC(TestMP3):
filename = 'data/lossless/01 - Christmas Waltz.flac'
filetype = FLACFile
def test_lossless(self):
raw = open('raw sounds.wav', 'r').read()
data = self.file.convert_to_wav()
self.assertEquals(raw, data)




-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] portability of pickle and shelve across platforms and different python versions?

2010-05-06 Thread Steven D'Aprano
On Thu, 6 May 2010 09:12:24 am Garry Willgoose wrote:

> How portable are files containing pickle'd and shelve'd data? I'm
> thinking issues like simply O/S portability, through
> big-end/little-end hardware (for floats/integers which I use a lot),
> and then for unicode/non-unicode string,  64/32 bit and V2.6/V3.1
> implementations of python. Does the version of the encoder in pickle
> make any difference for this? One post I've seen suggests that as
> long as the file is opened binary (ie. 'b') all should be well for
> platform independence.

Technically, reading the file isn't a matter of pickle, but a matter of 
the operating system not mangling the contents of the file before it 
reaches pickle. I would expect that text pickles (protocol 0) would be 
just fine with opening the file in text mode, but I haven't tried it.

Because Python makes no guarantees about floats, but is just a thin 
wrapper around your platform's C floating point library, you may find 
occasional differences when pickling floats. E.g. I wouldn't trust 
pickling NANs and INFs to be platform independent unless the 
documentation explicitly says so.

Unfortunately transferring floats from one platform to another is a hard 
problem: given a float x on platform A, there's no guarantee that x is 
even representable on platform B. You can make stronger promises about 
transferring floats if you know both platforms use IEEE floats, 
although the C maths libraries differ in their handling of subnormals, 
overflow, NANs, INFs, and signed zeroes. If these features are 
important to you, you've probably already discovered that your 
calculations differ on platforms A and B unless you're using a 
dedicated numeric library that doesn't rely on the native C maths 
routines. If this is gobbledygook to you, then don't worry about it, it 
should Just Work well enough that you won't notice the difference.



> My core question if I give a pickled file to somebody else can i
> guarantee they can read/load it OK. The other person will be using
> exactly the same python code to open it as used to create it.

By default, pickling uses protocol 0, which uses the repr() of objects. 
Nobody can *guarantee* platform independence, because you might feed 
Python an object like this:

class Silly:
def __init__(self, arg):
self.arg = arg
def __repr__(self):
if sys.platform == 'posix':
# I hate Linux.
return "Screw you hippies, I'm going home!"
return "Silly(%r)" % self.arg


which will break pickling. Similarly if your class has a __setstate__ 
method which does something stupid. Python is a "consenting adults" 
language: if you want to shoot yourself in the foot by writing broken 
code, Python doesn't try to stop you.

But for built-ins, with the possible exception of floats depending on 
the specific platforms in question, you should be safe.



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

2010-05-06 Thread Dave Angel

Art Kendall wrote:



On 5/6/2010 1:51 PM, Dave Angel wrote:

Art Kendall wrote:



On 5/6/2010 11:14 AM, Dave Angel wrote:

Art Kendall wrote:
I am running Windows 7 64bit Home premium. with quad cpus and 8G 
memory.   I am using Python 2.6.2.


I have all the Federalist Papers concatenated into one .txt file.
Which is how big?  Currently you (unnecessarily) load the entire 
thing into memory with readlines().  And then you do confusing work 
to split it apart again, into one list element per paper.   And for 
a while there, you have three copies of the entire text.  You're 
keeping two copies, in the form of alltext and papers.
You print out the len(papers).  What do you see there?  Is it 
correctly 87 ?  If it's not, you have to fix the problem here, 
before even going on.


  I want to prepare a file with a row for each paper and a column 
for each term. The cells would contain the count of a term in that 
paper.  In the original application in the 1950's 30 single word 
terms were used. I can now use NoteTab to get a list of all the 
8708 separate words in allWords.txt. I can then use that data in 
statistical exploration of the set of texts.


I have the python program(?) syntax(?) script(?) below that I am 
using to learn PYTHON. The comments starting with "later" are 
things I will try to do to make this more useful. I am getting one 
step at at time to work


It works when the number of terms in the term list is small e.g., 
10.  I get a file with the correct number of rows (87) and count 
columns (10) in termcounts.txt. The termcounts.txt file is not 
correct when I have a larger number of terms, e.g., 100. I get a 
file with only 40 rows and the correct number of columns.  With 
8700 terms I get only 40 rows I need to be able to have about 8700 
terms. (If this were FORTRAN I would say that the subscript 
indices were getting scrambled.)  (As I develop this I would like 
to be open-ended with the numbers of input papers and open ended 
with the number of words/terms.)




# word counts: Federalist papers

import re, textwrap
# read the combined file and split into individual papers
# later create a new version that deals with all files in a folder 
rather than having papers concatenated

alltext = file("C:/Users/Art/Desktop/fed/feder16v3.txt").readlines()
papers= re.split(r'FEDERALIST No\.'," ".join(alltext))
print len(papers)

countsfile = file("C:/Users/Art/desktop/fed/TermCounts.txt", "w")
syntaxfile = file("C:/Users/Art/desktop/fed/TermCounts.sps", "w")
# later create a python program that extracts all words instead of 
using NoteTab

termfile   = open("C:/Users/Art/Desktop/fed/allWords.txt")
termlist = termfile.readlines()
termlist = [item.rstrip("\n") for item in termlist]
print len(termlist)
# check for SPSS reserved words
varnames = textwrap.wrap(" ".join([v.lower() in ['and', 'or', 
'not', 'eq', 'ge',
'gt', 'le', 'lt', 'ne', 'all', 'by', 'to','with'] and (v+"_r") or 
v for v in termlist]))
syntaxfile.write("data list file= 
'c:/users/Art/desktop/fed/termcounts.txt' free/docnumber\n")

syntaxfile.writelines([v + "\n" for v in varnames])
syntaxfile.write(".\n")
# before using the syntax manually replace spaces internal to a 
string to underscore // replace (ltrtim(rtrim(varname))," ","_")   
replace any special characters with @ in variable names



for p in range(len(papers)):

range(len()) is un-pythonic.  Simply do
for paper in papers:

and of course use paper below instead of papers[p]

   counts = []
   for t in termlist:
  counts.append(len(re.findall(r"\b" + t + r"\b", papers[p], 
re.IGNORECASE)))

   if sum(counts) > 0:
  papernum = re.search("[0-9]+", papers[p]).group(0)
  countsfile.write(str(papernum) + " " + " ".join([str(s) for 
s in counts]) + "\n")



Art

If you're memory limited, you really should sequence through the 
files, only loading one at a time, rather than all at once.  It's 
no harder.  Use dirlist() to make a list of files, then your loop 
becomes something like:


for  infile in filelist:
 paper = " ".join(open(infile, "r").readlines())

Naturally, to do it right, you should usewith...  Or at least 
close each file when done.


DaveA




Thank you for getting back to me. I am trying to generalize a 
process that 50 years ago used 30 terms on the whole file and I am 
using the task of generalizing the process to learn python.   In the 
post I sent there were comments to myself about things that I would 
want to learn about.  One of the first is to learn about processing 
all files in a folder, so your reply will be very helpful.  It seems 
that dirlist() should allow me to include the filespec in the output 
file which would be very helpful.


to rephrase my questions.
Is there a way to tell python to use more RAM?

Does python use the same array space over as it counts the 
occurrences for each input document? Or does it keep every row of 
the output someplace even after it has written it to the output? If 
it does keep old arrays,

[Tutor] Programming microsoft excel

2010-05-06 Thread hbutau
Hi
guys can i use python's win32com module to do the same tasks that i do with 
visual basic for applications (vba). I mean automating tasks for excel e.t.c 
and accessing Access databases. If win32com doesnt which module can i use?

Thanks in advance
--
Ovi Mail: Simple and user-friendly interface
http://mail.ovi.com

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

2010-05-06 Thread Alan Gauld

"Art Kendall"  wrote


Is there a way to tell python to use more RAM?


Only in an arcane way you should never need. This is not Fortran 
and Python does all the memory management for you so you don't 
need to worry 99.9% of the time.


BTW is Python some kind of a grandchild to Algol which was around in the 
early 70's?  It seems reminiscent.


Yes it is a long way down the heirarchy but it is part of the Algol family 
of imperative languages, being descended via Pascal and B which 
became C. And they both influenced ABC which became Python...
The links are there but not too strong. There are equally strong links 
to functional languages like Lisp and OOP languages like Simula

(which itself has a link to Algol)

Pretty much every modern programming language traces its family 
tree back to either Algol or Lisp or both. Even VB has as much in 
common with Algol as it does with the original BASIC.


HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] List comprehension + lambdas - strange behaviour

2010-05-06 Thread Alan Gauld

I found this strange behaviour of lambdas, closures and list
comprehensions:

  

funs = [lambda: x for x in range(5)]
[f() for f in funs]


[4, 4, 4, 4, 4]

Of course I was expecting the list [0, 1, 2, 3, 4] as the result. The
'x' was bound to the final value of 'range(5)' expression for ALL
defined functions. Can you explain this? Is this only counterintuitive
example or an error in CPython?


As others have pointed out you are returning a reference not a value.

You can do what you want by defining a lo cal closure using:

funs = [lambda y = x: y for x in range(5)]

Now you can do

for f in funs:
   print f()

and get the answer you expect.

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

2010-05-06 Thread Art Kendall



On 5/6/2010 1:51 PM, Dave Angel wrote:

Art Kendall wrote:



On 5/6/2010 11:14 AM, Dave Angel wrote:

Art Kendall wrote:
I am running Windows 7 64bit Home premium. with quad cpus and 8G 
memory.   I am using Python 2.6.2.


I have all the Federalist Papers concatenated into one .txt file.
Which is how big?  Currently you (unnecessarily) load the entire 
thing into memory with readlines().  And then you do confusing work 
to split it apart again, into one list element per paper.   And for 
a while there, you have three copies of the entire text.  You're 
keeping two copies, in the form of alltext and papers.
You print out the len(papers).  What do you see there?  Is it 
correctly 87 ?  If it's not, you have to fix the problem here, 
before even going on.


  I want to prepare a file with a row for each paper and a column 
for each term. The cells would contain the count of a term in that 
paper.  In the original application in the 1950's 30 single word 
terms were used. I can now use NoteTab to get a list of all the 
8708 separate words in allWords.txt. I can then use that data in 
statistical exploration of the set of texts.


I have the python program(?) syntax(?) script(?) below that I am 
using to learn PYTHON. The comments starting with "later" are 
things I will try to do to make this more useful. I am getting one 
step at at time to work


It works when the number of terms in the term list is small e.g., 
10.  I get a file with the correct number of rows (87) and count 
columns (10) in termcounts.txt. The termcounts.txt file is not 
correct when I have a larger number of terms, e.g., 100. I get a 
file with only 40 rows and the correct number of columns.  With 
8700 terms I get only 40 rows I need to be able to have about 8700 
terms. (If this were FORTRAN I would say that the subscript indices 
were getting scrambled.)  (As I develop this I would like to be 
open-ended with the numbers of input papers and open ended with the 
number of words/terms.)




# word counts: Federalist papers

import re, textwrap
# read the combined file and split into individual papers
# later create a new version that deals with all files in a folder 
rather than having papers concatenated

alltext = file("C:/Users/Art/Desktop/fed/feder16v3.txt").readlines()
papers= re.split(r'FEDERALIST No\.'," ".join(alltext))
print len(papers)

countsfile = file("C:/Users/Art/desktop/fed/TermCounts.txt", "w")
syntaxfile = file("C:/Users/Art/desktop/fed/TermCounts.sps", "w")
# later create a python program that extracts all words instead of 
using NoteTab

termfile   = open("C:/Users/Art/Desktop/fed/allWords.txt")
termlist = termfile.readlines()
termlist = [item.rstrip("\n") for item in termlist]
print len(termlist)
# check for SPSS reserved words
varnames = textwrap.wrap(" ".join([v.lower() in ['and', 'or', 
'not', 'eq', 'ge',
'gt', 'le', 'lt', 'ne', 'all', 'by', 'to','with'] and (v+"_r") or v 
for v in termlist]))
syntaxfile.write("data list file= 
'c:/users/Art/desktop/fed/termcounts.txt' free/docnumber\n")

syntaxfile.writelines([v + "\n" for v in varnames])
syntaxfile.write(".\n")
# before using the syntax manually replace spaces internal to a 
string to underscore // replace (ltrtim(rtrim(varname))," ","_")   
replace any special characters with @ in variable names



for p in range(len(papers)):

range(len()) is un-pythonic.  Simply do
for paper in papers:

and of course use paper below instead of papers[p]

   counts = []
   for t in termlist:
  counts.append(len(re.findall(r"\b" + t + r"\b", papers[p], 
re.IGNORECASE)))

   if sum(counts) > 0:
  papernum = re.search("[0-9]+", papers[p]).group(0)
  countsfile.write(str(papernum) + " " + " ".join([str(s) for s 
in counts]) + "\n")



Art

If you're memory limited, you really should sequence through the 
files, only loading one at a time, rather than all at once.  It's no 
harder.  Use dirlist() to make a list of files, then your loop 
becomes something like:


for  infile in filelist:
 paper = " ".join(open(infile, "r").readlines())

Naturally, to do it right, you should usewith...  Or at least 
close each file when done.


DaveA




Thank you for getting back to me. I am trying to generalize a process 
that 50 years ago used 30 terms on the whole file and I am using the 
task of generalizing the process to learn python.   In the post I 
sent there were comments to myself about things that I would want to 
learn about.  One of the first is to learn about processing all files 
in a folder, so your reply will be very helpful.  It seems that 
dirlist() should allow me to include the filespec in the output file 
which would be very helpful.


to rephrase my questions.
Is there a way to tell python to use more RAM?

Does python use the same array space over as it counts the 
occurrences for each input document? Or does it keep every row of the 
output someplace even after it has written it to the output? If it 
does keep old arrays, is there a way to "

Re: [Tutor] List comprehension + lambdas - strange behaviour

2010-05-06 Thread spir ☣
On Thu, 06 May 2010 16:53:07 -0300
Ricardo Aráoz  wrote:

> So you see, your functions just return the value of x. That's because
> the lambda have no parameter, so x refers to the global name x.

In other words, the "upvalue" (the variable captured in the closure) is 
referenced. Meaning if you later change it, the closure sees the change. The 
same in other dynamic languages.
If you want the value to be captured in each func, use a second lambda to pass 
it:
>>> funcs = [(lambda a: (lambda: a))(x) for x in range(5)]
>>> [f() for f in funcs]
[0, 1, 2, 3, 4]

Or even ;-):
>>> [(lambda a: (lambda: a))(x)() for x in range(5)]
[0, 1, 2, 3, 4]

... but --> KISS principle http://en.wikipedia.org/wiki/Keep_it_simple_stupid
Such syntaxes are only good for creating problems, imo. Why not make your life 
simple? (exception: for exploring the language's guts)

Denis


vit esse estrany ☣

spir.wikidot.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] List comprehension + lambdas - strange behaviour

2010-05-06 Thread Ricardo Aráoz
Artur Siekielski wrote:
> Hello.
> I found this strange behaviour of lambdas, closures and list
> comprehensions:
>
>   
 funs = [lambda: x for x in range(5)]
 [f() for f in funs]
 
> [4, 4, 4, 4, 4]
>
> Of course I was expecting the list [0, 1, 2, 3, 4] as the result. The
> 'x' was bound to the final value of 'range(5)' expression for ALL
> defined functions. Can you explain this? Is this only counterintuitive
> example or an error in CPython?
>
>
> Regards,
> Artur
>   
Check this :
>>> funs = [(lambda: x) for x in range(5)]
>>> funs[0]()
4
>>> x
4
>>> x = 3
>>> funs[0]()
3
>>> del x
>>> funs[0]()
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1, in 
NameError: global name 'x' is not defined

So you see, your functions just return the value of x. That's because
the lambda have no parameter, so x refers to the global name x.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Vincent Davis
On Thu, May 6, 2010 at 1:15 PM, Steve Willoughby  wrote:

> The unit test methods all take message arguments so if you just
> want to customize the reported error, that's easily done.
>
> something like:
>  self.assertEqual(self.file.tags[k], v, "Failure with key "+k)
>
> That's easiest.  If you really want a separate test for each, you
> may want to create a factory function which will generate the individual
> test methods when the testcase object is created.
>
> --steve


Looks like Steve answered the question you had for me,
 "self.assertEqual(self.file.tags[k], v, "Failure with key "+k)" I think
this is the best(how I would do it) solution, 1 test for files with a
meaningful report as to which file is the problem.

  *Vincent Davis
720-301-3003 *
vinc...@vincentdavis.net
 my blog  |
LinkedIn

On Thu, May 6, 2010 at 1:15 PM, Steve Willoughby  wrote:

> The unit test methods all take message arguments so if you just
> want to customize the reported error, that's easily done.
>
> something like:
>  self.assertEqual(self.file.tags[k], v, "Failure with key "+k)
>
> That's easiest.  If you really want a separate test for each, you
> may want to create a factory function which will generate the individual
> test methods when the testcase object is created.
>
> --steve
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Steve Willoughby
The unit test methods all take message arguments so if you just
want to customize the reported error, that's easily done.

something like:
  self.assertEqual(self.file.tags[k], v, "Failure with key "+k)

That's easiest.  If you really want a separate test for each, you
may want to create a factory function which will generate the individual
test methods when the testcase object is created.

--steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Damon Timm
Sorry for the multiple posts ... I'll be quiet for a while until I
find a real answer!

What I wrote below doesn't actually work -- it appears to work because
all the functions have different names but they all reference a single
function ... I should have looked more closely at my initial output...
I'm going to have to look into why that is.  I need a way to make each
function unique ...

On Thu, May 6, 2010 at 2:04 PM, Damon Timm  wrote:
> class TestFileTags(unittest.TestCase):
>    pass
>
> for test_name, file, key, value in list_of_tests:
>    def test_func(self):
>        self.assertEqual(file.tags[key], value)
>
>    setattr(TestFileTags, test_name, test_func)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Damon Timm
Ooh!  Wait!  I found another method that is similar in style and
appears to work ...

class TestFileTags(unittest.TestCase):
pass

for test_name, file, key, value in list_of_tests:
def test_func(self):
self.assertEqual(file.tags[key], value)

setattr(TestFileTags, test_name, test_func)

I'm not sure if it is the *best* or *right* way to do it, but it does the trick!

Damon

On Thu, May 6, 2010 at 1:53 PM, Damon Timm  wrote:
> Hi Lie -
>
> Thanks for that idea -- I tried it but am getting an error.  I read a
> little about the __dict__ feature but couldn't figure it.  I am going
> to keep searching around for how to dynamically add methods to a class
> ... here is the error and then the code.
>
> Thanks.
>
> # ERROR:
>
> $ python tests_tagging.py
> Traceback (most recent call last):
>  File "tests_tagging.py", line 25, in 
>    class TestFileTags(unittest.TestCase):
>  File "tests_tagging.py", line 31, in TestFileTags
>    __dict__[test] = new_test
> NameError: name '__dict__' is not defined
>
> # CODE:
>
> import unittest
> from mlc.filetypes import *
>
> TAG_VALUES = (
>    ('title', 'Christmas Waltz'),
>    ('artist', 'Damon Timm'),
>    ('album', 'Homemade'),
> )
>
> FILES = (
>    FLACFile('data/lossless/01 - Christmas Waltz.flac'),
>    MP3File('data/lossy/04 - Christmas Waltz (MP3-79).mp3'),
>    OGGFile('data/lossy/01 - Christmas Waltz (OGG-77).ogg'),
>    MP4File('data/lossy/06 - Christmas Waltz (M4A-64).m4a'),
> )
>
> list_of_tests = []
> for file in FILES:
>    for k, v in TAG_VALUES:
>        test_name = 'test_' + file.exts[0] + '_' + k
>        list_of_tests.append((test_name, file, k, v))
>
> class TestFileTags(unittest.TestCase):
>
>    for test in list_of_tests:
>        def new_test(self):
>            self.assertEqual(test[1].tags[test[2]],test[3])
>
>        __dict__[test] = new_test
>
> if __name__ == '__main__':
>    unittest.main()
>
>
> On Thu, May 6, 2010 at 12:26 PM, Lie Ryan  wrote:
>> On 05/06/10 10:37, Damon Timm wrote:
>>> Hi - am trying to write some unit tests for my little python project -
>>> I had been hard coding them when necessary here or there but I figured
>>> it was time to try and learn how to do it properly.
>>> 
>>> This test works, however, it only runs as *one* test (which either
>>> fails or passes) and I want it to run as 12 different tests (three for
>>> each file type) and be able to see which key is failing for which file
>>> type.  I know I could write them all out individually but that seems
>>> unnecessary.
>>
>> One way to do what you wanted is to harness python's dynamicity and
>> generate the methods by their names:
>>
>> class TestFiles(unittest.TestCase):
>>    for methname, case in somedict:
>>        def test(self):
>>             ...
>>        __dict__[methname] = test
>>
>> ___
>> Tutor maillist  -  tu...@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Damon Timm
Hi Lie -

Thanks for that idea -- I tried it but am getting an error.  I read a
little about the __dict__ feature but couldn't figure it.  I am going
to keep searching around for how to dynamically add methods to a class
... here is the error and then the code.

Thanks.

# ERROR:

$ python tests_tagging.py
Traceback (most recent call last):
  File "tests_tagging.py", line 25, in 
class TestFileTags(unittest.TestCase):
  File "tests_tagging.py", line 31, in TestFileTags
__dict__[test] = new_test
NameError: name '__dict__' is not defined

# CODE:

import unittest
from mlc.filetypes import *

TAG_VALUES = (
('title', 'Christmas Waltz'),
('artist', 'Damon Timm'),
('album', 'Homemade'),
)

FILES = (
FLACFile('data/lossless/01 - Christmas Waltz.flac'),
MP3File('data/lossy/04 - Christmas Waltz (MP3-79).mp3'),
OGGFile('data/lossy/01 - Christmas Waltz (OGG-77).ogg'),
MP4File('data/lossy/06 - Christmas Waltz (M4A-64).m4a'),
)

list_of_tests = []
for file in FILES:
for k, v in TAG_VALUES:
test_name = 'test_' + file.exts[0] + '_' + k
list_of_tests.append((test_name, file, k, v))

class TestFileTags(unittest.TestCase):

for test in list_of_tests:
def new_test(self):
self.assertEqual(test[1].tags[test[2]],test[3])

__dict__[test] = new_test

if __name__ == '__main__':
unittest.main()


On Thu, May 6, 2010 at 12:26 PM, Lie Ryan  wrote:
> On 05/06/10 10:37, Damon Timm wrote:
>> Hi - am trying to write some unit tests for my little python project -
>> I had been hard coding them when necessary here or there but I figured
>> it was time to try and learn how to do it properly.
>> 
>> This test works, however, it only runs as *one* test (which either
>> fails or passes) and I want it to run as 12 different tests (three for
>> each file type) and be able to see which key is failing for which file
>> type.  I know I could write them all out individually but that seems
>> unnecessary.
>
> One way to do what you wanted is to harness python's dynamicity and
> generate the methods by their names:
>
> class TestFiles(unittest.TestCase):
>    for methname, case in somedict:
>        def test(self):
>             ...
>        __dict__[methname] = test
>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

2010-05-06 Thread Dave Angel

Art Kendall wrote:



On 5/6/2010 11:14 AM, Dave Angel wrote:

Art Kendall wrote:
I am running Windows 7 64bit Home premium. with quad cpus and 8G 
memory.   I am using Python 2.6.2.


I have all the Federalist Papers concatenated into one .txt file.
Which is how big?  Currently you (unnecessarily) load the entire 
thing into memory with readlines().  And then you do confusing work 
to split it apart again, into one list element per paper.   And for a 
while there, you have three copies of the entire text.  You're 
keeping two copies, in the form of alltext and papers.
You print out the len(papers).  What do you see there?  Is it 
correctly 87 ?  If it's not, you have to fix the problem here, before 
even going on.


  I want to prepare a file with a row for each paper and a column 
for each term. The cells would contain the count of a term in that 
paper.  In the original application in the 1950's 30 single word 
terms were used. I can now use NoteTab to get a list of all the 8708 
separate words in allWords.txt. I can then use that data in 
statistical exploration of the set of texts.


I have the python program(?) syntax(?) script(?) below that I am 
using to learn PYTHON. The comments starting with "later" are things 
I will try to do to make this more useful. I am getting one step at 
at time to work


It works when the number of terms in the term list is small e.g., 
10.  I get a file with the correct number of rows (87) and count 
columns (10) in termcounts.txt. The termcounts.txt file is not 
correct when I have a larger number of terms, e.g., 100. I get a 
file with only 40 rows and the correct number of columns.  With 8700 
terms I get only 40 rows I need to be able to have about 8700 terms. 
(If this were FORTRAN I would say that the subscript indices were 
getting scrambled.)  (As I develop this I would like to be 
open-ended with the numbers of input papers and open ended with the 
number of words/terms.)




# word counts: Federalist papers

import re, textwrap
# read the combined file and split into individual papers
# later create a new version that deals with all files in a folder 
rather than having papers concatenated

alltext = file("C:/Users/Art/Desktop/fed/feder16v3.txt").readlines()
papers= re.split(r'FEDERALIST No\.'," ".join(alltext))
print len(papers)

countsfile = file("C:/Users/Art/desktop/fed/TermCounts.txt", "w")
syntaxfile = file("C:/Users/Art/desktop/fed/TermCounts.sps", "w")
# later create a python program that extracts all words instead of 
using NoteTab

termfile   = open("C:/Users/Art/Desktop/fed/allWords.txt")
termlist = termfile.readlines()
termlist = [item.rstrip("\n") for item in termlist]
print len(termlist)
# check for SPSS reserved words
varnames = textwrap.wrap(" ".join([v.lower() in ['and', 'or', 'not', 
'eq', 'ge',
'gt', 'le', 'lt', 'ne', 'all', 'by', 'to','with'] and (v+"_r") or v 
for v in termlist]))
syntaxfile.write("data list file= 
'c:/users/Art/desktop/fed/termcounts.txt' free/docnumber\n")

syntaxfile.writelines([v + "\n" for v in varnames])
syntaxfile.write(".\n")
# before using the syntax manually replace spaces internal to a 
string to underscore // replace (ltrtim(rtrim(varname))," ","_")   
replace any special characters with @ in variable names



for p in range(len(papers)):

range(len()) is un-pythonic.  Simply do
for paper in papers:

and of course use paper below instead of papers[p]

   counts = []
   for t in termlist:
  counts.append(len(re.findall(r"\b" + t + r"\b", papers[p], 
re.IGNORECASE)))

   if sum(counts) > 0:
  papernum = re.search("[0-9]+", papers[p]).group(0)
  countsfile.write(str(papernum) + " " + " ".join([str(s) for s 
in counts]) + "\n")



Art

If you're memory limited, you really should sequence through the 
files, only loading one at a time, rather than all at once.  It's no 
harder.  Use dirlist() to make a list of files, then your loop 
becomes something like:


for  infile in filelist:
 paper = " ".join(open(infile, "r").readlines())

Naturally, to do it right, you should usewith...  Or at least 
close each file when done.


DaveA




Thank you for getting back to me. I am trying to generalize a process 
that 50 years ago used 30 terms on the whole file and I am using the 
task of generalizing the process to learn python.   In the post I sent 
there were comments to myself about things that I would want to learn 
about.  One of the first is to learn about processing all files in a 
folder, so your reply will be very helpful.  It seems that dirlist() 
should allow me to include the filespec in the output file which would 
be very helpful.


to rephrase my questions.
Is there a way to tell python to use more RAM?

Does python use the same array space over as it counts the occurrences 
for each input document? Or does it keep every row of the output 
someplace even after it has written it to the output? If it does keep 
old arrays, is there a way to "close" the output array in RAM between 
do

Re: [Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

2010-05-06 Thread Art Kendall



On 5/6/2010 11:14 AM, Dave Angel wrote:

Art Kendall wrote:
I am running Windows 7 64bit Home premium. with quad cpus and 8G 
memory.   I am using Python 2.6.2.


I have all the Federalist Papers concatenated into one .txt file.
Which is how big?  Currently you (unnecessarily) load the entire thing 
into memory with readlines().  And then you do confusing work to split 
it apart again, into one list element per paper.   And for a while 
there, you have three copies of the entire text.  You're keeping two 
copies, in the form of alltext and papers.
You print out the len(papers).  What do you see there?  Is it 
correctly 87 ?  If it's not, you have to fix the problem here, before 
even going on.


  I want to prepare a file with a row for each paper and a column for 
each term. The cells would contain the count of a term in that 
paper.  In the original application in the 1950's 30 single word 
terms were used. I can now use NoteTab to get a list of all the 8708 
separate words in allWords.txt. I can then use that data in 
statistical exploration of the set of texts.


I have the python program(?) syntax(?) script(?) below that I am 
using to learn PYTHON. The comments starting with "later" are things 
I will try to do to make this more useful. I am getting one step at 
at time to work


It works when the number of terms in the term list is small e.g., 
10.  I get a file with the correct number of rows (87) and count 
columns (10) in termcounts.txt. The termcounts.txt file is not 
correct when I have a larger number of terms, e.g., 100. I get a file 
with only 40 rows and the correct number of columns.  With 8700 terms 
I get only 40 rows I need to be able to have about 8700 terms. (If 
this were FORTRAN I would say that the subscript indices were getting 
scrambled.)  (As I develop this I would like to be open-ended with 
the numbers of input papers and open ended with the number of 
words/terms.)




# word counts: Federalist papers

import re, textwrap
# read the combined file and split into individual papers
# later create a new version that deals with all files in a folder 
rather than having papers concatenated

alltext = file("C:/Users/Art/Desktop/fed/feder16v3.txt").readlines()
papers= re.split(r'FEDERALIST No\.'," ".join(alltext))
print len(papers)

countsfile = file("C:/Users/Art/desktop/fed/TermCounts.txt", "w")
syntaxfile = file("C:/Users/Art/desktop/fed/TermCounts.sps", "w")
# later create a python program that extracts all words instead of 
using NoteTab

termfile   = open("C:/Users/Art/Desktop/fed/allWords.txt")
termlist = termfile.readlines()
termlist = [item.rstrip("\n") for item in termlist]
print len(termlist)
# check for SPSS reserved words
varnames = textwrap.wrap(" ".join([v.lower() in ['and', 'or', 'not', 
'eq', 'ge',
'gt', 'le', 'lt', 'ne', 'all', 'by', 'to','with'] and (v+"_r") or v 
for v in termlist]))
syntaxfile.write("data list file= 
'c:/users/Art/desktop/fed/termcounts.txt' free/docnumber\n")

syntaxfile.writelines([v + "\n" for v in varnames])
syntaxfile.write(".\n")
# before using the syntax manually replace spaces internal to a 
string to underscore // replace (ltrtim(rtrim(varname))," ","_")   
replace any special characters with @ in variable names



for p in range(len(papers)):

range(len()) is un-pythonic.  Simply do
for paper in papers:

and of course use paper below instead of papers[p]

   counts = []
   for t in termlist:
  counts.append(len(re.findall(r"\b" + t + r"\b", papers[p], 
re.IGNORECASE)))

   if sum(counts) > 0:
  papernum = re.search("[0-9]+", papers[p]).group(0)
  countsfile.write(str(papernum) + " " + " ".join([str(s) for s 
in counts]) + "\n")



Art

If you're memory limited, you really should sequence through the 
files, only loading one at a time, rather than all at once.  It's no 
harder.  Use dirlist() to make a list of files, then your loop becomes 
something like:


for  infile in filelist:
 paper = " ".join(open(infile, "r").readlines())

Naturally, to do it right, you should usewith...  Or at least 
close each file when done.


DaveA




Thank you for getting back to me. I am trying to generalize a process 
that 50 years ago used 30 terms on the whole file and I am using the 
task of generalizing the process to learn python.   In the post I sent 
there were comments to myself about things that I would want to learn 
about.  One of the first is to learn about processing all files in a 
folder, so your reply will be very helpful.  It seems that dirlist() 
should allow me to include the filespec in the output file which would 
be very helpful.


to rephrase my questions.
Is there a way to tell python to use more RAM?

Does python use the same array space over as it counts the occurrences 
for each input document? Or does it keep every row of the output 
someplace even after it has written it to the output? If it does keep 
old arrays, is there a way to "close" the output array in RAM between 
documents


I narrowed

Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Lie Ryan
On 05/06/10 10:37, Damon Timm wrote:
> Hi - am trying to write some unit tests for my little python project -
> I had been hard coding them when necessary here or there but I figured
> it was time to try and learn how to do it properly.
> 
> This test works, however, it only runs as *one* test (which either
> fails or passes) and I want it to run as 12 different tests (three for
> each file type) and be able to see which key is failing for which file
> type.  I know I could write them all out individually but that seems
> unnecessary.

One way to do what you wanted is to harness python's dynamicity and
generate the methods by their names:

class TestFiles(unittest.TestCase):
for methname, case in somedict:
def test(self):
 ...
__dict__[methname] = test

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Damon Timm
Hi Vincent - Thanks for your input.

Where would I put that string ?  In the function's doctsring ?  Or
just as a print method ?

I have been looking online some more and it appears there may be a way
to create some sort of generator ... it's still a little confusing to
me, though.  I was hoping there was an easier way.  I can't imagine I
am the first person with this task to accomplish ...

Thanks,
Damon



On Thu, May 6, 2010 at 9:46 AM, Vincent Davis  wrote:
> By they way you shouldn't need to use str(file) as I did. Unlessit is
> not a string already. Bad habit. I am used to numbers
> vincet
>
> On Thursday, May 6, 2010, Vincent Davis  wrote:
>> I can't think of a way to do what you ask, without defining a test for each. 
>> ButI think what you might actually want is the define the error message to 
>> report which one failed. ie, it's one test with a meaningful error message.
>> 'Failed to load' + str(file)+' '+ str(k)+', '+str(v)I am not ecpert on 
>> unittests
>>
>>
>>
>>
>>
>>   Vincent Davis
>>     720-301-3003
>>
>>     vinc...@vincentdavis.net
>>
>>   my blog  |
>>   LinkedIn 
>> On Wed, May 5, 2010 at 6:37 PM, Damon Timm  wrote:
>> Hi - am trying to write some unit tests for my little python project -
>> I had been hard coding them when necessary here or there but I figured
>> it was time to try and learn how to do it properly.
>>
>> I've read over Python's guide
>> (http://docs.python.org/library/unittest.html) but I am having a hard
>> time understanding how I can apply it *properly* to my first test case
>> ...
>>
>> What I am trying to do is straightforward, I am just not sure how to
>> populate the tests easily.  Here is what I want to accomplish:
>>
>> # code
>> import unittest
>> from mlc.filetypes import * # the module I am testing
>>
>> # here are the *correct* key, value pairs I am testing against
>> TAG_VALUES = (
>>     ('title', 'Christmas Waltz'),
>>     ('artist', 'Damon Timm'),
>>     ('album', 'Homemade'),
>> )
>>
>> # list of different file types that I want to test my tag grabbing 
>> capabilities
>> # the tags inside these files are set to match my TAG_VALUES
>> # I want to make sure my code is extracting them correctly
>> FILES = (
>>     FLACFile('data/lossless/01 - Christmas Waltz.flac'),
>>     MP3File('data/lossy/04 - Christmas Waltz (MP3-79).mp3'),
>>     OGGFile('data/lossy/01 - Christmas Waltz (OGG-77).ogg'),
>>     MP4File('data/lossy/06 - Christmas Waltz (M4A-64).m4a'),
>> )
>>
>> class TestFiles(unittest.TestCase):
>>
>>     # this is the basic test
>>     def test_values(self):
>>         '''see if values from my object match what they should match'''
>>         for file in FILES:
>>             for k, v in TAG_VALUES:
>>                 self.assertEqual(self.file.tags[k], v)
>>
>> This test works, however, it only runs as *one* test (which either
>> fails or passes) and I want it to run as 12 different tests (three for
>> each file type) and be able to see which key is failing for which file
>> type.  I know I could write them all out individually but that seems
>> unnecessary.
>>
>> I suspect my answer lies in the Suites but I can't wrap my head around it.
>>
>> Thanks!
>>
>> Damon
>> ___
>> Tutor maillist  -  tu...@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
>>
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

2010-05-06 Thread Dave Angel

Art Kendall wrote:
I am running Windows 7 64bit Home premium. with quad cpus and 8G 
memory.   I am using Python 2.6.2.


I have all the Federalist Papers concatenated into one .txt file.
Which is how big?  Currently you (unnecessarily) load the entire thing 
into memory with readlines().  And then you do confusing work to split 
it apart again, into one list element per paper.   And for a while 
there, you have three copies of the entire text.  You're keeping two 
copies, in the form of alltext and papers. 

You print out the len(papers).  What do you see there?  Is it correctly 
87 ?  If it's not, you have to fix the problem here, before even going on.


  I want to prepare a file with a row for each paper and a column for 
each term. The cells would contain the count of a term in that paper.  
In the original application in the 1950's 30 single word terms were 
used. I can now use NoteTab to get a list of all the 8708 separate 
words in allWords.txt. I can then use that data in statistical 
exploration of the set of texts.


I have the python program(?) syntax(?) script(?) below that I am using 
to learn PYTHON. The comments starting with "later" are things I will 
try to do to make this more useful. I am getting one step at at time 
to work


It works when the number of terms in the term list is small e.g., 10.  
I get a file with the correct number of rows (87) and count columns 
(10) in termcounts.txt. The termcounts.txt file is not correct when I 
have a larger number of terms, e.g., 100. I get a file with only 40 
rows and the correct number of columns.  With 8700 terms I get only 40 
rows I need to be able to have about 8700 terms. (If this were FORTRAN 
I would say that the subscript indices were getting scrambled.)  (As I 
develop this I would like to be open-ended with the numbers of input 
papers and open ended with the number of words/terms.)




# word counts: Federalist papers

import re, textwrap
# read the combined file and split into individual papers
# later create a new version that deals with all files in a folder 
rather than having papers concatenated

alltext = file("C:/Users/Art/Desktop/fed/feder16v3.txt").readlines()
papers= re.split(r'FEDERALIST No\.'," ".join(alltext))
print len(papers)

countsfile = file("C:/Users/Art/desktop/fed/TermCounts.txt", "w")
syntaxfile = file("C:/Users/Art/desktop/fed/TermCounts.sps", "w")
# later create a python program that extracts all words instead of 
using NoteTab

termfile   = open("C:/Users/Art/Desktop/fed/allWords.txt")
termlist = termfile.readlines()
termlist = [item.rstrip("\n") for item in termlist]
print len(termlist)
# check for SPSS reserved words
varnames = textwrap.wrap(" ".join([v.lower() in ['and', 'or', 'not', 
'eq', 'ge',
'gt', 'le', 'lt', 'ne', 'all', 'by', 'to','with'] and (v+"_r") or v 
for v in termlist]))
syntaxfile.write("data list file= 
'c:/users/Art/desktop/fed/termcounts.txt' free/docnumber\n")

syntaxfile.writelines([v + "\n" for v in varnames])
syntaxfile.write(".\n")
# before using the syntax manually replace spaces internal to a string 
to underscore // replace (ltrtim(rtrim(varname))," ","_")   replace 
any special characters with @ in variable names



for p in range(len(papers)):

range(len()) is un-pythonic.  Simply do
for paper in papers:

and of course use paper below instead of papers[p]

   counts = []
   for t in termlist:
  counts.append(len(re.findall(r"\b" + t + r"\b", papers[p], 
re.IGNORECASE)))

   if sum(counts) > 0:
  papernum = re.search("[0-9]+", papers[p]).group(0)
  countsfile.write(str(papernum) + " " + " ".join([str(s) for s in 
counts]) + "\n")



Art

If you're memory limited, you really should sequence through the files, 
only loading one at a time, rather than all at once.  It's no harder.  
Use dirlist() to make a list of files, then your loop becomes something 
like:


for  infile in filelist:
 paper = " ".join(open(infile, "r").readlines())

Naturally, to do it right, you should usewith...  Or at least close 
each file when done.


DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] portability of pickle and shelve across platforms and different python versions?

2010-05-06 Thread python
Garry,

I asked a similar question on Stackoverflow.com and got some
great responses including at least one from a member of the
Python development team.

Best way to save complex Python data structures across program
sessions (pickle, json, xml, database, other)
http://stackoverflow.com/questions/2003693/best-way-to-save-compl
ex-python-data-structures-across-program-sessions-pickle

To cut-to-the-chase: I believe pickle files are portable across
platforms and versions. I do not know how portable shelve files
are.

Malcolm
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] portability of pickle and shelve across platforms and different python versions?

2010-05-06 Thread Bjorn Egil Ludvigsen
I do not know all the details, but I think it is NOT portable.

The reason I am saying this is from a comment in Mark Summerfields excellent
book "Rapid GUI programming with Python and Qt" where he has an example that
uses several different load/save methods (chapter 8). He recommends using
Qt's QDataStream method to write/read binary fields over pickle/cpickle
because the Qt method is fully platform independent and it can handle Qt
classes more effortlessly.

He writes that files written with QDataStream are platform-independent; the
class automatically takes care of endianess and word size. Since he does not
say this about pickle I infer from this that files written by pickle does
not take care of endianess or word size automatically.

Sorry for the guesswork as I might be wrong, but I hope this will guide you
in the right direction or that others will come up with more correct
answers.

Regards,
Bjorn

On Wed, May 5, 2010 at 6:12 PM, Garry Willgoose <
garry.willgo...@newcastle.edu.au> wrote:

> I have seen conflicting info on this on the web and in the tutor archive
> and the docs don't seem to address it explicitly. How portable are files
> containing pickle'd and shelve'd data? I'm thinking issues like simply O/S
> portability, through big-end/little-end hardware (for floats/integers which
> I use a lot), and then for unicode/non-unicode string,  64/32 bit and
> V2.6/V3.1 implementations of python. Does the version of the encoder in
> pickle make any difference for this? One post I've seen suggests that as
> long as the file is opened binary (ie. 'b') all should be well for platform
> independence.
>
> My core question if I give a pickled file to somebody else can i guarantee
> they can read/load it OK. The other person will be using exactly the same
> python code to open it as used to create it.
>
> 
> Prof Garry Willgoose,
> Australian Professorial Fellow in Environmental Engineering,
> Director, Centre for Climate Impact Management (C2IM),
> School of Engineering, The University of Newcastle,
> Callaghan, 2308
> Australia.
>
> Centre webpage: www.c2im.org.au
>
> Phone: (International) +61 2 4921 6050 (Tues-Fri AM); +61 2 6545 9574 (Fri
> PM-Mon)
> FAX: (International) +61 2 4921 6991 (Uni); +61 2 6545 9574 (personal and
> Telluric)
> Env. Engg. Secretary: (International) +61 2 4921 6042
>
> email:  garry.willgo...@newcastle.edu.au; g.willgo...@telluricresearch.com
> email-for-life: garry.willgo...@alum.mit.edu
> personal webpage: www.telluricresearch.com/garry
> 
> "Do not go where the path may lead, go instead where there is no path and
> leave a trail"
>  Ralph Waldo Emerson
> 
>
>
>
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Vincent Davis
By they way you shouldn't need to use str(file) as I did. Unlessit is
not a string already. Bad habit. I am used to numbers
vincet

On Thursday, May 6, 2010, Vincent Davis  wrote:
> I can't think of a way to do what you ask, without defining a test for each. 
> ButI think what you might actually want is the define the error message to 
> report which one failed. ie, it's one test with a meaningful error message.
> 'Failed to load' + str(file)+' '+ str(k)+', '+str(v)I am not ecpert on 
> unittests
>
>
>
>
>
>   Vincent Davis
> 720-301-3003
>
> vinc...@vincentdavis.net
>
>   my blog  |
>   LinkedIn 
> On Wed, May 5, 2010 at 6:37 PM, Damon Timm  wrote:
> Hi - am trying to write some unit tests for my little python project -
> I had been hard coding them when necessary here or there but I figured
> it was time to try and learn how to do it properly.
>
> I've read over Python's guide
> (http://docs.python.org/library/unittest.html) but I am having a hard
> time understanding how I can apply it *properly* to my first test case
> ...
>
> What I am trying to do is straightforward, I am just not sure how to
> populate the tests easily.  Here is what I want to accomplish:
>
> # code
> import unittest
> from mlc.filetypes import * # the module I am testing
>
> # here are the *correct* key, value pairs I am testing against
> TAG_VALUES = (
>     ('title', 'Christmas Waltz'),
>     ('artist', 'Damon Timm'),
>     ('album', 'Homemade'),
> )
>
> # list of different file types that I want to test my tag grabbing 
> capabilities
> # the tags inside these files are set to match my TAG_VALUES
> # I want to make sure my code is extracting them correctly
> FILES = (
>     FLACFile('data/lossless/01 - Christmas Waltz.flac'),
>     MP3File('data/lossy/04 - Christmas Waltz (MP3-79).mp3'),
>     OGGFile('data/lossy/01 - Christmas Waltz (OGG-77).ogg'),
>     MP4File('data/lossy/06 - Christmas Waltz (M4A-64).m4a'),
> )
>
> class TestFiles(unittest.TestCase):
>
>     # this is the basic test
>     def test_values(self):
>         '''see if values from my object match what they should match'''
>         for file in FILES:
>             for k, v in TAG_VALUES:
>                 self.assertEqual(self.file.tags[k], v)
>
> This test works, however, it only runs as *one* test (which either
> fails or passes) and I want it to run as 12 different tests (three for
> each file type) and be able to see which key is failing for which file
> type.  I know I could write them all out individually but that seems
> unnecessary.
>
> I suspect my answer lies in the Suites but I can't wrap my head around it.
>
> Thanks!
>
> Damon
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Newbie & Unittest ...

2010-05-06 Thread Vincent Davis
I can't think of a way to do what you ask, without defining a test for each.
ButI think what you might actually want is the define the error message to
report which one failed. ie, it's one test with a meaningful error message.
'Failed to load' + str(file)+' '+ str(k)+', '+str(v)
I am not ecpert on unittests

  *Vincent Davis
720-301-3003 *
vinc...@vincentdavis.net
 my blog  |
LinkedIn

On Wed, May 5, 2010 at 6:37 PM, Damon Timm  wrote:

> Hi - am trying to write some unit tests for my little python project -
> I had been hard coding them when necessary here or there but I figured
> it was time to try and learn how to do it properly.
>
> I've read over Python's guide
> (http://docs.python.org/library/unittest.html) but I am having a hard
> time understanding how I can apply it *properly* to my first test case
> ...
>
> What I am trying to do is straightforward, I am just not sure how to
> populate the tests easily.  Here is what I want to accomplish:
>
> # code
> import unittest
> from mlc.filetypes import * # the module I am testing
>
> # here are the *correct* key, value pairs I am testing against
> TAG_VALUES = (
>('title', 'Christmas Waltz'),
>('artist', 'Damon Timm'),
>('album', 'Homemade'),
> )
>
> # list of different file types that I want to test my tag grabbing
> capabilities
> # the tags inside these files are set to match my TAG_VALUES
> # I want to make sure my code is extracting them correctly
> FILES = (
>FLACFile('data/lossless/01 - Christmas Waltz.flac'),
>MP3File('data/lossy/04 - Christmas Waltz (MP3-79).mp3'),
>OGGFile('data/lossy/01 - Christmas Waltz (OGG-77).ogg'),
>MP4File('data/lossy/06 - Christmas Waltz (M4A-64).m4a'),
> )
>
> class TestFiles(unittest.TestCase):
>
># this is the basic test
>def test_values(self):
>'''see if values from my object match what they should match'''
>for file in FILES:
>for k, v in TAG_VALUES:
>self.assertEqual(self.file.tags[k], v)
>
> This test works, however, it only runs as *one* test (which either
> fails or passes) and I want it to run as 12 different tests (three for
> each file type) and be able to see which key is failing for which file
> type.  I know I could write them all out individually but that seems
> unnecessary.
>
> I suspect my answer lies in the Suites but I can't wrap my head around it.
>
> Thanks!
>
> Damon
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

2010-05-06 Thread Art Kendall
I am running Windows 7 64bit Home premium. with quad cpus and 8G 
memory.   I am using Python 2.6.2.


I have all the Federalist Papers concatenated into one .txt file.  I 
want to prepare a file with a row for each paper and a column for each 
term. The cells would contain the count of a term in that paper.  In the 
original application in the 1950's 30 single word terms were used. I can 
now use NoteTab to get a list of all the 8708 separate words in 
allWords.txt. I can then use that data in statistical exploration of the 
set of texts.


I have the python program(?) syntax(?) script(?) below that I am using 
to learn PYTHON. The comments starting with "later" are things I will 
try to do to make this more useful. I am getting one step at at time to work


It works when the number of terms in the term list is small e.g., 10.  I 
get a file with the correct number of rows (87) and count columns (10) 
in termcounts.txt. The termcounts.txt file is not correct when I have a 
larger number of terms, e.g., 100. I get a file with only 40 rows and 
the correct number of columns.  With 8700 terms I get only 40 rows I 
need to be able to have about 8700 terms. (If this were FORTRAN I would 
say that the subscript indices were getting scrambled.)  (As I develop 
this I would like to be open-ended with the numbers of input papers and 
open ended with the number of words/terms.)




# word counts: Federalist papers

import re, textwrap
# read the combined file and split into individual papers
# later create a new version that deals with all files in a folder 
rather than having papers concatenated

alltext = file("C:/Users/Art/Desktop/fed/feder16v3.txt").readlines()
papers= re.split(r'FEDERALIST No\.'," ".join(alltext))
print len(papers)

countsfile = file("C:/Users/Art/desktop/fed/TermCounts.txt", "w")
syntaxfile = file("C:/Users/Art/desktop/fed/TermCounts.sps", "w")
# later create a python program that extracts all words instead of using 
NoteTab

termfile   = open("C:/Users/Art/Desktop/fed/allWords.txt")
termlist = termfile.readlines()
termlist = [item.rstrip("\n") for item in termlist]
print len(termlist)
# check for SPSS reserved words
varnames = textwrap.wrap(" ".join([v.lower() in ['and', 'or', 'not', 
'eq', 'ge',
'gt', 'le', 'lt', 'ne', 'all', 'by', 'to','with'] and (v+"_r") or v for 
v in termlist]))
syntaxfile.write("data list file= 
'c:/users/Art/desktop/fed/termcounts.txt' free/docnumber\n")

syntaxfile.writelines([v + "\n" for v in varnames])
syntaxfile.write(".\n")
# before using the syntax manually replace spaces internal to a string 
to underscore // replace (ltrtim(rtrim(varname))," ","_")   replace any 
special characters with @ in variable names



for p in range(len(papers)):
   counts = []
   for t in termlist:
  counts.append(len(re.findall(r"\b" + t + r"\b", papers[p], 
re.IGNORECASE)))

   if sum(counts) > 0:
  papernum = re.search("[0-9]+", papers[p]).group(0)
  countsfile.write(str(papernum) + " " + " ".join([str(s) for s in 
counts]) + "\n")



Art
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor