Re: PyWhich
On 08/04/2011 10:03 PM, Chris Angelico wrote: On Fri, Aug 5, 2011 at 1:34 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Especially for a tool aimed at programmers (who else would be interested in PyWhich?) The use that first springs to my mind is debugging import paths etc. If you have multiple pythons installed and aren't sure that they're finding the right modules, you could fire up PyWhich on an innocuous module like math or sys, and see if it's loading it from the right path. People doing this might not necessarily be programmers, they might be sysadmins; but you're right that it's most likely this will be used by competent Python programmers. ChrisA I am trying to do debugging. I have had some trouble with multiple python installs with virtualenv, and I was trying to see where given modules came from. I knew about the code execution, but I couldn't think of a clean way to just find out the location rather than load it. The reason I used stdout was because I was going to be using it in a tool chain where the stdout might need to be formatted for another program to read in. Thats also why I was catching ImportError since a later version of this script might need to do something special with it. This is also useful to see if python is really using the module you think it is. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
PyWhich
Hey c.l.p., I wrote a little python script that finds the file that a python module came from. Does anyone see anything wrong with this script? #!/usr/bin/python import sys if __name__ == '__main__': if len(sys.argv) 1: try: m = __import__(sys.argv[1]) sys.stdout.write(m.__file__ + '\n') sys.stdout.flush() sys.exit(0) except ImportError: sys.stderr.write(No such module '%s'\n % sys.argv[1]) sys.stderr.flush() sys.exit(1) else: sys.stderr.write(Usage: pywhich module\n) sys.stderr.flush() sys.exit(0) -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR
On 08/01/2011 06:06 PM, Steven D'Aprano wrote: Does your definition of fixed mean gives wrong results for n= 4 ? fibo(4) == 3 False Well, I don't know if you're trolling or just dumb: http://en.wikipedia.org/wiki/Fibonacci_number In [2]: for i in range(10): ...: print fibo(i) ...: ...: 0.0 1.0 1.0 2.0 3.0 5.0 8.0 13.0 21.0 34.0 -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR
On 08/02/2011 08:45 AM, Alain Ketterlin wrote: produce integers. And it will fail with overflow for big values. If it would make you feel better I can use decimal. Also, perhaps I can name my function billy_fibo(n), which is defined as billy_fibo(n) +error(n) = fibo(n), where error(n) can be made arbitrarily small. This runs in constant time rather than linear (memoized) or exponential (fully recursive) at the cost of a minutia of accuracy. I find this tradeoff acceptable. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR
On 08/02/2011 10:15 AM, Steven D'Aprano wrote: So you say, but I don't believe it. Given fibo, the function you provided earlier, the error increases with N: fibo(82) - fib(82) # fib returns the accurate Fibonacci number 160.0 fibo(182) - fib(182) 2.92786721937918e+23 Hardly arbitrarily small. Perhaps the individual number is big, but compare that to: (fibo(n) - fib(n)) / fib(n) The number is still quite close to the actual answer. Your function also overflows for N = 1475: fibo(1475) Traceback (most recent call last): File stdin, line 1, inmodule File stdin, line 3, in fibo OverflowError: (34, 'Numerical result out of range') The correct value only has 307 digits, so it's not that large a number for integer math. I won't show them all, but it starts and ends like this: 8077637632...87040886025 Yes, I mentioned possibly using the decimal class, which I suppose does lose the constant access time depending on how its implemented. A good memoisation scheme will run in constant time (amortised). Amortized perhaps, but this assumes the call happening a number of times. Also, this requires linear memory to store previous values. Good heavens no. Only the most naive recursive algorithm is exponential. Good ones (note plural) are linear. Combine that with memoisation, and you have amortised constant time. Not all recursive functions can be memoized (or they can but for practically no benefit). What I was getting at was that a closed form expression of a recurrence might be significantly faster at an acceptable loss in accuracy. For an example, see the Ackermann function. Given that Fibonacci numbers are mostly of interest to number theorists, who care about the *actual* Fibonacci numbers and not almost-but-not-quite Fibonacci numbers, I'm having a lot of difficulty imagining what sort of application you have in mind that could legitimately make that trade-off. I was trying to show that there is an alternate method of calculation. Accuracy losses are really a problem with the underlying machinery rather than the high level code. If the recursive form of fib() were written in c, the integers would have overflown a long while ago compared to float. One other note, Fibonacci numbers grow exponentially fast (with respect to the number of bits), and python's integer multiplication takes exponential time (karatsuba rather than fft). If we are going to discuss the behavior of python's numeric types, then lets talk about how slow python will become for the nth Fibonacci integer and how much space it will take compared to the floating point short concise and almost as close form. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR
On 08/01/2011 05:11 AM, jc wrote: # Get Fibonacci Value #Fibonacci(N) = Fibonacci(N-1) + Fibonacci(N-2) # # n = 900 is OK # n = 1000 is ERROR , Why # # What Wrong? # I have fixed the problem for you: def fibo(n): phi = (1+5**.5)/2; iphi = 1-phi; return (phi**n - iphi**n) / (5**.5) -- Bill -- http://mail.python.org/mailman/listinfo/python-list
What is xrange?
Is xrange not a generator? I know it doesn't return a tuple or list, so what exactly is it? Y doesn't ever complete, but x does. x = (i for i in range(10)) y = xrange(10) print ===X=== while True: for i in x: print i break else: break print ===Y=== while True: for i in y: print i break else: break -- http://mail.python.org/mailman/listinfo/python-list
Re: What's in a name?
On 7/29/2011 11:25 PM, Andrew Berg wrote: In case you want to see the code (not complete by a long shot, and they need to be refactored): Module - http://elucidation.hg.sourceforge.net/hgweb/elucidation/elucidation/file/f8da0b15ecca/elucidation.py CLI app - http://disillusion-cli.hg.sourceforge.net/hgweb/disillusion-cli/disillusion-cli/file/947d230dbfc3/disillusion.py I have no code written for the GUI app yet. Any ideas? PyMetaMux ? -- http://mail.python.org/mailman/listinfo/python-list
Re: NoneType and new instances
On 07/28/2011 11:39 AM, Ethan Furman wrote: class 'NoneType' Traceback (most recent call last): File stdin, line 3, in module TypeError: cannot create 'NoneType' instances Why is NoneType unable to produce a None instance? I realise that None is a singleton, but so are True and False, and bool is able to handle returning them: -- bool(0) is bool(0) True This feels like a violation of 'Special cases aren't special enough to break the rules.' ~Ethan~ Probably for the same reason Ellipsis and NotImplemented also can't be instantiated. What that reason is I don't know. Related: http://bugs.python.org/issue6477#msg90641 -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: How can I make a program automatically run once per day?
On 07/27/2011 08:35 AM, Chris Angelico wrote: On Wed, Jul 27, 2011 at 10:27 PM, Dave Angelda...@ieee.org wrote: As Chris pointed out, you probably aren't getting the script's directory right. After all, how can the scheduler guess where you put it? The obvious answer is to use a full path for the script's filename. Another alternative is to fill in the current directory in the appropriate field of the scheduler's entry. I would prefer setting the current directory, as that allows the script to find any data files it needs, but either works. I find it useful to only add batch files to the scheduler. Those batch files can do any setup and cleanup necessary. In this case, the batch file might simply set the current directory to the location of the script. And that is an excellent idea. Definitely recommended. ChrisA If it hasn't been mentioned already: import time while True: t1 = time.time() #your code here t2 = time.time() time.sleep( 86400 - (t2 - t1) ) This doesn't take into account leap seconds, but it doesn't depend on a task scheduler. It is also independent of the time your code takes to execute. This is simpler, but it might drift slightly over time. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming Python for Absolute Beginners
On 7/27/2011 11:50 PM, harrismh777 wrote: No one cares and don't spam the list. -- http://mail.python.org/mailman/listinfo/python-list
Re: Convolution of different sized arrays
On 07/26/2011 08:10 AM, Olenka Subota wrote: If anyone of you can help, please do it.. Thanks! You would probably get a better answer asking on one of the mailing lists here: http://new.scipy.org/mailing-lists.html -- http://mail.python.org/mailman/listinfo/python-list
Re: reading zipfile; problem using raw buffer
On 07/26/2011 08:42 AM, Sells, Fred wrote: I'm tring to unzip a buffer that is uploaded to django/python. I can unzip the file in batch mode just fine, but when I get the buffer I get a BadZipfile exception. I wrote this snippet to try to isolate the issue but I don't understand what's going on. I'm guessing that I'm losing some header/trailer somewhere? def unittestZipfile(filename): buffer = '' f = open(filename) for i in range(22): block = f.read() if len(block) == 0: break else: buffer += block print len(buffer) tmp = open('tmp.zip', 'w') tmp.write(buffer) tmp.close() zf = zipfile.ZipFile('tmp.zip') print dir(zf) for name in zf.namelist(): print name print zf.read(name) 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] Traceback (most recent call last): File C:\all\projects\AccMDS30Server\mds30\app\uploaders\xmitzipfile.py, line 162, inmodule unittestZipfile('wk1live7.8to7.11.zip') File C:\all\projects\AccMDS30Server\mds30\app\uploaders\xmitzipfile.py, line 146, in unittestZipfile print zf.read(name) File C:\alltools\python26\lib\zipfile.py, line 837, in read return self.open(name, r, pwd).read() File C:\alltools\python26\lib\zipfile.py, line 867, in open raise BadZipfile, Bad magic number for file header zipfile.BadZipfile: Bad magic number for file header You need to specify the file mode since I'm guessing you use Windows from the traceback: f = open(filename, 'rb') and later: tmp = open('tmp.zip', 'wb') -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Only Bytecode, No .py Files
On 07/26/2011 11:19 AM, Eldon Ziegler wrote: Is there a way to have the Python processor look only for bytecode files, not .py files? We are seeing huge numbers of Linux audit messages on production system on which only bytecode files are stored. The audit subsystem is recording each open failure. Thanks, Eldon Ziegler How are you opening your files? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Pipe in the return statement
On 07/25/2011 10:16 AM, Archard Lias wrote: On Jul 25, 2:03 pm, Ian Collinsian-n...@hotmail.com wrote: On 07/26/11 12:00 AM, Archard Lias wrote: Hi, Still I dont get how I am supposed to understand the pipe and its task/ idea/influece on control flow, of: returnstatement|statement ?? It's simply a bitwise OR. -- Ian Collins Yes, but how does it get determined, which one actually gets returned? The return statement returns a single value from a function context. The pipe operator takes 2 values and bitwise ORs* them together. That result is then returned to the caller. The pipe character in this instance is not the same as in a shell. * This is not exactly true, but don't worry about it. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
On 07/25/2011 05:48 AM, Steven D'Aprano wrote: But if you're calling a function in both cases: map(int, data) [int(x) for x in data] I am aware the premature optimization is a danger, but its also incorrect to ignore potential performance pitfalls. I would favor a generator expression here, if only because I think its easier to read. In addition, it properly handles large amounts of data by not duplicating the list. For very long input sequences, genexp would be the proper thing to do (assuming you don't need to index into results, in which case, its wrong.) I think the fastest way to solve the OP's problem is the following: ;) def convert_165_0_to_int(arg): return 165 -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
On 7/24/2011 2:27 PM, SigmundV wrote: On Jul 21, 10:31 am, Frank Millmanfr...@chagford.com wrote: Is there a short cut, or must I do this every time (I have lots of them!) ? I know I can write a function to do this, but is there anything built-in? I'd say that we have established that there is no shortcut, no built- in for this. You write you own function: string_to_int = lambda s: int(float(s)) Then you apply it to your list of strings: list_of_integers = map(string_to_int, list_of_strings) Of course, this will be horribly slow if you have thousands of strings. In such a case you should use an iterator (assuming you use python 2.7): import itertools as it iterator = it.imap(string_to_int, list_of_strings) Regards, Sigmund if the goal is speed, then you should use generator expressions: list_of_integers = (int(float(s)) for s in list_of_strings) -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
On 7/23/2011 3:42 AM, Chris Angelico wrote: int(s.rstrip('0').rstrip('.')) Also, it will (in?)correct parse strings such as: '16500' to 165. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
On 7/23/2011 2:28 PM, rantingrick wrote: On Jul 23, 1:53 am, Frank Millmanfr...@chagford.com wrote: -- The problem with that is that it will silently ignore any non-zero digits after the point. Of course int(float(x)) does the same, which I had overlooked. -- Wait a minute; first you said all you wanted was to cast string floats to integers NOW your changing the rules. -- I do not expect any non-zero digits after the point, but if there are, I would want to be warned, as I should probably be treating it as a float, not an int. -- Then the solution is a try:except. py def castit(value): ... try: ... v = int(value) ... return v ... except ValueError: ... return float(value) ... py castit('165') 165 py castit('165.0') 165.0 py castit('165.333') 165.333 py castit('3.3') 3.3 -- To recap, the original problem is that it would appear that some third- party systems, when serialising int's into a string format, add a .0 to the end of the string. I am trying to get back to the original int safely. -- But you also said you wanted floats too, i am confused?? -- The ideal solution is the one I sketched out earlier - modify python's 'int' function to accept strings such as '165.0'. -- NO! You create your OWN casting function for special cases. PythonZEN: Special cases aren't special enough to break the rules. I'll probably get flak for this, but damn the torpedoes: def my_int(num): import re try: m = re.match('^(-?[0-9]+)(.0)?$', num) return int(m.group(1)) except AttributeError: #raise your own error, or re raise raise -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
On 07/22/2011 10:21 AM, Grant Edwards wrote: While that may be clear to you, that's because you've made some assumptions. Convert a properly formatted string representation of a floating point number to an integer is not a rigorous definition. What does properly formatted mean? Who says that the character representing the radix is . rather than ,? Properly formatted means that Python would accept the string as an argument to float() without raising an exception. Notice the last digit switched from a 3 to a 2? Floats in python don't have arbitrary accuracy. You would need to import decimal and use it for rounding to work properly. It should be floor() though, for that is what int() does. Um, what? The example given by the OP implied that int(float(s)) did what he wanted. That is _not_ rounding the float. It's the equivalent of using the floor() function. int(float(s)) does the right thing for short strings. However, for longer strings it loses information due to the way floats are implemented in Python. Python uses the IEEE754 double precision datatype(double) to implement floating point numbers. The floats only have 53 bits in the mantissa portion of the number which means python can only accurately represent integers up to 2**53 correctly as floats. Compare this to integers in Python, which are automatically upcast to longs if overflow would occur. The int() call will never lose accuracy when converting a properly formatted integer string. float() will lose accuracy, even if the float string is properly formatted. The is no floor() being called or used, this is simply the behavior of the float datatype. You seem to be worrying about python producing invalid output for invalid input (period separated numbers). You should be worrying if valid input (a very long float string) produces invalid output. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
On 07/22/2011 10:58 AM, Grant Edwards wrote: On 2011-07-22, Billy Mays81282ed9a88799d21e77957df2d84bd6514d9...@myhashismyemail.com wrote: Properly formatted means that Python would accept the string as an argument to float() without raising an exception. Then you can't assume that '.' is the radix character. When you use radix, I assume you mean the grouping separator for large numbers, not the base correct? I have always heard radix used as the base (ie base 2) of the number, as in radix sort. No, I'm talking about the claim that you should use decmial so that you can use rounding when the OP's example showed that rounding was not what he wanted. Yes, you are right. I mistyped what I was thinking. Let me rephrase: decimal is needed to preserve the accuracy of the string to `number` conversion. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
On 07/21/2011 08:46 AM, Web Dreamer wrote: If you do not want to use 'float()' try: int(x.split('.')[0]) This is right. But, the problem is the same as with int(float(x)), the integer number is still not as close as possible as the original float value. I would in fact consider doing this: int(round(float(x))) This is wrong, since there is a loss of information in the float cast: float('9007199254740993.0') 9007199254740992.0 Notice the last digit switched from a 3 to a 2? Floats in python don't have arbitrary accuracy. You would need to import decimal and use it for rounding to work properly. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Can someone help please
On 07/21/2011 01:02 PM, Gary wrote: Hi Can someone help me with this code below please, For some reason it will not send me the first text file in the directory. I made up an empty file a.txt file with nothing on it and it sends the files i need but would like to fix the code. Thanks total = ' ' os.chdir('/home/woodygar/Desktop/Docs') for i in os.listdir('.'): if '.txt' in i: f = open(i, 'r') total += f.read() f.close() message = \ Subject: %s %s % (SUBJECT,total) Does the file end with '.TXT' ? This might help: total = ' ' os.chdir('/home/woodygar/Desktop/Docs') txts = (nm for nm in os.listdir('.') if nm.lower().endswith('.txt') ) for nm in txts: f = open(nm, 'r') total += f.readlines() f.close() message = \ Subject: %s %s % (SUBJECT,total) I also changed read to readlines(). -- http://mail.python.org/mailman/listinfo/python-list
Re: Can someone help please
On 07/21/2011 01:41 PM, Gary Herron wrote: On 07/21/2011 10:23 AM, Billy Mays wrote: On 07/21/2011 01:02 PM, Gary wrote: Hi Can someone help me with this code below please, For some reason it will not send me the first text file in the directory. I made up an empty file a.txt file with nothing on it and it sends the files i need but would like to fix the code. Thanks total = ' ' os.chdir('/home/woodygar/Desktop/Docs') for i in os.listdir('.'): if '.txt' in i: f = open(i, 'r') total += f.read() f.close() message = \ Subject: %s %s % (SUBJECT,total) Does the file end with '.TXT' ? This might help: total = ' ' os.chdir('/home/woodygar/Desktop/Docs') txts = (nm for nm in os.listdir('.') if nm.lower().endswith('.txt') ) for nm in txts: f = open(nm, 'r') total += f.readlines() f.close() message = \ Subject: %s %s % (SUBJECT,total) I also changed read to readlines(). That won't work (You must not have even tried to run this.) The call f.readlines() returns a list which causes an error when added to a string: TypeError: cannot concatenate 'str' and 'list' objects Gary Herron You're right, I didn't. But thats not really the important part of the code. I believe the generator should work, (and in fact, should probably use os.path.walk instead of os.listdir ) -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
On 7/21/2011 10:40 PM, Thomas 'PointedEars' Lahn wrote: Billy Mays wrote: On 07/21/2011 08:46 AM, Web Dreamer wrote: If you do not want to use 'float()' try: int(x.split('.')[0]) This is right. Assuming that the value of `x' is in the proper format, of course. Else you might easily cut to the first one to three digits of a string representation (if `.' is the thousands separator of the locale, e. g.) The point (which was clear to me) was to convert a properly formatted string representation of a floating point number to an integer. We might also assume the number could be a hex encoded float or be in scientific notation. If the input is not properly formatted, it is unreasonable for us to return a correct value. But, the problem is the same as with int(float(x)), the integer number is still not as close as possible as the original float value. I would in fact consider doing this: int(round(float(x))) This is wrong, since there is a loss of information in the float cast: float('9007199254740993.0') 9007199254740992.0 Notice the last digit switched from a 3 to a 2? Floats in python don't have arbitrary accuracy. You would need to import decimal and use it for rounding to work properly. It should be floor() though, for that is what int() does. Um, what? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Return and set
I have a method getToken() which checks to see if a value is set, and if so, return it. However, it doesn't feel pythonic to me: def getToken(self): if self.tok: t = self.tok self.tok = None return t # ... Is there a way to trim the 'if' block to reset self.tok upon return? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Return and set
On 07/19/2011 09:43 AM, Ben Finney wrote: Billy Mays 81282ed9a88799d21e77957df2d84bd6514d9...@myhashismyemail.com writes: I have a method getToken() which checks to see if a value is set, and if so, return it. However, it doesn't feel pythonic to me: Clearly that's because the function name is not Pythonic :-) I'll assume the name is a PEP-8 compatible ‘get_token’. def getToken(self): if self.tok: t = self.tok self.tok = None return t # ... Are you testing ‘self.tok’ in a boolean context because you don't care whether it it might be ‘’ or ‘0’ or ‘0.0’ or ‘[]’ or ‘False’ or lots of other things that evaluate false in a boolean context? If you want to test whether it is any value other than ‘None’, that's not the way to do it. Instead, use ‘if self.token is not None’. But I don't see why you test it at all, in that case, since you're immediately setting it to ‘None’ afterward. Also, the function name is quite misleading; the implication for a function named ‘get_foo’ is that it is a non-destructive read. I would expect the name of this function to indicate what's going on much more explicitly. My suggestion:: def get_and_reset_token(self): result = self.token self.token = None return result This function is used in a file parser. There are two methods, getToken() and peekToken(). getToken pops a token from the file, while peekToken keeps the token, but still returns it. Code: def getToken(self): if self.tok: t = self.tok self.tok = None return t try: t = self.gen.next() except StopIteration: return NULL else: return t def peekToken(self): if not self.tok: self.tok = self.getToken() return self.tok NULL is an enumerated value I have defined above. The idea is for peekToken to reuse getToken, but to keep the token still around. -- http://mail.python.org/mailman/listinfo/python-list
Re: Return and set
On 07/19/2011 01:00 PM, Micah wrote: That sounds artificially backwards; why not let getToken() reuse peekToken()? def peek(self): if self.tok is None: try: self.tok = self.gen.next() except StopIteration: self.tok = NULL return self.tok def pop(self): token = self.peek() self.tok = None return token I actually like this way better, thanks! -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Return and set
On 07/19/2011 01:02 PM, Terry Reedy wrote: You did not answer Ben's question about the allowed values of self.tok and whether you really want to clobber all 'false' values. The proper code depends on that answer. NULL is an enumerated value I have defined above. The idea is for peekToken to reuse getToken, but to keep the token still around. I think about reversing and have getToken use peekToken and then reset. But that depends on the exact logic which depends on the specs. I would more likely have just one function with a reset parameter defaulted to the more common value. self.gen is a generator that gets filters single characters from a file. Values that come from self.gen.next() will always be string values since the file generator closes on EOF. I can be sure that I will either get a string from self.gen.next() or catch an exception so its okay to use NULL (which evaluates to 0). You are correct it makes more sense to use peekToken() to do the lifting and getToken() to reuse the token. However, I am not sure what you mean by reset. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: a little parsing challenge ☺
On 07/19/2011 01:14 PM, Xah Lee wrote: I added other unicode brackets to your list of brackets, but it seems your code still fail to catch a file that has mismatched curly quotes. (e.g.http://xahlee.org/p/time_machine/tm-ch04.html ) LOL Billy. Xah I suspect its due to the file mode being opened with 'rb' mode. Also, the diction of characters at the top, the closing token is the key, while the opening one is the value. Not sure if thats obvious. Also returning the position of the first mismatched pair is somewhat ambiguous. File systems store files as streams of octets (mine do anyways) rather than as characters. When you ask for the position of the the first mismatched pair, do you mean the position as per file.tell() or do you mean the nth character in the utf-8 stream? Also, you may have answered this earlier but I'll ask again anyways: You ask for the first mismatched pair, Are you referring to the inner most mismatched, or the outermost? For example, suppose you have this file: foo[(])bar Would the ( be the first mismatched character or would the ]? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Saving changes to path
On 07/19/2011 02:24 PM, Chess Club wrote: Hello, I used sys.path.append() to add to the path directory, but the changes made are not saved when I exit the compiler. Is there a way to save it? Thank you. Since python is running in a child process, it only affects its own environment variables. I'm not sure that a process can change its parents' environment variables in a platform independent way. I also tried: $ python -c 'import os; os.system(export FOO=1)' $ echo $FOO $ but it doesn't seem to work. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: a little parsing challenge ☺
On 07/17/2011 03:47 AM, Xah Lee wrote: 2011-07-16 I gave it a shot. It doesn't do any of the Unicode delims, because let's face it, Unicode is for goobers. import sys, os pairs = {'}':'{', ')':'(', ']':'[', '':'', ':', '':''} valid = set( v for pair in pairs.items() for v in pair ) for dirpath, dirnames, filenames in os.walk(sys.argv[1]): for name in filenames: stack = [' '] with open(os.path.join(dirpath, name), 'rb') as f: chars = (c for line in f for c in line if c in valid) for c in chars: if c in pairs and stack[-1] == pairs[c]: stack.pop() else: stack.append(c) print (Good if len(stack) == 1 else Bad) + ': %s' % name -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: a little parsing challenge ☺
On 7/18/2011 7:56 PM, Steven D'Aprano wrote: Billy Mays wrote: On 07/17/2011 03:47 AM, Xah Lee wrote: 2011-07-16 I gave it a shot. It doesn't do any of the Unicode delims, because let's face it, Unicode is for goobers. Goobers... that would be one of those new-fangled slang terms that the young kids today use to mean its opposite, like bad, wicked and sick, correct? I mention it only because some people might mistakenly interpret your words as a childish and feeble insult against the 98% of the world who want or need more than the 127 characters of ASCII, rather than understand you meant it as a sign of the utmost respect for the richness and diversity of human beings and their languages, cultures, maths and sciences. TL;DR version: international character sets are a problem, and Unicode is not the answer to that problem). As long as I have used python (which I admit has only been 3 years) Unicode has never appeared to be implemented correctly. I'm probably repeating old arguments here, but whatever. Unicode is a mess. When someone says ASCII, you know that they can only mean characters 0-127. When someone says Unicode, do the mean real Unicode (and is it 2 byte or 4 byte?) or UTF-32 or UTF-16 or UTF-8? When using the 'u' datatype with the array module, the docs don't even tell you if its 2 bytes wide or 4 bytes. Which is it? I'm sure that all the of these can be figured out, but the problem is now I have to ask every one of these questions whenever I want to use strings. Secondly, Python doesn't do Unicode exception handling correctly. (but I suspect that its a broader problem with languages) A good example of this is with UTF-8 where there are invalid code points ( such as 0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, ..., 0xFF, but you already knew that, as well as everyone else who wants to use strings for some reason). When embedding Python in a long running application where user input is received, it is very easy to make mistake which bring down the whole program. If any user string isn't properly try/excepted, a user could craft a malformed string which a UTF-8 decoder would choke on. Using ASCII (or whatever 8 bit encoding) doesn't have these problems since all codepoints are valid. Another (this must have been a good laugh amongst the UniDevs) 'feature' of unicode is the zero width space (UTF-8 code point 0xE2 0x80 0x8B). Any string can masquerade as any other string by placing few of these in a string. Any word filters you might have are now defeated by some cheesy Unicode nonsense character. Can you just just check for these characters and strip them out? Yes. Should you have to? I would say no. Does it get better? Of course! international character sets used for domain name encoding use yet a different scheme (Punycode). Are the following two domain names the same: tést.com , xn--tst-bma.com ? Who knows! I suppose I can gloss over the pains of using Unicode in C with every string needing to be an LPS since 0x00 is now a valid code point in UTF-8 (0x for 2 byte Unicode) or suffer the O(n) look up time to do strlen or concatenation operations. Can it get even better? Yep. We also now need to have a Byte order Mark (BOM) to determine the endianness of our characters. Are they little endian or big endian? (or perhaps one of the two possible middle endian encodings?) Who knows? String processing with unicode is unpleasant to say the least. I suppose that's what we get when we things are designed by committee. But Hey! The great thing about standards is that there are so many to choose from. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Possible File iteration bug
On 07/15/2011 04:01 AM, bruno.desthuilli...@gmail.com wrote: On Jul 14, 9:46 pm, Billy Maysno...@nohow.com wrote: I noticed that if a file is being continuously written to, the file generator does not notice it: def getLines(f): lines = [] for line in f: lines.append(line) return lines what's wrong with file.readlines() ? Using that will read the entire file into memory which may not be possible. In the library reference, it mentions that using the generator (which calls file.next()) uses a read ahead buffer to efficiently loop over the file. If I call .readline() myself, I forfeit that performance gain. I was thinking that a convenient solution to this problem would be to introduce a new Exception call PauseIteration, which would signal to the caller that there is no more data for now, but not to close down the generator entirely. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Possible File iteration bug
On 07/15/2011 08:39 AM, Thomas Rachel wrote: Am 14.07.2011 21:46 schrieb Billy Mays: I noticed that if a file is being continuously written to, the file generator does not notice it: Yes. That's why there were alternative suggestions in your last thread How to write a file generator. To repeat mine: an object which is not an iterator, but an iterable. class Follower(object): def __init__(self, file): self.file = file def __iter__(self): while True: l = self.file.readline() if not l: return yield l if __name__ == '__main__': import time f = Follower(open(/var/log/messages)) while True: for i in f: print i, print all read, waiting... time.sleep(4) Here, you iterate over the object until it is exhausted, but you can iterate again to get the next entries. The difference to the file as iterator is, as you have noticed, that once an iterator is exhausted, it will be so forever. But if you have an iterable, like the Follower above, you can reuse it as you want. I did see it, but it feels less pythonic than using a generator. I did end up using an extra class to get more data from the file, but it seems like overhead. Also, in the python docs, file.next() mentions there being a performance gain for using the file generator (iterator?) over the readline function. Really what would be useful is some sort of PauseIteration Exception which doesn't close the generator when raised, but indicates to the looping header that there is no more data for now. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Possible File iteration bug
On 07/15/2011 10:28 AM, Thomas Rachel wrote: Am 15.07.2011 14:52 schrieb Billy Mays: Also, in the python docs, file.next() mentions there being a performance gain for using the file generator (iterator?) over the readline function. Here, the question is if this performance gain is really relevant AKA feelable. The file object seems to have another internal buffer distinct from the one used for iterating used for the readline() function. Why this is not the same buffer is unclear to me. Really what would be useful is some sort of PauseIteration Exception which doesn't close the generator when raised, but indicates to the looping header that there is no more data for now. a None or other sentinel value would do this as well (as ChrisA already said). Thomas A sentinel does provide a work around, but it also passes the problem onto the caller rather than the callee: def getLines(f): lines = [] while True: yield f.readline() def bar(f): for line in getLines(f): if not line: # I now have to check here instead of in getLines break foo(line) def baz(f): for line in getLines(f) if line: # this would be nice for generators foo(line) bar() is the correct way to do things, but I think baz looks cleaner. I found my self writing baz() first, finding it wasn't syntactically correct, and then converting it to bar(). The if portion of the loop would be nice for generators, since it seems like the proper place for the sentinel to be matched. Also, with potentially infinite (but pauseable) data, there needs to be a nice way to catch stuff like this. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Looking for general advice on complex program
On 07/15/2011 03:47 PM, Josh English wrote: I remember reading that file locking doesn't work on network mounted drives (specifically nfs mounts), but you might be able to simply create a 'lock' (mydoc.xml.lock or the like) file for the XML doc in question. If that file exists you could either hang or silently give up. Not sure if that helps. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: String formatting - mysql insert
On 07/14/2011 11:00 AM, Christian wrote: Hi, I get some problem when i like to set the table name dynamic. I'm appreciate for any help. Christian ### works newcur.execute ( INSERT INTO events (id1,id2) VALUES (%s,%s); , (rs[1],rs[2])) ### works not newcur.execute ( INSERT INTO %s_events (id1,id2) VALUES (%s, %s); , (table_name,rs[1],rs[2])) ### works but is not really perfect: None from rs list result in None instead of NULL. newcur.execute ( INSERT INTO %s_events (id1,id2) VALUES ('%s','%s'); % (table_name,rs[1],rs[2])) You shouldn't use The bottom form at all since that is how injection attacks occur. The reason the second version doesn't work is because the the execute command escapes all of the arguments before replacing them. Example: sql = SELECT * FROM table WHERE col = %s; cur.execute(sql, ('name',)) # The actual sql statement that gets executed is: # SELECT * FROM table WHERE col = 'name'; # Notice the single quotes. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Possible File iteration bug
I noticed that if a file is being continuously written to, the file generator does not notice it: def getLines(f): lines = [] for line in f: lines.append(line) return lines with open('/var/log/syslog', 'rb') as f: lines = getLines(f) # do some processing with lines # /var/log/syslog gets updated in the mean time # always returns an empty list, even though f has more data lines = getLines(f) I found a workaround by adding f.seek(0,1) directly before the last getLines() call, but is this the expected behavior? Calling f.tell() right after the first getLines() call shows that it isn't reset back to 0. Is this correct or a bug? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Possible File iteration bug
On 07/14/2011 04:00 PM, Ian Kelly wrote: On Thu, Jul 14, 2011 at 1:46 PM, Billy Maysno...@nohow.com wrote: def getLines(f): lines = [] for line in f: lines.append(line) return lines with open('/var/log/syslog', 'rb') as f: lines = getLines(f) # do some processing with lines # /var/log/syslog gets updated in the mean time # always returns an empty list, even though f has more data lines = getLines(f) I found a workaround by adding f.seek(0,1) directly before the last getLines() call, but is this the expected behavior? Calling f.tell() right after the first getLines() call shows that it isn't reset back to 0. Is this correct or a bug? This is expected. Part of the iterator protocol is that once an iterator raises StopIteration, it should continue to raise StopIteration on subsequent next() calls. Is there any way to just create a new generator that clears its `closed` status? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
How to write a file generator
I want to make a generator that will return lines from the tail of /var/log/syslog if there are any, but my function is reopening the file each call: def getLines(): with open('/var/log/syslog', 'rb') as f: while True: line = f.readline() if line: yield line else: raise StopIteration I know the problem lies with the StopIteration, but I'm not sure how to tell the caller that there are no more lines for now. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: How to write a file generator
On 07/12/2011 11:52 AM, Thomas Jollans wrote: On 07/12/2011 04:46 PM, Billy Mays wrote: I want to make a generator that will return lines from the tail of /var/log/syslog if there are any, but my function is reopening the file each call: def getLines(): with open('/var/log/syslog', 'rb') as f: while True: line = f.readline() if line: yield line else: raise StopIteration I know the problem lies with the StopIteration, but I'm not sure how to tell the caller that there are no more lines for now. -- Bill http://stackoverflow.com/questions/1475950/tail-f-in-python-with-no-time-sleep That was actually the behavior I was trying to avoid. If there is no data to be read, the call will hang. That function is actually called by a webserver (from wsgiref) so it cannot hang indefinitely. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Why isn't there a good RAD Gui tool for python
On 07/11/2011 02:59 PM, Elias Fotinis wrote: On Mon, 11 Jul 2011 20:11:56 +0300, Stefan Behnel stefan...@behnel.de wrote: Just a quick suggestion regarding the way you posed your question. It's usually better to ask if anyone knows a good tool to do a specific job (which you would describe in your post), instead of complaining about there being none. Opinion is divided on this… http://bash.org/?152037 There is another way: http://bash.org/?684045 -- http://mail.python.org/mailman/listinfo/python-list
Re: Finding duplicated photo
On 07/08/2011 07:29 AM, TheSaint wrote: Hello, I came across the problem that Gwenview moves the photo from the camera memory by renaming them, but later I forgot which where moved. Then I tought about a small script in python, but I stumbled upon my ignorance on the way to do that. PIL can find similar pictures. I was thinking to reduce the foto into gray scale and resize them to same size, what algorithm should take place? Is PIL able to compare 2 images? I recently wrote a program after reading an article ( http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html ) using the DCT method he proposes. It worked surprisingly well even with just the 64bit hash it produces. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Finding duplicated photo
On 07/08/2011 10:14 AM, TheSaint wrote: Billy Mays wrote: It worked surprisingly well even with just the 64bit hash it produces. I'd say that comparing 2 images reduced upto 32x32 bit seems too little to find if one of the 2 portrait has a smile referred to the other. I think it's about that mine and your suggestion are similar, but I'd like to scale pictures not less than 256x256 pixel. Also to take a wider case which the comparison involve a rotated image. Originally I thought the same thing. It turns out that doing a DCT on an image typically moves the more important data to the top left corner of the output. This means that most of the other data in the output an be thrown away since most of it doesn't significantly affect the image. The 32x32 is an arbitrary size, you can make it any square block that you want. Rotation is harder to find. You can always take a brute force approach by simply rotating the image a couple of times and try running the algorithm on each of the rotated pics. Image matching is a difficult problem. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation vs. string formatting
On 07/08/2011 04:18 PM, Andrew Berg wrote: Is it bad practice to use this logger.error(self.preset_file + ' could not be stored - ' + sys.exc_info()[1]) Instead of this? logger.error('{file} could not be stored - {error}'.format(file=self.preset_file, error=sys.exc_info()[1])) Other than the case where a variable isn't a string (format() converts variables to strings, automatically, right?) and when a variable is used a bunch of times, concatenation is fine, but somehow, it seems wrong. Sorry if this seems a bit silly, but I'm a novice when it comes to design. Plus, there's not really supposed to be more than one way to do it in Python. If it means anything, I think concatenation is faster. __TIMES__ a() - 0.09s b() - 0.09s c() - 54.80s d() - 5.50s Code is below: def a(v): out = for i in xrange(100): out += v return len(out) def b(v): out = for i in xrange(10): out += v+v+v+v+v+v+v+v+v+v return len(out) def c(v): out = for i in xrange(100): out = %s%s % (out, v) return len(out) def d(v): out = for i in xrange(10): out = %s%s%s%s%s%s%s%s%s%s%s % (out,v,v,v,v,v,v,v,v,v,v) return len(out) print a, a('xx') print b, b('xx') print c, c('xx') print d, d('xx') import profile profile.run(a('xx')) profile.run(b('xx')) profile.run(c('xx')) profile.run(d('xx')) -- http://mail.python.org/mailman/listinfo/python-list
Large number multiplication
I was looking through the python source and noticed that long multiplication is done using the Karatsuba method (O(~n^1.5)) rather than using FFTs O(~n log n). I was wondering if there was a reason the Karatsuba method was chosen over the FFT convolution method? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Large number multiplication
On 07/06/2011 04:05 PM, Christian Heimes wrote: Am 06.07.2011 21:30, schrieb Billy Mays: I was looking through the python source and noticed that long multiplication is done using the Karatsuba method (O(~n^1.5)) rather than using FFTs O(~n log n). I was wondering if there was a reason the Karatsuba method was chosen over the FFT convolution method? The Karatsuba algorithm uses just addition, subtraction and multiplication, so you don't need to resort to floats and have no rounding errors. On the other hand FFT are based on e, complex numbers or trigonometric functions (=floats), which mean you'll get rounding errors. We don't want rounding errors for large long multiplication. Christian I believe it is possible to do FFTs without significant rounding error. I know that the GIMPS's Prime95 does very large multiplications using FFTs (I don't know if they use the integer based or double based version). I also know they have guards to prevent rounding errors so I don't think it would be impossible to implement. -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Large number multiplication
On 07/06/2011 04:02 PM, Ian Kelly wrote: On Wed, Jul 6, 2011 at 1:30 PM, Billy Maysno...@nohow.com wrote: I was looking through the python source and noticed that long multiplication is done using the Karatsuba method (O(~n^1.5)) rather than using FFTs O(~n log n). I was wondering if there was a reason the Karatsuba method was chosen over the FFT convolution method? According to Wikipedia: In practice the Schönhage–Strassen algorithm starts to outperform older methods such as Karatsuba and Toom–Cook multiplication for numbers beyond 2**2**15 to 2**2**17 (10,000 to 40,000 decimal digits). I think most Python users are probably not working with numbers that large, and if they are, they are probably using specialized numerical libraries anyway, so there would be little benefit in implementing it in core. You are right that not many people would gain significant use of it. The reason I ask is because convolution has a better (best ?) complexity class than the current multiplication algorithm. I do like the idea of minimizing reliance on external libraries, but only if the changes would be useful to all the regular users of python. I was more interested in finding previous discussion (if any) on why Karatsuba was chosen, not so much as trying to alter the current multiplication implementation. Side note: Are Numpy/Scipy the libraries you are referring to? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Better way to iterate over indices?
I have always found that iterating over the indices of a list/tuple is not very clean: for i in range(len(myList)): doStuff(i, myList[i]) I know I could use enumerate: for i, v in enumerate(myList): doStuff(i, myList[i]) ...but that stiff seems clunky. Are there any better ways to iterate over the indices of a list /tuple? --Bill -- http://mail.python.org/mailman/listinfo/python-list
Standard Deviation One-liner
I'm trying to shorten a one-liner I have for calculating the standard deviation of a list of numbers. I have something so far, but I was wondering if it could be made any shorter (without imports). Here's my function: a=lambda d:(sum((x-1.*sum(d)/len(d))**2 for x in d)/(1.*(len(d)-1)))**.5 The functions is invoked as follows: a([1,2,3,4]) 1.2909944487358056 -- http://mail.python.org/mailman/listinfo/python-list
Re: Updated blog post on how to use super()
On 5/31/2011 10:44 PM, Raymond Hettinger wrote: I've tightened the wording a bit, made much better use of keyword arguments instead of kwds.pop(arg), and added a section on defensive programming (protecting a subclass from inadvertently missing an MRO requirement). Also, there is an entry on how to use assertions to validate search order requirements and make them explicit. http://bit.ly/py_super or http://rhettinger.wordpress.com/2011/05/26/super-considered-super/ Any further suggestions are welcome. I'm expecting this to evolve into how-to guide to be included in the regular Python standard documentation. The goal is to serve as a reliable guide to using super and how to design cooperative classes in a way that lets subclasses compose and extent them. Raymond Hettinger follow my python tips on twitter: @raymondh I read this when it was on HN the other day, but I still don't see what is special about super(). It seems (from your post) to just be a stand in for the super class name? Is there something special I missed? -- Bill -- http://mail.python.org/mailman/listinfo/python-list
Re: Updated blog post on how to use super()
On 6/1/2011 12:42 PM, Ian Kelly wrote: On Wed, Jun 1, 2011 at 7:03 AM, Billy Maysno...@nohow.com wrote: I read this when it was on HN the other day, but I still don't see what is special about super(). It seems (from your post) to just be a stand in for the super class name? Is there something special I missed? It's not a stand-in for the super-class name. It's a stand-in for whatever class is next in the Method Resolution Order (MRO), which is determined at run-time and can vary depending on what the actual class of the object is. For example, in this inheritance situation: class A(object): ... class B(object): ... class C(A, B): ... a = A() c = C() The MRO of A is (A, object). The MRO of B is (B, object). The MRO of C is (C, A, B, object). Thus, super(A, a) is going to resolve to object, as you might expect. But super(A, c) is going to resolve to B, because the next class after A in the MRO for C instances is B. That's a pretty quick and dirty explanation. If it doesn't make sense, I suggest reading the article again. What it does is clear to me, but why is it interesting or special isn't. This looks like a small feature that would be useful in a handful of cases. -- Bill -- http://mail.python.org/mailman/listinfo/python-list