Re: Progress on the Gilectomy
On Jun 20, 2017, at 1:19 AM, Paul Rubin <no.email@nospam.invalid> wrote: > Cem Karan <cfkar...@gmail.com> writes: >> Can you give examples of how it's not reliable? > > Basically there's a chance of it leaking memory by mistaking a data word > for a pointer. This is unlikely to happen by accident and usually > inconsequential if it does happen, but maybe there could be malicious > data that makes it happen Got it, thank you. My processes will run for 1-2 weeks at a time, so I can handle minor memory leaks over that time without too much trouble. > Also, it's a non-compacting gc that has to touch all the garbage as it > sweeps, not a reliability issue per se, but not great for performance > especially in large, long-running systems. I'm not too sure how much of performance impact that will have. My code generates a very large number of tiny, short-lived objects at a fairly high rate of speed throughout its lifetime. At least in the last iteration of the code, garbage collection consumed less than 1% of the total runtime. Maybe this is something that needs to be done and profiled to see how well it works? > It's brilliant though. It's one of those things that seemingly can't > possibly work, but it turns out to be quite effective. Agreed! I **still** can't figure out how they managed to do it, it really does look like it shouldn't work at all! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Progress on the Gilectomy
On Jun 19, 2017, at 6:19 PM, Gregory Ewing <greg.ew...@canterbury.ac.nz> wrote: > Ethan Furman wrote: >> Let me ask a different question: How much effort is required at the C level >> when using tracing garbage collection? > > That depends on the details of the GC implementation, but often > you end up swapping one form of boilerplate (maintaining ref > counts) for another (such as making sure the GC system knows > about all the temporary references you're using). > > Some, such as the Bohm collector, try to figure it all out > automagically, but they rely on non-portable tricks and aren't > totally reliable. Can you give examples of how it's not reliable? I'm currently using it in one of my projects, so if it has problems, I need to know about them. On the main topic: I think that a good tracing garbage collector would probably be a good idea. I've been having a real headache binding python to my C library via ctypes, and a large part of that problem is that I've got two different garbage collectors (python and bdwgc). I think I've got it worked out at this point, but it would have been convenient to get memory allocated from python's garbage collected heap on the C-side. Lot fewer headaches. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: pip list --outdated gives all packages
On May 29, 2017, at 1:51 AM, Cecil Westerhof <ce...@decebal.nl> wrote: > On Monday 29 May 2017 06:16 CEST, Cecil Westerhof wrote: > >>> I'm completely flummoxed then; on my machines I get the 'old' >>> behavior. Can you try a completely clean Debian install somewhere >>> (maybe on a virtual box) and see what happens? I'm wondering if >>> there is something going on with your migration. >> >> I will do that. By the way, because of hardware I installed Stretch >> which at the moment is still in testing. > > I tried it. (Where some problems. Looks like you can not do certain > things in VirtualBox. But that is for another time.) > Get the same result. So maybe I should put it on the Debian list. Yeah, I have no idea what to tell you. Good luck! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: pip list --outdated gives all packages
On May 27, 2017, at 11:10 AM, Cecil Westerhof <ce...@decebal.nl> wrote: > On Saturday 27 May 2017 16:34 CEST, Cem Karan wrote: > >> >> On May 27, 2017, at 7:15 AM, Cecil Westerhof <ce...@decebal.nl> wrote: >> >>> On Saturday 27 May 2017 12:33 CEST, Cecil Westerhof wrote: >>> >>>> I wrote a script to run as a cron job to check if I need to update >>>> my Python installations. I migrated from openSUSE to Debian and >>>> that does not work anymore (pip2 and pip3): it displays the same >>>> with and without --outdated. Anyone knows what the problem could >>>> be? >>> >>> It does not exactly displays the same, but it displays all >>> packages, while in the old version it only displayed the outdated >>> versions. I already made a change with awk, but I would prefer the >>> old functionality. >>> >>> By the way, the patch is: >>> pip2 list --outdated --format=legacy | awk ' >>> { >>> if (substr($2, 2, length($2) - 2) != $5) { >>> print $0 >>> } >>> }' >> >> Could you check the output of 'pip3 --version'? When I tested pip3 >> on my machine, 'pip3 list --outdated' only yielded the outdated >> packages, not a list of everything out there. > > Both as normal user and root I get: >pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.5) I'm completely flummoxed then; on my machines I get the 'old' behavior. Can you try a completely clean Debian install somewhere (maybe on a virtual box) and see what happens? I'm wondering if there is something going on with your migration. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: pip list --outdated gives all packages
On May 27, 2017, at 7:15 AM, Cecil Westerhof <ce...@decebal.nl> wrote: > On Saturday 27 May 2017 12:33 CEST, Cecil Westerhof wrote: > >> I wrote a script to run as a cron job to check if I need to update >> my Python installations. I migrated from openSUSE to Debian and that >> does not work anymore (pip2 and pip3): it displays the same with and >> without --outdated. Anyone knows what the problem could be? > > It does not exactly displays the same, but it displays all packages, > while in the old version it only displayed the outdated versions. I > already made a change with awk, but I would prefer the old > functionality. > > By the way, the patch is: >pip2 list --outdated --format=legacy | awk ' >{ >if (substr($2, 2, length($2) - 2) != $5) { >print $0 >} >}' Could you check the output of 'pip3 --version'? When I tested pip3 on my machine, 'pip3 list --outdated' only yielded the outdated packages, not a list of everything out there. I'm asking about 'pip3 --version' because I found that my PATH as an ordinary user and as root were different, so my scripts would work as an ordinary user and then fail as root. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Survey: improving the Python std lib docs
On May 16, 2017, at 12:36 PM, rzed <rzan...@gmail.com> wrote: > On Friday, May 12, 2017 at 6:02:58 AM UTC-4, Steve D'Aprano wrote: >> One of the more controversial aspects of the Python ecosystem is the Python >> docs. Some people love them, and some people hate them and describe them as >> horrible. >> > [...] > > One thing I would love to see in any function or class docs is a few example > invocations, preferably non-trivial. If I need to see more, I can read the > entire doc, but most times I just want a refresher on how the function is > called. Does it use keywords? Are there required nameless parameters? In what > order? A line or two would immediately clarify that most of the time. > > Apart from that, links to docs for uncommon functions (or to the docs of the > module, if there are many) would be at least somewhat useful. I'd like to see complete signatures in the docstrings, so when I use help() on something that has *args or **kwargs I can see what the arguments actually are. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Battle of the garbage collectors, or ARGGHHHHHH!!!!
On Apr 24, 2017, at 8:54 PM, Jon Ribbens <jon+use...@unequivocal.eu> wrote: > On 2017-04-24, CFK <cfkar...@gmail.com> wrote: >> Long version: I'm trying to write bindings for python via ctypes to control >> a library written in C that uses the bdwgc garbage collector ( >> http://www.hboehm.info/gc/). The bindings mostly work, except for when >> either bdwgc or python's garbage collector decide to get into an argument >> over what is garbage and what isn't, in which case I get a segfault because >> one or the other collector has already reaped the memory. > > Make your Python C objects contain a pointer to a > GC_MALLOC_UNCOLLECTABLE block that contains a pointer to the > bwdgc object it's an interface to? And GC_FREE it in tp_dealloc? > Then bwdgc won't free any C memory that Python is referencing. OK, I realized today that there was a miscommunication somewhere. My python code is all pure python, and the library is pure C, and it is not designed to be called by python (it's intended to be language neutral, so if someone wants to call it from a different language, they can). That means that tp_dealloc (which is part of the python C API) is probably not going to work. I got interrupted (again) so I didn't have a chance to try the next trick and register the ctypes objects as roots from which to scan in bdwgc, but I'm hoping that roots aren't removed. If that works, I'll post it to the list. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Battle of the garbage collectors, or ARGGHHHHHH!!!!
On Apr 24, 2017, at 8:54 PM, Jon Ribbens <jon+use...@unequivocal.eu> wrote: > On 2017-04-24, CFK <cfkar...@gmail.com> wrote: >> Long version: I'm trying to write bindings for python via ctypes to control >> a library written in C that uses the bdwgc garbage collector ( >> http://www.hboehm.info/gc/). The bindings mostly work, except for when >> either bdwgc or python's garbage collector decide to get into an argument >> over what is garbage and what isn't, in which case I get a segfault because >> one or the other collector has already reaped the memory. > > Make your Python C objects contain a pointer to a > GC_MALLOC_UNCOLLECTABLE block that contains a pointer to the > bwdgc object it's an interface to? And GC_FREE it in tp_dealloc? > Then bwdgc won't free any C memory that Python is referencing. That's a really clever idea… I'm not near the machine that I could test it on right now, but I'll give it a shot tomorrow and see how it works. I'll let everyone know what I find out. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Battle of the garbage collectors, or ARGGHHHHHH!!!!
On Apr 24, 2017, at 6:59 PM, Terry Reedy <tjre...@udel.edu> wrote: > On 4/24/2017 6:24 PM, CFK wrote: >> TLDR version: the bdwgc garbage collector (http://www.hboehm.info/gc/) and >> python's collector are not playing nice with one another, and I need to >> make them work with each other. >> Long version: I'm trying to write bindings for python via ctypes to control >> a library written in C that uses the bdwgc garbage collector ( >> http://www.hboehm.info/gc/). The bindings mostly work, except for when >> either bdwgc or python's garbage collector decide to get into an argument >> over what is garbage and what isn't, in which case I get a segfault because >> one or the other collector has already reaped the memory. I need the two >> sides to play nice with one another. I can think of two solutions: >> First, I can replace Python's garbage collector via the functions described >> at https://docs.python.org/3/c-api/memory.html#customize-memory-allocators >> so that they use the bdwgc functions instead. However, this leads me to a >> whole series of questions: >>1. Has anyone done anything like this before? > > I know that experiments have been done. > Have you tried searching 'Python bdwgc garbage collection' or similar? I did google around a bit, but the results I found weren't relevant. I was hoping someone else on the list had tried, and simply hadn't gotten around to posting about it anywhere yet. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Python replace multiple strings (m*n) combination
Another possibility is to form a suffix array (https://en.wikipedia.org/wiki/Suffix_array#Applications) as an index for the string, and then search for patterns within the suffix array. The basic idea is that you index the string you're searching over once, and then look for patterns within it. The main problem with this method is how you're doing the replacements. If your replacement text can create a new string that matches a different regex that occurs later on, then you really should use what INADA Naoki suggested. Thanks, Cem Karan On Feb 25, 2017, at 2:08 PM, INADA Naoki <songofaca...@gmail.com> wrote: > If you can use third party library, I think you can use Aho-Corasick > algorithm. > > https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm > > https://pypi.python.org/pypi/pyahocorasick/ > > On Sat, Feb 25, 2017 at 3:54 AM, <kar6...@gmail.com> wrote: >> I have a task to search for multiple patterns in incoming string and replace >> with matched patterns, I'm storing all pattern as keys in dict and >> replacements as values, I'm using regex for compiling all the pattern and >> using the sub method on pattern object for replacement. But the problem I >> have a tens of millions of rows, that I need to check for pattern which is >> about 1000 and this is turns out to be a very expensive operation. >> >> What can be done to optimize it. Also I have special characters for >> matching, where can I specify raw string combinations. >> >> for example is the search string is not a variable we can say >> >> re.search(r"\$%^search_text", "replace_text", "some_text") but when I read >> from the dict where shd I place the "r" keyword, unfortunately putting >> inside key doesnt work "r key" like this >> >> Pseudo code >> >> for string in genobj_of_million_strings: >> pattern = re.compile('|'.join(regex_map.keys())) >> return pattern.sub(lambda x: regex_map[x], string) >> -- >> https://mail.python.org/mailman/listinfo/python-list > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: PTH files: Abs paths not working as expected. Symlinks needed?
On Feb 16, 2017, at 9:55 PM, Rustom Mody <rustompm...@gmail.com> wrote: > On Friday, February 17, 2017 at 3:24:32 AM UTC+5:30, Terry Reedy wrote: >> On 2/15/2017 7:42 AM, poseidon wrote: >> >>> what are pth files for? >> >> They are for extending (mainly) lib/site-packages. > > > Hey Terry! > This needs to get into more public docs than a one-off post on a newsgroup/ML +1! This is the first I've heard of this, and it sounds INCREDIBLY useful! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Who owns the memory in ctypes?
Hi all, I'm hoping that this will be an easy question. I have a pile of C code that I wrote that I want to interface to via the ctypes module (https://docs.python.org/3/library/ctypes.html). The C code uses the Boehm-Demers-Weiser garbage collector (http://www.hboehm.info/gc/) for all of its memory management. What I want to know is, who owns allocated memory? That is, if my C code allocates memory via GC_MALLOC() (the standard call for allocating memory in the garbage collector), and I access some object via ctypes in python, will the python garbage collector assume that it owns it and attempt to dispose of it when it goes out of scope? Ideally, the memory is owned by the side that created it, with the other side simply referencing it, but I want to be sure before I invest a lot of time interfacing the two sides together. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Byte code descriptions somewhere?
On Oct 1, 2016, at 7:34 PM, breamore...@gmail.com wrote: > On Saturday, October 1, 2016 at 11:57:17 PM UTC+1, Cem Karan wrote: >> Hi all, I've all of a sudden gotten interested in the CPython interpreter, >> and started trying to understand how it ingests and runs byte code. I found >> Include/opcode.h in the python sources, and I found some basic documentation >> on how to add in new opcodes online, but I haven't found the equivalent of >> an assembly manual like you might for x86, etc. Is there something similar >> to a manual dedicated to python byte code? Also, is there a manual for how >> the interpreter expects the stack, etc. to be setup so that all interactions >> go as expected (garbage collections works, exceptions work, etc.)? >> Basically, I want a manual similar to what Intel or AMD might put out for >> their chips so that all executables behave nicely with one another. >> >> Thanks, >> Cem Karan > > Further to Ben Finney's answer this > https://docs.python.org/devguide/compiler.html should help. > > Kindest regards. > > Mark Lawrence. > -- > https://mail.python.org/mailman/listinfo/python-list Thank you! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Byte code descriptions somewhere?
On Oct 1, 2016, at 8:30 PM, Ned Batchelder <n...@nedbatchelder.com> wrote: > On Saturday, October 1, 2016 at 7:48:09 PM UTC-4, Cem Karan wrote: >> Cool, thank you! Quick experimentation suggests that I don't need to worry >> about marking anything for garbage collection, correct? The next question >> is, how do I create a stream of byte codes that can be interpreted by >> CPython directly? I don't mean 'use the compile module', I mean writing my >> own byte array with bytes that CPython can directly interpret. > > In Python 2, you use new.code: > https://docs.python.org/2/library/new.html#new.code It takes a bytestring of > byte codes as one of its > twelve (!) arguments. > > Something that might help (indirectly) with understanding bytecode: > byterun (https://github.com/nedbat/byterun) is a pure-Python implementation > of a Python bytecode VM. > > --Ned. byterun seems like the perfect project to work through to understand things. Thank you for pointing it out! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Byte code descriptions somewhere?
On Oct 1, 2016, at 7:56 PM, Chris Angelico <ros...@gmail.com> wrote: > On Sun, Oct 2, 2016 at 10:47 AM, Cem Karan <cfkar...@gmail.com> wrote: >> Cool, thank you! Quick experimentation suggests that I don't need to worry >> about marking anything for garbage collection, correct? The next question >> is, how do I create a stream of byte codes that can be interpreted by >> CPython directly? I don't mean 'use the compile module', I mean writing my >> own byte array with bytes that CPython can directly interpret. >> > > "Marking for garbage collection" in CPython is done by refcounts; the > bytecode is at a higher level than that. > >>>> dis.dis("x = y*2") > 1 0 LOAD_NAME0 (y) > 3 LOAD_CONST 0 (2) > 6 BINARY_MULTIPLY > 7 STORE_NAME 1 (x) > 10 LOAD_CONST 1 (None) > 13 RETURN_VALUE > > A LOAD operation will increase the refcount (a ref is on the stack), > BINARY_MULTIPLY dereferences the multiplicands and adds a ref to the > product, STORE will deref whatever previously was stored, etc. > > To execute your own code, look at types.FunctionType and > types.CodeType, particularly the latter's 'codestring' argument > (stored as the co_code attribute). Be careful: you can easily crash > CPython if you mess this stuff up :) Ah, but crashing things is how we learn! :) That said, types.CodeType and types.FunctionType appear to be EXACTLY what I'm looking for! Thank you! Although I have to admit, the built-in docs for types.CodeType are concerning... "Create a code object. Not for the faint of heart." Maybe that should be updated to "Here there be dragons"? I'll poke through python's sources to get an idea of how to use codestring argument, but I'll probably be asking more questions on here about it. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Byte code descriptions somewhere?
I kind of got the feeling that was so from reading the docs in the source code. Too bad! :( Cem On Oct 1, 2016, at 7:53 PM, Paul Rubin <no.email@nospam.invalid> wrote: > Cem Karan <cfkar...@gmail.com> writes: >> how do I create a stream of byte codes that can be interpreted by >> CPython directly? > > Basically, study the already existing code and do something similar. > The CPython bytecode isn't standardized like JVM bytecode. It's > designed for the interpreter's convenience, not officially documented, > and (somewhat) subject to change between versions. > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Byte code descriptions somewhere?
Cool, thank you! Quick experimentation suggests that I don't need to worry about marking anything for garbage collection, correct? The next question is, how do I create a stream of byte codes that can be interpreted by CPython directly? I don't mean 'use the compile module', I mean writing my own byte array with bytes that CPython can directly interpret. Thanks, Cem Karan On Oct 1, 2016, at 7:02 PM, Ben Finney <ben+pyt...@benfinney.id.au> wrote: > Cem Karan <cfkar...@gmail.com> writes: > >> Hi all, I've all of a sudden gotten interested in the CPython >> interpreter, and started trying to understand how it ingests and runs >> byte code. > > That sounds like fun! > >> Is there something similar to a manual dedicated to python byte code? > > The Python documentation for the ‘dis’ module shows not only how to use > that module for dis-assembly of Python byte code, but also a reference > for the byte code. > >32.12. dis — Disassembler for Python bytecode > ><URL:https://docs.python.org/3/library/dis.html> > > -- > \ “Skepticism is the highest duty and blind faith the one | > `\ unpardonable sin.” —Thomas Henry Huxley, _Essays on | > _o__) Controversial Questions_, 1889 | > Ben Finney > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Byte code descriptions somewhere?
Hi all, I've all of a sudden gotten interested in the CPython interpreter, and started trying to understand how it ingests and runs byte code. I found Include/opcode.h in the python sources, and I found some basic documentation on how to add in new opcodes online, but I haven't found the equivalent of an assembly manual like you might for x86, etc. Is there something similar to a manual dedicated to python byte code? Also, is there a manual for how the interpreter expects the stack, etc. to be setup so that all interactions go as expected (garbage collections works, exceptions work, etc.)? Basically, I want a manual similar to what Intel or AMD might put out for their chips so that all executables behave nicely with one another. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Abusive Italian Spam
Honestly, I'm impressed by how little spam ever makes it onto the list. Considering the absolute flood of email the lists get, it's impressive work. Thank you for all the hard work you guys do for all the rest of us! Thanks, Cem Karan On Sep 29, 2016, at 11:30 AM, Tim Golden <m...@timgolden.me.uk> wrote: > You may have noticed one or two more of the abusive spam messages slip > through onto the list. We do have traps for these but, as with most such > things, they need tuning. (We've discarded many more than you've seen). > > As ever, kudos to Mark Sapiro of the Mailman team for tweaking our > custom filters and sorting out the archives in a timely fashion. > > TJG > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: The Joys Of Data-Driven Programming
On Aug 31, 2016, at 9:02 AM, Paul Moore <p.f.mo...@gmail.com> wrote: > On 31 August 2016 at 13:49, Cem Karan <cfkar...@gmail.com> wrote: >>> Has anyone else found this to be the case? Is there any "make replacement" >>> out there that focuses more on named sets of actions (maybe with >>> prerequisite/successor type interdependencies), and less on building file >>> dependency graphs? >> >> Maybe Ninja (https://ninja-build.org/)? I personally like it because of how >> simple it is, and the fact that it doesn't use leading tabs the way that >> make does. It is intended to be the assembler for higher-level build >> systems which are more like compilers. I personally use it as a make >> replacement because it does what I tell it to do, and nothing else. It may >> fit what you're after. > > It still seems focused on the file dependency graph (at least, from a > quick look). > > I'm thinking more of the makefile pattern > > myproj.whl: >pip wheel . > ve: build >virtualenv ve >ve/bin/python -m pip install ./*.whl > test: ve >push ve >bin/python -m py.test >popd > clean: >rm -rf ve > > Basically, a couple of "subcommands", one of which has 2 prerequisites > that are run if needed. Little more in practice than 2 shell scripts > with a bit of "if this is already done, skip" logic. > > Most makefiles I encounter or write are of this form, and make > essentially no use of dependency rules or anything more complex than > "does the target already exist" checks. Make would be fine for this > except for the annoying "must use tabs" rule, and the need to rely on > shell (= non-portable, generally unavailable on Windows) constructs > for any non-trivial logic. > > In the days when make was invented, not compiling a source file whose > object file was up to date was a worthwhile time saving. Now I'm more > likely to just do "cc -c *.c" and not worry about it. OK, I see what you're doing, and you're right, Ninja could be forced to do what you want, but it isn't the tool that you need. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: The Joys Of Data-Driven Programming
On Aug 31, 2016, at 8:21 AM, Paul Moore <p.f.mo...@gmail.com> wrote: > On Sunday, 21 August 2016 15:20:39 UTC+1, Marko Rauhamaa wrote: >>> Aren’t makefiles data-driven? >> >> Yes, "make" should be added to my sin list. >> >>> [Personally Ive always believed that jam is better than make and is >>> less used for entirely historical reasons; something like half the >>> world eoling with crlf and half with lf. But maybe make is really a >>> better design because more imperative?] >> >> Don't know jam, but can heartily recommend SCons. > > The data driven side of make is the target: sources part. But (particularly > as a Python programmer, where build dependencies are less of an issue) a huge > part of make usage is in my experience, simply name: actions pairs (which is > the less data driven aspect), maybe with an element of "always do X before Y". > > I've generally found "make successors" like SCons and waf to be less useful, > precisely because they focus on the dependency graph (the data driven side) > and less on the trigger-action aspect. > > Has anyone else found this to be the case? Is there any "make replacement" > out there that focuses more on named sets of actions (maybe with > prerequisite/successor type interdependencies), and less on building file > dependency graphs? Maybe Ninja (https://ninja-build.org/)? I personally like it because of how simple it is, and the fact that it doesn't use leading tabs the way that make does. It is intended to be the assembler for higher-level build systems which are more like compilers. I personally use it as a make replacement because it does what I tell it to do, and nothing else. It may fit what you're after. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Quote of the day
On May 17, 2016, at 4:30 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Radek Holý <rad...@holych.org>: > >> 2016-05-17 9:50 GMT+02:00 Steven D'Aprano < >> steve+comp.lang.pyt...@pearwood.info>: >> >>> Overhead in the office today: >>> >>> "I don't have time to learn an existing library - much faster to make >>> my own mistakes!" >> >> *THUMBS UP* At least they are aware of that "own mistakes" part... Not >> like my employer... > > Also: > > With a third party solution I don't need to fix the bugs. > > But with an in-house solution I at least *can* fix the bugs. > > The feeling of powerlessness can be crushing when you depend on a > third-party component that is broken with no fix in sight. +1000 on this one. Just downloaded and used a library that came with unit tests, which all passed. When I started using it, I kept getting odd errors. Digging into it, I discovered they had commented out the bodies of some of the unit tests... glad it was open source, at least I *could* dig into the code and figure out what was going on :/ Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Guido on python3 for beginners
On Feb 18, 2016, at 4:57 AM, Chris Angelico <ros...@gmail.com> wrote: > On Thu, Feb 18, 2016 at 7:40 PM, Terry Reedy <tjre...@udel.edu> wrote: >> To my mind, the numerous duplications and overlaps in 2.7 that are gone in >> 3.x make 2.7 the worse version ever for beginners. > > Hmm. I was teaching on 2.7 up until last year, and for the most part, > we taught a "compatible with Py3" subset of the language, without any > significant cost. If you'd shown code saying "except ValueError, e:" > to one of my Py2 students then, s/he would have been just as > unfamiliar as one of my Py3 students would be today. That said, > though, it's still that Py3 is no worse than Py2, and potentially > better. > > The removal of L suffixes (and, similarly, the removal of u"..." > prefixes on text strings) is a bigger deal to newbies than it is to > experienced programmers, so that one definitely counts. "This is > great, but how can I remove that u from the strings?" was a common > question (eg when they're printing out a list of strings obtained from > a database, or decoded from JSON). > > The removal of old-style classes is a definite improvement in Py3, as > is the no-arg form of super(), which I'd consider a related change. So > there's a bunch of tiny little "quality of life" improvements here. > > ChrisA I agree with Chris on all his points. My personal feeling is that Py3 is the way to go for teaching in the future; its just that little bit more consistent across the board. And the things that are confusing are not things that beginners will need to know about. About the only thing I've read where Py2 has a slight advantage is for scripts where you're suddenly surprised by Py2 starting up when you've been using a Py3 interactive interpreter. For me, I'd probably give my students a block of code that they are asked to copy at the start of their files to test for Py2 or Py3, and to raise an exception on Py2. After that, I just wouldn't worry about it. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Heap Implementation
On Feb 10, 2016, at 1:23 PM, "Sven R. Kunze" <srku...@mail.de> wrote: > Hi Cem, > > On 08.02.2016 02:37, Cem Karan wrote: >> My apologies for not writing sooner, but work has been quite busy lately >> (and likely will be for some time to come). > > no problem here. :) > >> I read your approach, and it looks pretty good, but there may be one issue >> with it; how do you handle the same item being pushed into the heap more >> than once? In my simple simulator, I'll push the same object into my event >> queue multiple times in a row. The priority is the moment in the future >> when the object will be called. As a result, items don't have unique >> priorities. I know that there are methods of handling this from the >> client-side (tuples with unique counters come to mind), but if your library >> can handle it directly, then that could be useful to others as well. > > I've pondered about that in the early design phase. I considered it a > slowdown for my use-case without benefit. > > Why? Because I always push a fresh object ALTHOUGH it might be equal > comparing attributes (priority, deadline, etc.). > > > That's the reason why I need to ask again: why pushing the same item on a > heap? > > > Are we talking about function objects? If so, then your concern is valid. > Would you accept a solution that would involve wrapping the function in > another object carrying the priority? Would you prefer a wrapper that's > defined by xheap itself so you can just use it? Yes. I use priority queues for event loops. The items I push in are callables (sometimes callbacks, sometimes objects with __call__()) and the priority is the simulation date that they should be called. I push the same item multiple times in a row because it will modify itself by the call (e.g., the location of an actor is calculated by its velocity and the date). There are certain calls that I tend to push in all at once because the math for calculating when the event should occur is somewhat expensive to calculate, and always returns multiple dates at once. That is also why deleting or changing events can be useful; I know that at least some of those events will be canceled in the future, which makes deleting useful. Note that it is also possible to cancel an event by marking it as cancelled, and then simply not executing it when you pop it off the queue, but I've found that there are a few cases in my simulations where the number of dead events that are in the queue exceeds the number of live events, which does have an impact on memory and operational speed (maintaining the heap invariant). There isn't much difference though, but I need FAST code to deal with size of my simulations (thousands to tens of thousands of actors, over hundreds of millions of simulations, which is why I finally had to give up on python and switch to pure C). Having a wrapper defined by xheap would be ideal; I suspect that I won't be the only one that needs to deal with this, so having it centrally located would be best. It may also make it possible for you to optimize xheap's behavior in some way. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Heap Implementation
On Feb 9, 2016, at 4:40 AM, Mark Lawrence <breamore...@yahoo.co.uk> wrote: > On 09/02/2016 04:25, Cem Karan wrote: >> >> No problem, that's what I thought happened. And you're right, I'm looking >> for a priority queue (not the only reason to use a heap, but a pretty >> important reason!) >> > > I'm assuming I've missed the explanation, so what is the problem again with > https://docs.python.org/3/library/queue.html#queue.PriorityQueue or even > https://docs.python.org/3/library/asyncio-queue.html#asyncio.PriorityQueue ? Efficiently changing the the priority of items already in the queue/deleting items in the queue (not the first item). This comes up a LOT in event-based simulators where it's easier to tentatively add an event knowing that you might need to delete it or change it later. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Heap Implementation
On Feb 9, 2016, at 9:27 AM, Mark Lawrence <breamore...@yahoo.co.uk> wrote: > On 09/02/2016 11:44, Cem Karan wrote: >> >> On Feb 9, 2016, at 4:40 AM, Mark Lawrence <breamore...@yahoo.co.uk> wrote: >> >>> On 09/02/2016 04:25, Cem Karan wrote: >>>> >>>> No problem, that's what I thought happened. And you're right, I'm looking >>>> for a priority queue (not the only reason to use a heap, but a pretty >>>> important reason!) >>>> >>> >>> I'm assuming I've missed the explanation, so what is the problem again with >>> https://docs.python.org/3/library/queue.html#queue.PriorityQueue or even >>> https://docs.python.org/3/library/asyncio-queue.html#asyncio.PriorityQueue ? >> >> Efficiently changing the the priority of items already in the queue/deleting >> items in the queue (not the first item). This comes up a LOT in event-based >> simulators where it's easier to tentatively add an event knowing that you >> might need to delete it or change it later. >> >> Thanks, >> Cem Karan >> > > Thanks for that, but from the sounds of it sooner you than me :) Eh, its not too bad once you figure out how to do it. It's easier in C though; you can use pointer tricks that let you find the element in constant time, and then removal will involve figuring out how to fix up your heap after you've removed the element. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Heap Implementation
On Feb 9, 2016, at 8:27 PM, srinivas devaki <mr.eightnotei...@gmail.com> wrote: > > > On Feb 10, 2016 6:11 AM, "Cem Karan" <cfkar...@gmail.com> wrote: > > > > Eh, its not too bad once you figure out how to do it. It's easier in C > > though; you can use pointer tricks that let you find the element in > > constant time, and then removal will involve figuring out how to fix up > > your heap after you've removed the element. > > > > If you can do it with C pointers then you can do it with python's > references/mutable objects. :) > in case of immutable objects, use a light mutable wrapper or better use list > for performance. I should have been clearer; it's easier to UNDERSTAND in C, but you can implement it in either language. C will still be faster, but only because its compiled. It will also take a lot longer to code and ensure that it's correct, but that is the tradeoff. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Heap Implementation
On Feb 7, 2016, at 10:15 PM, srinivas devaki <mr.eightnotei...@gmail.com> wrote: > On Feb 8, 2016 7:07 AM, "Cem Karan" <cfkar...@gmail.com> wrote: > > I know that there are methods of handling this from the client-side (tuples > > with unique counters come to mind), but if your library can handle it > > directly, then that could be useful to others as well. > > yeah it is a good idea to do at client side. > but if it should be introduced as feature into the library, instead of > tuples, we should just piggyback a single counter it to the self._indexes > dict, or better make another self._counts dict which will be light and fast. > and if you think again with this method you can easily subclass with just > using self._counts dict in your subclass. but still I think it is good to > introduce it as a feature in the library. > > Regards > Srinivas Devaki Just to be 100% sure, you do mean to use the counters as UUIDs, right? I don't mean that the elements in the heap get counted, I meant that the counter is a trick to separate different instances of (item, priority) pairs when you're pushing in the same item multiple times, but with different priorities. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Heap Implementation
On Feb 8, 2016, at 10:12 PM, srinivas devaki <mr.eightnotei...@gmail.com> wrote: > > On Feb 8, 2016 5:17 PM, "Cem Karan" <cfkar...@gmail.com> wrote: > > > > On Feb 7, 2016, at 10:15 PM, srinivas devaki <mr.eightnotei...@gmail.com> > > wrote: > > > On Feb 8, 2016 7:07 AM, "Cem Karan" <cfkar...@gmail.com> wrote: > > > > I know that there are methods of handling this from the client-side > > > > (tuples with unique counters come to mind), but if your library can > > > > handle it directly, then that could be useful to others as well. > > > > > > yeah it is a good idea to do at client side. > > > but if it should be introduced as feature into the library, instead of > > > tuples, we should just piggyback a single counter it to the self._indexes > > > dict, or better make another self._counts dict which will be light and > > > fast. > > > and if you think again with this method you can easily subclass with just > > > using self._counts dict in your subclass. but still I think it is good > > > to introduce it as a feature in the library. > > > > > > Regards > > > Srinivas Devaki > > > > I meant that the counter is a trick to separate different instances of > > (item, priority) pairs when you're pushing in the same item multiple times, > > but with different priorities. > > oh okay, I'm way too off. > > what you are asking for is a Priority Queue like feature. > > but the emphasis is on providing extra features to heap data structure. > > and xheap doesn't support having duplicate items. > > and if you want to insert same items with distinct priorities, you can > provide the priority with key argument to the xheap. what xheap doesn't > support is having same keys/priorities. > So I got confused and proposed a method to have same keys. > > Regards > Srinivas Devaki No problem, that's what I thought happened. And you're right, I'm looking for a priority queue (not the only reason to use a heap, but a pretty important reason!) Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: A sets algorithm
On Feb 7, 2016, at 4:46 PM, Paulo da Silva <p_s_d_a_s_i_l_v_a...@netcabo.pt> wrote: > Hello! > > This may not be a strict python question, but ... > > Suppose I have already a class MyFile that has an efficient method (or > operator) to compare two MyFile s for equality. > > What is the most efficient way to obtain all sets of equal files (of > course each set must have more than one file - all single files are > discarded)? > > Thanks for any suggestions. If you're after strict equality (every byte in a pair of files is identical), then here are a few heuristics that may help you: 1) Test for file length, without reading in the whole file. You can use os.path.getsize() to do this (I hope that this is a constant-time operation, but I haven't tested it). As Oscar Benjamin suggested, you can create a defaultdict(list) which will make it possible to gather lists of files of equal size. This should help you gather your potentially identical files quickly. 2) Once you have your dictionary from above, you can iterate its values, each of which will be a list. If a list has only one file in it, you know its unique, and you don't have to do any more work on it. If there are two files in the list, then you have several different options: a) Use Chris Angelico's suggestion and hash each of the files (use the standard library's 'hashlib' for this). Identical files will always have identical hashes, but there may be false positives, so you'll need to verify that files that have identical hashes are indeed identical. b) If your files tend to have sections that are very different (e.g., the first 32 bytes tend to be different), then you pretend that section of the file is its hash. You can then do the same trick as above. (the advantage of this is that you will read in a lot less data than if you have to hash the entire file). c) You may be able to do something clever by reading portions of each file. That is, use zip() combined with read(1024) to read each of the files in sections, while keeping hashes of the files. Or, maybe you'll be able to read portions of them and sort the list as you're reading. In either case, if any files are NOT identical, then you'll be able to stop work as soon as you figure this out, rather than having to read the entire file at once. The main purpose of these suggestions is to reduce the amount of reading you're doing. Storage tends to be slow, and any tricks that reduce the number of bytes you need to read in will be helpful to you. Good luck! Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Heap Implementation
On Jan 30, 2016, at 5:47 PM, Sven R. Kunze <srku...@mail.de> wrote: > Hi again, > > as the topic of the old thread actually was fully discussed, I dare to open a > new one. > > I finally managed to finish my heap implementation. You can find it at > https://pypi.python.org/pypi/xheap + https://github.com/srkunze/xheap. > > I described my motivations and design decisions at > http://srkunze.blogspot.com/2016/01/fast-object-oriented-heap-implementation.html > > > @Cem > You've been worried about a C implementation. I can assure you that I did not > intend to rewrite the incredibly fast and well-tested heapq implementation. I > just re-used it. > > I would really be grateful for your feedback as you have first-hand > experience with heaps. <> My apologies for not writing sooner, but work has been quite busy lately (and likely will be for some time to come). I read your approach, and it looks pretty good, but there may be one issue with it; how do you handle the same item being pushed into the heap more than once? In my simple simulator, I'll push the same object into my event queue multiple times in a row. The priority is the moment in the future when the object will be called. As a result, items don't have unique priorities. I know that there are methods of handling this from the client-side (tuples with unique counters come to mind), but if your library can handle it directly, then that could be useful to others as well. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: me, my arm, my availability ...
On Jan 13, 2016, at 3:47 PM, Laura Creighton <l...@openend.se> wrote: > > I fell recently. Ought to be nothing, but a small chip of bone, either an > existing one or one I just made is nicely wedged in the joint taking away > a whole lot of the ability of my arm to rotate in the elbow joint. Or > hold my arm in a position that is usual for typing. Plus, now that the > sprain/swelling is more or less over, the pain, unfortunately is not. > > The real downside is that my typing speed is down from 135-140 wpm > to 5-10 wmp. At this rate, just getting my usual work done takes > overtime. > > Seems like surgery is needed to fix this. > > So I wanted you all to know, no, I haven't forgotten you and no haven't > stopped caring. I have just stopped being as __capable__ if you know > what I mean. > > Please take care of yourselves and each other. I will often be reading > even if typing is more than I can do right now. > > Laura > > ps -- (recent tutor discussion) I am with Alan and not with Mark. I > am happy as anything when people post their not-quite-working code for > homework assignments here to tutor. They aren't lazy bastards wanting > somebody to do their assignments for them, they want to learn why what > they are trying to do isn't working. Sounds perfect for tutor to me. Good luck healing! Hope you get better soon. Surgery has gotten a WHOLE lot better recently, they did wonders for my knee a few years back. With luck, it'll be more or less outpatient surgery. Good luck, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: How to remove item from heap efficiently?
On Jan 13, 2016, at 2:08 PM, Sven R. Kunze <srku...@mail.de> wrote: > On 13.01.2016 12:20, Cem Karan wrote: >> On Jan 12, 2016, at 11:18 AM, "Sven R. Kunze" <srku...@mail.de> wrote: >> >>> Thanks for replying here. I've come across these types of >>> wrappers/re-implementations of heapq as well when researching this issue. :) >>> >>> Unfortunately, they don't solve the underlying issue at hand which is: >>> "remove item from heap with unknown index" and be efficient at it (by not >>> using _heapq C implementation). >>> >>> >>> So, I thought I did another wrapper. ;) It at least uses _heapq (if >>> available otherwise heapq) and lets you remove items without violating the >>> invariant in O(log n). I am going to make that open-source on pypi and see >>> what people think of it. >> Is that so? I'll be honest, I never tested its asymptotic performance, I >> just assumed that he had a dict coupled with a heap somehow, but I never >> looked into the code. > > My concern about that specific package is a missing C-implementation. I feel > that somewhat defeats the whole purpose of using a heap: performance. I agree with you that performance is less than that of using a C extension module, but there are other costs associated with developing a C module: 1) As the developer of the module, you must be very careful to ensure your code is portable. 2) Distribution becomes somewhat more difficult; you may need to distribute both source and compiled binaries for various platforms. This is somewhat more annoying than pure python scripts. 3) Debugging can become significantly more difficult. My current codebase is python+cython+c, and when something crashes, it is usually easier to use a bunch of printf() statements to figure out what is going on than to use a debugger (others may have different experiences, this is just mine). 4) Not everyone is familiar with C, so writing extensions may be more difficult. 5) Will the extension module work on non-cpython platforms (iron python, jython, etc.)? Finally, without profiling the complete package it may be difficult to tell what impact your C module will have on overall performance. In my code, HeapDict had less than a 2% performance impact on what I was doing; even if I had replaced it with a pure C implementation, my code would not have run much faster. So, while I agree in principle to what you're saying, in practice there may be other factors to consider before rejecting the pure python approach. > Asymptotic performance is still O(log n). So, if the intent is to pop events more often than to peek at them, then in practice, HeapDict is about the same as some clever heap+dict method (which it might be, as I said, I haven't looked at the code). >> That said, IMHO using a dict interface is the way to go for priority queues; >> it really simplified my code using it! This is my not-so-subtle way of >> asking you to adopt the MutableMapping interface for your wrapper ;) > > Could you elaborate on this? What simplified you code so much? > > I have been using heaps for priority queues as well but haven't missed the > dict interface so far. Maybe, my use-case is different. I'm writing an event-based simulator, and as it turns out, it is much easier to tentatively add events than it is to figure out precisely which events will occur in the future. That means that on a regular basis I need to delete events as I determine that they are garbage. HeapDict did a good job of that for me (for completely unrelated reasons I decided to switch to a pure-C codebase, with python hooks to twiddle the simulator at a few, very rare, points in time; hence the python+cython+c comment above). Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: How to remove item from heap efficiently?
On Jan 12, 2016, at 11:18 AM, "Sven R. Kunze" <srku...@mail.de> wrote: > On 12.01.2016 03:48, Cem Karan wrote: >> >> Jumping in late, but... >> >> If you want something that 'just works', you can use HeapDict: >> >> http://stutzbachenterprises.com/ >> >> I've used it in the past, and it works quite well. I haven't tested its >> asymptotic performance though, so you might want to check into that. > > Thanks for replying here. I've come across these types of > wrappers/re-implementations of heapq as well when researching this issue. :) > > Unfortunately, they don't solve the underlying issue at hand which is: > "remove item from heap with unknown index" and be efficient at it (by not > using _heapq C implementation). > > > So, I thought I did another wrapper. ;) It at least uses _heapq (if available > otherwise heapq) and lets you remove items without violating the invariant in > O(log n). I am going to make that open-source on pypi and see what people > think of it. Is that so? I'll be honest, I never tested its asymptotic performance, I just assumed that he had a dict coupled with a heap somehow, but I never looked into the code. That said, IMHO using a dict interface is the way to go for priority queues; it really simplified my code using it! This is my not-so-subtle way of asking you to adopt the MutableMapping interface for your wrapper ;) Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: How to remove item from heap efficiently?
On Jan 11, 2016, at 9:53 AM, srinivas devaki <mr.eightnotei...@gmail.com> wrote: > On Jan 11, 2016 12:18 AM, "Sven R. Kunze" <srku...@mail.de> wrote: >> Indeed. I already do the sweep method as you suggested. ;) >> >> Additionally, you provided me with a reasonable condition when to do the > sweep in order to achieve O(log n). Thanks much for that. I currently used > a time-bases approached (sweep each 20 iterations). >> >> PS: Could you add a note on how you got to the condition ( > 2*self.useless_b > len(self.heap_b))? >> > > oh that's actually simple, > that condition checks if more than half of heap is useless items. > the sweep complexity is O(len(heap)), so to keep the extra amortized > complexity as O(1), we have to split that work(virtually) with O(len(heap)) > operations, so when our condition becomes true we have done len(heap) > operations, so doing a sweep at that time means we splitted that > work(O(len(heap))) with every operation. Jumping in late, but... If you want something that 'just works', you can use HeapDict: http://stutzbachenterprises.com/ I've used it in the past, and it works quite well. I haven't tested its asymptotic performance though, so you might want to check into that. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: How can I count word frequency in a web site?
You might want to look into Beautiful Soup (https://pypi.python.org/pypi/beautifulsoup4), which is an HTML screen-scraping tool. I've never used it, but I've heard good things about it. Good luck, Cem Karan On Nov 29, 2015, at 7:49 PM, ryguy7272 <ryanshu...@gmail.com> wrote: > I'm trying to figure out how to count words in a web site. Here is a sample > of the link I want to scrape data from and count specific words. > http://finance.yahoo.com/q/h?s=STRP+Headlines > > I only want to count certain words, like 'fraud', 'lawsuit', etc. I want to > have a way to control for specific words. I have a couple Python scripts > that do this for a text file, but not for a web site. I can post that, if > that's helpful. > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: sys path modification
On Jul 27, 2015, at 1:24 PM, neubyr neu...@gmail.com wrote: I am trying to understand sys.path working and best practices for managing it within a program or script. Is it fine to modify sys.path using sys.path.insert(0, EXT_MODULES_DIR)? One stackoverflow answer - http://stackoverflow.com/a/10097543 - suggests that it may break external 3'rd party code as by convention first item of sys.path list, path[0], is the directory containing the script that was used to invoke the Python interpreter. So what are best practices to prepend sys.path in the program itself? Any further elaboration would be helpful. Why are you trying to modify sys.path? I'm not judging, there are many good reasons to do so, but there may be safer ways of getting the effect you want that don't rely on modifying sys.path. One simple method is to modify PYTHONPATH (https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH) instead. In order of preference: 1) Append to sys.path. This will cause you the fewest headaches. 2) If you absolutely have to insert into the list, insert after the first element. As you noted from SO, and noted in the docs (https://docs.python.org/3/library/sys.html#sys.path), the first element of sys.path is the path to the directory of the script itself. If you modify this, you **will** break third-party code at some point. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Using Python instead of Bash
I help someone that has problems reading. For this I take photo's of text, use convert from ImageMagick to make a good contrast (original paper is grey) and use lpr to print it a little bigger. Normally I would implement this in Bash, but I thought it a good idea to implement it in Python. This is my first try: import glob import subprocess treshold = 66 count = 0 for input in sorted(glob.glob('*.JPG')): count += 1 output = '{0:02d}.png'.format(count) print('Going to convert {0} to {1}'.format(input, output)) p = subprocess.Popen(['convert', '-threshold', '{0}%'.format(treshold), input, output]) p.wait() print('Going to print {0}'.format(output)) p = subprocess.Popen(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', output]) p.wait() There have to be some improvements: display before printing, possibility to change threshold, … But is this a good start, or should I do it differently? As a first try, I think its pretty good, but to really answer your question, I think we could use a little more information. - Are you using python 2, or python 3? There are slightly easier ways to do this using concurrent.futures objects, but they are only available under python 3. (See https://docs.python.org/3/library/concurrent.futures.html) - In either case, subprocess.call(), subprocess.check_call(), or subprocess.check_output() may be easier to use. That said, your code is perfectly fine! The only real difference is that subprocess.call() will automatically wait for the call to complete, so you don't need to use p.wait() from above. (See https://docs.python.org/2.7/library/subprocess.html, and https://docs.python.org/3/library/subprocess.html) The following codes does the conversion in parallel, and submits the jobs to the printer serially. That should ensure that the printed output is also in sorted order, but you might want to double check before relying on it too much. The major problem with it is that you can't display the output before printing; since everything is running in parallel, you'll have race conditions if you try. **I DID NOT TEST THIS CODE, I JUST TYPED IT OUT IN MY MAIL CLIENT!** Please test it carefully before relying on it! import subprocess import concurrent.futures import glob import os.path _THRESHOLD = 66 def _collect_filenames(): files = glob.glob('*.JPG') # I build a set of the real paths so that if you have # symbolic links that all point to the same file, they # they are automatically collapsed to a single file real_files = {os.path.realpath(x) for x in files} base_files = [os.path.splitext(x)[0] for x in real_files] return base_files def _convert(base_file_name): This code is slightly different from your code. Instead of using numbers as names, I use the base name of file and append '.png' to it. You may need to adjust this to ensure you don't overwrite anything. input = base_file_name + .JPG output = base_file_name + .png subprocess.call(['convert', '-threshold', '{0}%'.format(_THRESHOLD), input, output]) def _print_files_in_order(base_files): base_files.sort() for f in base_files: output = f + .png subprocess.call(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', output]) def driver(): base_files = _collect_filenames() # If you use an executor as a context manager, then the # executor will wait until all of the submitted jobs finish # before it returns. The submitted jobs will execute in # parallel. with concurrent.futures.ProcessPoolExecutor() as executor: for f in base_files: executor.submit(_convert_and_print, f) _print_files_in_order(base_files) Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Hello Group and how to practice?
On May 31, 2015, at 9:35 AM, Anders Johansen sko...@gmail.com wrote: Hi my name is Anders I am from Denmark, and I am new to programming and python. Currently, I am doing the codecademy.com python course, but sometime I feel that the course advances to fast and I lack repeating (practicing) some of the concepts, however I don't feel confident enough to start programming on my own. Do you guys have some advice to how I can practicing programming and get the concept in under the skin? Choose something that you think is small and easy to do, and try to do it. When you have trouble, read the docs you find online, and ask us questions; the python community is pretty friendly, and if you show us what you've already tried to do, someone is likely to try to help you out. The main thing is to not get discouraged. One of the hardest things to do is figuring out what you CAN do with a computer; some things that look like they should be easy, are actually major research questions. Just keep trying, and it will get easier over time. Good luck! Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Hello Group and how to practice?
On May 31, 2015, at 10:51 AM, Anders Johansen sko...@gmail.com wrote: Den søndag den 31. maj 2015 kl. 16.22.10 UTC+2 skrev Cem Karan: On May 31, 2015, at 9:35 AM, Anders Johansen sko...@gmail.com wrote: Hi my name is Anders I am from Denmark, and I am new to programming and python. Currently, I am doing the codecademy.com python course, but sometime I feel that the course advances to fast and I lack repeating (practicing) some of the concepts, however I don't feel confident enough to start programming on my own. Do you guys have some advice to how I can practicing programming and get the concept in under the skin? Choose something that you think is small and easy to do, and try to do it. When you have trouble, read the docs you find online, and ask us questions; the python community is pretty friendly, and if you show us what you've already tried to do, someone is likely to try to help you out. The main thing is to not get discouraged. One of the hardest things to do is figuring out what you CAN do with a computer; some things that look like they should be easy, are actually major research questions. Just keep trying, and it will get easier over time. Good luck! Cem Karan Thank you Cem Karan for your reply. I will try and follow your advice. I am yet to install python on my computer, do you know of any easy to follow instructions on how to do so? Where do I start? Python 3 installers are here: https://www.python.org/downloads/release/python-343/ Choose the one appropriate for your system, and follow the instructions. If you have trouble, write to the list. Good luck, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Fixing Python install on the Mac after running 'CleanMyMac'
On May 28, 2015, at 11:47 PM, Laura Creighton l...@openend.se wrote: webmas...@python.org just got some mail from some poor embarrased soul who ran this program and broke their Python install. They are running Mac OSX 10.7.5 They are getting: Utility has encountered a fatal error, and will now terminate. A Python runtime could not be located. You may need to install a framework build of Python or edit the PyRuntimeLocations array in this applications info.plist file. Then there are two oblong circles. One says Open Console. The other says Terminate. So https://docs.python.org/2/using/mac.html says: The Apple-provided build of Python is installed in /System/Library/Frameworks/Python.framework and /usr/bin/python, respectively. You should never modify or delete these, as they are Apple-controlled and are used by Apple- or third-party software. So, I assume this poor soul has done precisely that. What do I tell her to do now? Does she have a recent Time Machine backup that she can restore from? Otherwise the solutions are all fairly painful: 1) Install Python 2.7 from scratch (easy). Then figure out where to put symlinks that point back to the install (mildly annoying/hard). Note that Python 3 won't work; none of the built-in scripts expect it. 2) OS X recovery - http://www.macworld.co.uk/how-to/mac/how-reinstall-mac-os-x-using-internet-recovery-3593641/ I've never had to do that, so I have no idea how easy/reliable it is. I **think** its supposed to save all the data on the drive, but again, I've not done this, so I can't make any guarantees. 3) Wipe it clean and reinstall from scratch. Honestly, I hope she has a time machine backup. I've had to do recoveries a couple of times, and it can really save you. Good luck, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Best approach to create humongous amount of files
On May 20, 2015, at 7:44 AM, Parul Mogra scoria@gmail.com wrote: Hello everyone, My objective is to create large amount of data files (say a million *.json files), using a pre-existing template file (*.json). Each file would have a unique name, possibly by incorporating time stamp information. The files have to be generated in a folder specified. What is the best strategy to achieve this task, so that the files will be generated in the shortest possible time? Say within an hour. If you absolutely don't care about the name, then something like the following will work: import uuid for counter in range(100): with open(uuid.uuid1().hex.upper() + .json, w) as f: f.write(templateString) where templateString is the template you want to write to each file. The only problem is that the files won't be in any particular order; they'll just be uniquely named. As a test, I ran the code above, but I killed the loop after about 10 minutes, at which point about 500,000 files were created. Note that my laptop is about 6 years old, so you might get better performance on your machine. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: To pickle or not to pickle
What are you using pickle for? If this is just for yourself, go for it. If you're planning on interchanging with different languages/platforms/etc., JSON or XML might be better. If you're after something that is smaller and faster, maybe MessagePack or Google Protocol Buffers. If you're after something that can hold a planet's worth of data, maybe HDF5. It really depends on your use-case. MessagePack - http://en.wikipedia.org/wiki/MessagePack Google Protocol Buffers - http://en.wikipedia.org/wiki/Protocol_Buffers HDF5 - http://en.wikipedia.org/wiki/Hierarchical_Data_Format Thanks, Cem Karan On May 8, 2015, at 5:58 AM, Cecil Westerhof ce...@decebal.nl wrote: I first used marshal in my filebasedMessages module. Then I read that you should not use it, because it changes per Python version and it was better to use pickle. So I did that and now I find: https://wiki.python.org/moin/Pickle Is it really that bad and should I change again? -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 23, 2015, at 11:05 AM, Steve Smaldone smald...@gmail.com wrote: On Thu, Apr 23, 2015 at 6:34 AM, Cem Karan cfkar...@gmail.com wrote: On Apr 23, 2015, at 1:59 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 23 April 2015 11:53, Cem Karan wrote: Precisely. In order to make my simulations more realistic, I use a lot of random numbers. I can fake things by keeping the seed to the generator, but if I want to do any sort of hardware in the loop simulations, then that approach won't work. That's exactly why we have *pseudo* random number generators. They are statistically indistinguishable from real randomness, but repeatable when needed. Which is why is why I mentioned keeping the seed above. The problem is that I eventually want to do hardware in the loop, which will involve IO between the simulation machine and the actual robots, and IO timing is imprecise and uncontrollable. That is where not recording something becomes lossy. That said, the mere act of trying to record everything is going to cause timing issues, so I guess I'm over thinking things yet again. Thanks for the help everyone, its helped me clarify what I need to do in my mind. Well, you could achieve this on Linux by using the rdiff library. Not exactly a purely Python solution, but it would give you file-based diffs. Basically, what you could do is write the first file. Then for each subsequent saves, write out the file (as a temp file) and issue shell commands (via the Python script) to calculate the diffs of the new file against the first (basis) file. Once you remove the temp files, you'd have a full first save and a set of diffs against that file. You could rehydrate any save you want by applying the diff to the basis. If you work on it a bit, you might even be able to avoid the temp file saves by using pipes in the shell command. Of course, I haven't tested this so there may be non-obvious issues with diffing between subsequent pickled saves, but it seems that it should work on the surface. That might work... although I'm running on OS X right now, once I get to the hardware in the loop part, it's all going to be some flavor of Linux. I'll look into it... thanks! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 23, 2015, at 1:59 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 23 April 2015 11:53, Cem Karan wrote: Precisely. In order to make my simulations more realistic, I use a lot of random numbers. I can fake things by keeping the seed to the generator, but if I want to do any sort of hardware in the loop simulations, then that approach won't work. That's exactly why we have *pseudo* random number generators. They are statistically indistinguishable from real randomness, but repeatable when needed. Which is why is why I mentioned keeping the seed above. The problem is that I eventually want to do hardware in the loop, which will involve IO between the simulation machine and the actual robots, and IO timing is imprecise and uncontrollable. That is where not recording something becomes lossy. That said, the mere act of trying to record everything is going to cause timing issues, so I guess I'm over thinking things yet again. Thanks for the help everyone, its helped me clarify what I need to do in my mind. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Cem Karan wrote: Hi all, I need some help. I'm working on a simple event-based simulator for my dissertation research. The simulator has state information that I want to analyze as a post-simulation step, so I currently save (pickle) the entire simulator every time an event occurs; this lets me analyze the simulation at any moment in time, and ask questions that I haven't thought of yet. The problem is that pickling this amount of data is both time-consuming and a space hog. This is true even when using bz2.open() to create a compressed file on the fly. This leaves me with two choices; first, pick the data I want to save, and second, find a way of generating diffs between object graphs. Since I don't yet know all the questions I want to ask, I don't want to throw away information prematurely, which is why I would prefer to avoid scenario 1. So that brings up possibility two; generating diffs between object graphs. I've searched around in the standard library and on pypi, but I haven't yet found a library that does what I want. Does anyone know of something that does? Basically, I want something with the following ability: Object_graph_2 - Object_graph_1 = diff_2_1 Object_graph_1 + diff_2_1 = Object_graph_2 The object graphs are already pickleable, and the diffs must be, or this won't work. I can use deepcopy to ensure the two object graphs are completely separate, so the diffing engine doesn't need to worry about that part. Anyone know of such a thing? A poor man's approach: Do not compress the pickled data, check it into version control. Getting the n-th state then becomes checking out the n-th revision of the file. I have no idea how much space you save that way, but it's simple enough to give it a try. Sounds like a good approach, I'll give it a shot in the morning. Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 22, 2015, at 9:56 PM, Dave Angel da...@davea.name wrote: On 04/22/2015 09:46 PM, Chris Angelico wrote: On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote: On 04/22/2015 09:30 PM, Cem Karan wrote: On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Why would that lose any information??? It loses information if event processing isn't perfectly deterministic. Quite right. But I hadn't seen anything in this thread to imply that. My apologies, that's my fault. I should have mentioned that in the first place. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 22, 2015, at 9:46 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote: On 04/22/2015 09:30 PM, Cem Karan wrote: On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Why would that lose any information??? It loses information if event processing isn't perfectly deterministic. Precisely. In order to make my simulations more realistic, I use a lot of random numbers. I can fake things by keeping the seed to the generator, but if I want to do any sort of hardware in the loop simulations, then that approach won't work. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Diff between object graphs?
Hi all, I need some help. I'm working on a simple event-based simulator for my dissertation research. The simulator has state information that I want to analyze as a post-simulation step, so I currently save (pickle) the entire simulator every time an event occurs; this lets me analyze the simulation at any moment in time, and ask questions that I haven't thought of yet. The problem is that pickling this amount of data is both time-consuming and a space hog. This is true even when using bz2.open() to create a compressed file on the fly. This leaves me with two choices; first, pick the data I want to save, and second, find a way of generating diffs between object graphs. Since I don't yet know all the questions I want to ask, I don't want to throw away information prematurely, which is why I would prefer to avoid scenario 1. So that brings up possibility two; generating diffs between object graphs. I've searched around in the standard library and on pypi, but I haven't yet found a library that does what I want. Does anyone know of something that does? Basically, I want something with the following ability: Object_graph_2 - Object_graph_1 = diff_2_1 Object_graph_1 + diff_2_1 = Object_graph_2 The object graphs are already pickleable, and the diffs must be, or this won't work. I can use deepcopy to ensure the two object graphs are completely separate, so the diffing engine doesn't need to worry about that part. Anyone know of such a thing? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Good PDF parser/form filler?
Hi all, I'm currently looking for a PDF parser/writer library so I can programmatically fill in some PDF forms. I've found PDF2 (https://pypi.python.org/pypi/PyPDF2/1.24), and report lab (https://pypi.python.org/pypi/reportlab), and I can see that there are a LOT more PDF frameworks out there on pypi, but I wanted to know what kinds of experiences others have had with them so I can choose a reasonably good one. Note that I'm not creating brand-new PDF files, but filling in ones I've already gotten. My requirements: - Must work with python 3.4 - Must work on OS X (only a real problem for extension classes, etc.) - Ideally pure python with few dependencies. - NOT shoveling data out to the internet! MUST be wholly contained on my machine! Thanks in advance for any help! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Installed Python 3 on Mac OS X Yosemite but its still Python 2.7
On Mar 7, 2015, at 6:39 PM, James Dekker james.dek...@gmail.com wrote: I am currently running OS X Yosemite (10.10.2) on my MacBook Pro... By default, Apple ships Python 2.7.6 on Yosemite. Just downloaded and ran this installer for Python 3: python-3.4.3-macosx10.6.pkg When I opened up my Terminal and typed in python, this is what came up: Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type help, copyright, credits or license for more information. Sorry, I am very new to Python... Question(s): (1) Does anyone know where the Python 3.4.3 interpreter was installed? It should be installed as either python3 or python3.4. To figure out which, type 'python' in the terminal, and hit tab twice. It should bring up a list of python interpreters you have installed. (2) Do I need to uninstall Python 2.7.3 (if so, how do I go about doing this) before setting a global environmental variable such as PYTHON_HOME to the location of the installed Python 3.4.3? You don't need to uninstall python 2.7, and you shouldn't try. I tried it as an experiment at one time, and my system had various mysterious failures after that. It may be that Yosemite fixes those failures, but I wouldn't bet on it. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
the comments above, I've decided to do the following for my API: - All callbacks will be strongly retained (no weakrefs). - Callbacks will be stored in a list, and the list will be exposed as a read-only property of the library. This will let users reorder callbacks as necessary, add them multiple times in a row, etc. I'm also hoping that by making it a list, it becomes obvious that the callback is strongly retained. - Finally, callbacks are not one-shots. This just happens to make sense for my code, but others may find other methods make more sense. Thanks again to everyone for providing so many comments on my question, and I apologize again for taking so long to wrap things up. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 26, 2015, at 7:04 PM, Fabio Zadrozny fabi...@gmail.com wrote: On Wed, Feb 25, 2015 at 9:46 AM, Cem Karan cfkar...@gmail.com wrote: On Feb 24, 2015, at 8:23 AM, Fabio Zadrozny fabi...@gmail.com wrote: Hi Cem, I didn't read the whole long thread, but I thought I'd point you to what I'm using in PyVmMonitor (http://www.pyvmmonitor.com/) -- which may already cover your use-case. Take a look at the callback.py at https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py And its related test (where you can see how to use it): https://github.com/fabioz/pyvmmonitor-core/blob/master/_pyvmmonitor_core_tests/test_callback.py (note that it falls back to a strong reference on simple functions -- i.e.: usually top-level methods or methods created inside a scope -- but otherwise uses weak references). That looks like a better version of what I was thinking about originally. However, various people on the list have convinced me to stick with strong references everywhere. I'm working out a possible API right now, once I have some code that I can use to illustrate what I'm thinking to everyone, I'll post it to the list. Thank you for showing me your code though, it is clever! Thanks, Cem Karan Hi Cem, Well, I decided to elaborate a bit on the use-case I have and how I use it (on a higher level): http://pydev.blogspot.com.br/2015/02/design-for-client-side-applications-in.html So, you can see if it may be worth for you or not (I agree that sometimes you should keep strong references, but for my use-cases, weak references usually work better -- with the only exception being closures, which is handled different anyways but with the gotcha of having to manually unregister it). As I mentioned in an earlier post, I've been quite busy at home, and expect to be for a few days to come, so I apologize both for being so late posting, and for not posting my own API plans. Your blog post has given me quite a bit to think about, thank you! Do you mind if I work up an API similar to yours? I'm planning on using a different license (not LGPL), which is why I ask. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 26, 2015, at 2:54 PM, Ian Kelly ian.g.ke...@gmail.com wrote: On Feb 26, 2015 4:00 AM, Cem Karan cfkar...@gmail.com wrote: On Feb 26, 2015, at 12:36 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Cem Karan wrote: I think I see what you're talking about now. Does WeakMethod (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve this problem? Yes, that looks like it would work. Cool! Sometimes I wonder whether anybody reads my posts. I suggested a solution involving WeakMethod four days ago that additionally extends the concept to non-method callbacks (requiring a small amount of extra effort from the client in those cases, but I think that is unavoidable. There is no way that the framework can determine the appropriate lifetime for a closure-based callback.) I apologize about taking so long to reply to everyone's posts, but I've been busy at home. Ian, it took me a while to do some research to understand WHY what you were suggesting was important; you're right about storing the object as well as the method/function separately, but I think that WeakMethod might solve that completely, correct? Are there any cases where WeakMethod wouldn't work? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 26, 2015, at 3:00 PM, Ethan Furman et...@stoneleaf.us wrote: On 02/26/2015 11:54 AM, Ian Kelly wrote: Sometimes I wonder whether anybody reads my posts. It's entirely possible the OP wasn't ready to understand your solution four days ago, but two days later the OP was. Thank you Ethan, that was precisely my problem. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 26, 2015, at 12:36 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Cem Karan wrote: I think I see what you're talking about now. Does WeakMethod (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve this problem? Yes, that looks like it would work. Cool! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 24, 2015, at 4:19 PM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: random...@fastmail.us wrote: On Tue, Feb 24, 2015, at 00:20, Gregory Ewing wrote: This is why I suggested registering a listener object plus a method name instead of a callback. It avoids that reference cycle, because there is no long-lived callback object keeping a reference to the listener. How does that help? Everywhere you would have had a reference to the callback object, you now have a reference to the listener object. The point is that the library can keep a weak reference to the listener object, whereas it can't reliably keep a weak reference to a bound method. I think I see what you're talking about now. Does WeakMethod (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve this problem? Note that I can force my users to use the latest stable version of python at all times, so WeakMethod IS available to me. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 24, 2015, at 8:23 AM, Fabio Zadrozny fabi...@gmail.com wrote: Hi Cem, I didn't read the whole long thread, but I thought I'd point you to what I'm using in PyVmMonitor (http://www.pyvmmonitor.com/) -- which may already cover your use-case. Take a look at the callback.py at https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py And its related test (where you can see how to use it): https://github.com/fabioz/pyvmmonitor-core/blob/master/_pyvmmonitor_core_tests/test_callback.py (note that it falls back to a strong reference on simple functions -- i.e.: usually top-level methods or methods created inside a scope -- but otherwise uses weak references). That looks like a better version of what I was thinking about originally. However, various people on the list have convinced me to stick with strong references everywhere. I'm working out a possible API right now, once I have some code that I can use to illustrate what I'm thinking to everyone, I'll post it to the list. Thank you for showing me your code though, it is clever! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
I'm combining two messages into one, On Feb 24, 2015, at 12:29 AM, random...@fastmail.us wrote: On Tue, Feb 24, 2015, at 00:20, Gregory Ewing wrote: Cem Karan wrote: I tend to structure my code as a tree or DAG of objects. The owner refers to the owned object, but the owned object has no reference to its owner. With callbacks, you get cycles, where the owned owns the owner. This is why I suggested registering a listener object plus a method name instead of a callback. It avoids that reference cycle, because there is no long-lived callback object keeping a reference to the listener. How does that help? Everywhere you would have had a reference to the callback object, you now have a reference to the listener object. You're just shuffling deck chairs around: if B shouldn't reference A because A owns B, then removing C from the B-C-A reference chain does nothing to fix this. On Feb 24, 2015, at 12:45 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Cem Karan wrote: On Feb 22, 2015, at 5:15 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Perhaps instead of registering a callback function, you should be registering the listener object together with a method name. I see what you're saying, but I don't think it gains us too much. If I store an object and an unbound method of the object, or if I store the bound method directly, I suspect it will yield approximately the same results. It would be weird and unpythonic to have to register both an object and an unbound method, and if you use a bound method you can't keep a weak reference to it. Greg, random832 said what I was thinking earlier, that you've only increased the diameter of your cycle without actually fixing it. Can you give a code example where your method breaks the cycle entirely? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 23, 2015, at 7:29 AM, Frank Millman fr...@chagford.com wrote: Cem Karan cfkar...@gmail.com wrote in message news:a3c11a70-5846-4915-bb26-b23793b65...@gmail.com... Good questions! That was why I was asking about 'gotchas' with WeakSets originally. Honestly, the only way to know for sure would be to write two APIs for doing similar things, and then see how people react to them. The problem is, how do you set up such a study so it is statistically valid? Just in case you missed Steven's comment on my 'gotcha', and my reply, it is worth repeating that what I reported as a gotcha was not what it seemed. If you set up the callback as a weakref, and the listening object goes out of scope, it will wait to be garbage collected. However, as far as I can tell, the weakref is removed at the same time as the object is gc'd, so there is no 'window' where the weakref exists but the object it is referencing does not exist. My problem was that I had performed a cleanup operation on the listening object before letting it go out of scope, and it was no longer in a valid state to deal with the callback, resulting in an error. If you do not have that situation, your original idea may well work. Thank you Frank, I did read Steve's comment to your reply earlier, but what you said in your original reply made sense to me. I don't have control over user code. That means that if someone wants to write code such that they perform some kind of cleanup and are no longer able to handle the callback, they are free to do so. While I can't prevent this from happening, I can make it as obvious as possible in my code that before you perform any cleanup, you also need to unregister from the library. That is my main goal in developing pythonic/obvious methods of registering callbacks. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 22, 2015, at 5:29 PM, Laura Creighton l...@openend.se wrote: In a message of Sun, 22 Feb 2015 17:09:01 -0500, Cem Karan writes: Documentation is a given; it MUST be there. That said, documenting something, but still making it surprising, is a bad idea. For example, several people have been strongly against using a WeakSet to hold callbacks because they expect a library to hold onto callbacks. If I chose not to do that, and used a WeakSet, then even if I documented it, it would still end up surprising people (and from the sound of it, more people would be surprised than not). Thanks, Cem Karan No matter what you do, alas, will surprise the hell out of people because callbacks do not behave as people expect. Among people who have used callbacks, what you are polling is 'what are people familiar with', and it seems for the people around here, now, WeakSets are not what they are familiar with. And that's fine. I know that regardless of what I do, some people are going to be surprised. I'm trying to develop APIs that reduce that surprise as far as possible. That means I can spend more time coding and less time answering questions... :) But that is not so surprising. How many people use WeakSets for _anything_? I've never used them, aside from 'ooh! cool shiny new language feature! Let's kick it around the park!' That people aren't familiar with WeakSets doesn't mean all that much. Actually, I use them when building caches of stuff, and I use weak references when I have trees of stuff so the child nodes know of, but don't hold onto, their parents. But I agree with you, there aren't a huge number of use-cases. The question I have is does this architecture make things harder, easier or about the same to debug? To write tests for? to do Test Driven Design with? Good questions! That was why I was asking about 'gotchas' with WeakSets originally. Honestly, the only way to know for sure would be to write two APIs for doing similar things, and then see how people react to them. The problem is, how do you set up such a study so it is statistically valid? Cem -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 12:08 PM, Marko Rauhamaa ma...@pacujo.net wrote: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: Other than that, I cannot see how calling a function which has *not* yet been garbage collected can fail, just because the only reference still existing is a weak reference. Maybe the logic of the receiving object isn't prepared for the callback anymore after an intervening event. The problem then, of course, is in the logic and not in the callbacks. This was PRECISELY the situation I was thinking about. My hope was to make the callback mechanism slightly less surprising by allowing the user to track them, releasing them when they aren't needed without having to figure out where the callbacks were registered. However, it appears I'm making things more surprising rather than less. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 12:27 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Cem Karan wrote: On Feb 21, 2015, at 8:15 AM, Chris Angelico ros...@gmail.com wrote: On Sun, Feb 22, 2015 at 12:13 AM, Cem Karan cfkar...@gmail.com wrote: OK, so it would violate the principle of least surprise for you. Interesting. Is this a general pattern in python? That is, callbacks are owned by what they are registered with? In the end, I want to make a library that offers as few surprises to the user as possible, and no matter how I think about callbacks, they are surprising to me. If callbacks are strongly-held, then calling 'del foo' on a callable object may not make it go away, which can lead to weird and nasty situations. How? The whole point of callbacks is that you hand over responsibility to another piece of code, and then forget about your callback. The library will call it, when and if necessary, and when the library no longer needs your callback, it is free to throw it away. (If I wish the callback to survive beyond the lifetime of your library's use of it, I have to keep a reference to the function.) Marko mentioned it earlier; if you think you've gotten rid of all references to some chunk of code, and it is still alive afterwards, that can be surprising. Weakly-held callbacks mean that I (as the programmer), know that objects will go away after the next garbage collection (see Frank's earlier message), so I don't get 'dead' callbacks coming back from the grave to haunt me. I'm afraid this makes no sense to me. Can you explain, or better still demonstrate, a scenario where dead callbacks rise from the grave, so to speak? #! /usr/bin/env python class Callback_object(object): def __init__(self, msg): self._msg = msg def callback(self, stuff): print(From {0!s}: {1!s}.format(self._msg, stuff)) class Fake_library(object): def __init__(self): self._callbacks = list() def register_callback(self, callback): self._callbacks.append(callback) def execute_callbacks(self): for thing in self._callbacks: thing('Surprise!') if __name__ == __main__: foo = Callback_object(Evil Zombie) lib = Fake_library() lib.register_callback(foo.callback) # Way later, after the user forgot all about the callback above foo = Callback_object(Your Significant Other) lib.register_callback(foo.callback) # And finally getting around to running all those callbacks. lib.execute_callbacks() Output: From Evil Zombie: Surprise! From Your Significant Other: Surprise! In this case, the user made an error (just as Marko said in his earlier message), and forgot about the callback he registered with the library. The callback isn't really rising from the dead; as you say, either its been garbage collected, or it hasn't been. However, you may not be ready for a callback to be called at that moment in time, which means you're surprised by unexpected behavior. So, what's the consensus on the list, strongly-held callbacks, or weakly-held ones? I don't know about Python specifically, but it's certainly a general pattern in other languages. They most definitely are owned, and it's the only model that makes sense when you use closures (which won't have any other references anywhere). I agree about closures; its the only way they could work. *scratches head* There's nothing special about closures. You can assign them to a name like any other object. def make_closure(): x = 23 def closure(): return x + 1 return closure func = make_closure() Now you can register func as a callback, and de-register it when your done: register(func) unregister(func) Of course, if you thrown away your reference to func, you have no (easy) way of de-registering it. That's no different to any other object which is registered by identity. (Registering functions by name is a bad idea, since multiple functions can have the same name.) As an alternative, your callback registration function might return a ticket for the function: ticket = register(func) del func unregister(ticket) but that strikes me as over-kill. And of course, the simplest ticket is to return the function itself :-) Agreed on all points; closures are just ordinary objects. The only difference (in my opinion) is that they are 'fire and forget'; if you are registering or tracking them then you've kind of defeated the purpose. THAT is what I meant about how you handle closures. When I was originally thinking about the library, I was trying to include all types of callbacks, including closures and callable objects. The callable objects may pass themselves, or one of their methods to the library, or may do something really weird. I don't think they can do anything too weird. They have to pass a callable object. Your library just calls that object. You shouldn't
Re: Design thought for callbacks
On Feb 22, 2015, at 7:12 AM, Marko Rauhamaa ma...@pacujo.net wrote: Cem Karan cfkar...@gmail.com: On Feb 21, 2015, at 11:03 AM, Marko Rauhamaa ma...@pacujo.net wrote: I use callbacks all the time but haven't had any problems with strong references. I am careful to move my objects to a zombie state after they're done so they can absorb any potential loose callbacks that are lingering in the system. So, if I were designing a library for you, you would be willing to have a 'zombie' attribute on your callback, correct? This would allow the library to query its callbacks to ensure that only 'live' callbacks are called. How would you handle closures? Sorry, don't understand the question. You were saying that you move your objects into a zombie state. I assumed that you meant you marked them in some manner (e.g., setting 'is_zombie' to True), so that anything that has a strong reference to the object knows the object is not supposed to be used anymore. That way, regardless of where or how many times you've registered your object for callbacks, the library can do something like the following (banged out in my mail application, may have typos): _CALLBACKS = [] def execute_callbacks(): global _CALLBACKS _CALLBACKS = [x for x in _CALLBACKS if not x.is_zombie] for x in _CALLBACKS: x() That will lazily unregister callbacks that are in the zombie state, which will eventually lead to their collection by the garbage collector. It won't work for anything that you don't have a reference for (lambdas, etc.), but it should work in a lot of cases. Is this what you meant? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 22, 2015, at 7:24 AM, Chris Angelico ros...@gmail.com wrote: On Sun, Feb 22, 2015 at 11:07 PM, Cem Karan cfkar...@gmail.com wrote: Correct. The GUI engine ultimately owns everything. Of course, this is a very simple case (imagine a little notification popup; you don't care about it, you don't need to know when it's been closed, the only event on it is hit Close to destroy the window), and most usage would have other complications, but it's not uncommon for me to build a GUI program that leaves everything owned by the GUI engine. Everything is done through callbacks. Destroy a window, clean up its callbacks. The main window will have an on-deletion callback that terminates the program, perhaps. It's pretty straight-forward. How do you handle returning information? E.g., the user types in a number and expects that to update the internal state of your code somewhere. Not sure what you mean by returning. If the user types in a number in a GUI widget, that would trigger some kind of on-change event, and either the new text would be a parameter to the callback function, or the callback could query the widget. In the latter case, I'd probably have the callback as a closure, and thus able to reference the object. We're thinking of the same thing. I try to structure what little GUI code I write using the MVP pattern (http://en.wikipedia.org/wiki/Model-view-presenter), so I have these hub and spoke patterns. But you're right, if you have a partially evaluated callback that has the presenter as one of the parameters, that would do it for a GUI. I was thinking more of a DAG of objects, but now that I think about it, callbacks wouldn't make sense in that case. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 22, 2015, at 7:46 AM, Marko Rauhamaa ma...@pacujo.net wrote: Cem Karan cfkar...@gmail.com: On Feb 21, 2015, at 12:08 PM, Marko Rauhamaa ma...@pacujo.net wrote: Maybe the logic of the receiving object isn't prepared for the callback anymore after an intervening event. The problem then, of course, is in the logic and not in the callbacks. This was PRECISELY the situation I was thinking about. My hope was to make the callback mechanism slightly less surprising by allowing the user to track them, releasing them when they aren't needed without having to figure out where the callbacks were registered. However, it appears I'm making things more surprising rather than less. When dealing with callbacks, my advice is to create your objects as explicit finite state machines. Don't try to encode the object state implicitly or indirectly. Rather, give each and every state a symbolic name and log the state transitions for troubleshooting. Your callbacks should then consider what to do in each state. There are different ways to express this in Python, but it always boils down to a state/transition matrix. Callbacks sometimes cannot be canceled after they have been committed to and have been shipped to the event pipeline. Then, the receiving object must brace itself for the impending spurious callback. Nononono, I'm NOT encoding anything implicitly! As Frank mentioned earlier, this is more of a pub/sub problem. E.g., 'USB dongle has gotten plugged in', or 'key has been pressed'. The user code needs to decide what to do next, the library code provides a nice, clean interface to some potentially weird hardware. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 22, 2015, at 5:15 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Frank Millman wrote: In order to inform users that certain bits of state have changed, I require them to register a callback with my code. This sounds to me like a pub/sub scenario. When a 'listener' object comes into existence it is passed a reference to a 'controller' object that holds state. It wants to be informed when the state changes, so it registers a callback function with the controller. Perhaps instead of registering a callback function, you should be registering the listener object together with a method name. You can then keep a weak reference to the listener object, since if it is no longer referenced elsewhere, it presumably no longer needs to be notified of anything. I see what you're saying, but I don't think it gains us too much. If I store an object and an unbound method of the object, or if I store the bound method directly, I suspect it will yield approximately the same results. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 11:03 AM, Marko Rauhamaa ma...@pacujo.net wrote: Chris Angelico ros...@gmail.com: On Sat, Feb 21, 2015 at 1:44 PM, Cem Karan cfkar...@gmail.com wrote: In order to inform users that certain bits of state have changed, I require them to register a callback with my code. The problem is that when I store these callbacks, it naturally creates a strong reference to the objects, which means that if they are deleted without unregistering themselves first, my code will keep the callbacks alive. Since this could lead to really weird and nasty situations, [...] No, it's not. I would advise using strong references - if the callback is a closure, for instance, you need to hang onto it, because there are unlikely to be any other references to it. If I register a callback with you, I expect it to be called; I expect, in fact, that that *will* keep my object alive. I use callbacks all the time but haven't had any problems with strong references. I am careful to move my objects to a zombie state after they're done so they can absorb any potential loose callbacks that are lingering in the system. So, if I were designing a library for you, you would be willing to have a 'zombie' attribute on your callback, correct? This would allow the library to query its callbacks to ensure that only 'live' callbacks are called. How would you handle closures? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 3:57 PM, Grant Edwards invalid@invalid.invalid wrote: On 2015-02-21, Cem Karan cfkar...@gmail.com wrote: On Feb 21, 2015, at 12:42 AM, Chris Angelico ros...@gmail.com wrote: On Sat, Feb 21, 2015 at 1:44 PM, Cem Karan cfkar...@gmail.com wrote: In order to inform users that certain bits of state have changed, I require them to register a callback with my code. The problem is that when I store these callbacks, it naturally creates a strong reference to the objects, which means that if they are deleted without unregistering themselves first, my code will keep the callbacks alive. Since this could lead to really weird and nasty situations, I would like to store all the callbacks in a WeakSet (https://docs.python.org/3/library/weakref.html#weakref.WeakSet). That way, my code isn't the reason why the objects are kept alive, and if they are no longer alive, they are automatically removed from the WeakSet, preventing me from accidentally calling them when they are dead. My question is simple; is this a good design? If not, why not? Are there any potential 'gotchas' I should be worried about? No, it's not. I would advise using strong references - if the callback is a closure, for instance, you need to hang onto it, because there are unlikely to be any other references to it. If I register a callback with you, I expect it to be called; I expect, in fact, that that *will* keep my object alive. OK, so it would violate the principle of least surprise for you. And me as well. I would expect to be able to pass a closure as a callback and not have to keep a reference to it. Perhaps that just a leftover from working with other languages (javascript, scheme, etc.). It doesn't matter if it's a string, a float, a callback, a graphic or whatever: if I pass your function/library an object, I expect _you_ to keep track of it until you're done with it. Interesting. Is this a general pattern in python? That is, callbacks are owned by what they are registered with? I'm not sure what you mean by owned or why it matters that it's a callback: it's an object that was passed to you: you need to hold onto a reference to it until you're done with it, and the polite thing to do is to delete references to it when you're done with it. I tend to structure my code as a tree or DAG of objects. The owner refers to the owned object, but the owned object has no reference to its owner. With callbacks, you get cycles, where the owned owns the owner. As a result, if you forget where your object has been registered, it may be kept alive when you aren't expecting it. My hope was that with WeakSets I could continue to preserve the DAG or tree while still having the benefits of callbacks. However, it looks like that is too surprising to most people. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 10:55 AM, Chris Angelico ros...@gmail.com wrote: On Sun, Feb 22, 2015 at 2:45 AM, Cem Karan cfkar...@gmail.com wrote: OK, so if I'm reading your code correctly, you're breaking the cycle in your object graph by making the GUI the owner of the callback, correct? No other chunk of code has a reference to the callback, correct? Correct. The GUI engine ultimately owns everything. Of course, this is a very simple case (imagine a little notification popup; you don't care about it, you don't need to know when it's been closed, the only event on it is hit Close to destroy the window), and most usage would have other complications, but it's not uncommon for me to build a GUI program that leaves everything owned by the GUI engine. Everything is done through callbacks. Destroy a window, clean up its callbacks. The main window will have an on-deletion callback that terminates the program, perhaps. It's pretty straight-forward. How do you handle returning information? E.g., the user types in a number and expects that to update the internal state of your code somewhere. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 22, 2015, at 7:52 AM, Laura Creighton l...@openend.se wrote: In a message of Sun, 22 Feb 2015 07:16:14 -0500, Cem Karan writes: This was PRECISELY the situation I was thinking about. My hope was to make the callback mechanism slightly less surprising by allowing the user to track them, releasing them when they aren't needed without having to figure out where the callbacks were registered. However, it appears I'm making things more surprising rather than less. You may be able to accomplish your goal by using a Queue with a producer/consumer model. see: http://stackoverflow.com/questions/9968592/turn-functions-with-a-callback-into-python-generators especially the bottom of that. I haven't run the code, but it looks mostly reasonable, except that you do not want to rely on the Queue maxsize being 1 here, and indeed, I almost always want a bigger Queue in any case. Use Queue.task_done if blocking the producer features in your design. The problem that you are up against is that callbacks are inherantly confusing, even to programmers who are learning about them for the first time. They don't fit people's internal model of 'how code works'. There isn't a whole lot one can do about that except to try to make the magic do as little as possible, so that more of the code works 'the way people expect'. I think what you're suggesting is that library users register a Queue instead of a callback, correct? The problem is that I'll then have a strong reference to the Queue, which means I'll be pumping events into it after the user code has gone away. I was hoping to solve the problem of forgotten registrations in the library. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 22, 2015, at 4:34 PM, Marko Rauhamaa ma...@pacujo.net wrote: Cem Karan cfkar...@gmail.com: My goal is to make things as pythonic (whatever that means in this case) and obvious as possible. Ideally, a novice can more or less guess what will happen with my API without really having to read the documentation on it. If you try to shield your user from the complexities of asynchronous programming, you will only cause confusion. You will definitely need to document all nooks and crannies of the semantics of the callback API and your user will have to pay attention to every detail of your spec. Your user, whether novice or an expert, will thank you for your unambiguous specification even if it is complicated. Documentation is a given; it MUST be there. That said, documenting something, but still making it surprising, is a bad idea. For example, several people have been strongly against using a WeakSet to hold callbacks because they expect a library to hold onto callbacks. If I chose not to do that, and used a WeakSet, then even if I documented it, it would still end up surprising people (and from the sound of it, more people would be surprised than not). Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 22, 2015, at 4:02 PM, Ethan Furman et...@stoneleaf.us wrote: On 02/22/2015 05:13 AM, Cem Karan wrote: Output: From Evil Zombie: Surprise! From Your Significant Other: Surprise! In this case, the user made an error (just as Marko said in his earlier message), and forgot about the callback he registered with the library. The callback isn't really rising from the dead; as you say, either its been garbage collected, or it hasn't been. However, you may not be ready for a callback to be called at that moment in time, which means you're surprised by unexpected behavior. But the unexpected behavior is not a problem with Python, nor with your library -- it's a bug in the fellow-programmer's code, and you can't (or at least shouldn't) try to prevent those kinds of bugs from manifesting -- they'll just get bitten somewhere else by the same bug. I agree with you, but until a relatively new programmer has gotten used to what callbacks are and what they imply, I want to make things easy. For example, if the API subclasses collections.abc.MutableSet, and the documentation states that you can only add callbacks to this particular type of set, then a new programmer will naturally decide that either a) they need to dispose of the set, and if that isn't possible, then b) they need to delete their callback from the set. It won't occur to them that their live object will just magically 'go away'; its a member of a set! My goal is to make things as pythonic (whatever that means in this case) and obvious as possible. Ideally, a novice can more or less guess what will happen with my API without really having to read the documentation on it. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 9:36 AM, Chris Angelico ros...@gmail.com wrote: On Sun, Feb 22, 2015 at 1:07 AM, Cem Karan cfkar...@gmail.com wrote: I agree about closures; its the only way they could work. When I was originally thinking about the library, I was trying to include all types of callbacks, including closures and callable objects. The callable objects may pass themselves, or one of their methods to the library, or may do something really weird. Although I just realized that closures may cause another problem. In my code, I expect that many different callbacks can be registered for the same event. Unregistering means you request to be unregistered for the event. How do you do that with a closure? Aren't they anonymous? They're objects, same as any other, so the caller can hang onto a reference and then say now remove this one. Simple example: callbacks = [] def register_callback(f): callbacks.append(f) def unregister_callback(f): callbacks.remove(f) def do_callbacks(): for f in callbacks: f() def make_callback(i): def inner(): print(Callback! %d%i) register_callback(inner) return inner make_callback(5) remove_me = make_callback(6) make_callback(7) unregister_callback(remove_me) do_callbacks() Yeah, that's pretty much what I thought you'd have to do, which kind of defeats the purpose of closures (fire-and-forget things). BUT it does answer my question, so no complaints about it! So, either you keep a reference to your own closure, which means that the library doesn't really need to, or the library keeps hold of it for you, in which case you don't have a reasonable way of removing it. The other option is for your callback registration to return some kind of identifier, which can later be used to unregister the callback. This is a good way of avoiding reference cycles (the ID could be a simple integer - maybe the length of the list prior to the new callback being appended, and then the unregistration process is simply callbacks[id] = None, and you skip the Nones when iterating), and even allows you to register the exact same function more than once, for what that's worth. That would work. In the cases where someone might register unregister many callbacks, you might use UUIDs as keys instead (avoids the ABA problem). When I do GUI programming, this is usually how things work. For instance, I use GTK2 (though usually with Pike rather than Python), and I can connect a signal to a callback function. Any given signal could have multiple callbacks attached to it, so it's similar to your case. I frequently depend on the GTK engine retaining a reference to my function (and thus to any data it requires), as I tend not to hang onto any inner objects that don't need retention. Once the parent object is destroyed, all its callbacks get dereferenced. Consider this simplified form: def popup_window(): w = Window() # Add layout, info, whatever it takes btn = Button(Close) w.add(btn) # actually it'd be added to a layout btn.signal_connect(clicked, lambda *args: w.destroy()) The GUI back end will hang onto a reference to the window, because it's currently on screen; to the button, because it's attached to the window; and to my function, because it's connected to a button signal. Then when you click the button, the window gets destroyed, which destroys the button, which unregisters all its callbacks. At that point, there are no refs to the function, so it can get disposed of. That button function was the last external reference to the window, and now that it's not on screen, its Python object can also be disposed of, as can the button inside. So it'll all clean up fairly nicely; as long as the callback gets explicitly deregistered, that's the end of everything. OK, so if I'm reading your code correctly, you're breaking the cycle in your object graph by making the GUI the owner of the callback, correct? No other chunk of code has a reference to the callback, correct? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 12:42 AM, Chris Angelico ros...@gmail.com wrote: On Sat, Feb 21, 2015 at 1:44 PM, Cem Karan cfkar...@gmail.com wrote: In order to inform users that certain bits of state have changed, I require them to register a callback with my code. The problem is that when I store these callbacks, it naturally creates a strong reference to the objects, which means that if they are deleted without unregistering themselves first, my code will keep the callbacks alive. Since this could lead to really weird and nasty situations, I would like to store all the callbacks in a WeakSet (https://docs.python.org/3/library/weakref.html#weakref.WeakSet). That way, my code isn't the reason why the objects are kept alive, and if they are no longer alive, they are automatically removed from the WeakSet, preventing me from accidentally calling them when they are dead. My question is simple; is this a good design? If not, why not? Are there any potential 'gotchas' I should be worried about? No, it's not. I would advise using strong references - if the callback is a closure, for instance, you need to hang onto it, because there are unlikely to be any other references to it. If I register a callback with you, I expect it to be called; I expect, in fact, that that *will* keep my object alive. OK, so it would violate the principle of least surprise for you. Interesting. Is this a general pattern in python? That is, callbacks are owned by what they are registered with? In the end, I want to make a library that offers as few surprises to the user as possible, and no matter how I think about callbacks, they are surprising to me. If callbacks are strongly-held, then calling 'del foo' on a callable object may not make it go away, which can lead to weird and nasty situations. Weakly-held callbacks mean that I (as the programmer), know that objects will go away after the next garbage collection (see Frank's earlier message), so I don't get 'dead' callbacks coming back from the grave to haunt me. So, what's the consensus on the list, strongly-held callbacks, or weakly-held ones? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 8:15 AM, Chris Angelico ros...@gmail.com wrote: On Sun, Feb 22, 2015 at 12:13 AM, Cem Karan cfkar...@gmail.com wrote: OK, so it would violate the principle of least surprise for you. Interesting. Is this a general pattern in python? That is, callbacks are owned by what they are registered with? In the end, I want to make a library that offers as few surprises to the user as possible, and no matter how I think about callbacks, they are surprising to me. If callbacks are strongly-held, then calling 'del foo' on a callable object may not make it go away, which can lead to weird and nasty situations. Weakly-held callbacks mean that I (as the programmer), know that objects will go away after the next garbage collection (see Frank's earlier message), so I don't get 'dead' callbacks coming back from the grave to haunt me. So, what's the consensus on the list, strongly-held callbacks, or weakly-held ones? I don't know about Python specifically, but it's certainly a general pattern in other languages. They most definitely are owned, and it's the only model that makes sense when you use closures (which won't have any other references anywhere). I agree about closures; its the only way they could work. When I was originally thinking about the library, I was trying to include all types of callbacks, including closures and callable objects. The callable objects may pass themselves, or one of their methods to the library, or may do something really weird. Although I just realized that closures may cause another problem. In my code, I expect that many different callbacks can be registered for the same event. Unregistering means you request to be unregistered for the event. How do you do that with a closure? Aren't they anonymous? If you're expecting 'del foo' to destroy the object, then you have a bigger problem than callbacks, because that's simply not how Python works. You can't _ever_ assume that deleting something from your local namespace will destroy the object, because there can always be more references. So maybe you need a more clear way of saying I'm done with this, get rid of it. Agreed about 'del', and I don't assume that the object goes away at the point. The problem is debugging and determining WHY your object is still around. I know a combination of logging and gc.get_referrers() will probably help you figure out why something is still around, but I'm trying to avoid that headache. I guess the real problem is how this creates cycles in the call graph. User code effectively owns the library code, which via callbacks owns the user code. I have no idea what the best point the cycle is to break it, and not surprise someone down the road. The only idea I have is to redesign the library a little, and make anything that accepts a callback actually be a subclass of collections.abc.Container, or even collections.abc.MutableSet. That makes it very obvious that the object owns the callback, and that you will need to remove your object to unregister it. The only problem is how to handle closures; since they are anonymous, how do you decide which one to remove? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 12:41 AM, Frank Millman fr...@chagford.com wrote: Cem Karan cfkar...@gmail.com wrote in message news:33677ae8-b2fa-49f9-9304-c8d937842...@gmail.com... Hi all, I'm working on a project that will involve the use of callbacks, and I want to bounce an idea I had off of everyone to make sure I'm not developing a bad idea. Note that this is for python 3.4 code; I don't need to worry about any version of python earlier than that. In order to inform users that certain bits of state have changed, I require them to register a callback with my code. The problem is that when I store these callbacks, it naturally creates a strong reference to the objects, which means that if they are deleted without unregistering themselves first, my code will keep the callbacks alive. Since this could lead to really weird and nasty situations, I would like to store all the callbacks in a WeakSet (https://docs.python.org/3/library/weakref.html#weakref.WeakSet). That way, my code isn't the reason why the objects are kept alive, and if they are no longer alive, they are automatically removed from the WeakSet, preventing me from accidentally calling them when they are dead. My question is simple; is this a good design? If not, why not? Are there any potential 'gotchas' I should be worried about? I tried something similar a while ago, and I did find a gotcha. The problem lies in this phrase - if they are no longer alive, they are automatically removed from the WeakSet, preventing me from accidentally calling them when they are dead. I found that the reference was not removed immediately, but was waiting to be garbage collected. During that window, I could call the callback, which resulted in an error. There may have been a simple workaround. Perhaps someone else can comment. THAT would be one heck of a gotcha! Must have been fun debugging that one! Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Design thought for callbacks
On Feb 21, 2015, at 8:37 AM, Mark Lawrence breamore...@yahoo.co.uk wrote: On 21/02/2015 05:41, Frank Millman wrote: Cem Karan cfkar...@gmail.com wrote in message news:33677ae8-b2fa-49f9-9304-c8d937842...@gmail.com... Hi all, I'm working on a project that will involve the use of callbacks, and I want to bounce an idea I had off of everyone to make sure I'm not developing a bad idea. Note that this is for python 3.4 code; I don't need to worry about any version of python earlier than that. In order to inform users that certain bits of state have changed, I require them to register a callback with my code. The problem is that when I store these callbacks, it naturally creates a strong reference to the objects, which means that if they are deleted without unregistering themselves first, my code will keep the callbacks alive. Since this could lead to really weird and nasty situations, I would like to store all the callbacks in a WeakSet (https://docs.python.org/3/library/weakref.html#weakref.WeakSet). That way, my code isn't the reason why the objects are kept alive, and if they are no longer alive, they are automatically removed from the WeakSet, preventing me from accidentally calling them when they are dead. My question is simple; is this a good design? If not, why not? Are there any potential 'gotchas' I should be worried about? I tried something similar a while ago, and I did find a gotcha. The problem lies in this phrase - if they are no longer alive, they are automatically removed from the WeakSet, preventing me from accidentally calling them when they are dead. I found that the reference was not removed immediately, but was waiting to be garbage collected. During that window, I could call the callback, which resulted in an error. There may have been a simple workaround. Perhaps someone else can comment. Frank Millman https://docs.python.org/3/library/gc.html has a collect function. That seems like a simple workaround, but whether or not it classifies as a good solution I'll leave to others, I'm not qualified to say. Unfortunately, depending on how many objects you have in your object graph, it can slow your code down a fair amount. I think Frank is right about how a WeakSet might be a bad idea in this case. You really need to know if an object is alive or dead, and not some indeterminate state. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Design thought for callbacks
Hi all, I'm working on a project that will involve the use of callbacks, and I want to bounce an idea I had off of everyone to make sure I'm not developing a bad idea. Note that this is for python 3.4 code; I don't need to worry about any version of python earlier than that. In order to inform users that certain bits of state have changed, I require them to register a callback with my code. The problem is that when I store these callbacks, it naturally creates a strong reference to the objects, which means that if they are deleted without unregistering themselves first, my code will keep the callbacks alive. Since this could lead to really weird and nasty situations, I would like to store all the callbacks in a WeakSet (https://docs.python.org/3/library/weakref.html#weakref.WeakSet). That way, my code isn't the reason why the objects are kept alive, and if they are no longer alive, they are automatically removed from the WeakSet, preventing me from accidentally calling them when they are dead. My question is simple; is this a good design? If not, why not? Are there any potential 'gotchas' I should be worried about? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: unpyc3 - a python bytecode decompiler for Python3
On Jan 28, 2015, at 5:02 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Jan 29, 2015 at 8:52 AM, Devin Jeanpierre jeanpierr...@gmail.com wrote: Git doesn't help if you lose your files in between commits, or if you lose the entire directory between pushes. So you commit often and push immediately. Solved. ChrisA Just to expand on what Chris is saying, learn to use branches. I use git flow ([1][2]), but you don't need it, plain old branches are fine. Then you can have a feature branch like 'Joes_current', or something similar which you and only you push/pull from. Whenever you're done with it, you can merge the changes back into whatever you your group see as the real branch. That is the model I use at work, and it works fairly well, and its saved me once already when the laptop I was working on decided to die on me. Thanks, Cem Karan [1] http://nvie.com/posts/a-successful-git-branching-model/ [2] https://github.com/nvie/gitflow -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On Dec 29, 2014, at 2:47 AM, Rick Johnson rantingrickjohn...@gmail.com wrote: On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. Step1: Search through a single file. # Just a few more brush strokes... Step2: Search through all files in a directory. # Time to go exploring! Step3: Option to filter by file extension. # Waste not, want not! Step4: Option for recursing down sub-directories. # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! [Opps, fell into a recursive black hole!] # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! [BREAK] # Whew, no worries, MaximumRecursionError is my best friend! ;-) In addition to the other advice, you might want to check out os.walk() DEFINITELY use os.walk() if you're going to recurse through a directory tree. Here is an untested program I wrote that should do what you want. Modify as needed: # This is all Python 3 code, although I believe it will run under Python 2 # as well. # os.path is documented at https://docs.python.org/3/library/os.path.html # os.walk is documented at https://docs.python.org/3/library/os.html#os.walk # losging is documented at https://docs.python.org/3/library/logging.html import os import os.path import logging # Logging messages can be filtered by level. If you set the level really # low, then low-level messages, and all higher-level messages, will be # logged. However, if you set the filtering level higher, then low-level # messages will not be logged. Debug messages are lower than info messages, # so if you comment out the first line, and uncomment the second, you will # only get info messages (right now you're getting both). If you look # through the code, you'll see that I go up in levels as I work my way # inward through the filters; this makes debugging really, really easy. # I'll start out with my level high, and if my code works, I'm done. # However, if there is a bug, I'll work my downwards towards lower and # lower debug levels, which gives me more and more information. Eventually # I'll hit a level where I know enough about what is going on that I can # fix the problem. By the way, if you comment out both lines, you shouldn't # have any logging at all. logging.basicConfig(level=logging.DEBUG) ##logging.basicConfig(level=logging.INFO) EXTENSIONS = {.txt} def do_something_useful(real_path): # I deleted the original message, so I have no idea # what you were trying to accomplish, so I'm punting # the definition of this function back to you. pass for root, dirs, files in os.walk('/'): for f in files: # This expands symbolic links, cleans up double slashes, etc. # This can be useful when you're trying to debug why something # isn't working via logging. real_path = os.path.realpath(os.path.join(root, f)) logging.debug(operating on path '{0!s}'.format(real_path)) (r, e) = os.path.splitext(real_path) if e in EXTENSIONS: # If we've made a mistake in our EXTENSIONS set, we might never # reach this point. logging.info(Selected path '{0!s}'.format(real_path)) do_something_useful(real_path) As a note, for the sake of speed and your own sanity, you probably want to do the easiest/computationally cheapest filtering first here. That means selecting the files that match your extensions first, and then filtering those files by their contents second. Finally, if you are planning on parsing command-line options, DON'T do it by hand! Use argparse (https://docs.python.org/3/library/argparse.html) instead. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: resource based job queue manager
On Dec 19, 2014, at 11:53 AM, Parthiban Ramachandran rparthib...@gmail.com wrote: can someone suggest a resource based job queue manager. for eg i have 3 resources and 10 jobs based on the resource busy/free we should start running the jobs. I can write the code but want to know if there is any established scheduler which can run the jobs from different servers too. Try SCOOP: https://code.google.com/p/scoop/ Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Bug? Feature? setattr(foo, '3', 4) works!
On Dec 19, 2014, at 10:33 AM, random...@fastmail.us wrote: On Fri, Dec 19, 2014, at 07:23, Ben Finney wrote: Cem Karan cfkar...@gmail.com writes: I'd like to suggest that getattr(), setattr(), and hasattr() all be modified so that syntactically invalid statements raise SyntaxErrors. What syntactically invalid statements? The only syntactically invalid statements I see you presenting are ones that *already* raise SyntaxError. I think you mean that setting an attribute on an object should be a SyntaxError if the resulting attribute's name is not a valid identifier. But why should a valid statement produce SyntaxError? I'm −1 on such a change. And some APIs - ctypes, for example - actually require using getattr with an invalid identifier in some cases (where attribute access is used for an underlying concept with names that are usually, but not always, valid identifiers: in ctypes' case, looking up symbols from DLLs.) This is the one part I didn't know of; if ctypes requires this behavior, then it can't be changed. Dave Angel, the reason I wanted to raise a SyntaxError is because from a user's point of view they look like the same type of error. That said, you're right that for anyone trying to debug the interpreter itself raising SyntaxError would make things confusing. Regardless, because ctypes requires it, it can't be changed. I'm dropping the suggestion. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Bug? Feature? setattr(foo, '3', 4) works!
I'm bringing this discussion over from the python-ideas mailing list to see what people think. I accidentally discovered that the following works, at least in Python 3.4.2: class foo(object): ... pass ... setattr(foo, '3', 4) dir(foo) ['3', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] getattr(foo, '3') 4 bar = foo() dir(bar) ['3', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] getattr(bar, '3') 4 hasattr(foo, '3') True hasattr(bar, '3') True However, the following doesn't work: foo.3 File stdin, line 1 foo.3 ^ SyntaxError: invalid syntax bar.3 File stdin, line 1 bar.3 ^ SyntaxError: invalid syntax I'd like to suggest that getattr(), setattr(), and hasattr() all be modified so that syntactically invalid statements raise SyntaxErrors. In messages on python-ideas, Nick Coghlan mentioned that since a Namespace is just a dictionary, the normal error raised would be TypeError and not SyntaxError; I'd like to suggest special-casing this so that using getattr(), setattr(), and hasattr() in this way raise SyntaxError instead as I think that will be less astonishing. Thoughts? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
[issue4830] regrtest.py -u largefile test_io fails on OS X 10.5.6
New submission from Cem Karan cfkaran2+pyt...@gmail.com: I'm running OS X 10.5.6 (uname -a == Darwin 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386) I get the following error after compiling Python 3.0. Note that I have NOT installed it; I'm just trying to run the regression tests on the build. Python-3.0 cfkaran2$ ./Lib/test/regrtest.py -u largefile test_io File ./Lib/test/regrtest.py, line 183 print(msg, file=sys.stderr) ^ SyntaxError: invalid syntax I suspect that tester is not using the newly built python 3.0, but is using whatever is installed on the system, though I have not checked this at all. -- components: Tests messages: 79044 nosy: ironsmith severity: normal status: open title: regrtest.py -u largefile test_io fails on OS X 10.5.6 type: crash versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4830 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com