Re: Progress on the Gilectomy

2017-06-20 Thread Cem Karan
On Jun 20, 2017, at 1:19 AM, Paul Rubin <no.email@nospam.invalid> wrote:

> Cem Karan <cfkar...@gmail.com> writes:
>> Can you give examples of how it's not reliable?
> 
> Basically there's a chance of it leaking memory by mistaking a data word
> for a pointer.  This is unlikely to happen by accident and usually
> inconsequential if it does happen, but maybe there could be malicious
> data that makes it happen

Got it, thank you.  My processes will run for 1-2 weeks at a time, so I can 
handle minor memory leaks over that time without too much trouble.

> Also, it's a non-compacting gc that has to touch all the garbage as it
> sweeps, not a reliability issue per se, but not great for performance
> especially in large, long-running systems.

I'm not too sure how much of performance impact that will have.  My code 
generates a very large number of tiny, short-lived objects at a fairly high 
rate of speed throughout its lifetime.  At least in the last iteration of the 
code, garbage collection consumed less than 1% of the total runtime.  Maybe 
this is something that needs to be done and profiled to see how well it works?

> It's brilliant though.  It's one of those things that seemingly can't
> possibly work, but it turns out to be quite effective.

Agreed!  I **still** can't figure out how they managed to do it, it really does 
look like it shouldn't work at all!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy

2017-06-19 Thread Cem Karan

On Jun 19, 2017, at 6:19 PM, Gregory Ewing <greg.ew...@canterbury.ac.nz> wrote:

> Ethan Furman wrote:
>> Let me ask a different question:  How much effort is required at the C level 
>> when using tracing garbage collection?
> 
> That depends on the details of the GC implementation, but often
> you end up swapping one form of boilerplate (maintaining ref
> counts) for another (such as making sure the GC system knows
> about all the temporary references you're using).
> 
> Some, such as the Bohm collector, try to figure it all out
> automagically, but they rely on non-portable tricks and aren't
> totally reliable.

Can you give examples of how it's not reliable?  I'm currently using it in one 
of my projects, so if it has problems, I need to know about them.

On the main topic: I think that a good tracing garbage collector would probably 
be a good idea.  I've been having a real headache binding python to my C 
library via ctypes, and a large part of that problem is that I've got two 
different garbage collectors (python and bdwgc).  I think I've got it worked 
out at this point, but it would have been convenient to get memory allocated 
from python's garbage collected heap on the C-side.  Lot fewer headaches.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip list --outdated gives all packages

2017-05-30 Thread Cem Karan

On May 29, 2017, at 1:51 AM, Cecil Westerhof <ce...@decebal.nl> wrote:

> On Monday 29 May 2017 06:16 CEST, Cecil Westerhof wrote:
> 
>>> I'm completely flummoxed then; on my machines I get the 'old'
>>> behavior. Can you try a completely clean Debian install somewhere
>>> (maybe on a virtual box) and see what happens? I'm wondering if
>>> there is something going on with your migration.
>> 
>> I will do that. By the way, because of hardware I installed Stretch
>> which at the moment is still in testing.
> 
> I tried it. (Where some problems. Looks like you can not do certain
> things in VirtualBox. But that is for another time.)
> Get the same result. So maybe I should put it on the Debian list.

Yeah, I have no idea what to tell you.  Good luck!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip list --outdated gives all packages

2017-05-28 Thread Cem Karan

On May 27, 2017, at 11:10 AM, Cecil Westerhof <ce...@decebal.nl> wrote:

> On Saturday 27 May 2017 16:34 CEST, Cem Karan wrote:
> 
>> 
>> On May 27, 2017, at 7:15 AM, Cecil Westerhof <ce...@decebal.nl> wrote:
>> 
>>> On Saturday 27 May 2017 12:33 CEST, Cecil Westerhof wrote:
>>> 
>>>> I wrote a script to run as a cron job to check if I need to update
>>>> my Python installations. I migrated from openSUSE to Debian and
>>>> that does not work anymore (pip2 and pip3): it displays the same
>>>> with and without --outdated. Anyone knows what the problem could
>>>> be?
>>> 
>>> It does not exactly displays the same, but it displays all
>>> packages, while in the old version it only displayed the outdated
>>> versions. I already made a change with awk, but I would prefer the
>>> old functionality.
>>> 
>>> By the way, the patch is:
>>> pip2 list --outdated --format=legacy | awk '
>>> {
>>> if (substr($2, 2, length($2) - 2) != $5) {
>>> print $0
>>> }
>>> }'
>> 
>> Could you check the output of 'pip3 --version'? When I tested pip3
>> on my machine, 'pip3 list --outdated' only yielded the outdated
>> packages, not a list of everything out there.
> 
> Both as normal user and root I get:
>pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.5)

I'm completely flummoxed then; on my machines I get the 'old' behavior.  Can 
you try a completely clean Debian install somewhere (maybe on a virtual box) 
and see what happens?  I'm wondering if there is something going on with your 
migration.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip list --outdated gives all packages

2017-05-27 Thread Cem Karan

On May 27, 2017, at 7:15 AM, Cecil Westerhof <ce...@decebal.nl> wrote:

> On Saturday 27 May 2017 12:33 CEST, Cecil Westerhof wrote:
> 
>> I wrote a script to run as a cron job to check if I need to update
>> my Python installations. I migrated from openSUSE to Debian and that
>> does not work anymore (pip2 and pip3): it displays the same with and
>> without --outdated. Anyone knows what the problem could be?
> 
> It does not exactly displays the same, but it displays all packages,
> while in the old version it only displayed the outdated versions. I
> already made a change with awk, but I would prefer the old
> functionality.
> 
> By the way, the patch is:
>pip2 list --outdated --format=legacy | awk '
>{
>if (substr($2, 2, length($2) - 2) != $5) {
>print $0
>}
>}'

Could you check the output of 'pip3 --version'?  When I tested pip3 on my 
machine, 'pip3 list --outdated' only yielded the outdated packages, not a list 
of everything out there.

I'm asking about 'pip3 --version' because I found that my PATH as an ordinary 
user and as root were different, so my scripts would work as an ordinary user 
and then fail as root.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Survey: improving the Python std lib docs

2017-05-17 Thread Cem Karan
On May 16, 2017, at 12:36 PM, rzed <rzan...@gmail.com> wrote:

> On Friday, May 12, 2017 at 6:02:58 AM UTC-4, Steve D'Aprano wrote:
>> One of the more controversial aspects of the Python ecosystem is the Python
>> docs. Some people love them, and some people hate them and describe them as
>> horrible.
>> 
> [...]
> 
> One thing I would love to see in any function or class docs is a few example 
> invocations, preferably non-trivial. If I need to see more, I can read the 
> entire doc, but most times I just want a refresher on how the function is 
> called. Does it use keywords? Are there required nameless parameters? In what 
> order? A line or two would immediately clarify that most of the time. 
> 
> Apart from that, links to docs for uncommon functions (or to the docs of the 
> module, if there are many) would be at least somewhat useful.

I'd like to see complete signatures in the docstrings, so when I use help() on 
something that has *args or **kwargs I can see what the arguments actually are.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Battle of the garbage collectors, or ARGGHHHHHH!!!!

2017-04-26 Thread Cem Karan

On Apr 24, 2017, at 8:54 PM, Jon Ribbens <jon+use...@unequivocal.eu> wrote:

> On 2017-04-24, CFK <cfkar...@gmail.com> wrote:
>> Long version: I'm trying to write bindings for python via ctypes to control
>> a library written in C that uses the bdwgc garbage collector (
>> http://www.hboehm.info/gc/).  The bindings mostly work, except for when
>> either bdwgc or python's garbage collector decide to get into an argument
>> over what is garbage and what isn't, in which case I get a segfault because
>> one or the other collector has already reaped the memory.
> 
> Make your Python C objects contain a pointer to a
> GC_MALLOC_UNCOLLECTABLE block that contains a pointer to the
> bwdgc object it's an interface to? And GC_FREE it in tp_dealloc?
> Then bwdgc won't free any C memory that Python is referencing.

OK, I realized today that there was a miscommunication somewhere.  My python 
code is all pure python, and the library is pure C, and it is not designed to 
be called by python (it's intended to be language neutral, so if someone wants 
to call it from a different language, they can).  That means that tp_dealloc 
(which is part of the python C API) is probably not going to work.

I got interrupted (again) so I didn't have a chance to try the next trick and 
register the ctypes objects as roots from which to scan in bdwgc, but I'm 
hoping that roots aren't removed.  If that works, I'll post it to the list.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Battle of the garbage collectors, or ARGGHHHHHH!!!!

2017-04-25 Thread Cem Karan

On Apr 24, 2017, at 8:54 PM, Jon Ribbens <jon+use...@unequivocal.eu> wrote:

> On 2017-04-24, CFK <cfkar...@gmail.com> wrote:
>> Long version: I'm trying to write bindings for python via ctypes to control
>> a library written in C that uses the bdwgc garbage collector (
>> http://www.hboehm.info/gc/).  The bindings mostly work, except for when
>> either bdwgc or python's garbage collector decide to get into an argument
>> over what is garbage and what isn't, in which case I get a segfault because
>> one or the other collector has already reaped the memory.
> 
> Make your Python C objects contain a pointer to a
> GC_MALLOC_UNCOLLECTABLE block that contains a pointer to the
> bwdgc object it's an interface to? And GC_FREE it in tp_dealloc?
> Then bwdgc won't free any C memory that Python is referencing.

That's a really clever idea… I'm not near the machine that I could test it on 
right now, but I'll give it a shot tomorrow and see how it works.  I'll let 
everyone know what I find out.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Battle of the garbage collectors, or ARGGHHHHHH!!!!

2017-04-24 Thread Cem Karan

On Apr 24, 2017, at 6:59 PM, Terry Reedy <tjre...@udel.edu> wrote:

> On 4/24/2017 6:24 PM, CFK wrote:
>> TLDR version: the bdwgc garbage collector (http://www.hboehm.info/gc/) and
>> python's collector are not playing nice with one another, and I need to
>> make them work with each other.
>> Long version: I'm trying to write bindings for python via ctypes to control
>> a library written in C that uses the bdwgc garbage collector (
>> http://www.hboehm.info/gc/).  The bindings mostly work, except for when
>> either bdwgc or python's garbage collector decide to get into an argument
>> over what is garbage and what isn't, in which case I get a segfault because
>> one or the other collector has already reaped the memory.  I need the two
>> sides to play nice with one another.  I can think of two solutions:
>> First, I can replace Python's garbage collector via the functions described
>> at https://docs.python.org/3/c-api/memory.html#customize-memory-allocators
>> so that they use the bdwgc functions instead.  However, this leads me to a
>> whole series of questions:
>>1. Has anyone done anything like this before?
> 
> I know that experiments have been done.
> Have you tried searching 'Python bdwgc garbage collection' or similar?

I did google around a bit, but the results I found weren't relevant.  I was 
hoping someone else on the list had tried, and simply hadn't gotten around to 
posting about it anywhere yet.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python replace multiple strings (m*n) combination

2017-02-25 Thread Cem Karan
Another possibility is to form a suffix array 
(https://en.wikipedia.org/wiki/Suffix_array#Applications) as an index for the 
string, and then search for patterns within the suffix array.  The basic idea 
is that you index the string you're searching over once, and then look for 
patterns within it.  

The main problem with this method is how you're doing the replacements.  If 
your replacement text can create a new string that matches a different regex 
that occurs later on, then you really should use what INADA Naoki suggested.

Thanks,
Cem Karan

On Feb 25, 2017, at 2:08 PM, INADA Naoki <songofaca...@gmail.com> wrote:

> If you can use third party library, I think you can use Aho-Corasick 
> algorithm.
> 
> https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm
> 
> https://pypi.python.org/pypi/pyahocorasick/
> 
> On Sat, Feb 25, 2017 at 3:54 AM,  <kar6...@gmail.com> wrote:
>> I have a task to search for multiple patterns in incoming string and replace 
>> with matched patterns, I'm storing all pattern as keys in dict and 
>> replacements as values, I'm using regex for compiling all the pattern and 
>> using the sub method on pattern object for replacement. But the problem I 
>> have a tens of millions of rows, that I need to check for pattern which is 
>> about 1000 and this is turns out to be a very expensive operation.
>> 
>> What can be done to optimize it. Also I have special characters for 
>> matching, where can I specify raw string combinations.
>> 
>> for example is the search string is not a variable we can say
>> 
>> re.search(r"\$%^search_text", "replace_text", "some_text") but when I read 
>> from the dict where shd I place the "r" keyword, unfortunately putting 
>> inside key doesnt work "r key" like this
>> 
>> Pseudo code
>> 
>> for string in genobj_of_million_strings:
>>   pattern = re.compile('|'.join(regex_map.keys()))
>>   return pattern.sub(lambda x: regex_map[x], string)
>> --
>> https://mail.python.org/mailman/listinfo/python-list
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PTH files: Abs paths not working as expected. Symlinks needed?

2017-02-18 Thread Cem Karan

On Feb 16, 2017, at 9:55 PM, Rustom Mody <rustompm...@gmail.com> wrote:

> On Friday, February 17, 2017 at 3:24:32 AM UTC+5:30, Terry Reedy wrote:
>> On 2/15/2017 7:42 AM, poseidon wrote:
>> 
>>> what are pth files for?
>> 
>> They are for extending (mainly) lib/site-packages.  
> 
> 
> Hey Terry!
> This needs to get into more public docs than a one-off post on a newsgroup/ML

+1!

This is the first I've heard of this, and it sounds INCREDIBLY useful!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Who owns the memory in ctypes?

2016-11-14 Thread Cem Karan
Hi all, I'm hoping that this will be an easy question.

I have a pile of C code that I wrote that I want to interface to via the ctypes 
module (https://docs.python.org/3/library/ctypes.html).  The C code uses the 
Boehm-Demers-Weiser garbage collector (http://www.hboehm.info/gc/) for all of 
its memory management.  What I want to know is, who owns allocated memory?  
That is, if my C code allocates memory via GC_MALLOC() (the standard call for 
allocating memory in the garbage collector), and I access some object via 
ctypes in python, will the python garbage collector assume that it owns it and 
attempt to dispose of it when it goes out of scope?

Ideally, the memory is owned by the side that created it, with the other side 
simply referencing it, but I want to be sure before I invest a lot of time 
interfacing the two sides together.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Byte code descriptions somewhere?

2016-10-02 Thread Cem Karan

On Oct 1, 2016, at 7:34 PM, breamore...@gmail.com wrote:

> On Saturday, October 1, 2016 at 11:57:17 PM UTC+1, Cem Karan wrote:
>> Hi all, I've all of a sudden gotten interested in the CPython interpreter, 
>> and started trying to understand how it ingests and runs byte code.  I found 
>> Include/opcode.h in the python sources, and I found some basic documentation 
>> on how to add in new opcodes online, but I haven't found the equivalent of 
>> an assembly manual like you might for x86, etc.  Is there something similar 
>> to a manual dedicated to python byte code?  Also, is there a manual for how 
>> the interpreter expects the stack, etc. to be setup so that all interactions 
>> go as expected (garbage collections works, exceptions work, etc.)?  
>> Basically, I want a manual similar to what Intel or AMD might put out for 
>> their chips so that all executables behave nicely with one another.
>> 
>> Thanks,
>> Cem Karan
> 
> Further to Ben Finney's answer this 
> https://docs.python.org/devguide/compiler.html should help.
> 
> Kindest regards.
> 
> Mark Lawrence.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

Thank you!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Byte code descriptions somewhere?

2016-10-02 Thread Cem Karan

On Oct 1, 2016, at 8:30 PM, Ned Batchelder <n...@nedbatchelder.com> wrote:

> On Saturday, October 1, 2016 at 7:48:09 PM UTC-4, Cem Karan wrote:
>> Cool, thank you!  Quick experimentation suggests that I don't need to worry 
>> about marking anything for garbage collection, correct?  The next question 
>> is, how do I create a stream of byte codes that can be interpreted by 
>> CPython directly?  I don't mean 'use the compile module', I mean writing my 
>> own byte array with bytes that CPython can directly interpret.
> 
> In Python 2, you use new.code: 
> https://docs.python.org/2/library/new.html#new.code  It takes a bytestring of 
> byte codes as one of its
> twelve (!) arguments.
> 
> Something that might help (indirectly) with understanding bytecode:
> byterun (https://github.com/nedbat/byterun) is a pure-Python implementation
> of a Python bytecode VM.
> 
> --Ned.

byterun seems like the perfect project to work through to understand things.  
Thank you for pointing it out!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Byte code descriptions somewhere?

2016-10-02 Thread Cem Karan
On Oct 1, 2016, at 7:56 PM, Chris Angelico <ros...@gmail.com> wrote:

> On Sun, Oct 2, 2016 at 10:47 AM, Cem Karan <cfkar...@gmail.com> wrote:
>> Cool, thank you!  Quick experimentation suggests that I don't need to worry 
>> about marking anything for garbage collection, correct?  The next question 
>> is, how do I create a stream of byte codes that can be interpreted by 
>> CPython directly?  I don't mean 'use the compile module', I mean writing my 
>> own byte array with bytes that CPython can directly interpret.
>> 
> 
> "Marking for garbage collection" in CPython is done by refcounts; the
> bytecode is at a higher level than that.
> 
>>>> dis.dis("x = y*2")
>  1   0 LOAD_NAME0 (y)
>  3 LOAD_CONST   0 (2)
>  6 BINARY_MULTIPLY
>  7 STORE_NAME   1 (x)
> 10 LOAD_CONST   1 (None)
> 13 RETURN_VALUE
> 
> A LOAD operation will increase the refcount (a ref is on the stack),
> BINARY_MULTIPLY dereferences the multiplicands and adds a ref to the
> product, STORE will deref whatever previously was stored, etc.
> 
> To execute your own code, look at types.FunctionType and
> types.CodeType, particularly the latter's 'codestring' argument
> (stored as the co_code attribute). Be careful: you can easily crash
> CPython if you mess this stuff up :)

Ah, but crashing things is how we learn! :)  

That said, types.CodeType and types.FunctionType appear to be EXACTLY what I'm 
looking for!  Thank you!  Although I have to admit, the built-in docs for 
types.CodeType are concerning... "Create a code object.  Not for the faint of 
heart." Maybe that should be updated to "Here there be dragons"?  I'll poke 
through python's sources to get an idea of how to use codestring argument, but 
I'll probably be asking more questions on here about it.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Byte code descriptions somewhere?

2016-10-02 Thread Cem Karan
I kind of got the feeling that was so from reading the docs in the source code. 
 Too bad! :(

Cem

On Oct 1, 2016, at 7:53 PM, Paul Rubin <no.email@nospam.invalid> wrote:

> Cem Karan <cfkar...@gmail.com> writes:
>> how do I create a stream of byte codes that can be interpreted by
>> CPython directly?
> 
> Basically, study the already existing code and do something similar.
> The CPython bytecode isn't standardized like JVM bytecode.  It's
> designed for the interpreter's convenience, not officially documented,
> and (somewhat) subject to change between versions.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Byte code descriptions somewhere?

2016-10-01 Thread Cem Karan
Cool, thank you!  Quick experimentation suggests that I don't need to worry 
about marking anything for garbage collection, correct?  The next question is, 
how do I create a stream of byte codes that can be interpreted by CPython 
directly?  I don't mean 'use the compile module', I mean writing my own byte 
array with bytes that CPython can directly interpret.

Thanks,
Cem Karan


On Oct 1, 2016, at 7:02 PM, Ben Finney <ben+pyt...@benfinney.id.au> wrote:

> Cem Karan <cfkar...@gmail.com> writes:
> 
>> Hi all, I've all of a sudden gotten interested in the CPython
>> interpreter, and started trying to understand how it ingests and runs
>> byte code.
> 
> That sounds like fun!
> 
>> Is there something similar to a manual dedicated to python byte code?
> 
> The Python documentation for the ‘dis’ module shows not only how to use
> that module for dis-assembly of Python byte code, but also a reference
> for the byte code.
> 
>32.12. dis — Disassembler for Python bytecode
> 
><URL:https://docs.python.org/3/library/dis.html>
> 
> -- 
> \ “Skepticism is the highest duty and blind faith the one |
>  `\   unpardonable sin.” —Thomas Henry Huxley, _Essays on |
> _o__)   Controversial Questions_, 1889 |
> Ben Finney
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Byte code descriptions somewhere?

2016-10-01 Thread Cem Karan
Hi all, I've all of a sudden gotten interested in the CPython interpreter, and 
started trying to understand how it ingests and runs byte code.  I found 
Include/opcode.h in the python sources, and I found some basic documentation on 
how to add in new opcodes online, but I haven't found the equivalent of an 
assembly manual like you might for x86, etc.  Is there something similar to a 
manual dedicated to python byte code?  Also, is there a manual for how the 
interpreter expects the stack, etc. to be setup so that all interactions go as 
expected (garbage collections works, exceptions work, etc.)?  Basically, I want 
a manual similar to what Intel or AMD might put out for their chips so that all 
executables behave nicely with one another.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Abusive Italian Spam

2016-09-30 Thread Cem Karan
Honestly, I'm impressed by how little spam ever makes it onto the list.  
Considering the absolute flood of email the lists get, it's impressive work.  
Thank you for all the hard work you guys do for all the rest of us!

Thanks,
Cem Karan

On Sep 29, 2016, at 11:30 AM, Tim Golden <m...@timgolden.me.uk> wrote:

> You may have noticed one or two more of the abusive spam messages slip
> through onto the list. We do have traps for these but, as with most such
> things, they need tuning. (We've discarded many more than you've seen).
> 
> As ever, kudos to Mark Sapiro of the Mailman team for tweaking our
> custom filters and sorting out the archives in a timely fashion.
> 
> TJG
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: The Joys Of Data-Driven Programming

2016-08-31 Thread Cem Karan

On Aug 31, 2016, at 9:02 AM, Paul Moore <p.f.mo...@gmail.com> wrote:

> On 31 August 2016 at 13:49, Cem Karan <cfkar...@gmail.com> wrote:
>>> Has anyone else found this to be the case? Is there any "make replacement" 
>>> out there that focuses more on named sets of actions (maybe with 
>>> prerequisite/successor type interdependencies), and less on building file 
>>> dependency graphs?
>> 
>> Maybe Ninja (https://ninja-build.org/)?  I personally like it because of how 
>> simple it is, and the fact that it doesn't use leading tabs the way that 
>> make does.  It is intended to be the assembler for higher-level build 
>> systems which are more like compilers.  I personally use it as a make 
>> replacement because it does what I tell it to do, and nothing else.  It may 
>> fit what you're after.
> 
> It still seems focused on the file dependency graph (at least, from a
> quick look).
> 
> I'm thinking more of the makefile pattern
> 
> myproj.whl:
>pip wheel .
> ve: build
>virtualenv ve
>ve/bin/python -m pip install ./*.whl
> test: ve
>push ve
>bin/python -m py.test
>popd
> clean:
>rm -rf ve
> 
> Basically, a couple of "subcommands", one of which has 2 prerequisites
> that are run if needed. Little more in practice than 2 shell scripts
> with a bit of "if this is already done, skip" logic.
> 
> Most makefiles I encounter or write are of this form, and make
> essentially no use of dependency rules or anything more complex than
> "does the target already exist" checks. Make would be fine for this
> except for the annoying "must use tabs" rule, and the need to rely on
> shell (= non-portable, generally unavailable on Windows) constructs
> for any non-trivial logic.
> 
> In the days when make was invented, not compiling a source file whose
> object file was up to date was a worthwhile time saving. Now I'm more
> likely to just do "cc -c *.c" and not worry about it.

OK, I see what you're doing, and you're right, Ninja could be forced to do what 
you want, but it isn't the tool that you need.  

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: The Joys Of Data-Driven Programming

2016-08-31 Thread Cem Karan

On Aug 31, 2016, at 8:21 AM, Paul  Moore <p.f.mo...@gmail.com> wrote:

> On Sunday, 21 August 2016 15:20:39 UTC+1, Marko Rauhamaa  wrote:
>>> Aren’t makefiles data-driven?
>> 
>> Yes, "make" should be added to my sin list.
>> 
>>> [Personally Ive always believed that jam is better than make and is
>>> less used for entirely historical reasons; something like half the
>>> world eoling with crlf and half with lf. But maybe make is really a
>>> better design because more imperative?]
>> 
>> Don't know jam, but can heartily recommend SCons.
> 
> The data driven side of make is the target: sources part. But (particularly 
> as a Python programmer, where build dependencies are less of an issue) a huge 
> part of make usage is in my experience, simply name: actions pairs (which is 
> the less data driven aspect), maybe with an element of "always do X before Y".
> 
> I've generally found "make successors" like SCons and waf to be less useful, 
> precisely because they focus on the dependency graph (the data driven side) 
> and less on the trigger-action aspect.
> 
> Has anyone else found this to be the case? Is there any "make replacement" 
> out there that focuses more on named sets of actions (maybe with 
> prerequisite/successor type interdependencies), and less on building file 
> dependency graphs?

Maybe Ninja (https://ninja-build.org/)?  I personally like it because of how 
simple it is, and the fact that it doesn't use leading tabs the way that make 
does.  It is intended to be the assembler for higher-level build systems which 
are more like compilers.  I personally use it as a make replacement because it 
does what I tell it to do, and nothing else.  It may fit what you're after.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Quote of the day

2016-05-17 Thread Cem Karan

On May 17, 2016, at 4:30 AM, Marko Rauhamaa <ma...@pacujo.net> wrote:

> Radek Holý <rad...@holych.org>:
> 
>> 2016-05-17 9:50 GMT+02:00 Steven D'Aprano <
>> steve+comp.lang.pyt...@pearwood.info>:
>> 
>>> Overhead in the office today:
>>> 
>>> "I don't have time to learn an existing library - much faster to make
>>> my own mistakes!"
>> 
>> *THUMBS UP* At least they are aware of that "own mistakes" part... Not
>> like my employer...
> 
> Also:
> 
>   With a third party solution I don't need to fix the bugs.
> 
>   But with an in-house solution I at least *can* fix the bugs.
> 
> The feeling of powerlessness can be crushing when you depend on a
> third-party component that is broken with no fix in sight.

+1000 on this one.  Just downloaded and used a library that came with unit 
tests, which all passed.  When I started using it, I kept getting odd errors.  
Digging into it, I discovered they had commented out the bodies of some of the 
unit tests... glad it was open source, at least I *could* dig into the code and 
figure out what was going on :/

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Guido on python3 for beginners

2016-02-18 Thread Cem Karan

On Feb 18, 2016, at 4:57 AM, Chris Angelico <ros...@gmail.com> wrote:

> On Thu, Feb 18, 2016 at 7:40 PM, Terry Reedy <tjre...@udel.edu> wrote:
>> To my mind, the numerous duplications and overlaps in 2.7 that are gone in
>> 3.x make 2.7 the worse version ever for beginners.
> 
> Hmm. I was teaching on 2.7 up until last year, and for the most part,
> we taught a "compatible with Py3" subset of the language, without any
> significant cost. If you'd shown code saying "except ValueError, e:"
> to one of my Py2 students then, s/he would have been just as
> unfamiliar as one of my Py3 students would be today. That said,
> though, it's still that Py3 is no worse than Py2, and potentially
> better.
> 
> The removal of L suffixes (and, similarly, the removal of u"..."
> prefixes on text strings) is a bigger deal to newbies than it is to
> experienced programmers, so that one definitely counts. "This is
> great, but how can I remove that u from the strings?" was a common
> question (eg when they're printing out a list of strings obtained from
> a database, or decoded from JSON).
> 
> The removal of old-style classes is a definite improvement in Py3, as
> is the no-arg form of super(), which I'd consider a related change. So
> there's a bunch of tiny little "quality of life" improvements here.
> 
> ChrisA

I agree with Chris on all his points.  My personal feeling is that Py3 is the 
way to go for teaching in the future; its just that little bit more consistent 
across the board.  And the things that are confusing are not things that 
beginners will need to know about.

About the only thing I've read where Py2 has a slight advantage is for scripts 
where you're suddenly surprised by Py2 starting up when you've been using a Py3 
interactive interpreter.  For me, I'd probably give my students a block of code 
that they are asked to copy at the start of their files to test for Py2 or Py3, 
and to raise an exception on Py2.  After that, I just wouldn't worry about it.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-11 Thread Cem Karan

On Feb 10, 2016, at 1:23 PM, "Sven R. Kunze" <srku...@mail.de> wrote:

> Hi Cem,
> 
> On 08.02.2016 02:37, Cem Karan wrote:
>> My apologies for not writing sooner, but work has been quite busy lately 
>> (and likely will be for some time to come).
> 
> no problem here. :)
> 
>> I read your approach, and it looks pretty good, but there may be one issue 
>> with it; how do you handle the same item being pushed into the heap more 
>> than once?  In my simple simulator, I'll push the same object into my event 
>> queue multiple times in a row.  The priority is the moment in the future 
>> when the object will be called.  As a result, items don't have unique 
>> priorities.  I know that there are methods of handling this from the 
>> client-side (tuples with unique counters come to mind), but if your library 
>> can handle it directly, then that could be useful to others as well.
> 
> I've pondered about that in the early design phase. I considered it a 
> slowdown for my use-case without benefit.
> 
> Why? Because I always push a fresh object ALTHOUGH it might be equal 
> comparing attributes (priority, deadline, etc.).
> 
> 
> That's the reason why I need to ask again: why pushing the same item on a 
> heap?
> 
> 
> Are we talking about function objects? If so, then your concern is valid. 
> Would you accept a solution that would involve wrapping the function in 
> another object carrying the priority? Would you prefer a wrapper that's 
> defined by xheap itself so you can just use it?

Yes.  I use priority queues for event loops.  The items I push in are callables 
(sometimes callbacks, sometimes objects with __call__()) and the priority is 
the simulation date that they should be called.  I push the same item multiple 
times in a row because it will modify itself by the call (e.g., the location of 
an actor is calculated by its velocity and the date). There are certain calls 
that I tend to push in all at once because the math for calculating when the 
event should occur is somewhat expensive to calculate, and always returns 
multiple dates at once.  

That is also why deleting or changing events can be useful; I know that at 
least some of those events will be canceled in the future, which makes deleting 
useful.  Note that it is also possible to cancel an event by marking it as 
cancelled, and then simply not executing it when you pop it off the queue, but 
I've found that there are a few cases in my simulations where the number of 
dead events that are in the queue exceeds the number of live events, which does 
have an impact on memory and operational speed (maintaining the heap 
invariant).  There isn't much difference though, but I need FAST code to deal 
with size of my simulations (thousands to tens of thousands of actors, over 
hundreds of millions of simulations, which is why I finally had to give up on 
python and switch to pure C).

Having a wrapper defined by xheap would be ideal; I suspect that I won't be the 
only one that needs to deal with this, so having it centrally located would be 
best.  It may also make it possible for you to optimize xheap's behavior in 
some way.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread Cem Karan

On Feb 9, 2016, at 4:40 AM, Mark Lawrence <breamore...@yahoo.co.uk> wrote:

> On 09/02/2016 04:25, Cem Karan wrote:
>> 
>> No problem, that's what I thought happened.  And you're right, I'm looking 
>> for a priority queue (not the only reason to use a heap, but a pretty 
>> important reason!)
>> 
> 
> I'm assuming I've missed the explanation, so what is the problem again with 
> https://docs.python.org/3/library/queue.html#queue.PriorityQueue or even 
> https://docs.python.org/3/library/asyncio-queue.html#asyncio.PriorityQueue ?

Efficiently changing the the priority of items already in the queue/deleting 
items in the queue (not the first item).  This comes up a LOT in event-based 
simulators where it's easier to tentatively add an event knowing that you might 
need to delete it or change it later.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread Cem Karan

On Feb 9, 2016, at 9:27 AM, Mark Lawrence <breamore...@yahoo.co.uk> wrote:

> On 09/02/2016 11:44, Cem Karan wrote:
>> 
>> On Feb 9, 2016, at 4:40 AM, Mark Lawrence <breamore...@yahoo.co.uk> wrote:
>> 
>>> On 09/02/2016 04:25, Cem Karan wrote:
>>>> 
>>>> No problem, that's what I thought happened.  And you're right, I'm looking 
>>>> for a priority queue (not the only reason to use a heap, but a pretty 
>>>> important reason!)
>>>> 
>>> 
>>> I'm assuming I've missed the explanation, so what is the problem again with 
>>> https://docs.python.org/3/library/queue.html#queue.PriorityQueue or even 
>>> https://docs.python.org/3/library/asyncio-queue.html#asyncio.PriorityQueue ?
>> 
>> Efficiently changing the the priority of items already in the queue/deleting 
>> items in the queue (not the first item).  This comes up a LOT in event-based 
>> simulators where it's easier to tentatively add an event knowing that you 
>> might need to delete it or change it later.
>> 
>> Thanks,
>> Cem Karan
>> 
> 
> Thanks for that, but from the sounds of it sooner you than me :)

Eh, its not too bad once you figure out how to do it.  It's easier in C though; 
you can use pointer tricks that let you find the element in constant time, and 
then removal will involve figuring out how to fix up your heap after you've 
removed the element.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread Cem Karan

On Feb 9, 2016, at 8:27 PM, srinivas devaki <mr.eightnotei...@gmail.com> wrote:

> 
> 
> On Feb 10, 2016 6:11 AM, "Cem Karan" <cfkar...@gmail.com> wrote:
> >
> > Eh, its not too bad once you figure out how to do it.  It's easier in C 
> > though; you can use pointer tricks that let you find the element in 
> > constant time, and then removal will involve figuring out how to fix up 
> > your heap after you've removed the element.
> >
> 
> If you can do it with C pointers then you can do it with python's 
> references/mutable objects. :)
> in case of immutable objects, use a light mutable wrapper or better use list 
> for performance.

I should have been clearer; it's easier to UNDERSTAND in C, but you can 
implement it in either language.  C will still be faster, but only because its 
compiled.  It will also take a lot longer to code and ensure that it's correct, 
but that is the tradeoff.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-08 Thread Cem Karan
On Feb 7, 2016, at 10:15 PM, srinivas devaki <mr.eightnotei...@gmail.com> wrote:
> On Feb 8, 2016 7:07 AM, "Cem Karan" <cfkar...@gmail.com> wrote:
> > I know that there are methods of handling this from the client-side (tuples 
> > with unique counters come to mind), but if your library can handle it 
> > directly, then that could be useful to others as well.
> 
> yeah it is a good idea to do at client side.
> but if it should be introduced as feature into the library, instead of 
> tuples, we should just piggyback a single counter it to the self._indexes 
> dict, or better make another self._counts dict which will be light and fast.
> and if you think again with this method you can easily subclass with just 
> using self._counts dict  in your subclass. but still I think it is good to 
> introduce it as a feature in the library.
> 
> Regards
> Srinivas Devaki

Just to be 100% sure, you do mean to use the counters as UUIDs, right?  I don't 
mean that the elements in the heap get counted, I meant that the counter is a 
trick to separate different instances of (item, priority) pairs when you're 
pushing in the same item multiple times, but with different priorities.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-08 Thread Cem Karan

On Feb 8, 2016, at 10:12 PM, srinivas devaki <mr.eightnotei...@gmail.com> wrote:

> 
> On Feb 8, 2016 5:17 PM, "Cem Karan" <cfkar...@gmail.com> wrote:
> >
> > On Feb 7, 2016, at 10:15 PM, srinivas devaki <mr.eightnotei...@gmail.com> 
> > wrote:
> > > On Feb 8, 2016 7:07 AM, "Cem Karan" <cfkar...@gmail.com> wrote:
> > > > I know that there are methods of handling this from the client-side 
> > > > (tuples with unique counters come to mind), but if your library can 
> > > > handle it directly, then that could be useful to others as well.
> > >
> > > yeah it is a good idea to do at client side.
> > > but if it should be introduced as feature into the library, instead of 
> > > tuples, we should just piggyback a single counter it to the self._indexes 
> > > dict, or better make another self._counts dict which will be light and 
> > > fast.
> > > and if you think again with this method you can easily subclass with just 
> > > using self._counts dict  in your subclass. but still I think it is good 
> > > to introduce it as a feature in the library.
> > >
> > > Regards
> > > Srinivas Devaki
> >
> > I meant that the counter is a trick to separate different instances of 
> > (item, priority) pairs when you're pushing in the same item multiple times, 
> > but with different priorities.
> 
> oh okay, I'm way too off.
> 
> what you are asking for is a Priority Queue like feature.
> 
> but the emphasis is on providing extra features to heap data structure.
> 
> and xheap doesn't support having duplicate items.
> 
> and if you want to insert same items with distinct priorities, you can 
> provide the priority with key argument to the xheap. what xheap doesn't 
> support is having same keys/priorities.
> So I got confused and proposed a method to have same keys.
> 
> Regards
> Srinivas Devaki

No problem, that's what I thought happened.  And you're right, I'm looking for 
a priority queue (not the only reason to use a heap, but a pretty important 
reason!)

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: A sets algorithm

2016-02-07 Thread Cem Karan

On Feb 7, 2016, at 4:46 PM, Paulo da Silva <p_s_d_a_s_i_l_v_a...@netcabo.pt> 
wrote:

> Hello!
> 
> This may not be a strict python question, but ...
> 
> Suppose I have already a class MyFile that has an efficient method (or
> operator) to compare two MyFile s for equality.
> 
> What is the most efficient way to obtain all sets of equal files (of
> course each set must have more than one file - all single files are
> discarded)?
> 
> Thanks for any suggestions.

If you're after strict equality (every byte in a pair of files is identical), 
then here are a few heuristics that may help you:

1) Test for file length, without reading in the whole file.  You can use 
os.path.getsize() to do this (I hope that this is a constant-time operation, 
but I haven't tested it).  As Oscar Benjamin suggested, you can create a 
defaultdict(list) which will make it possible to gather lists of files of equal 
size.  This should help you gather your potentially identical files quickly.

2) Once you have your dictionary from above, you can iterate its values, each 
of which will be a list.  If a list has only one file in it, you know its 
unique, and you don't have to do any more work on it.  If there are two files 
in the list, then you have several different options:
a) Use Chris Angelico's suggestion and hash each of the files (use the 
standard library's 'hashlib' for this).  Identical files will always have 
identical hashes, but there may be false positives, so you'll need to verify 
that files that have identical hashes are indeed identical.
b) If your files tend to have sections that are very different (e.g., 
the first 32 bytes tend to be different), then you pretend that section of the 
file is its hash.  You can then do the same trick as above. (the advantage of 
this is that you will read in a lot less data than if you have to hash the 
entire file).
c) You may be able to do something clever by reading portions of each 
file.  That is, use zip() combined with read(1024) to read each of the files in 
sections, while keeping hashes of the files.  Or, maybe you'll be able to read 
portions of them and sort the list as you're reading.  In either case, if any 
files are NOT identical, then you'll be able to stop work as soon as you figure 
this out, rather than having to read the entire file at once.

The main purpose of these suggestions is to reduce the amount of reading you're 
doing.  Storage tends to be slow, and any tricks that reduce the number of 
bytes you need to read in will be helpful to you.

Good luck!
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-07 Thread Cem Karan

On Jan 30, 2016, at 5:47 PM, Sven R. Kunze <srku...@mail.de> wrote:

> Hi again,
> 
> as the topic of the old thread actually was fully discussed, I dare to open a 
> new one.
> 
> I finally managed to finish my heap implementation. You can find it at 
> https://pypi.python.org/pypi/xheap + https://github.com/srkunze/xheap.
> 
> I described my motivations and design decisions at 
> http://srkunze.blogspot.com/2016/01/fast-object-oriented-heap-implementation.html
>  
> 
> @Cem
> You've been worried about a C implementation. I can assure you that I did not 
> intend to rewrite the incredibly fast and well-tested heapq implementation. I 
> just re-used it.
> 
> I would really be grateful for your feedback as you have first-hand 
> experience with heaps.

<>

My apologies for not writing sooner, but work has been quite busy lately (and 
likely will be for some time to come).

I read your approach, and it looks pretty good, but there may be one issue with 
it; how do you handle the same item being pushed into the heap more than once?  
In my simple simulator, I'll push the same object into my event queue multiple 
times in a row.  The priority is the moment in the future when the object will 
be called.  As a result, items don't have unique priorities.  I know that there 
are methods of handling this from the client-side (tuples with unique counters 
come to mind), but if your library can handle it directly, then that could be 
useful to others as well.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: me, my arm, my availability ...

2016-01-14 Thread Cem Karan

On Jan 13, 2016, at 3:47 PM, Laura Creighton <l...@openend.se> wrote:

> 
> I fell recently.  Ought to be nothing, but a small chip of bone, either an
> existing one or one I just made is nicely wedged in the joint taking away
> a whole lot of the ability of my arm to rotate in the elbow joint.  Or
> hold my arm in a position that is usual for typing.  Plus,  now that the
> sprain/swelling is more or less over, the pain, unfortunately is not.
> 
> The real downside is that my typing speed is down from 135-140 wpm
> to 5-10 wmp.  At this rate, just getting my usual work done takes
> overtime.
> 
> Seems like surgery is needed to fix this. 
> 
> So I wanted you all to know, no, I haven't forgotten you and no haven't
> stopped caring.  I have just stopped being as __capable__ if you know
> what I mean.
> 
> Please take care of yourselves and each other.  I will often be reading
> even if typing is more than I can do right now.
> 
> Laura
> 
> ps -- (recent tutor discussion) I am with Alan and not with Mark.  I
> am happy as anything when people post their not-quite-working code for
> homework assignments here to tutor.  They aren't lazy bastards wanting
> somebody to do their assignments for them, they want to learn why what
> they are trying to do isn't working.  Sounds perfect for tutor to me.

Good luck healing!  Hope you get better soon.  Surgery has gotten a WHOLE lot 
better recently, they did wonders for my knee a few years back.  With luck, 
it'll be more or less outpatient surgery.

Good luck,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to remove item from heap efficiently?

2016-01-14 Thread Cem Karan
On Jan 13, 2016, at 2:08 PM, Sven R. Kunze <srku...@mail.de> wrote:
> On 13.01.2016 12:20, Cem Karan wrote:
>> On Jan 12, 2016, at 11:18 AM, "Sven R. Kunze" <srku...@mail.de> wrote:
>> 
>>> Thanks for replying here. I've come across these types of 
>>> wrappers/re-implementations of heapq as well when researching this issue. :)
>>> 
>>> Unfortunately, they don't solve the underlying issue at hand which is: 
>>> "remove item from heap with unknown index" and be efficient at it (by not 
>>> using _heapq C implementation).
>>> 
>>> 
>>> So, I thought I did another wrapper. ;) It at least uses _heapq (if 
>>> available otherwise heapq) and lets you remove items without violating the 
>>> invariant in O(log n). I am going to make that open-source on pypi and see 
>>> what people think of it.
>> Is that so?  I'll be honest, I never tested its asymptotic performance, I 
>> just assumed that he had a dict coupled with a heap somehow, but I never 
>> looked into the code.
> 
> My concern about that specific package is a missing C-implementation. I feel 
> that somewhat defeats the whole purpose of using a heap: performance.

I agree with you that performance is less than that of using a C extension 
module, but there are other costs associated with developing a C module:

1) As the developer of the module, you must be very careful to ensure your code 
is portable.

2) Distribution becomes somewhat more difficult; you may need to distribute 
both source and compiled binaries for various platforms.  This is somewhat more 
annoying than pure python scripts.

3) Debugging can become significantly more difficult.  My current codebase is 
python+cython+c, and when something crashes, it is usually easier to use a 
bunch of printf() statements to figure out what is going on than to use a 
debugger (others may have different experiences, this is just mine).

4) Not everyone is familiar with C, so writing extensions may be more difficult.

5) Will the extension module work on non-cpython platforms (iron python, 
jython, etc.)?

Finally, without profiling the complete package it may be difficult to tell 
what impact your C module will have on overall performance.  In my code, 
HeapDict had less than a 2% performance impact on what I was doing; even if I 
had replaced it with a pure C implementation, my code would not have run much 
faster.

So, while I agree in principle to what you're saying, in practice there may be 
other factors to consider before rejecting the pure python approach.

> Asymptotic performance is still O(log n).

So, if the intent is to pop events more often than to peek at them, then in 
practice, HeapDict is about the same as some clever heap+dict method (which it 
might be, as I said, I haven't looked at the code).

>> That said, IMHO using a dict interface is the way to go for priority queues; 
>> it really simplified my code using it!  This is my not-so-subtle way of 
>> asking you to adopt the MutableMapping interface for your wrapper ;)
> 
> Could you elaborate on this? What simplified you code so much?
> 
> I have been using heaps for priority queues as well but haven't missed the 
> dict interface so far. Maybe, my use-case is different.


I'm writing an event-based simulator, and as it turns out, it is much easier to 
tentatively add events than it is to figure out precisely which events will 
occur in the future.  That means that on a regular basis I need to delete 
events as I determine that they are garbage.  HeapDict did a good job of that 
for me (for completely unrelated reasons I decided to switch to a pure-C 
codebase, with python hooks to twiddle the simulator at a few, very rare, 
points in time; hence the python+cython+c comment above).

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to remove item from heap efficiently?

2016-01-13 Thread Cem Karan

On Jan 12, 2016, at 11:18 AM, "Sven R. Kunze" <srku...@mail.de> wrote:

> On 12.01.2016 03:48, Cem Karan wrote:
>> 
>> Jumping in late, but...
>> 
>> If you want something that 'just works', you can use HeapDict:
>> 
>> http://stutzbachenterprises.com/
>> 
>> I've used it in the past, and it works quite well.  I haven't tested its 
>> asymptotic performance though, so you might want to check into that.
> 
> Thanks for replying here. I've come across these types of 
> wrappers/re-implementations of heapq as well when researching this issue. :)
> 
> Unfortunately, they don't solve the underlying issue at hand which is: 
> "remove item from heap with unknown index" and be efficient at it (by not 
> using _heapq C implementation).
> 
> 
> So, I thought I did another wrapper. ;) It at least uses _heapq (if available 
> otherwise heapq) and lets you remove items without violating the invariant in 
> O(log n). I am going to make that open-source on pypi and see what people 
> think of it.

Is that so?  I'll be honest, I never tested its asymptotic performance, I just 
assumed that he had a dict coupled with a heap somehow, but I never looked into 
the code.

That said, IMHO using a dict interface is the way to go for priority queues; it 
really simplified my code using it!  This is my not-so-subtle way of asking you 
to adopt the MutableMapping interface for your wrapper ;)

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to remove item from heap efficiently?

2016-01-11 Thread Cem Karan

On Jan 11, 2016, at 9:53 AM, srinivas devaki <mr.eightnotei...@gmail.com> wrote:

> On Jan 11, 2016 12:18 AM, "Sven R. Kunze" <srku...@mail.de> wrote:
>> Indeed. I already do the sweep method as you suggested. ;)
>> 
>> Additionally, you provided me with a reasonable condition when to do the
> sweep in order to achieve O(log n). Thanks much for that. I currently used
> a time-bases approached (sweep each 20 iterations).
>> 
>> PS: Could you add a note on how you got to the condition (
> 2*self.useless_b > len(self.heap_b))?
>> 
> 
> oh that's actually simple,
> that condition checks if more than half of heap is useless items.
> the sweep complexity is O(len(heap)), so to keep the extra amortized
> complexity as O(1), we have to split that work(virtually) with O(len(heap))
> operations, so when our condition becomes true we have done len(heap)
> operations, so doing a sweep at that time means we splitted that
> work(O(len(heap))) with every operation.

Jumping in late, but...

If you want something that 'just works', you can use HeapDict:

http://stutzbachenterprises.com/

I've used it in the past, and it works quite well.  I haven't tested its 
asymptotic performance though, so you might want to check into that.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I count word frequency in a web site?

2015-11-29 Thread Cem Karan
You might want to look into Beautiful Soup 
(https://pypi.python.org/pypi/beautifulsoup4), which is an HTML screen-scraping 
tool.  I've never used it, but I've heard good things about it.

Good luck,
Cem Karan

On Nov 29, 2015, at 7:49 PM, ryguy7272 <ryanshu...@gmail.com> wrote:

> I'm trying to figure out how to count words in a web site.  Here is a sample 
> of the link I want to scrape data from and count specific words.
> http://finance.yahoo.com/q/h?s=STRP+Headlines
> 
> I only want to count certain words, like 'fraud', 'lawsuit', etc.  I want to 
> have a way to control for specific words.  I have a couple Python scripts 
> that do this for a text file, but not for a web site.  I can post that, if 
> that's helpful.
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sys path modification

2015-07-27 Thread Cem Karan
On Jul 27, 2015, at 1:24 PM, neubyr neu...@gmail.com wrote:

 
 I am trying to understand sys.path working and best practices for managing it 
 within a program or script. Is it fine to modify sys.path using 
 sys.path.insert(0, EXT_MODULES_DIR)? One stackoverflow answer - 
 http://stackoverflow.com/a/10097543 - suggests that it may break external 
 3'rd party code as by convention first item of sys.path list, path[0], is the 
 directory containing the script that was used to invoke the Python 
 interpreter. So what are best practices to prepend sys.path in the program 
 itself? Any further elaboration would be helpful. 


Why are you trying to modify sys.path?  I'm not judging, there are many good 
reasons to do so, but there may be safer ways of getting the effect you want 
that don't rely on modifying sys.path.  One simple method is to modify 
PYTHONPATH (https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH) 
instead.

In order of preference:

1) Append to sys.path.  This will cause you the fewest headaches.

2) If you absolutely have to insert into the list, insert after the first 
element.  As you noted from SO, and noted in the docs 
(https://docs.python.org/3/library/sys.html#sys.path), the first element of 
sys.path is the path to the directory of the script itself.  If you modify 
this, you **will** break third-party code at some point.  

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Using Python instead of Bash

2015-05-31 Thread Cem Karan
 I help someone that has problems reading. For this I take photo's of
 text, use convert from ImageMagick to make a good contrast (original
 paper is grey) and use lpr to print it a little bigger.
 
 Normally I would implement this in Bash, but I thought it a good idea
 to implement it in Python. This is my first try:
import glob
import subprocess
 
treshold = 66
count = 0
for input in sorted(glob.glob('*.JPG')):
count += 1
output = '{0:02d}.png'.format(count)
print('Going to convert {0} to {1}'.format(input, output))
p = subprocess.Popen(['convert', '-threshold', 
 '{0}%'.format(treshold), input, output])
p.wait()
print('Going to print {0}'.format(output))
p = subprocess.Popen(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', 
 output])
p.wait()
 
 There have to be some improvements: display before printing,
 possibility to change threshold, … But is this a good start, or should
 I do it differently?


As a first try, I think its pretty good, but to really answer your question, I 
think we could use a little more information.  

- Are you using python 2, or python 3?  There are slightly easier ways to do 
this using concurrent.futures objects, but they are only available under python 
3. (See https://docs.python.org/3/library/concurrent.futures.html)

- In either case, subprocess.call(), subprocess.check_call(), or 
subprocess.check_output() may be easier to use.  That said, your code is 
perfectly fine!  The only real difference is that subprocess.call() will 
automatically wait for the call to complete, so you don't need to use p.wait() 
from above.  (See https://docs.python.org/2.7/library/subprocess.html, and 
https://docs.python.org/3/library/subprocess.html) 



The following codes does the conversion in parallel, and submits the jobs to 
the printer serially.  That should ensure that the printed output is also in 
sorted order, but you might want to double check before relying on it too much. 
 The major problem with it is that you can't display the output before 
printing; since everything is running in parallel, you'll have race conditions 
if you try.  **I DID NOT TEST THIS CODE, I JUST TYPED IT OUT IN MY MAIL 
CLIENT!**  Please test it carefully before relying on it!


import subprocess
import concurrent.futures
import glob
import os.path

_THRESHOLD = 66

def _collect_filenames():
files = glob.glob('*.JPG')

# I build a set of the real paths so that if you have 
# symbolic links that all point to the same file, they
# they are automatically collapsed to a single file
real_files = {os.path.realpath(x) for x in files}
base_files = [os.path.splitext(x)[0] for x in real_files]
return base_files

def _convert(base_file_name):

This code is slightly different from your code.  Instead
of using numbers as names, I use the base name of file and
append '.png' to it.  You may need to adjust this to ensure
you don't overwrite anything.

input = base_file_name + .JPG
output = base_file_name + .png
subprocess.call(['convert', '-threshold', '{0}%'.format(_THRESHOLD), input, 
output])

def _print_files_in_order(base_files):
base_files.sort()
for f in base_files:
output = f + .png
subprocess.call(['lpr', '-o', 'fit-to-page', '-o', 'media=A4', output])

def driver():
base_files = _collect_filenames()

# If you use an executor as a context manager, then the
# executor will wait until all of the submitted jobs finish
# before it returns.  The submitted jobs will execute in
# parallel.
with concurrent.futures.ProcessPoolExecutor() as executor:
for f in base_files:
executor.submit(_convert_and_print, f)

_print_files_in_order(base_files)


Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Hello Group and how to practice?

2015-05-31 Thread Cem Karan

On May 31, 2015, at 9:35 AM, Anders Johansen sko...@gmail.com wrote:

 Hi my name is Anders I am from Denmark, and I am new to programming and 
 python.
 
 Currently, I am doing the codecademy.com python course, but sometime I feel 
 that the course advances to fast and I lack repeating (practicing) some of 
 the concepts, however I don't feel confident enough to start programming on 
 my own.
 
 Do you guys have some advice to how I can practicing programming and get the 
 concept in under the skin?

Choose something that you think is small and easy to do, and try to do it.  
When you have trouble, read the docs you find online, and ask us questions; the 
python community is pretty friendly, and if you show us what you've already 
tried to do, someone is likely to try to help you out.

The main thing is to not get discouraged.  One of the hardest things to do is 
figuring out what you CAN do with a computer; some things that look like they 
should be easy, are actually major research questions.  Just keep trying, and 
it will get easier over time.

Good luck!
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Hello Group and how to practice?

2015-05-31 Thread Cem Karan

On May 31, 2015, at 10:51 AM, Anders Johansen sko...@gmail.com wrote:

 Den søndag den 31. maj 2015 kl. 16.22.10 UTC+2 skrev Cem Karan:
 On May 31, 2015, at 9:35 AM, Anders Johansen sko...@gmail.com wrote:
 
 Hi my name is Anders I am from Denmark, and I am new to programming and 
 python.
 
 Currently, I am doing the codecademy.com python course, but sometime I feel 
 that the course advances to fast and I lack repeating (practicing) some of 
 the concepts, however I don't feel confident enough to start programming on 
 my own.
 
 Do you guys have some advice to how I can practicing programming and get 
 the concept in under the skin?
 
 Choose something that you think is small and easy to do, and try to do it.  
 When you have trouble, read the docs you find online, and ask us questions; 
 the python community is pretty friendly, and if you show us what you've 
 already tried to do, someone is likely to try to help you out.
 
 The main thing is to not get discouraged.  One of the hardest things to do 
 is figuring out what you CAN do with a computer; some things that look like 
 they should be easy, are actually major research questions.  Just keep 
 trying, and it will get easier over time.
 
 Good luck!
 Cem Karan
 
 Thank you Cem Karan for your reply. I will try and follow your advice. I am 
 yet to install python on my computer, do you know of any easy to follow 
 instructions on how to do so? Where do I start?

Python 3 installers are here: 
https://www.python.org/downloads/release/python-343/  Choose the one 
appropriate for your system, and follow the instructions.  If you have trouble, 
write to the list.

Good luck,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fixing Python install on the Mac after running 'CleanMyMac'

2015-05-29 Thread Cem Karan

On May 28, 2015, at 11:47 PM, Laura Creighton l...@openend.se wrote:

 webmas...@python.org just got some mail from some poor embarrased
 soul who ran this program and broke their Python install.
 
 They are running Mac OSX 10.7.5
 
 They are getting:
 
 Utility has encountered a fatal error, and will now terminate.  A
 Python runtime could not be located. You may need to install a
 framework build of Python or edit the PyRuntimeLocations array in this
 applications info.plist file.  Then there are two oblong circles. One
 says Open Console. The other says Terminate.
 
 So https://docs.python.org/2/using/mac.html says:
 
   The Apple-provided build of Python is installed in
   /System/Library/Frameworks/Python.framework and /usr/bin/python,
   respectively. You should never modify or delete these, as they are
   Apple-controlled and are used by Apple- or third-party software.
 
 So, I assume this poor soul has done precisely that.
 
 What do I tell her to do now?

Does she have a recent Time Machine backup that she can restore from?  
Otherwise the solutions are all fairly painful:

1) Install Python 2.7 from scratch (easy).  Then figure out where to put 
symlinks that point back to the install (mildly annoying/hard).  Note that 
Python 3 won't work; none of the built-in scripts expect it.

2) OS X recovery - 
http://www.macworld.co.uk/how-to/mac/how-reinstall-mac-os-x-using-internet-recovery-3593641/
 I've never had to do that, so I have no idea how easy/reliable it is.  I 
**think** its supposed to save all the data on the drive, but again, I've not 
done this, so I can't make any guarantees.

3) Wipe it clean and reinstall from scratch.

Honestly, I hope she has a time machine backup.  I've had to do recoveries a 
couple of times, and it can really save you.

Good luck,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best approach to create humongous amount of files

2015-05-21 Thread Cem Karan

On May 20, 2015, at 7:44 AM, Parul Mogra scoria@gmail.com wrote:

 Hello everyone,
 My objective is to create large amount of data files (say a million *.json 
 files), using a pre-existing template file (*.json). Each file would have a 
 unique name, possibly by incorporating time stamp information. The files have 
 to be generated in a folder specified.
 
 What is the best strategy to achieve this task, so that the files will be 
 generated in the shortest possible time? Say within an hour.

If you absolutely don't care about the name, then something like the following 
will work:

import uuid
for counter in range(100):
with open(uuid.uuid1().hex.upper() + .json, w) as f:
f.write(templateString)

where templateString is the template you want to write to each file.  The only 
problem is that the files won't be in any particular order; they'll just be 
uniquely named.  As a test, I ran the code above, but I killed the loop after 
about 10 minutes, at which point about 500,000 files were created.  Note that 
my laptop is about 6 years old, so you might get better performance on your 
machine.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To pickle or not to pickle

2015-05-08 Thread Cem Karan
What are you using pickle for?  If this is just for yourself, go for it.  If 
you're planning on interchanging with different languages/platforms/etc., JSON 
or XML might be better.  If you're after something that is smaller and faster, 
maybe MessagePack or Google Protocol Buffers.  If you're after something that 
can hold a planet's worth of data, maybe HDF5.  It really depends on your 
use-case.

MessagePack - http://en.wikipedia.org/wiki/MessagePack
Google Protocol Buffers - http://en.wikipedia.org/wiki/Protocol_Buffers
HDF5 - http://en.wikipedia.org/wiki/Hierarchical_Data_Format

Thanks,
Cem Karan

On May 8, 2015, at 5:58 AM, Cecil Westerhof ce...@decebal.nl wrote:

 I first used marshal in my filebasedMessages module. Then I read that
 you should not use it, because it changes per Python version and it
 was better to use pickle. So I did that and now I find:
https://wiki.python.org/moin/Pickle
 
 Is it really that bad and should I change again?
 
 -- 
 Cecil Westerhof
 Senior Software Engineer
 LinkedIn: http://www.linkedin.com/in/cecilwesterhof
 -- 
 https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-24 Thread Cem Karan

On Apr 23, 2015, at 11:05 AM, Steve Smaldone smald...@gmail.com wrote:

 On Thu, Apr 23, 2015 at 6:34 AM, Cem Karan cfkar...@gmail.com wrote:
 
 On Apr 23, 2015, at 1:59 AM, Steven D'Aprano 
 steve+comp.lang.pyt...@pearwood.info wrote:
 
  On Thursday 23 April 2015 11:53, Cem Karan wrote:
 
  Precisely.  In order to make my simulations more realistic, I use a lot of
  random numbers.  I can fake things by keeping the seed to the generator,
  but if I want to do any sort of hardware in the loop simulations, then
  that approach won't work.
 
  That's exactly why we have *pseudo* random number generators. They are
  statistically indistinguishable from real randomness, but repeatable when
  needed.
 
 Which is why is why I mentioned keeping the seed above.  The problem is that 
 I eventually want to do hardware in the loop, which will involve IO between 
 the simulation machine and the actual robots, and IO timing is imprecise and 
 uncontrollable.  That is where not recording something becomes lossy.  That 
 said, the mere act of trying to record everything is going to cause timing 
 issues, so I guess I'm over thinking things yet again.
 
 Thanks for the help everyone, its helped me clarify what I need to do in my 
 mind.
 
  
 Well, you could achieve this on Linux by using the rdiff library.  Not 
 exactly a purely Python solution, but it would give you file-based diffs.  
 
 Basically, what you could do is write the first file.  Then for each 
 subsequent saves, write out the file (as a temp file) and issue shell 
 commands (via the Python script) to calculate the diffs of the new file 
 against the first (basis) file.  Once you remove the temp files, you'd have a 
 full first save and a set of diffs against that file.  You could rehydrate 
 any save you want by applying the diff to the basis.
 
 If you work on it a bit, you might even be able to avoid the temp file saves 
 by using pipes in the shell command.
 
 Of course, I haven't tested this so there may be non-obvious issues with 
 diffing between subsequent pickled saves, but it seems that it should work on 
 the surface.

That might work... although I'm running on OS X right now, once I get to the 
hardware in the loop part, it's all going to be some flavor of Linux.  I'll 
look into it... thanks!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-23 Thread Cem Karan

On Apr 23, 2015, at 1:59 AM, Steven D'Aprano 
steve+comp.lang.pyt...@pearwood.info wrote:

 On Thursday 23 April 2015 11:53, Cem Karan wrote:
 
 Precisely.  In order to make my simulations more realistic, I use a lot of
 random numbers.  I can fake things by keeping the seed to the generator,
 but if I want to do any sort of hardware in the loop simulations, then
 that approach won't work.
 
 That's exactly why we have *pseudo* random number generators. They are 
 statistically indistinguishable from real randomness, but repeatable when 
 needed.

Which is why is why I mentioned keeping the seed above.  The problem is that I 
eventually want to do hardware in the loop, which will involve IO between the 
simulation machine and the actual robots, and IO timing is imprecise and 
uncontrollable.  That is where not recording something becomes lossy.  That 
said, the mere act of trying to record everything is going to cause timing 
issues, so I guess I'm over thinking things yet again.

Thanks for the help everyone, its helped me clarify what I need to do in my 
mind.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Cem Karan

On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:

 Cem Karan wrote:
 
 Hi all, I need some help.  I'm working on a simple event-based simulator
 for my dissertation research. The simulator has state information that I
 want to analyze as a post-simulation step, so I currently save (pickle)
 the entire simulator every time an event occurs; this lets me analyze the
 simulation at any moment in time, and ask questions that I haven't thought
 of yet.  The problem is that pickling this amount of data is both
 time-consuming and a space hog.  This is true even when using bz2.open()
 to create a compressed file on the fly.
 
 This leaves me with two choices; first, pick the data I want to save, and
 second, find a way of generating diffs between object graphs.  Since I
 don't yet know all the questions I want to ask, I don't want to throw away
 information prematurely, which is why I would prefer to avoid scenario 1.
 
 So that brings up possibility two; generating diffs between object graphs.
 I've searched around in the standard library and on pypi, but I haven't
 yet found a library that does what I want.  Does anyone know of something
 that does?
 
 Basically, I want something with the following ability:
 
 Object_graph_2 - Object_graph_1 = diff_2_1
 Object_graph_1 + diff_2_1 = Object_graph_2
 
 The object graphs are already pickleable, and the diffs must be, or this
 won't work.  I can use deepcopy to ensure the two object graphs are
 completely separate, so the diffing engine doesn't need to worry about
 that part.
 
 Anyone know of such a thing?
 
 A poor man's approach:
 
 Do not compress the pickled data, check it into version control. Getting the 
 n-th state then becomes checking out the n-th revision of the file.
 
 I have no idea how much space you save that way, but it's simple enough to 
 give it a try.

Sounds like a good approach, I'll give it a shot in the morning.

 Another slightly more involved idea:
 
 Make the events pickleable, and save the simulator only for every 100th (for 
 example) event. To restore the 7531th state load pickle 7500 and apply 
 events 7501 to 7531.

I was hoping to avoid doing this as I lose information.  BUT, its likely that 
this will be the best approach regardless of what other methods I use; there is 
just too much data.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Cem Karan

On Apr 22, 2015, at 9:56 PM, Dave Angel da...@davea.name wrote:

 On 04/22/2015 09:46 PM, Chris Angelico wrote:
 On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote:
 On 04/22/2015 09:30 PM, Cem Karan wrote:
 
 
 On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:
 
 Another slightly more involved idea:
 
 Make the events pickleable, and save the simulator only for every 100th
 (for
 example) event. To restore the 7531th state load pickle 7500 and apply
 events 7501 to 7531.
 
 
 I was hoping to avoid doing this as I lose information.  BUT, its likely
 that this will be the best approach regardless of what other methods I use;
 there is just too much data.
 
 
 Why would that lose any information???
 
 It loses information if event processing isn't perfectly deterministic.
 
 Quite right.  But I hadn't seen anything in this thread to imply that.

My apologies, that's my fault.  I should have mentioned that in the first place.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Cem Karan

On Apr 22, 2015, at 9:46 PM, Chris Angelico ros...@gmail.com wrote:

 On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote:
 On 04/22/2015 09:30 PM, Cem Karan wrote:
 
 
 On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:
 
 Another slightly more involved idea:
 
 Make the events pickleable, and save the simulator only for every 100th
 (for
 example) event. To restore the 7531th state load pickle 7500 and apply
 events 7501 to 7531.
 
 
 I was hoping to avoid doing this as I lose information.  BUT, its likely
 that this will be the best approach regardless of what other methods I use;
 there is just too much data.
 
 
 Why would that lose any information???
 
 It loses information if event processing isn't perfectly deterministic.

Precisely.  In order to make my simulations more realistic, I use a lot of 
random numbers.  I can fake things by keeping the seed to the generator, but if 
I want to do any sort of hardware in the loop simulations, then that approach 
won't work.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Diff between object graphs?

2015-04-22 Thread Cem Karan
Hi all, I need some help.  I'm working on a simple event-based simulator for my 
dissertation research. The simulator has state information that I want to 
analyze as a post-simulation step, so I currently save (pickle) the entire 
simulator every time an event occurs; this lets me analyze the simulation at 
any moment in time, and ask questions that I haven't thought of yet.  The 
problem is that pickling this amount of data is both time-consuming and a space 
hog.  This is true even when using bz2.open() to create a compressed file on 
the fly.  

This leaves me with two choices; first, pick the data I want to save, and 
second, find a way of generating diffs between object graphs.  Since I don't 
yet know all the questions I want to ask, I don't want to throw away 
information prematurely, which is why I would prefer to avoid scenario 1.  

So that brings up possibility two; generating diffs between object graphs.  
I've searched around in the standard library and on pypi, but I haven't yet 
found a library that does what I want.  Does anyone know of something that does?

Basically, I want something with the following ability:

Object_graph_2 - Object_graph_1 = diff_2_1
Object_graph_1 + diff_2_1 = Object_graph_2

The object graphs are already pickleable, and the diffs must be, or this won't 
work.  I can use deepcopy to ensure the two object graphs are completely 
separate, so the diffing engine doesn't need to worry about that part.

Anyone know of such a thing?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Good PDF parser/form filler?

2015-03-20 Thread Cem Karan
Hi all, I'm currently looking for a PDF parser/writer library so I can 
programmatically fill in some PDF forms.  I've found PDF2 
(https://pypi.python.org/pypi/PyPDF2/1.24), and report lab 
(https://pypi.python.org/pypi/reportlab), and I can see that there are a LOT 
more PDF frameworks out there on pypi, but I wanted to know what kinds of 
experiences others have had with them so I can choose a reasonably good one.  
Note that I'm not creating brand-new PDF files, but filling in ones I've 
already gotten.

My requirements:
- Must work with python 3.4
- Must work on OS X (only a real problem for extension classes, etc.)
- Ideally pure python with few dependencies.
- NOT shoveling data out to the internet!  MUST be wholly contained on my 
machine!

Thanks in advance for any help!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Installed Python 3 on Mac OS X Yosemite but its still Python 2.7

2015-03-09 Thread Cem Karan

On Mar 7, 2015, at 6:39 PM, James Dekker james.dek...@gmail.com wrote:

 I am currently running OS X Yosemite (10.10.2) on my MacBook Pro... By 
 default, Apple ships Python 2.7.6 on Yosemite.
 
 Just downloaded and ran this installer for Python 3:
 
 python-3.4.3-macosx10.6.pkg
 
 When I opened up my Terminal and typed in python, this is what came up:
 
 Python 2.7.6 (default, Sep  9 2014, 15:04:36)
  
 
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)]
  on darwin
 
 Type help, copyright, credits or license for more information.
 
 Sorry, I am very new to Python...
 
 Question(s):
 
 (1) Does anyone know where the Python 3.4.3 interpreter was installed?

It should be installed as either python3 or python3.4.  To figure out which, 
type 'python' in the terminal, and hit tab twice.  It should bring up a list of 
python interpreters you have installed.

 (2) Do I need to uninstall Python 2.7.3 (if so, how do I go about doing this) 
 before setting a global environmental variable such as PYTHON_HOME to the 
 location of the installed Python 3.4.3?

You don't need to uninstall python 2.7, and you shouldn't try.  I tried it as 
an experiment at one time, and my system had various mysterious failures after 
that. It may be that Yosemite fixes those failures, but I wouldn't bet on it.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-03-08 Thread Cem Karan
 the comments above, I've decided to do the following for 
my API:

- All callbacks will be strongly retained (no weakrefs).
- Callbacks will be stored in a list, and the list will be exposed as a 
read-only property of the library.  This will let users reorder callbacks as 
necessary, add them multiple times in a row, etc.  I'm also hoping that by 
making it a list, it becomes obvious that the callback is strongly retained.
- Finally, callbacks are not one-shots.  This just happens to make sense for my 
code, but others may find other methods make more sense.


Thanks again to everyone for providing so many comments on my question, and I 
apologize again for taking so long to wrap things up.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-03-02 Thread Cem Karan

On Feb 26, 2015, at 7:04 PM, Fabio Zadrozny fabi...@gmail.com wrote:

 
 On Wed, Feb 25, 2015 at 9:46 AM, Cem Karan cfkar...@gmail.com wrote:
 
 On Feb 24, 2015, at 8:23 AM, Fabio Zadrozny fabi...@gmail.com wrote:
 
  Hi Cem,
 
  I didn't read the whole long thread, but I thought I'd point you to what 
  I'm using in PyVmMonitor (http://www.pyvmmonitor.com/) -- which may already 
  cover your use-case.
 
  Take a look at the callback.py at 
  https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py
 
  And its related test (where you can see how to use it): 
  https://github.com/fabioz/pyvmmonitor-core/blob/master/_pyvmmonitor_core_tests/test_callback.py
   (note that it falls back to a strong reference on simple functions -- 
  i.e.: usually top-level methods or methods created inside a scope -- but 
  otherwise uses weak references).
 
 That looks like a better version of what I was thinking about originally.  
 However, various people on the list have convinced me to stick with strong 
 references everywhere.  I'm working out a possible API right now, once I have 
 some code that I can use to illustrate what I'm thinking to everyone, I'll 
 post it to the list.
 
 Thank you for showing me your code though, it is clever!
 
 Thanks,
 Cem Karan
 
 ​Hi Cem,
 
 Well, I decided to elaborate a bit on the use-case I have and how I use it 
 (on a higher level): 
 http://pydev.blogspot.com.br/2015/02/design-for-client-side-applications-in.html
 
 So, you can see if it may be worth for you or not (I agree that sometimes you 
 should keep strong references, but for my use-cases, weak references usually 
 work better -- with the only exception being closures, which is handled 
 different anyways but with the gotcha of having to manually unregister it).

As I mentioned in an earlier post, I've been quite busy at home, and expect to 
be for a few days to come, so I apologize both for being so late posting, and 
for not posting my own API plans.

Your blog post has given me quite a bit to think about, thank you!  Do you mind 
if I work up an API similar to yours?  I'm planning on using a different 
license (not LGPL), which is why I ask.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-03-02 Thread Cem Karan
On Feb 26, 2015, at 2:54 PM, Ian Kelly ian.g.ke...@gmail.com wrote:
 On Feb 26, 2015 4:00 AM, Cem Karan cfkar...@gmail.com wrote:
 
 
  On Feb 26, 2015, at 12:36 AM, Gregory Ewing greg.ew...@canterbury.ac.nz 
  wrote:
 
   Cem Karan wrote:
   I think I see what you're talking about now.  Does WeakMethod
   (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve
   this problem?
  
   Yes, that looks like it would work.
 
 
  Cool!
 
 Sometimes I wonder whether anybody reads my posts. I suggested a solution 
 involving WeakMethod four days ago that additionally extends the concept to 
 non-method callbacks (requiring a small amount of extra effort from the 
 client in those cases, but I think that is unavoidable. There is no way that 
 the framework can determine the appropriate lifetime for a closure-based 
 callback.)

I apologize about taking so long to reply to everyone's posts, but I've been 
busy at home.

Ian, it took me a while to do some research to understand WHY what you were 
suggesting was important; you're right about storing the object as well as the 
method/function separately, but I think that WeakMethod might solve that 
completely, correct?  Are there any cases where WeakMethod wouldn't work?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-03-02 Thread Cem Karan

On Feb 26, 2015, at 3:00 PM, Ethan Furman et...@stoneleaf.us wrote:

 On 02/26/2015 11:54 AM, Ian Kelly wrote:
 
 Sometimes I wonder whether anybody reads my posts.
 
 It's entirely possible the OP wasn't ready to understand your solution four 
 days ago, but two days later the OP was.

Thank you Ethan, that was precisely my problem.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-26 Thread Cem Karan

On Feb 26, 2015, at 12:36 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 Cem Karan wrote:
 I think I see what you're talking about now.  Does WeakMethod
 (https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve
 this problem?
 
 Yes, that looks like it would work.


Cool!  

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-25 Thread Cem Karan

On Feb 24, 2015, at 4:19 PM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 random...@fastmail.us wrote:
 On Tue, Feb 24, 2015, at 00:20, Gregory Ewing wrote:
 This is why I suggested registering a listener object
 plus a method name instead of a callback. It avoids that
 reference cycle, because there is no long-lived callback
 object keeping a reference to the listener.
 How does that help? Everywhere you would have had a reference to the
 callback object, you now have a reference to the listener object.
 
 The point is that the library can keep a weak reference
 to the listener object, whereas it can't reliably keep
 a weak reference to a bound method.

I think I see what you're talking about now.  Does WeakMethod 
(https://docs.python.org/3/library/weakref.html#weakref.WeakMethod) solve this 
problem?  Note that I can force my users to use the latest stable version of 
python at all times, so WeakMethod IS available to me.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-25 Thread Cem Karan

On Feb 24, 2015, at 8:23 AM, Fabio Zadrozny fabi...@gmail.com wrote:

 Hi Cem,
 
 I didn't read the whole long thread, but I thought I'd point you to what I'm 
 using in PyVmMonitor (http://www.pyvmmonitor.com/) -- which may already cover 
 your use-case.
 
 Take a look at the callback.py at 
 https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py
 
 And its related test (where you can see how to use it): 
 https://github.com/fabioz/pyvmmonitor-core/blob/master/_pyvmmonitor_core_tests/test_callback.py
  (note that it falls back to a strong reference on simple functions -- i.e.: 
 usually top-level methods or methods created inside a scope -- but otherwise 
 uses weak references).

That looks like a better version of what I was thinking about originally.  
However, various people on the list have convinced me to stick with strong 
references everywhere.  I'm working out a possible API right now, once I have 
some code that I can use to illustrate what I'm thinking to everyone, I'll post 
it to the list.

Thank you for showing me your code though, it is clever!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-24 Thread Cem Karan
I'm combining two messages into one, 

On Feb 24, 2015, at 12:29 AM, random...@fastmail.us wrote:

 On Tue, Feb 24, 2015, at 00:20, Gregory Ewing wrote:
 Cem Karan wrote:
 I tend to structure my code as a tree or DAG of objects.  The owner refers 
 to
 the owned object, but the owned object has no reference to its owner.  With
 callbacks, you get cycles, where the owned owns the owner.
 
 This is why I suggested registering a listener object
 plus a method name instead of a callback. It avoids that
 reference cycle, because there is no long-lived callback
 object keeping a reference to the listener.
 
 How does that help? Everywhere you would have had a reference to the
 callback object, you now have a reference to the listener object.
 You're just shuffling deck chairs around: if B shouldn't reference A
 because A owns B, then removing C from the B-C-A reference chain does
 nothing to fix this.

On Feb 24, 2015, at 12:45 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 Cem Karan wrote:
 On Feb 22, 2015, at 5:15 AM, Gregory Ewing greg.ew...@canterbury.ac.nz
 wrote:
 Perhaps instead of registering a callback function, you should be
 registering the listener object together with a method name.
 I see what you're saying, but I don't think it gains us too much.  If I store
 an object and an unbound method of the object, or if I store the bound method
 directly, I suspect it will yield approximately the same results.
 
 It would be weird and unpythonic to have to register both
 an object and an unbound method, and if you use a bound
 method you can't keep a weak reference to it.


Greg, random832 said what I was thinking earlier, that you've only increased 
the diameter of your cycle without actually fixing it.  Can you give a code 
example where your method breaks the cycle entirely?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-24 Thread Cem Karan

On Feb 23, 2015, at 7:29 AM, Frank Millman fr...@chagford.com wrote:

 
 Cem Karan cfkar...@gmail.com wrote in message 
 news:a3c11a70-5846-4915-bb26-b23793b65...@gmail.com...
 
 
 Good questions!  That was why I was asking about 'gotchas' with WeakSets 
 originally.  Honestly, the only way to know for sure would be to write two 
 APIs for doing similar things, and then see how people react to them.  The 
 problem is, how do you set up such a study so it is statistically valid?
 
 
 Just in case you missed Steven's comment on my 'gotcha', and my reply, it is 
 worth repeating that what I reported as a gotcha was not what it seemed.
 
 If you set up the callback as a weakref, and the listening object goes out 
 of scope, it will wait to be garbage collected. However, as far as I can 
 tell, the weakref is removed at the same time as the object is gc'd, so 
 there is no 'window' where the weakref exists but the object it is 
 referencing does not exist.
 
 My problem was that I had performed a cleanup operation on the listening 
 object before letting it go out of scope, and it was no longer in a valid 
 state to deal with the callback, resulting in an error. If you do not have 
 that situation, your original idea may well work.

Thank you Frank, I did read Steve's comment to your reply earlier, but what you 
said in your original reply made sense to me.  I don't have control over user 
code.  That means that if someone wants to write code such that they perform 
some kind of cleanup and are no longer able to handle the callback, they are 
free to do so.  While I can't prevent this from happening, I can make it as 
obvious as possible in my code that before you perform any cleanup, you also 
need to unregister from the library.  That is my main goal in developing 
pythonic/obvious methods of registering callbacks.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-23 Thread Cem Karan

On Feb 22, 2015, at 5:29 PM, Laura Creighton l...@openend.se wrote:

 In a message of Sun, 22 Feb 2015 17:09:01 -0500, Cem Karan writes:
 
 Documentation is a given; it MUST be there.  That said, documenting
 something, but still making it surprising, is a bad idea.  For
 example, several people have been strongly against using a WeakSet to
 hold callbacks because they expect a library to hold onto callbacks.
 If I chose not to do that, and used a WeakSet, then even if I
 documented it, it would still end up surprising people (and from the
 sound of it, more people would be surprised than not).
 
 Thanks, Cem Karan
 
 No matter what you do, alas, will surprise the hell out of people
 because callbacks do not behave as people expect.  Among people who
 have used callbacks, what you are polling is 'what are people
 familiar with', and it seems for the people around here, now,
 WeakSets are not what they are familiar with.

And that's fine.  I know that regardless of what I do, some people are going to 
be surprised.  I'm trying to develop APIs that reduce that surprise as far as 
possible.  That means I can spend more time coding and less time answering 
questions... :)

 But that is not so surprising.  How many people use WeakSets for
 _anything_?  I've never used them, aside from 'ooh! cool shiny
 new language feature!  Let's kick it around the park!'  That people
 aren't familiar with WeakSets doesn't mean all that much.

Actually, I use them when building caches of stuff, and I use weak references 
when I have trees of stuff so the child nodes know of, but don't hold onto, 
their parents.  But I agree with you, there aren't a huge number of use-cases.

 The question I have is does this architecture make things harder,
 easier or about the same to debug?  To write tests for? to do Test
 Driven Design with?

Good questions!  That was why I was asking about 'gotchas' with WeakSets 
originally.  Honestly, the only way to know for sure would be to write two APIs 
for doing similar things, and then see how people react to them.  The problem 
is, how do you set up such a study so it is statistically valid?

Cem
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 21, 2015, at 12:08 PM, Marko Rauhamaa ma...@pacujo.net wrote:

 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info:
 
 Other than that, I cannot see how calling a function which has *not*
 yet been garbage collected can fail, just because the only reference
 still existing is a weak reference.
 
 Maybe the logic of the receiving object isn't prepared for the callback
 anymore after an intervening event.
 
 The problem then, of course, is in the logic and not in the callbacks.

This was PRECISELY the situation I was thinking about.  My hope was to make the 
callback mechanism slightly less surprising by allowing the user to track them, 
releasing them when they aren't needed without having to figure out where the 
callbacks were registered.  However, it appears I'm making things more 
surprising rather than less.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 21, 2015, at 12:27 PM, Steven D'Aprano 
steve+comp.lang.pyt...@pearwood.info wrote:

 Cem Karan wrote:
 
 
 On Feb 21, 2015, at 8:15 AM, Chris Angelico ros...@gmail.com wrote:
 
 On Sun, Feb 22, 2015 at 12:13 AM, Cem Karan cfkar...@gmail.com wrote:
 OK, so it would violate the principle of least surprise for you. 
 Interesting.  Is this a general pattern in python?  That is, callbacks
 are owned by what they are registered with?
 
 In the end, I want to make a library that offers as few surprises to the
 user as possible, and no matter how I think about callbacks, they are
 surprising to me.  If callbacks are strongly-held, then calling 'del
 foo' on a callable object may not make it go away, which can lead to
 weird and nasty situations.
 
 How?
 
 The whole point of callbacks is that you hand over responsibility to another
 piece of code, and then forget about your callback. The library will call
 it, when and if necessary, and when the library no longer needs your
 callback, it is free to throw it away. (If I wish the callback to survive
 beyond the lifetime of your library's use of it, I have to keep a reference
 to the function.)

Marko mentioned it earlier; if you think you've gotten rid of all references to 
some chunk of code, and it is still alive afterwards, that can be surprising.

 Weakly-held callbacks mean that I (as the 
 programmer), know that objects will go away after the next garbage
 collection (see Frank's earlier message), so I don't get 'dead'
 callbacks coming back from the grave to haunt me.
 
 I'm afraid this makes no sense to me. Can you explain, or better still
 demonstrate, a scenario where dead callbacks rise from the grave, so to
 speak?


#! /usr/bin/env python

class Callback_object(object):
def __init__(self, msg):
self._msg = msg
def callback(self, stuff):
print(From {0!s}: {1!s}.format(self._msg, stuff))

class Fake_library(object):
def __init__(self):
self._callbacks = list()
def register_callback(self, callback):
self._callbacks.append(callback)
def execute_callbacks(self):
for thing in self._callbacks:
thing('Surprise!')

if __name__ == __main__:
foo = Callback_object(Evil Zombie)
lib = Fake_library()
lib.register_callback(foo.callback)

# Way later, after the user forgot all about the callback above
foo = Callback_object(Your Significant Other)
lib.register_callback(foo.callback)

# And finally getting around to running all those callbacks.
lib.execute_callbacks()


Output:
From Evil Zombie: Surprise!
From Your Significant Other: Surprise!

In this case, the user made an error (just as Marko said in his earlier 
message), and forgot about the callback he registered with the library.  The 
callback isn't really rising from the dead; as you say, either its been garbage 
collected, or it hasn't been.  However, you may not be ready for a callback to 
be called at that moment in time, which means you're surprised by unexpected 
behavior.

 So, what's the consensus on the list, strongly-held callbacks, or
 weakly-held ones?
 
 I don't know about Python specifically, but it's certainly a general
 pattern in other languages. They most definitely are owned, and it's
 the only model that makes sense when you use closures (which won't
 have any other references anywhere).
 
 I agree about closures; its the only way they could work.
 
 *scratches head* There's nothing special about closures. You can assign them
 to a name like any other object.
 
 def make_closure():
x = 23
def closure():
return x + 1
return closure
 
 func = make_closure()
 
 Now you can register func as a callback, and de-register it when your done:
 
 register(func)
 unregister(func)
 
 
 Of course, if you thrown away your reference to func, you have no (easy) way
 of de-registering it. That's no different to any other object which is
 registered by identity. (Registering functions by name is a bad idea, since
 multiple functions can have the same name.)
 
 As an alternative, your callback registration function might return a ticket
 for the function:
 
 ticket = register(func)
 del func
 unregister(ticket)
 
 but that strikes me as over-kill. And of course, the simplest ticket is to
 return the function itself :-)

Agreed on all points; closures are just ordinary objects.  The only difference 
(in my opinion) is that they are 'fire and forget'; if you are registering or 
tracking them then you've kind of defeated the purpose.  THAT is what I meant 
about how you handle closures.

 
 When I was 
 originally thinking about the library, I was trying to include all types
 of callbacks, including closures and callable objects.  The callable
 objects may pass themselves, or one of their methods to the library, or
 may do something really weird.
 
 I don't think they can do anything too weird. They have to pass a callable
 object. Your library just calls that object. You shouldn't

Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 22, 2015, at 7:12 AM, Marko Rauhamaa ma...@pacujo.net wrote:

 Cem Karan cfkar...@gmail.com:
 
 On Feb 21, 2015, at 11:03 AM, Marko Rauhamaa ma...@pacujo.net wrote:
 I use callbacks all the time but haven't had any problems with strong
 references.
 
 I am careful to move my objects to a zombie state after they're done so
 they can absorb any potential loose callbacks that are lingering in the
 system.
 
 So, if I were designing a library for you, you would be willing to have
 a 'zombie' attribute on your callback, correct? This would allow the
 library to query its callbacks to ensure that only 'live' callbacks are
 called. How would you handle closures?
 
 Sorry, don't understand the question.

You were saying that you move your objects into a zombie state.  I assumed that 
you meant you marked them in some manner (e.g., setting 'is_zombie' to True), 
so that anything that has a strong reference to the object knows the object is 
not supposed to be used anymore.  That way, regardless of where or how many 
times you've registered your object for callbacks, the library can do something 
like the following (banged out in my mail application, may have typos):


_CALLBACKS = []

def execute_callbacks():
global _CALLBACKS
_CALLBACKS = [x for x in _CALLBACKS if not x.is_zombie]
for x in _CALLBACKS:
x()


That will lazily unregister callbacks that are in the zombie state, which will 
eventually lead to their collection by the garbage collector.  It won't work 
for anything that you don't have a reference for (lambdas, etc.), but it should 
work in a lot of cases.

Is this what you meant?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 22, 2015, at 7:24 AM, Chris Angelico ros...@gmail.com wrote:

 On Sun, Feb 22, 2015 at 11:07 PM, Cem Karan cfkar...@gmail.com wrote:
 Correct. The GUI engine ultimately owns everything. Of course, this is
 a very simple case (imagine a little notification popup; you don't
 care about it, you don't need to know when it's been closed, the only
 event on it is hit Close to destroy the window), and most usage
 would have other complications, but it's not uncommon for me to build
 a GUI program that leaves everything owned by the GUI engine.
 Everything is done through callbacks. Destroy a window, clean up its
 callbacks. The main window will have an on-deletion callback that
 terminates the program, perhaps. It's pretty straight-forward.
 
 How do you handle returning information?  E.g., the user types in a number 
 and expects that to update the internal state of your code somewhere.
 
 Not sure what you mean by returning. If the user types in a number
 in a GUI widget, that would trigger some kind of on-change event, and
 either the new text would be a parameter to the callback function, or
 the callback could query the widget. In the latter case, I'd probably
 have the callback as a closure, and thus able to reference the object.

We're thinking of the same thing.  I try to structure what little GUI code I 
write using the MVP pattern 
(http://en.wikipedia.org/wiki/Model-view-presenter), so I have these hub and 
spoke patterns.  But you're right, if you have a partially evaluated callback 
that has the presenter as one of the parameters, that would do it for a GUI.  I 
was thinking more of a DAG of objects, but now that I think about it, callbacks 
wouldn't make sense in that case.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 22, 2015, at 7:46 AM, Marko Rauhamaa ma...@pacujo.net wrote:

 Cem Karan cfkar...@gmail.com:
 
 On Feb 21, 2015, at 12:08 PM, Marko Rauhamaa ma...@pacujo.net wrote:
 Maybe the logic of the receiving object isn't prepared for the callback
 anymore after an intervening event.
 
 The problem then, of course, is in the logic and not in the callbacks.
 
 This was PRECISELY the situation I was thinking about. My hope was to
 make the callback mechanism slightly less surprising by allowing the
 user to track them, releasing them when they aren't needed without
 having to figure out where the callbacks were registered. However, it
 appears I'm making things more surprising rather than less.
 
 When dealing with callbacks, my advice is to create your objects as
 explicit finite state machines. Don't try to encode the object state
 implicitly or indirectly. Rather, give each and every state a symbolic
 name and log the state transitions for troubleshooting.
 
 Your callbacks should then consider what to do in each state. There are
 different ways to express this in Python, but it always boils down to a
 state/transition matrix.
 
 Callbacks sometimes cannot be canceled after they have been committed to
 and have been shipped to the event pipeline. Then, the receiving object
 must brace itself for the impending spurious callback.

Nononono, I'm NOT encoding anything implicitly!  As Frank mentioned earlier, 
this is more of a pub/sub problem.  E.g., 'USB dongle has gotten plugged in', 
or 'key has been pressed'.  The user code needs to decide what to do next, the 
library code provides a nice, clean interface to some potentially weird 
hardware.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 22, 2015, at 5:15 AM, Gregory Ewing greg.ew...@canterbury.ac.nz wrote:

 Frank Millman wrote:
 In order to inform users that certain bits of state have changed, I require 
 them to register a callback with my code.
 This sounds to me like a pub/sub scenario. When a 'listener' object comes 
 into existence it is passed a reference to a 'controller' object that holds 
 state. It wants to be informed when the state changes, so it registers a 
 callback function with the controller.
 
 Perhaps instead of registering a callback function, you
 should be registering the listener object together with
 a method name.
 
 You can then keep a weak reference to the listener object,
 since if it is no longer referenced elsewhere, it presumably
 no longer needs to be notified of anything.

I see what you're saying, but I don't think it gains us too much.  If I store 
an object and an unbound method of the object, or if I store the bound method 
directly, I suspect it will yield approximately the same results.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 21, 2015, at 11:03 AM, Marko Rauhamaa ma...@pacujo.net wrote:

 Chris Angelico ros...@gmail.com:
 
 On Sat, Feb 21, 2015 at 1:44 PM, Cem Karan cfkar...@gmail.com wrote:
 
 In order to inform users that certain bits of state have changed, I
 require them to register a callback with my code. The problem is that
 when I store these callbacks, it naturally creates a strong reference
 to the objects, which means that if they are deleted without
 unregistering themselves first, my code will keep the callbacks
 alive. Since this could lead to really weird and nasty situations,
 [...]
 
 No, it's not. I would advise using strong references - if the callback
 is a closure, for instance, you need to hang onto it, because there
 are unlikely to be any other references to it. If I register a
 callback with you, I expect it to be called; I expect, in fact, that
 that *will* keep my object alive.
 
 I use callbacks all the time but haven't had any problems with strong
 references.
 
 I am careful to move my objects to a zombie state after they're done so
 they can absorb any potential loose callbacks that are lingering in the
 system.

So, if I were designing a library for you, you would be willing to have a 
'zombie' attribute on your callback, correct?  This would allow the library to 
query its callbacks to ensure that only 'live' callbacks are called.  How would 
you handle closures?  

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 21, 2015, at 3:57 PM, Grant Edwards invalid@invalid.invalid wrote:

 On 2015-02-21, Cem Karan cfkar...@gmail.com wrote:
 
 On Feb 21, 2015, at 12:42 AM, Chris Angelico ros...@gmail.com wrote:
 
 On Sat, Feb 21, 2015 at 1:44 PM, Cem Karan cfkar...@gmail.com wrote:
 In order to inform users that certain bits of state have changed, I 
 require them to register a callback with my code.  The problem is that 
 when I store these callbacks, it naturally creates a strong reference to 
 the objects, which means that if they are deleted without unregistering 
 themselves first, my code will keep the callbacks alive.  Since this could 
 lead to really weird and nasty situations, I would like to store all the 
 callbacks in a WeakSet 
 (https://docs.python.org/3/library/weakref.html#weakref.WeakSet).  That 
 way, my code isn't the reason why the objects are kept alive, and if they 
 are no longer alive, they are automatically removed from the WeakSet, 
 preventing me from accidentally calling them when they are dead.  My 
 question is simple; is this a good design?  If not, why not?  Are there 
 any potential 'gotchas' I should be worried about?
 
 
 No, it's not. I would advise using strong references - if the callback
 is a closure, for instance, you need to hang onto it, because there
 are unlikely to be any other references to it. If I register a
 callback with you, I expect it to be called; I expect, in fact, that
 that *will* keep my object alive.
 
 OK, so it would violate the principle of least surprise for you.
 
 And me as well.  I would expect to be able to pass a closure as a
 callback and not have to keep a reference to it.  Perhaps that just a
 leftover from working with other languages (javascript, scheme, etc.).
 It doesn't matter if it's a string, a float, a callback, a graphic or
 whatever: if I pass your function/library an object, I expect _you_ to
 keep track of it until you're done with it.
 
 Interesting.  Is this a general pattern in python?  That is,
 callbacks are owned by what they are registered with?
 
 I'm not sure what you mean by owned or why it matters that it's a
 callback: it's an object that was passed to you: you need to hold onto
 a reference to it until you're done with it, and the polite thing to
 do is to delete references to it when you're done with it.

I tend to structure my code as a tree or DAG of objects.  The owner refers to 
the owned object, but the owned object has no reference to its owner.  With 
callbacks, you get cycles, where the owned owns the owner.  As a result, if you 
forget where your object has been registered, it may be kept alive when you 
aren't expecting it.  My hope was that with WeakSets I could continue to 
preserve the DAG or tree while still having the benefits of callbacks.  
However, it looks like that is too surprising to most people.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 21, 2015, at 10:55 AM, Chris Angelico ros...@gmail.com wrote:

 On Sun, Feb 22, 2015 at 2:45 AM, Cem Karan cfkar...@gmail.com wrote:
 OK, so if I'm reading your code correctly, you're breaking the cycle in your 
 object graph by making the GUI the owner of the callback, correct?  No other 
 chunk of code has a reference to the callback, correct?
 
 Correct. The GUI engine ultimately owns everything. Of course, this is
 a very simple case (imagine a little notification popup; you don't
 care about it, you don't need to know when it's been closed, the only
 event on it is hit Close to destroy the window), and most usage
 would have other complications, but it's not uncommon for me to build
 a GUI program that leaves everything owned by the GUI engine.
 Everything is done through callbacks. Destroy a window, clean up its
 callbacks. The main window will have an on-deletion callback that
 terminates the program, perhaps. It's pretty straight-forward.

How do you handle returning information?  E.g., the user types in a number and 
expects that to update the internal state of your code somewhere.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 22, 2015, at 7:52 AM, Laura Creighton l...@openend.se wrote:

 In a message of Sun, 22 Feb 2015 07:16:14 -0500, Cem Karan writes:
 
 This was PRECISELY the situation I was thinking about.  My hope was
 to make the callback mechanism slightly less surprising by allowing
 the user to track them, releasing them when they aren't needed
 without having to figure out where the callbacks were registered.
 However, it appears I'm making things more surprising rather than
 less.
 
 You may be able to accomplish your goal by using a Queue with a
 producer/consumer model.
 see: 
 http://stackoverflow.com/questions/9968592/turn-functions-with-a-callback-into-python-generators
 
 especially the bottom of that.
 
 I haven't run the code, but it looks mostly reasonable, except that
 you do not want to rely on the Queue maxsize being 1 here, and
 indeed, I almost always want a bigger Queue  in any case.  Use
 Queue.task_done if blocking the producer features in your design.
 
 The problem that you are up against is that callbacks are inherantly
 confusing, even to programmers who are learning about them for the
 first time.  They don't fit people's internal model of 'how code works'.
 There isn't a whole lot one can do about that except to
 try to make the magic do as little as possible, so that more of the
 code works 'the way people expect'.

I think what you're suggesting is that library users register a Queue instead 
of a callback, correct?  The problem is that I'll then have a strong reference 
to the Queue, which means I'll be pumping events into it after the user code 
has gone away.  I was hoping to solve the problem of forgotten registrations in 
the library.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 22, 2015, at 4:34 PM, Marko Rauhamaa ma...@pacujo.net wrote:

 Cem Karan cfkar...@gmail.com:
 
 My goal is to make things as pythonic (whatever that means in this
 case) and obvious as possible. Ideally, a novice can more or less
 guess what will happen with my API without really having to read the
 documentation on it.
 
 If you try to shield your user from the complexities of asynchronous
 programming, you will only cause confusion. You will definitely need to
 document all nooks and crannies of the semantics of the callback API and
 your user will have to pay attention to every detail of your spec.
 
 Your user, whether novice or an expert, will thank you for your
 unambiguous specification even if it is complicated.

Documentation is a given; it MUST be there.  That said, documenting something, 
but still making it surprising, is a bad idea.  For example, several people 
have been strongly against using a WeakSet to hold callbacks because they 
expect a library to hold onto callbacks.  If I chose not to do that, and used a 
WeakSet, then even if I documented it, it would still end up surprising people 
(and from the sound of it, more people would be surprised than not).

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-22 Thread Cem Karan

On Feb 22, 2015, at 4:02 PM, Ethan Furman et...@stoneleaf.us wrote:

 On 02/22/2015 05:13 AM, Cem Karan wrote:
 
 Output:
 From Evil Zombie: Surprise!
 From Your Significant Other: Surprise!
 
 In this case, the user made an error (just as Marko said in his earlier 
 message),
 and forgot about the callback he registered with the library.  The callback 
 isn't
 really rising from the dead; as you say, either its been garbage collected, 
 or it
 hasn't been.  However, you may not be ready for a callback to be called at 
 that
 moment in time, which means you're surprised by unexpected behavior.
 
 But the unexpected behavior is not a problem with Python, nor with your 
 library -- it's a bug in the fellow-programmer's
 code, and you can't (or at least shouldn't) try to prevent those kinds of 
 bugs from manifesting -- they'll just get
 bitten somewhere else by the same bug.

I agree with you, but until a relatively new programmer has gotten used to what 
callbacks are and what they imply, I want to make things easy.  For example, if 
the API subclasses collections.abc.MutableSet, and the documentation states 
that you can only add callbacks to this particular type of set, then a new 
programmer will naturally decide that either a) they need to dispose of the 
set, and if that isn't possible, then b) they need to delete their callback 
from the set.  It won't occur to them that their live object will just 
magically 'go away'; its a member of a set!

My goal is to make things as pythonic (whatever that means in this case) and 
obvious as possible.  Ideally, a novice can more or less guess what will happen 
with my API without really having to read the documentation on it.  

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-21 Thread Cem Karan

On Feb 21, 2015, at 9:36 AM, Chris Angelico ros...@gmail.com wrote:

 On Sun, Feb 22, 2015 at 1:07 AM, Cem Karan cfkar...@gmail.com wrote:
 I agree about closures; its the only way they could work.  When I was 
 originally thinking about the library, I was trying to include all types of 
 callbacks, including closures and callable objects.  The callable objects 
 may pass themselves, or one of their methods to the library, or may do 
 something really weird.
 
 Although I just realized that closures may cause another problem.  In my 
 code, I expect that many different callbacks can be registered for the same 
 event.  Unregistering means you request to be unregistered for the event. 
 How do you do that with a closure?  Aren't they anonymous?
 
 
 They're objects, same as any other, so the caller can hang onto a
 reference and then say now remove this one. Simple example:
 
 callbacks = []
 def register_callback(f): callbacks.append(f)
 def unregister_callback(f): callbacks.remove(f)
 def do_callbacks():
for f in callbacks:
f()
 
 def make_callback(i):
def inner():
print(Callback! %d%i)
register_callback(inner)
return inner
 
 make_callback(5)
 remove_me = make_callback(6)
 make_callback(7)
 unregister_callback(remove_me)
 do_callbacks()

Yeah, that's pretty much what I thought you'd have to do, which kind of defeats 
the purpose of closures (fire-and-forget things).  BUT it does answer my 
question, so no complaints about it!

So, either you keep a reference to your own closure, which means that the 
library doesn't really need to, or the library keeps hold of it for you, in 
which case you don't have a reasonable way of removing it.

 The other option is for your callback registration to return some kind
 of identifier, which can later be used to unregister the callback.
 This is a good way of avoiding reference cycles (the ID could be a
 simple integer - maybe the length of the list prior to the new
 callback being appended, and then the unregistration process is simply
 callbacks[id] = None, and you skip the Nones when iterating), and
 even allows you to register the exact same function more than once,
 for what that's worth.

That would work.  In the cases where someone might register  unregister many 
callbacks, you might use UUIDs as keys instead (avoids the ABA problem).

 When I do GUI programming, this is usually how things work. For
 instance, I use GTK2 (though usually with Pike rather than Python),
 and I can connect a signal to a callback function. Any given signal
 could have multiple callbacks attached to it, so it's similar to your
 case. I frequently depend on the GTK engine retaining a reference to
 my function (and thus to any data it requires), as I tend not to hang
 onto any inner objects that don't need retention. Once the parent
 object is destroyed, all its callbacks get dereferenced. Consider this
 simplified form:
 
 def popup_window():
w = Window()
# Add layout, info, whatever it takes
btn = Button(Close)
w.add(btn) # actually it'd be added to a layout
btn.signal_connect(clicked, lambda *args: w.destroy())
 
 The GUI back end will hang onto a reference to the window, because
 it's currently on screen; to the button, because it's attached to the
 window; and to my function, because it's connected to a button signal.
 Then when you click the button, the window gets destroyed, which
 destroys the button, which unregisters all its callbacks. At that
 point, there are no refs to the function, so it can get disposed of.
 That button function was the last external reference to the window,
 and now that it's not on screen, its Python object can also be
 disposed of, as can the button inside. So it'll all clean up fairly
 nicely; as long as the callback gets explicitly deregistered, that's
 the end of everything.

OK, so if I'm reading your code correctly, you're breaking the cycle in your 
object graph by making the GUI the owner of the callback, correct?  No other 
chunk of code has a reference to the callback, correct?  

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-21 Thread Cem Karan

On Feb 21, 2015, at 12:42 AM, Chris Angelico ros...@gmail.com wrote:

 On Sat, Feb 21, 2015 at 1:44 PM, Cem Karan cfkar...@gmail.com wrote:
 In order to inform users that certain bits of state have changed, I require 
 them to register a callback with my code.  The problem is that when I store 
 these callbacks, it naturally creates a strong reference to the objects, 
 which means that if they are deleted without unregistering themselves first, 
 my code will keep the callbacks alive.  Since this could lead to really 
 weird and nasty situations, I would like to store all the callbacks in a 
 WeakSet (https://docs.python.org/3/library/weakref.html#weakref.WeakSet).  
 That way, my code isn't the reason why the objects are kept alive, and if 
 they are no longer alive, they are automatically removed from the WeakSet, 
 preventing me from accidentally calling them when they are dead.  My 
 question is simple; is this a good design?  If not, why not?  Are there any 
 potential 'gotchas' I should be worried about?
 
 
 No, it's not. I would advise using strong references - if the callback
 is a closure, for instance, you need to hang onto it, because there
 are unlikely to be any other references to it. If I register a
 callback with you, I expect it to be called; I expect, in fact, that
 that *will* keep my object alive.

OK, so it would violate the principle of least surprise for you.  Interesting.  
Is this a general pattern in python?  That is, callbacks are owned by what they 
are registered with?

In the end, I want to make a library that offers as few surprises to the user 
as possible, and no matter how I think about callbacks, they are surprising to 
me.  If callbacks are strongly-held, then calling 'del foo' on a callable 
object may not make it go away, which can lead to weird and nasty situations.  
Weakly-held callbacks mean that I (as the programmer), know that objects will 
go away after the next garbage collection (see Frank's earlier message), so I 
don't get 'dead' callbacks coming back from the grave to haunt me.

So, what's the consensus on the list, strongly-held callbacks, or weakly-held 
ones?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-21 Thread Cem Karan

On Feb 21, 2015, at 8:15 AM, Chris Angelico ros...@gmail.com wrote:

 On Sun, Feb 22, 2015 at 12:13 AM, Cem Karan cfkar...@gmail.com wrote:
 OK, so it would violate the principle of least surprise for you.  
 Interesting.  Is this a general pattern in python?  That is, callbacks are 
 owned by what they are registered with?
 
 In the end, I want to make a library that offers as few surprises to the 
 user as possible, and no matter how I think about callbacks, they are 
 surprising to me.  If callbacks are strongly-held, then calling 'del foo' on 
 a callable object may not make it go away, which can lead to weird and nasty 
 situations.  Weakly-held callbacks mean that I (as the programmer), know 
 that objects will go away after the next garbage collection (see Frank's 
 earlier message), so I don't get 'dead' callbacks coming back from the grave 
 to haunt me.
 
 So, what's the consensus on the list, strongly-held callbacks, or 
 weakly-held ones?
 
 I don't know about Python specifically, but it's certainly a general
 pattern in other languages. They most definitely are owned, and it's
 the only model that makes sense when you use closures (which won't
 have any other references anywhere).

I agree about closures; its the only way they could work.  When I was 
originally thinking about the library, I was trying to include all types of 
callbacks, including closures and callable objects.  The callable objects may 
pass themselves, or one of their methods to the library, or may do something 
really weird.  

Although I just realized that closures may cause another problem.  In my code, 
I expect that many different callbacks can be registered for the same event.  
Unregistering means you request to be unregistered for the event. How do you do 
that with a closure?  Aren't they anonymous?

 If you're expecting 'del foo' to destroy the object, then you have a
 bigger problem than callbacks, because that's simply not how Python
 works. You can't _ever_ assume that deleting something from your local
 namespace will destroy the object, because there can always be more
 references. So maybe you need a more clear way of saying I'm done
 with this, get rid of it.

Agreed about 'del', and I don't assume that the object goes away at the point.  
The problem is debugging and determining WHY your object is still around.  I 
know a combination of logging and gc.get_referrers() will probably help you 
figure out why something is still around, but I'm trying to avoid that 
headache.  

I guess the real problem is how this creates cycles in the call graph.  User 
code effectively owns the library code, which via callbacks owns the user code. 
 I have no idea what the best point the cycle is to break it, and not surprise 
someone down the road.  The only idea I have is to redesign the library a 
little, and make anything that accepts a callback actually be a subclass of 
collections.abc.Container, or even collections.abc.MutableSet.  That makes it 
very obvious that the object owns the callback, and that you will need to 
remove your object to unregister it.  The only problem is how to handle 
closures; since they are anonymous, how do you decide which one to remove?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-21 Thread Cem Karan

On Feb 21, 2015, at 12:41 AM, Frank Millman fr...@chagford.com wrote:

 
 Cem Karan cfkar...@gmail.com wrote in message 
 news:33677ae8-b2fa-49f9-9304-c8d937842...@gmail.com...
 Hi all, I'm working on a project that will involve the use of callbacks, 
 and I want to bounce an idea I had off of everyone to make sure I'm not 
 developing a bad idea.  Note that this is for python 3.4 code; I don't 
 need to worry about any version of python earlier than that.
 
 In order to inform users that certain bits of state have changed, I 
 require them to register a callback with my code.  The problem is that 
 when I store these callbacks, it naturally creates a strong reference to 
 the objects, which means that if they are deleted without unregistering 
 themselves first, my code will keep the callbacks alive.  Since this could 
 lead to really weird and nasty situations, I would like to store all the 
 callbacks in a WeakSet 
 (https://docs.python.org/3/library/weakref.html#weakref.WeakSet).  That 
 way, my code isn't the reason why the objects are kept alive, and if they 
 are no longer alive, they are automatically removed from the WeakSet, 
 preventing me from accidentally calling them when they are dead.  My 
 question is simple; is this a good design?  If not, why not?
  Are there any potential 'gotchas' I should be worried about?
 
 
 I tried something similar a while ago, and I did find a gotcha.
 
 The problem lies in this phrase - if they are no longer alive, they are 
 automatically removed from the WeakSet, preventing me from accidentally 
 calling them when they are dead.
 
 I found that the reference was not removed immediately, but was waiting to 
 be garbage collected. During that window, I could call the callback, which 
 resulted in an error.
 
 There may have been a simple workaround. Perhaps someone else can comment.

THAT would be one heck of a gotcha!  Must have been fun debugging that one!

Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Design thought for callbacks

2015-02-21 Thread Cem Karan

On Feb 21, 2015, at 8:37 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:

 On 21/02/2015 05:41, Frank Millman wrote:
 
 Cem Karan cfkar...@gmail.com wrote in message
 news:33677ae8-b2fa-49f9-9304-c8d937842...@gmail.com...
 Hi all, I'm working on a project that will involve the use of callbacks,
 and I want to bounce an idea I had off of everyone to make sure I'm not
 developing a bad idea.  Note that this is for python 3.4 code; I don't
 need to worry about any version of python earlier than that.
 
 In order to inform users that certain bits of state have changed, I
 require them to register a callback with my code.  The problem is that
 when I store these callbacks, it naturally creates a strong reference to
 the objects, which means that if they are deleted without unregistering
 themselves first, my code will keep the callbacks alive.  Since this could
 lead to really weird and nasty situations, I would like to store all the
 callbacks in a WeakSet
 (https://docs.python.org/3/library/weakref.html#weakref.WeakSet).  That
 way, my code isn't the reason why the objects are kept alive, and if they
 are no longer alive, they are automatically removed from the WeakSet,
 preventing me from accidentally calling them when they are dead.  My
 question is simple; is this a good design?  If not, why not?
   Are there any potential 'gotchas' I should be worried about?
 
 
 I tried something similar a while ago, and I did find a gotcha.
 
 The problem lies in this phrase - if they are no longer alive, they are
 automatically removed from the WeakSet, preventing me from accidentally
 calling them when they are dead.
 
 I found that the reference was not removed immediately, but was waiting to
 be garbage collected. During that window, I could call the callback, which
 resulted in an error.
 
 There may have been a simple workaround. Perhaps someone else can comment.
 
 Frank Millman
 
 
 https://docs.python.org/3/library/gc.html has a collect function.  That seems 
 like a simple workaround, but whether or not it classifies as a good solution 
 I'll leave to others, I'm not qualified to say.


Unfortunately, depending on how many objects you have in your object graph, it 
can slow your code down a fair amount.  I think Frank is right about how a 
WeakSet might be a bad idea in this case.  You really need to know if an object 
is alive or dead, and not some indeterminate state.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Design thought for callbacks

2015-02-20 Thread Cem Karan
Hi all, I'm working on a project that will involve the use of callbacks, and I 
want to bounce an idea I had off of everyone to make sure I'm not developing a 
bad idea.  Note that this is for python 3.4 code; I don't need to worry about 
any version of python earlier than that.

In order to inform users that certain bits of state have changed, I require 
them to register a callback with my code.  The problem is that when I store 
these callbacks, it naturally creates a strong reference to the objects, which 
means that if they are deleted without unregistering themselves first, my code 
will keep the callbacks alive.  Since this could lead to really weird and nasty 
situations, I would like to store all the callbacks in a WeakSet 
(https://docs.python.org/3/library/weakref.html#weakref.WeakSet).  That way, my 
code isn't the reason why the objects are kept alive, and if they are no longer 
alive, they are automatically removed from the WeakSet, preventing me from 
accidentally calling them when they are dead.  My question is simple; is this a 
good design?  If not, why not?  Are there any potential 'gotchas' I should be 
worried about?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ANN: unpyc3 - a python bytecode decompiler for Python3

2015-01-29 Thread Cem Karan

On Jan 28, 2015, at 5:02 PM, Chris Angelico ros...@gmail.com wrote:

 On Thu, Jan 29, 2015 at 8:52 AM, Devin Jeanpierre
 jeanpierr...@gmail.com wrote:
 Git doesn't help if you lose your files in between commits, or if you
 lose the entire directory between pushes.
 
 So you commit often and push immediately. Solved.
 
 ChrisA

Just to expand on what Chris is saying, learn to use branches.  I use git flow 
([1][2]), but you don't need it, plain old branches are fine.  Then you can 
have a feature branch like 'Joes_current', or something similar which you and 
only you push/pull from.  Whenever you're done with it, you can merge the 
changes back into whatever you  your group see as the real branch.  That is 
the model I use at work, and it works fairly well, and its saved me once 
already when the laptop I was working on decided to die on me.

Thanks,
Cem Karan

[1] http://nvie.com/posts/a-successful-git-branching-model/
[2] https://github.com/nvie/gitflow
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-29 Thread Cem Karan

On Dec 29, 2014, at 2:47 AM, Rick Johnson rantingrickjohn...@gmail.com wrote:

 On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote:
 I need to search through a directory of text files for a string.
 Here is a short program I made in the past to search through a single
 text file for a line of text.
 
 Step1: Search through a single file. 
 # Just a few more brush strokes...
 
 Step2: Search through all files in a directory. 
 # Time to go exploring! 
 
 Step3: Option to filter by file extension. 
 # Waste not, want not!
 
 Step4: Option for recursing down sub-directories. 
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 [Opps, fell into a recursive black hole!]
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 [BREAK]
 # Whew, no worries, MaximumRecursionError is my best friend! 
 
 ;-)
 
 In addition to the other advice, you might want to check out os.walk()

DEFINITELY use os.walk() if you're going to recurse through a directory tree.  
Here is an untested program I wrote that should do what you want.  Modify as 
needed:


# This is all Python 3 code, although I believe it will run under Python 2
# as well.  

# os.path is documented at https://docs.python.org/3/library/os.path.html
# os.walk is documented at https://docs.python.org/3/library/os.html#os.walk
# losging is documented at https://docs.python.org/3/library/logging.html

import os
import os.path
import logging

# Logging messages can be filtered by level.  If you set the level really
# low, then low-level messages, and all higher-level messages, will be
# logged.  However, if you set the filtering level higher, then low-level
# messages will not be logged.  Debug messages are lower than info messages,
# so if you comment out the first line, and uncomment the second, you will
# only get info messages (right now you're getting both).  If you look
# through the code, you'll see that I go up in levels as I work my way 
# inward through the filters; this makes debugging really, really easy.
# I'll start out with my level high, and if my code works, I'm done. 
# However, if there is a bug, I'll work my downwards towards lower and
# lower debug levels, which gives me more and more information.  Eventually
# I'll hit a level where I know enough about what is going on that I can 
# fix the problem.  By the way, if you comment out both lines, you shouldn't
# have any logging at all.
logging.basicConfig(level=logging.DEBUG)
##logging.basicConfig(level=logging.INFO)

EXTENSIONS = {.txt}

def do_something_useful(real_path):
# I deleted the original message, so I have no idea 
# what you were trying to accomplish, so I'm punting 
# the definition of this function back to you.
pass

for root, dirs, files in os.walk('/'):
for f in files:
# This expands symbolic links, cleans up double slashes, etc.
# This can be useful when you're trying to debug why something
# isn't working via logging.
real_path = os.path.realpath(os.path.join(root, f))
logging.debug(operating on path '{0!s}'.format(real_path))
(r, e) = os.path.splitext(real_path)
if e in EXTENSIONS:
# If we've made a mistake in our EXTENSIONS set, we might never
# reach this point.  
logging.info(Selected path '{0!s}'.format(real_path))
do_something_useful(real_path)


As a note, for the sake of speed and your own sanity, you probably want to do 
the easiest/computationally cheapest filtering first here.  That means 
selecting the files that match your extensions first, and then filtering those 
files by their contents second.

Finally, if you are planning on parsing command-line options, DON'T do it by 
hand!  Use argparse (https://docs.python.org/3/library/argparse.html) instead.

Thanks,
Cem Karan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: resource based job queue manager

2014-12-20 Thread Cem Karan

On Dec 19, 2014, at 11:53 AM, Parthiban Ramachandran rparthib...@gmail.com 
wrote:

 can someone suggest a resource based job queue manager. for eg i have 3 
 resources and 10 jobs based on the resource busy/free we should start running 
 the jobs. I can write the code but want to know if there is any established 
 scheduler which can run the jobs from different servers too.

Try SCOOP:
https://code.google.com/p/scoop/

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bug? Feature? setattr(foo, '3', 4) works!

2014-12-20 Thread Cem Karan

On Dec 19, 2014, at 10:33 AM, random...@fastmail.us wrote:

 On Fri, Dec 19, 2014, at 07:23, Ben Finney wrote:
 Cem Karan cfkar...@gmail.com writes:
 I'd like to suggest that getattr(), setattr(), and hasattr() all be
 modified so that syntactically invalid statements raise SyntaxErrors.
 
 What syntactically invalid statements? The only syntactically invalid
 statements I see you presenting are ones that *already* raise
 SyntaxError.
 
 I think you mean that setting an attribute on an object should be a
 SyntaxError if the resulting attribute's name is not a valid identifier.
 But why should a valid statement produce SyntaxError?
 
 I'm −1 on such a change.
 
 And some APIs - ctypes, for example - actually require using getattr
 with an invalid identifier in some cases (where attribute access is used
 for an underlying concept with names that are usually, but not always,
 valid identifiers: in ctypes' case, looking up symbols from DLLs.)

This is the one part I didn't know of; if ctypes requires this behavior, then 
it can't be changed.

Dave Angel, the reason I wanted to raise a SyntaxError is because from a user's 
point of view they look like the same type of error.  That said, you're right 
that for anyone trying to debug the interpreter itself raising SyntaxError 
would make things confusing. 

Regardless, because ctypes requires it, it can't be changed.  I'm dropping the 
suggestion.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Bug? Feature? setattr(foo, '3', 4) works!

2014-12-19 Thread Cem Karan
I'm bringing this discussion over from the python-ideas mailing list to see 
what people think. I accidentally discovered that the following works, at least 
in Python 3.4.2:

 class foo(object):
... pass
... 
 setattr(foo, '3', 4)
 dir(foo)
['3', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', 
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', 
'__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', 
'__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', 
'__subclasshook__', '__weakref__']
 getattr(foo, '3')
4
 bar = foo()
 dir(bar)
['3', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', 
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', 
'__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', 
'__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', 
'__subclasshook__', '__weakref__']
 getattr(bar, '3')
4
 hasattr(foo, '3')
True
 hasattr(bar, '3')
True

However, the following doesn't work:

 foo.3
 File stdin, line 1
   foo.3
   ^
SyntaxError: invalid syntax
 bar.3
 File stdin, line 1
   bar.3
   ^
SyntaxError: invalid syntax

I'd like to suggest that getattr(), setattr(), and hasattr() all be modified so 
that syntactically invalid statements raise SyntaxErrors. In messages on 
python-ideas, Nick Coghlan mentioned that since a Namespace is just a 
dictionary, the normal error raised would be TypeError and not SyntaxError; I'd 
like to suggest special-casing this so that using getattr(), setattr(), and 
hasattr() in this way raise SyntaxError instead as I think that will be less 
astonishing.  

Thoughts?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue4830] regrtest.py -u largefile test_io fails on OS X 10.5.6

2009-01-04 Thread Cem Karan

New submission from Cem Karan cfkaran2+pyt...@gmail.com:

I'm running OS X 10.5.6 (uname -a == Darwin 9.6.0 Darwin Kernel Version 9.6.0: 
Mon Nov 
24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386) I get the 
following 
error after compiling Python 3.0.  Note that I have NOT installed it; I'm just 
trying 
to run the regression tests on the build.

Python-3.0 cfkaran2$ ./Lib/test/regrtest.py -u largefile test_io
  File ./Lib/test/regrtest.py, line 183
print(msg, file=sys.stderr)
   ^
SyntaxError: invalid syntax

I suspect that tester is not using the newly built python 3.0, but is using 
whatever 
is installed on the system, though I have not checked this at all.

--
components: Tests
messages: 79044
nosy: ironsmith
severity: normal
status: open
title: regrtest.py -u largefile test_io fails on OS X 10.5.6
type: crash
versions: Python 3.0

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4830
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com