Feature Requests item #1275677, was opened at 2005-08-29 14:49 Message generated for change (Comment added) made by mwh You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1275677&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: None Status: Open Resolution: None Priority: 5 Submitted By: Antoine Pitrou (pitrou) Assigned to: Raymond Hettinger (rhettinger) Summary: add a get() method to sets Initial Comment: Hi, I would like to propose a new method for the builtin set objects. Currently we have a pop() method which pops an element from the set. What I often need, though, is a method that gets an arbitrary element without removing it (because adding / removing stuff is dealt with in another part of the program). Right now the simplest way to do that is : value = iter(my_set).next() There are two problems with this: 1. it's ugly and not very intuitive 2. it is not atomic; this means if another thread updates the set, I can get a "RuntimeError: dictionary changed size during iteration" (btw, the message is slightly wrong, it should be "set" instead of "dictionary") Although the first problem is rather minor (but annoying nevertheless), the second one is a real showstopper in some cases - yes, I did encounter it for real. There is a way to avoid the second problem : value = list(my_set)[0] But of course, not only it is still ugly, but it is also highly inefficient when the set is big. So in the end I am forced to use an explicit lock around the aforementioned construct (value = iter(my_set).next()) as well as around any other piece of code that can update the set from another thread. This is tedious and error-prone, and probably a bit inefficient. So for the bottom line: it would be in some cases very useful to have an atomic get() method in addition to the pop() method on sets. (it could probably be applied to frozensets and dicts too) The implementation would probably not be very difficult, since it's the same as pop() with the removal feature removed. ;) But I'm not familiar with the Python internals. What do you think ? Regards Antoine. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2005-08-30 08:03 Message: Logged In: YES user_id=6656 Mmm, the "v, = s" approach hadn't occurred to me and it seems ok. The others are all rather horribly indirect... (and, obviously, it's not about "need"). ---------------------------------------------------------------------- Comment By: Antoine Pitrou (pitrou) Date: 2005-08-30 00:09 Message: Logged In: YES user_id=133955 Hi again, > Antoine, yes a pop()/add() combination is efficient. Ok, thanks. > IMO, > it also clear in intent, easy to write, flexible enough for > a variety of applications, the pop() is atomic, Small correction: the pop() is atomic, but the pop/add sequence is not, AFAIU ;) > Question for Antoine: Have you ever had this need with > other containers? I think I had it for a dict sometime. For lists and tuples, you can just use my_container[0] of course. But sets are often used differently than dicts IMHO. Dicts are mappings: you give a precise key and get the exact value associated with it. Sets are bags: sometimes you just want to pick an item, as you would do in a real bag, without looking inside to choose a specific item. > Since choose() would not be a > mutating method, it would also apply to frozensets. Does it > have any use there? Any appearance in a loop would be farce > since it would always return the same value (the frozenset > never mutates). The variable to which you apply the method call could reference another frozenset on the next loop iteration... Yes, it doesn't sound very frequent. > Question for everyone: Is there any known application for > choose() that isn't met by pop()/add() irrespective of > whether it "feels right"? I don't think so indeed. (it would be the case if the API guaranteed atomicity in the case of single bultin method calls like choose()) > For applications other than Michael's, we won't know the > size of the set in advance. Are there any examples of using > choose() that won't have to have ugly EAFP or LBYL code to > handle the case where the set is empty? First, sometimes you know the set is non-empty without knowing its size in advance (for example you are inside a big block beginning with "if len(my_set)"). Second, error catching is still needed with other alternatives (you either have to catch KeyError when doing s.pop(), or StopIteration when doing iter(s).next()). > Rather than a method just for sets, is it a more appropriate > solution to have a generic function that takes any iterable > and returns its first element: Well, I thought global builtin functions were less favoured than methods. But this doesn't sound stupid. On the other hand, generic functions are less easy to find about (usually if I want to have the set API, I type help(set) in a Python shell which I always have open. But there is no quick way to have a list of the builtin functions that can apply to iterables (*)). In my experience, I often forget about the generic builtin functions that come with Python - maybe that's just me. (*) if there was a base type "iterable" with the generic method first() defined on it, as well as some of the other functions defined in the itertools module, it would probably be more elegant and more frequently used. Anyway, if the concern with builtin types is to avoid bloating the API, then I admit that it's a fair concern and that the motivation to add the choose() method might indeed not be convincing enough. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2005-08-29 23:30 Message: Logged In: YES user_id=80475 Exploratory questions: Michael, if you know there is only one element in a set, do you really need a custom method just to extract it? There are so many other ways. Given s=set([7]): 1) x = list(s)[0] 2) x = s.pop() 3) x, = s 4) x = max(s) yada yada yada Antoine, yes a pop()/add() combination is efficient. IMO, it also clear in intent, easy to write, flexible enough for a variety of applications, the pop() is atomic, and the approach also works with other mutable containers (dicts, lists, deques). Question for Antoine: Have you ever had this need with other containers? For instance, how do you find an arbitrary key in a dictionary without popping, iterating, or listing it? Question for everyone: Since choose() would not be a mutating method, it would also apply to frozensets. Does it have any use there? Any appearance in a loop would be farce since it would always return the same value (the frozenset never mutates). Question for everyone: Is there any known application for choose() that isn't met by pop()/add() irrespective of whether it "feels right"? For applications other than Michael's, we won't know the size of the set in advance. Are there any examples of using choose() that won't have to have ugly EAFP or LBYL code to handle the case where the set is empty? Rather than a method just for sets, is it a more appropriate solution to have a generic function that takes any iterable and returns its first element: def getfirst(it): for elem in it: return elem raise ValueError('iterator is empty') A function like this would work with almost anything: first(dict(a=1, b=2)) first(set([1, 2])) first([1,2]) first(open('myfile.txt')) . . . ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2005-08-29 21:42 Message: Logged In: YES user_id=6656 _My_ use case for something like this is applying a series of constraints as set intersections until the set has one element; then I want to know what that element *is*. I could probably use .pop(), but it feels wrong, and I know I can (and indeed do) do iter(theSet).next() but that's obscure. ---------------------------------------------------------------------- Comment By: Antoine Pitrou (pitrou) Date: 2005-08-29 20:16 Message: Logged In: YES user_id=133955 > When choosing a ready object, you pop it to unready > because you're using it -- put it back in if the current use > won't cause blocking, or when that use finishes. That's not exactly my semantics (objects remain ready until they explicitly tell the contrary: for example a queue remains ready until it becomes empty), but I can live with a pop() / add() sequence provided it is efficient. Is it ? Otherwise I may go for "elem = iter(my_set).next()". Thanks for the very prompt answers, btw :) ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2005-08-29 20:07 Message: Logged In: YES user_id=764593 This does look like pop might be a better choice. When choosing a ready object, you pop it to unready because you're using it -- put it back in if the current use won't cause blocking, or when that use finishes. When choosing a waiting ready thread, either the thread is no longer ready (so put it back in waiting, but you don't want it in ready), or it runs (so it should no longer be in waiting). ---------------------------------------------------------------------- Comment By: Antoine Pitrou (pitrou) Date: 2005-08-29 19:31 Message: Logged In: YES user_id=133955 Hi, Thanks for the detailed reply. So, atomicity cannot be guaranteed. I understand that (you might tell it to the Twisted folks by the way, because as far as I've seen some of their code relies on list operations being atomic in CPython ;-)). Remains the simplicity argument. As for the first objection: my set is mutated in the loop in ways that I cannot predict (because each element in the set points me in turn to a user-defined callback that will often alter the set ;-)). That explains why it *is* useful to get the "first" element repeatedly: the "first" element changes very often. As for the use case : I'm writing a small cooperative multithread package using generators (mostly for fun, but I'll be glad if it pleases others too): https://developer.berlios.de/projects/tasklets/ Scheduling is based on "wait objects": when a wait object becomes ready, it is added to the set of ready objects and the main loop takes an element from this set and asks it for one of the threads waiting on the object. It is the set I'm talking about ;) Sometimes the readiness of one of those objects can be changed from another thread (for now I'm using a helper thread for timers, and perhaps also for other things in the future - IO, wxWidgets integration, etc.). The main loop is in the Switcher.run() method towards the end of the following file: http://svn.berlios.de/viewcvs/tasklets/trunk/softlets/core.py?view=markup As you see, I actually do a "for" on the set, but I always break of the "for" loop after the first iteration... Which is not very elegant and understandable for the reader. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2005-08-29 18:10 Message: Logged In: YES user_id=80475 We've looked at a choose() method a couple of times and rejected it. Since the method gets an arbitrary element, it might as well get the first one it sees. But that is of little use in a loop (when do you ever need to get the same object over and over again?). Also, a choose method() would need to raise a KeyError if the set if emtpy and/or provide a default argument like dict.get() does. This complicates the heck out of using it. Put the two together and the idea loses any charm. Also, for mutable sets, a better approach is to pop() elements from the set and add them back when you're done with them. I'm not too concerned about atomicity. For one, that is almost never a good guide to API design. Second, it is implementation dependent (i.e. no guarantees for PyPy or Jython). Three, it generally indicates a problem in your design (if the set could mutate smaller during a thread accessing the set, then you have a race condition where the set could shrink to zero or not). Four, the right way to deal with atomicity issues is to use locks or control access via Queue. I do understand that basic motivation (I have a set, now how do I a representative element) but find the proposal lacking. It just doesn't do much for us. BTW, please post your use case (in a condensed form that gets to the essentials of why you think this method is needed).. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1275677&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com