Handling 2.7 and 3.0 Versions of Dict

2011-08-30 Thread Travis Parks
I am writing a simple algorithms library that I want to work for both
Python 2.7 and 3.x. I am writing some functions like distinct, which
work with dictionaries under the hood. The problem I ran into is that
I am calling itervalues or values depending on which version of the
language I am working in. Here is the code I wrote to overcome it:

import sys
def getDictValuesFoo():
if sys.version_info < (3,):
return dict.itervalues
else:
return dict.values

getValues = getDictValuesFoo()

def distinct(iterable, keySelector = (lambda x: x)):
lookup = {}
for item in iterable:
key = keySelector(item)
if key not in lookup:
lookup[key] = item
return getValues(lookup)

I was surprised to learn that getValues CANNOT be called as if it were
a member of dict. I figured it was more efficient to determine what
getValues was once rather than every time it was needed.

First, how can I make the method getValues "private" _and_ so it only
gets evaluated once? Secondly, will the body of the distinct method be
evaluated immediately? How can I delay building the dict until the
first value is requested?

I noticed that hashing is a lot different in Python than it is in .NET
languages. .NET supports custom "equality comparers" that can override
a type's Equals and GetHashCode functions. This is nice when you can't
change the class you are hashing. That is why I am using a key
selector in my code, here. Is there a better way of overriding the
default hashing of a type without actually modifying its definition? I
figured a requesting a key was the easiest way.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-08-30 Thread Terry Reedy

On 8/30/2011 9:43 PM, Travis Parks wrote:

I am writing a simple algorithms library that I want to work for both
Python 2.7 and 3.x. I am writing some functions like distinct, which
work with dictionaries under the hood. The problem I ran into is that
I am calling itervalues or values depending on which version of the
language I am working in. Here is the code I wrote to overcome it:

import sys
def getDictValuesFoo():
 if sys.version_info<  (3,):
 return dict.itervalues
 else:
 return dict.values


One alternative is to use itervalues and have 2to3 translate for you.
--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-08-31 Thread Martin v. Loewis
Am 31.08.2011 03:43, schrieb Travis Parks:
> I am writing a simple algorithms library that I want to work for both
> Python 2.7 and 3.x. I am writing some functions like distinct, which
> work with dictionaries under the hood. The problem I ran into is that
> I am calling itervalues or values depending on which version of the
> language I am working in. Here is the code I wrote to overcome it:
> 
> import sys
> def getDictValuesFoo():
> if sys.version_info < (3,):
> return dict.itervalues
> else:
> return dict.values
> 
> getValues = getDictValuesFoo()
> 
> def distinct(iterable, keySelector = (lambda x: x)):
> lookup = {}
> for item in iterable:
> key = keySelector(item)
> if key not in lookup:
> lookup[key] = item
> return getValues(lookup)
> 
> I was surprised to learn that getValues CANNOT be called as if it were
> a member of dict. I figured it was more efficient to determine what
> getValues was once rather than every time it was needed.
> 
> First, how can I make the method getValues "private" _and_ so it only
> gets evaluated once?

Not sure what "private" means here. Having the logic selected only once
goes like this

if sys.version_info < (3,):
  def getDictValues(dict):
  return dict.itervalues()
else:
  def getDictValues(dict):
  return dict.values()

> Secondly, will the body f the distinct method be
> evaluated immediately?

Yes.

> How can I delay building the dict until the first value is requested?

Make it a generator:

def distinct(iterable, keySelector = (lambda x: x)):
lookup = {}
for item in iterable:
key = keySelector(item)
if key not in lookup:
lookup[key] = item
for v in  getValues(lookup):
yield v

This delays *building* the dictionary until the *first* value is
requested. I.e. it completes building the dictionary before the first
value is returned.

If you also want to interleave iteration over iterable with fetching
distinct values, write it like that:

def distinct(iterable, keySelector = (lambda x: x)):
seen = {}
for item in iterable:
key = keySelector(item)
if key not in seen:
yield item
seen[key] = item

> I noticed that hashing is a lot different in Python than it is in .NET
> languages. .NET supports custom "equality comparers" that can override
> a type's Equals and GetHashCode functions. This is nice when you can't
> change the class you are hashing. That is why I am using a key
> selector in my code, here. Is there a better way of overriding the
> default hashing of a type without actually modifying its definition? I
> figured a requesting a key was the easiest way.

You could provide a Key class that takes a hash function and a value
function:

class Key:
  def __init__(self, value, hash, eq):
self.value, self.hash, self.eq = value, hash, eq
  def __hash__(self):
return self.hash(self.value)
  def __eq__(self, other_key):
return self.eq(self.value, other_key.value)

This class would then be used instead of your keySelector.

With that, you could change the dictionary to a set. Actually, you
could already do so in the second generator version:

def distinct(iterable, keySelector = (lambda x: x)):
seen = set()
for item in iterable:
key = keySelector(item)
if key not in seen:
yield item
seen.add(key) # item is not needed anymore

HTH,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-08-31 Thread Ian Kelly
On Wed, Aug 31, 2011 at 3:55 AM, Martin v. Loewis  wrote:
> if sys.version_info < (3,):
>  def getDictValues(dict):
>  return dict.itervalues()
> else:
>  def getDictValues(dict):
>  return dict.values()

The extra level of function call indirection is unnecessary here.
Better to write it as:

if sys.version_info < (3,):
getDictValues = dict.itervalues
else:
getDictValues = dict.values

(which is basically what the OP was doing in the first place).

>> I noticed that hashing is a lot different in Python than it is in .NET
>> languages. .NET supports custom "equality comparers" that can override
>> a type's Equals and GetHashCode functions. This is nice when you can't
>> change the class you are hashing. That is why I am using a key
>> selector in my code, here. Is there a better way of overriding the
>> default hashing of a type without actually modifying its definition? I
>> figured a requesting a key was the easiest way.
>
> You could provide a Key class that takes a hash function and a value
> function:
>
> class Key:
>  def __init__(self, value, hash, eq):
>    self.value, self.hash, self.eq = value, hash, eq
>  def __hash__(self):
>    return self.hash(self.value)
>  def __eq__(self, other_key):
>    return self.eq(self.value, other_key.value)
>
> This class would then be used instead of your keySelector.

For added value, you can make it a class factory so you don't have to
specify hash and eq over and over:

def Key(keyfunc):
class Key:
def __init__(self, value):
self.value = value
def __hash__(self):
return hash(keyfunc(self.value))
def __eq__(self, other):
return keyfunc(self) == keyfunc(other)
return Key

KeyTypeAlpha = Key(lambda x: x % 7)

items = set(KeyTypeAlpha(value) for value in sourceIterable)

Cheers,
Ian
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-08-31 Thread Gregory Ewing

Ian Kelly wrote:


if sys.version_info < (3,):
getDictValues = dict.itervalues
else:
getDictValues = dict.values

(which is basically what the OP was doing in the first place).


And which he seemed to think didn't work for some
reason, but it seems fine as far as I can tell:

Python 2.7 (r27:82500, Oct 15 2010, 21:14:33)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> gv = dict.itervalues
>>> d = {1:'a', 2:'b'}
>>> gv(d)


% python3.1
Python 3.1.2 (r312:79147, Mar  2 2011, 17:43:12)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> gv = dict.values
>>> d = {1:'a', 2:'b'}
>>> gv(d)
dict_values(['a', 'b'])

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-08-31 Thread Travis Parks
On Aug 31, 7:37 pm, Gregory Ewing  wrote:
> Ian Kelly wrote:
> > if sys.version_info < (3,):
> >     getDictValues = dict.itervalues
> > else:
> >     getDictValues = dict.values
>
> > (which is basically what the OP was doing in the first place).
>
> And which he seemed to think didn't work for some
> reason, but it seems fine as far as I can tell:
>
> Python 2.7 (r27:82500, Oct 15 2010, 21:14:33)
> [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> gv = dict.itervalues
>  >>> d = {1:'a', 2:'b'}
>  >>> gv(d)
> 
>
> % python3.1
> Python 3.1.2 (r312:79147, Mar  2 2011, 17:43:12)
> [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> gv = dict.values
>  >>> d = {1:'a', 2:'b'}
>  >>> gv(d)
> dict_values(['a', 'b'])
>
> --
> Greg

My problem was that I didn't understand the scoping rules. It is still
strange to me that the getValues variable is still in scope outside
the if/else branches.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-09-02 Thread Gabriel Genellina
En Wed, 31 Aug 2011 22:28:09 -0300, Travis Parks   
escribió:



On Aug 31, 7:37 pm, Gregory Ewing  wrote:

Ian Kelly wrote:
> if sys.version_info < (3,):
> getDictValues = dict.itervalues
> else:
> getDictValues = dict.values

> (which is basically what the OP was doing in the first place).


My problem was that I didn't understand the scoping rules. It is still
strange to me that the getValues variable is still in scope outside
the if/else branches.


Those if/else are at global scope. An 'if' statement does not introduce a  
new scope; so getDictValues, despite being "indented", is defined at  
global scope, and may be used anywhere in the module.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-09-02 Thread Travis Parks
On Sep 2, 12:36 pm, "Gabriel Genellina" 
wrote:
> En Wed, 31 Aug 2011 22:28:09 -0300, Travis Parks   
> escribi :
>
> > On Aug 31, 7:37 pm, Gregory Ewing  wrote:
> >> Ian Kelly wrote:
> >> > if sys.version_info < (3,):
> >> >     getDictValues = dict.itervalues
> >> > else:
> >> >     getDictValues = dict.values
>
> >> > (which is basically what the OP was doing in the first place).
>
> > My problem was that I didn't understand the scoping rules. It is still
> > strange to me that the getValues variable is still in scope outside
> > the if/else branches.
>
> Those if/else are at global scope. An 'if' statement does not introduce a  
> new scope; so getDictValues, despite being "indented", is defined at  
> global scope, and may be used anywhere in the module.
>
> --
> Gabriel Genellina
>
>

Does that mean the rules would be different inside a function?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-09-02 Thread Terry Reedy

On 9/2/2011 12:53 PM, Travis Parks wrote:

On Sep 2, 12:36 pm, "Gabriel Genellina"



Those if/else are at global scope. An 'if' statement does not introduce a
new scope; so getDictValues, despite being "indented", is defined at
global scope, and may be used anywhere in the module.



Does that mean the rules would be different inside a function?


Yes. Inside a function, you would have to add
global getDictValues
before the if statement in order for the assignments to have global effect.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Handling 2.7 and 3.0 Versions of Dict

2011-09-02 Thread Gabriel Genellina
En Fri, 02 Sep 2011 13:53:37 -0300, Travis Parks   
escribió:



On Sep 2, 12:36 pm, "Gabriel Genellina" 
wrote:
En Wed, 31 Aug 2011 22:28:09 -0300, Travis Parks  
 escribi :


> On Aug 31, 7:37 pm, Gregory Ewing  wrote:
>> Ian Kelly wrote:
>> > if sys.version_info < (3,):
>> > getDictValues = dict.itervalues
>> > else:
>> > getDictValues = dict.values

>> > (which is basically what the OP was doing in the first place).

> My problem was that I didn't understand the scoping rules. It is still
> strange to me that the getValues variable is still in scope outside
> the if/else branches.

Those if/else are at global scope. An 'if' statement does not introduce  
a new scope; so getDictValues, despite being "indented", is defined at  
global scope, and may be used anywhere in the module.


Does that mean the rules would be different inside a function?


Yes: a function body *does* create a new scope, as well as the class  
statement. See

http://docs.python.org/reference/executionmodel.html

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list