Re: [Tutor] reducing lists within list to their set of unique values

2013-05-21 Thread Oscar Benjamin
On 21 May 2013 14:31, Treder, Robert  wrote:
> Steven wrote:
>>
>> py> L = ['b', 'd', 'c', 'a', 'b']
>> py> list(set(L))
>> ['a', 'c', 'b', 'd']
>>
>>
>> If keeping the order is important, you cannot use set, and you'll need 
>> another way to extract only the unique values. Ask if you need help on
>> that.
>
> Thanks, Steven. Very helpful. It looks like the order is only changed on the 
> inner list that
> set() is applied to, not on the outer list since the outer list order is 
> controlled by index.
> For this application I don't care about the order of the inner lists. However 
> there are other
> applications where that will be import. Can you please describe the alternate 
> method for
> extracting the unique values that maintains order.

There isn't necessarily a uniquely defined ordering. Here's a function
that preserves the order of the first occurrences of each element in
each list:

def uniquify(original):
new = []
seen = set()
for item in original:
if item not in seen:
new.append(item)
seen.add(item)
return new

>>> uniquify([1, 2, 3, 1, 2, 5])
[1, 2, 3, 5]


Oscar
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reducing lists within list to their set of unique values

2013-05-21 Thread Treder, Robert
> Message: 6
> Date: Tue, 21 May 2013 09:45:17 +1000
> From: Steven D'Aprano 
> To: tutor@python.org
> Subject: Re: [Tutor] reducing lists within list to their set of unique
>   values
> Message-ID: <519ab58d.9020...@pearwood.info>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 21/05/13 08:49, Treder, Robert wrote:
>> Hi python folks,
>>
>> I have a list of lists that looks something like this:
>>
>> tst = [ [], ['test'], ['t1', 't2'], ['t1', 't1', 't2'] ]
>>
>> I want to change the empty sets to a blank string, i.e., '' and the lists 
>> with repeat values to the unique set of values. So I have done the >> 
>> following:
>>
>>>>> for t in tst:
>>  if len(t) == 0:
>>  tst.__setitem__(tst.index(t), '')
>>  else:
>>  tst.__setitem__(tst.index(t), set(t))
>
>
> As a general rule, if you are writing double-underscore special methods like 
> __setitem__ directly, you're doing it wrong. (There are
> exceptions, but consider them "for experts".)
>
> So instead of tst.__setitem__(a, b) you should write tst[a] = b.
>
> But that's still the wrong way to do this! You're doing a lot of extra work 
> with the calls to tst.index. You won't notice for a short list like > the 
> example above, but for a long list, this will get really, really slow.
>
> The way to do this is to keep track of the index as you walk over the list, 
> and not recalculate it by searching the list:
>
>
> for index, item in enumerate(tst):
> if item == []:
> item = ""
> else:
> item = list(set(item))
> tst[index] = item
>
>
> Notice that I call set() to get the unique values, then list() again to turn 
> it back into a list. This does the job you want, but it is not 
> guaranteed to keep the order:
>
> py> L = ['b', 'd', 'c', 'a', 'b']
> py> list(set(L))
> ['a', 'c', 'b', 'd']
>
>
> If keeping the order is important, you cannot use set, and you'll need 
> another way to extract only the unique values. Ask if you need help on 
> that.

Thanks, Steven. Very helpful. It looks like the order is only changed on the 
inner list that set() is applied to, not on the outer list since the outer list 
order is controlled by index. For this application I don't care about the order 
of the inner lists. However there are other applications where that will be 
import. Can you please describe the alternate method for extracting the unique 
values that maintains order. 

Thanks, 
Bob

>
>
>
>> What I get in return is
>>
>>>>> tst
>> ['', set(['test']), set(['t2', 't1']), set(['t2', 't1'])]
>>
>> The empty list is fine but the other lists seem to be expressions rather 
>> than values. What do I need to do to simply get the values back 
>> liike the following?
>>
>> ['', ['test'], ['t2', 't1'], ['t2', 't1']]
>
>
> They are values. It is just that they are *sets* rather than *lists*. When 
> printed, lists have a nice compact representation using square 
> brackets [], but unfortunately sets do not. However, if you upgrade to Python 
> 3, they have been upgraded to look a little nicer:
>
>
> # Python 2:
> set(['a', 'c', 'b', 'd'])
> 
> # Python 3
> {'d', 'b', 'c', 'a'}
>
>
> Notice that the order of the items is not guaranteed, but apart from that, 
> the two versions are the same despite the difference in print 
> representation.
>
> -- 
> Steven






NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do not constitute, advice 
within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and 
Consumer Protection Act. If you have received this communication in error, 
please destroy all electronic and paper copies and notify the sender 
immediately. Mistransmission is not intended to waive confidentiality or 
privilege. Morgan Stanley reserves the right, to the extent permitted under 
applicable law, to monitor electronic communications. This message is subject 
to terms available at the following link: 
http://www.morganstanley.com/disclaimers. If you cannot access these links, 
please notify us by reply message and we will send the contents to you. By 
messaging with Morgan Stanley you consent to the foregoing.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] reducing lists within list to their set of unique values

2013-05-20 Thread Steven D'Aprano

On 21/05/13 08:49, Treder, Robert wrote:

Hi python folks,

I have a list of lists that looks something like this:

tst = [ [], ['test'], ['t1', 't2'], ['t1', 't1', 't2'] ]

I want to change the empty sets to a blank string, i.e., '' and the lists with 
repeat values to the unique set of values. So I have done the following:


for t in tst:

if len(t) == 0:
tst.__setitem__(tst.index(t), '')
else:
tst.__setitem__(tst.index(t), set(t))



As a general rule, if you are writing double-underscore special methods like __setitem__ 
directly, you're doing it wrong. (There are exceptions, but consider them "for 
experts".)

So instead of tst.__setitem__(a, b) you should write tst[a] = b.

But that's still the wrong way to do this! You're doing a lot of extra work 
with the calls to tst.index. You won't notice for a short list like the example 
above, but for a long list, this will get really, really slow.

The way to do this is to keep track of the index as you walk over the list, and 
not recalculate it by searching the list:


for index, item in enumerate(tst):
if item == []:
item = ""
else:
item = list(set(item))
tst[index] = item


Notice that I call set() to get the unique values, then list() again to turn it 
back into a list. This does the job you want, but it is not guaranteed to keep 
the order:

py> L = ['b', 'd', 'c', 'a', 'b']
py> list(set(L))
['a', 'c', 'b', 'd']


If keeping the order is important, you cannot use set, and you'll need another 
way to extract only the unique values. Ask if you need help on that.




What I get in return is


tst

['', set(['test']), set(['t2', 't1']), set(['t2', 't1'])]

The empty list is fine but the other lists seem to be expressions rather than 
values. What do I need to do to simply get the values back liike the following?

['', ['test'], ['t2', 't1'], ['t2', 't1']]



They are values. It is just that they are *sets* rather than *lists*. When 
printed, lists have a nice compact representation using square brackets [], but 
unfortunately sets do not. However, if you upgrade to Python 3, they have been 
upgraded to look a little nicer:


# Python 2:
set(['a', 'c', 'b', 'd'])

# Python 3
{'d', 'b', 'c', 'a'}


Notice that the order of the items is not guaranteed, but apart from that, the 
two versions are the same despite the difference in print representation.



--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] reducing lists within list to their set of unique values

2013-05-20 Thread Treder, Robert
Hi python folks,
 
I have a list of lists that looks something like this: 
 
tst = [ [], ['test'], ['t1', 't2'], ['t1', 't1', 't2'] ]
 
I want to change the empty sets to a blank string, i.e., '' and the lists with 
repeat values to the unique set of values. So I have done the following: 
 
>>> for t in tst:
if len(t) == 0:
tst.__setitem__(tst.index(t), '')
else:
tst.__setitem__(tst.index(t), set(t))

What I get in return is 

>>> tst
['', set(['test']), set(['t2', 't1']), set(['t2', 't1'])]

The empty list is fine but the other lists seem to be expressions rather than 
values. What do I need to do to simply get the values back liike the following? 

['', ['test'], ['t2', 't1'], ['t2', 't1']]


Thanks, 
Bob
 





NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do not constitute, advice 
within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and 
Consumer Protection Act. If you have received this communication in error, 
please destroy all electronic and paper copies and notify the sender 
immediately. Mistransmission is not intended to waive confidentiality or 
privilege. Morgan Stanley reserves the right, to the extent permitted under 
applicable law, to monitor electronic communications. This message is subject 
to terms available at the following link: 
http://www.morganstanley.com/disclaimers. If you cannot access these links, 
please notify us by reply message and we will send the contents to you. By 
messaging with Morgan Stanley you consent to the foregoing.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor