Re: how to sort a list of tuples with custom function

2017-08-04 Thread Ho Yeung Lee
On Friday, August 4, 2017 at 10:08:56 PM UTC+8, Ho Yeung Lee wrote:
> i had changed to use kmeans
> 
> https://gist.github.com/hoyeunglee/2475391ad554e3d2b2a40ec24ab47940
> 
> i do not know whether write it correctly
> but it seems can cluster to find words in window, but not perfect
> 
> 
> On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote:
> > Glenn Linderman wrote:
> > 
> > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote:
> > >> Ho Yeung Lee  writes:
> > >>
> > >>> def isneighborlocation(lo1, lo2):
> > >>>  if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
> > >>>  return 1
> > >>>  elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
> > >>>  return 1
> > >>>  elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
> > >>>  return 1
> > >>>  else:
> > >>>  return 0
> > >>>
> > >>>
> > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
> > >>>
> > >>> return something like
> > >>> [(1,2),(3,3),(2,5)]
> > 
> > >> I think you are trying to sort a list of two-dimensional points into a
> > >> one-dimensiqonal list in such a way thet points that are close together
> > >> in the two-dimensional sense will also be close together in the
> > >> one-dimensional list. But that is impossible.
> > 
> > > It's not impossible, it just requires an appropriate distance function
> > > used in the sort.
> > 
> > That's a grossly misleading addition. 
> > 
> > Once you have an appropriate clustering algorithm
> > 
> > clusters = split_into_clusters(items) # needs access to all items
> > 
> > you can devise a key function
> > 
> > def get_cluster(item, clusters=split_into_clusters(items)):
> > return next(
> > index for index, cluster in enumerate(clusters) if item in cluster
> > )
> > 
> > such that
> > 
> > grouped_items = sorted(items, key=get_cluster)
> > 
> > but that's a roundabout way to write
> > 
> > grouped_items = sum(split_into_clusters(items), [])
> > 
> > In other words: sorting is useless, what you really need is a suitable 
> > approach to split the data into groups. 
> > 
> > One well-known algorithm is k-means clustering:
> > 
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html
> > 
> > Here is an example with pictures:
> > 
> > https://dzone.com/articles/k-means-clustering-scipy


i use number of clusters = 120

https://gist.github.com/hoyeunglee/2475391ad554e3d2b2a40ec24ab47940
https://drive.google.com/file/d/0Bxs_ao6uuBDUZFByNVgzd0Jrdm8/view?usp=sharing

using my previous is not suitable for english words
but using kmeans is better, however not perfect to cluster words, 
some words are missing
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-04 Thread Ho Yeung Lee
i had changed to use kmeans

https://gist.github.com/hoyeunglee/2475391ad554e3d2b2a40ec24ab47940

i do not know whether write it correctly
but it seems can cluster to find words in window, but not perfect


On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote:
> Glenn Linderman wrote:
> 
> > On 8/1/2017 2:10 PM, Piet van Oostrum wrote:
> >> Ho Yeung Lee  writes:
> >>
> >>> def isneighborlocation(lo1, lo2):
> >>>  if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
> >>>  return 1
> >>>  elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
> >>>  return 1
> >>>  elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
> >>>  return 1
> >>>  else:
> >>>  return 0
> >>>
> >>>
> >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
> >>>
> >>> return something like
> >>> [(1,2),(3,3),(2,5)]
> 
> >> I think you are trying to sort a list of two-dimensional points into a
> >> one-dimensiqonal list in such a way thet points that are close together
> >> in the two-dimensional sense will also be close together in the
> >> one-dimensional list. But that is impossible.
> 
> > It's not impossible, it just requires an appropriate distance function
> > used in the sort.
> 
> That's a grossly misleading addition. 
> 
> Once you have an appropriate clustering algorithm
> 
> clusters = split_into_clusters(items) # needs access to all items
> 
> you can devise a key function
> 
> def get_cluster(item, clusters=split_into_clusters(items)):
> return next(
> index for index, cluster in enumerate(clusters) if item in cluster
> )
> 
> such that
> 
> grouped_items = sorted(items, key=get_cluster)
> 
> but that's a roundabout way to write
> 
> grouped_items = sum(split_into_clusters(items), [])
> 
> In other words: sorting is useless, what you really need is a suitable 
> approach to split the data into groups. 
> 
> One well-known algorithm is k-means clustering:
> 
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html
> 
> Here is an example with pictures:
> 
> https://dzone.com/articles/k-means-clustering-scipy

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-03 Thread Ho Yeung Lee
I remove red line 
and capture another version

https://gist.github.com/hoyeunglee/99bbe7999bc489a79ffdf0277e80ecb6

it can capture words in windows, but since window words some are black
and some gray, some are not exactly black, 
so I only choose notepad , since it is using black words

but some words are splitted, I have already sorted by x[0] and x[1]

can it improver to a consecutively a few words

"檔案" <- File is succeed
but "另存新檔"  failed since words are splitted

On Thursday, August 3, 2017 at 3:54:13 PM UTC+8, Ho Yeung Lee wrote:
> https://gist.github.com/hoyeunglee/3d340ab4e9a3e2b7ad7307322055b550
> 
> I updated again
> 
> how to do better because some words are stored in different files
> 
> On Thursday, August 3, 2017 at 10:02:01 AM UTC+8, Ho Yeung Lee wrote:
> > https://gist.github.com/hoyeunglee/f371f66d55f90dda043f7e7fea38ffa2
> > 
> > I am near succeed in another way, please run above code
> > 
> > when so much black words, it will be very slow
> > so I only open notepad and maximum it without any content
> > then capture screen and save as roster.png
> > 
> > and run it, but I discover it can not circle all words with red rectangle
> > and only part of words
> > 
> > 
> > On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote:
> > > Glenn Linderman wrote:
> > > 
> > > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote:
> > > >> Ho Yeung Lee  writes:
> > > >>
> > > >>> def isneighborlocation(lo1, lo2):
> > > >>>  if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
> > > >>>  return 1
> > > >>>  elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
> > > >>>  return 1
> > > >>>  elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
> > > >>>  return 1
> > > >>>  else:
> > > >>>  return 0
> > > >>>
> > > >>>
> > > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
> > > >>>
> > > >>> return something like
> > > >>> [(1,2),(3,3),(2,5)]
> > > 
> > > >> I think you are trying to sort a list of two-dimensional points into a
> > > >> one-dimensiqonal list in such a way thet points that are close together
> > > >> in the two-dimensional sense will also be close together in the
> > > >> one-dimensional list. But that is impossible.
> > > 
> > > > It's not impossible, it just requires an appropriate distance function
> > > > used in the sort.
> > > 
> > > That's a grossly misleading addition. 
> > > 
> > > Once you have an appropriate clustering algorithm
> > > 
> > > clusters = split_into_clusters(items) # needs access to all items
> > > 
> > > you can devise a key function
> > > 
> > > def get_cluster(item, clusters=split_into_clusters(items)):
> > > return next(
> > > index for index, cluster in enumerate(clusters) if item in cluster
> > > )
> > > 
> > > such that
> > > 
> > > grouped_items = sorted(items, key=get_cluster)
> > > 
> > > but that's a roundabout way to write
> > > 
> > > grouped_items = sum(split_into_clusters(items), [])
> > > 
> > > In other words: sorting is useless, what you really need is a suitable 
> > > approach to split the data into groups. 
> > > 
> > > One well-known algorithm is k-means clustering:
> > > 
> > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html
> > > 
> > > Here is an example with pictures:
> > > 
> > > https://dzone.com/articles/k-means-clustering-scipy

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-03 Thread Ho Yeung Lee
https://gist.github.com/hoyeunglee/3d340ab4e9a3e2b7ad7307322055b550

I updated again

how to do better because some words are stored in different files

On Thursday, August 3, 2017 at 10:02:01 AM UTC+8, Ho Yeung Lee wrote:
> https://gist.github.com/hoyeunglee/f371f66d55f90dda043f7e7fea38ffa2
> 
> I am near succeed in another way, please run above code
> 
> when so much black words, it will be very slow
> so I only open notepad and maximum it without any content
> then capture screen and save as roster.png
> 
> and run it, but I discover it can not circle all words with red rectangle
> and only part of words
> 
> 
> On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote:
> > Glenn Linderman wrote:
> > 
> > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote:
> > >> Ho Yeung Lee  writes:
> > >>
> > >>> def isneighborlocation(lo1, lo2):
> > >>>  if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
> > >>>  return 1
> > >>>  elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
> > >>>  return 1
> > >>>  elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
> > >>>  return 1
> > >>>  else:
> > >>>  return 0
> > >>>
> > >>>
> > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
> > >>>
> > >>> return something like
> > >>> [(1,2),(3,3),(2,5)]
> > 
> > >> I think you are trying to sort a list of two-dimensional points into a
> > >> one-dimensiqonal list in such a way thet points that are close together
> > >> in the two-dimensional sense will also be close together in the
> > >> one-dimensional list. But that is impossible.
> > 
> > > It's not impossible, it just requires an appropriate distance function
> > > used in the sort.
> > 
> > That's a grossly misleading addition. 
> > 
> > Once you have an appropriate clustering algorithm
> > 
> > clusters = split_into_clusters(items) # needs access to all items
> > 
> > you can devise a key function
> > 
> > def get_cluster(item, clusters=split_into_clusters(items)):
> > return next(
> > index for index, cluster in enumerate(clusters) if item in cluster
> > )
> > 
> > such that
> > 
> > grouped_items = sorted(items, key=get_cluster)
> > 
> > but that's a roundabout way to write
> > 
> > grouped_items = sum(split_into_clusters(items), [])
> > 
> > In other words: sorting is useless, what you really need is a suitable 
> > approach to split the data into groups. 
> > 
> > One well-known algorithm is k-means clustering:
> > 
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html
> > 
> > Here is an example with pictures:
> > 
> > https://dzone.com/articles/k-means-clustering-scipy

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-02 Thread Ho Yeung Lee
https://gist.github.com/hoyeunglee/f371f66d55f90dda043f7e7fea38ffa2

I am near succeed in another way, please run above code

when so much black words, it will be very slow
so I only open notepad and maximum it without any content
then capture screen and save as roster.png

and run it, but I discover it can not circle all words with red rectangle
and only part of words


On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote:
> Glenn Linderman wrote:
> 
> > On 8/1/2017 2:10 PM, Piet van Oostrum wrote:
> >> Ho Yeung Lee  writes:
> >>
> >>> def isneighborlocation(lo1, lo2):
> >>>  if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
> >>>  return 1
> >>>  elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
> >>>  return 1
> >>>  elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
> >>>  return 1
> >>>  else:
> >>>  return 0
> >>>
> >>>
> >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
> >>>
> >>> return something like
> >>> [(1,2),(3,3),(2,5)]
> 
> >> I think you are trying to sort a list of two-dimensional points into a
> >> one-dimensiqonal list in such a way thet points that are close together
> >> in the two-dimensional sense will also be close together in the
> >> one-dimensional list. But that is impossible.
> 
> > It's not impossible, it just requires an appropriate distance function
> > used in the sort.
> 
> That's a grossly misleading addition. 
> 
> Once you have an appropriate clustering algorithm
> 
> clusters = split_into_clusters(items) # needs access to all items
> 
> you can devise a key function
> 
> def get_cluster(item, clusters=split_into_clusters(items)):
> return next(
> index for index, cluster in enumerate(clusters) if item in cluster
> )
> 
> such that
> 
> grouped_items = sorted(items, key=get_cluster)
> 
> but that's a roundabout way to write
> 
> grouped_items = sum(split_into_clusters(items), [])
> 
> In other words: sorting is useless, what you really need is a suitable 
> approach to split the data into groups. 
> 
> One well-known algorithm is k-means clustering:
> 
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html
> 
> Here is an example with pictures:
> 
> https://dzone.com/articles/k-means-clustering-scipy

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-02 Thread Peter Otten
Glenn Linderman wrote:

> On 8/1/2017 2:10 PM, Piet van Oostrum wrote:
>> Ho Yeung Lee  writes:
>>
>>> def isneighborlocation(lo1, lo2):
>>>  if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
>>>  return 1
>>>  elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
>>>  return 1
>>>  elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
>>>  return 1
>>>  else:
>>>  return 0
>>>
>>>
>>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
>>>
>>> return something like
>>> [(1,2),(3,3),(2,5)]

>> I think you are trying to sort a list of two-dimensional points into a
>> one-dimensiqonal list in such a way thet points that are close together
>> in the two-dimensional sense will also be close together in the
>> one-dimensional list. But that is impossible.

> It's not impossible, it just requires an appropriate distance function
> used in the sort.

That's a grossly misleading addition. 

Once you have an appropriate clustering algorithm

clusters = split_into_clusters(items) # needs access to all items

you can devise a key function

def get_cluster(item, clusters=split_into_clusters(items)):
return next(
index for index, cluster in enumerate(clusters) if item in cluster
)

such that

grouped_items = sorted(items, key=get_cluster)

but that's a roundabout way to write

grouped_items = sum(split_into_clusters(items), [])

In other words: sorting is useless, what you really need is a suitable 
approach to split the data into groups. 

One well-known algorithm is k-means clustering:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html

Here is an example with pictures:

https://dzone.com/articles/k-means-clustering-scipy

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-01 Thread Ho Yeung Lee
how to write this distance function in sort
there are the syntax error 


On Wednesday, August 2, 2017 at 6:03:13 AM UTC+8, Glenn Linderman wrote:
> On 8/1/2017 2:10 PM, Piet van Oostrum wrote:
> > Ho Yeung Lee  writes:
> >
> >> def isneighborlocation(lo1, lo2):
> >>  if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
> >>  return 1
> >>  elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
> >>  return 1
> >>  elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
> >>  return 1
> >>  else:
> >>  return 0
> >>
> >>
> >> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
> >>
> >> return something like
> >> [(1,2),(3,3),(2,5)]
> > I think you are trying to sort a list of two-dimensional points into a
> > one-dimensiqonal list in such a way thet points that are close together
> > in the two-dimensional sense will also be close together in the
> > one-dimensional list. But that is impossible.
> It's not impossible, it just requires an appropriate distance function 
> used in the sort.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-01 Thread Glenn Linderman

On 8/1/2017 2:10 PM, Piet van Oostrum wrote:

Ho Yeung Lee  writes:


def isneighborlocation(lo1, lo2):
 if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
 return 1
 elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
 return 1
 elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
 return 1
 else:
 return 0


sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))

return something like
[(1,2),(3,3),(2,5)]

I think you are trying to sort a list of two-dimensional points into a
one-dimensiqonal list in such a way thet points that are close together
in the two-dimensional sense will also be close together in the
one-dimensional list. But that is impossible.
It's not impossible, it just requires an appropriate distance function 
used in the sort.

--
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-01 Thread Piet van Oostrum
Ho Yeung Lee  writes:

> def isneighborlocation(lo1, lo2):
> if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
> return 1
> elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
> return 1
> elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
> return 1
> else:
> return 0
>
>
> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
>
> return something like
> [(1,2),(3,3),(2,5)]

I think you are trying to sort a list of two-dimensional points into a
one-dimensiqonal list in such a way thet points that are close together
in the two-dimensional sense will also be close together in the
one-dimensional list. But that is impossible.
-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to sort a list of tuples with custom function

2017-08-01 Thread Ho Yeung Lee
i tried with
testing1.sort(key=lambda x: x[0])
but only first element of tuple are grouped

then i expect to sort with custom function if difference between first element 
of tuple and another first element of tuple is less than some value
and do for second element too,

goal to segmentation of black words from photo

from PIL import Image
from functools import partial 
ma = Image.open("roster.png")
color1 = ma.load()
print ma.size
print color1[1,1] 
color1 = ma.load()
print ma.size
print color1[1,1] 
colortolocation = {}
def addtogroupkey(keyandmemory, key1, memorycontent):
k = key1
if k in keyandmemory: 
keyandmemory[k].append(memorycontent) 
else: 
keyandmemory[k] = [memorycontent]
return keyandmemory

for ii in range(0, ma.size[0]):
for jj in range(0, ma.size[1]):
colortolocation = addtogroupkey(colortolocation, color1[ii,jj], (ii,jj))

def isneighborlocation(lo1, lo2):
if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
return 1
elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
return 1
elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
return 1
else:
return 0

for eachcolor in colortolocation:
testing1 = list(colortolocation[eachcolor])
#testing1.sort(key=lambda x: x[1])
#custom_list_indices = {v: i for i, v in enumerate(custom_list)}
testing1.sort(key=lambda x: x[0]-x[1])
locations = testing1
locationsgroup = {}
continueconnect = 0
for ii in range(0,len(locations)-1):
if isneighborlocation(locations[ii], locations[ii+1]) == 1:
if continueconnect == 0:
keyone = len(locationsgroup)+1
if keyone in locationsgroup:
if locations[ii] not in locationsgroup[keyone]:
locationsgroup = addtogroupkey(locationsgroup, keyone, 
locations[ii])
if locations[ii+1] not in locationsgroup[keyone]:
locationsgroup = addtogroupkey(locationsgroup, keyone, 
locations[ii+1])
else:
locationsgroup = addtogroupkey(locationsgroup, keyone, 
locations[ii])
locationsgroup = addtogroupkey(locationsgroup, keyone, 
locations[ii+1])
continueconnect = 1
else:
if len(locationsgroup) > 0:
if locations[ii] not in locationsgroup[len(locationsgroup)]:
locationsgroup = addtogroupkey(locationsgroup, 
len(locationsgroup)+1, locations[ii])
else:
locationsgroup = addtogroupkey(locationsgroup, 
len(locationsgroup)+1, locations[ii])
continueconnect = 0
colortolocation[eachcolor] = locationsgroup

for kk in colortolocation[(0,0,0)]:
if len(colortolocation[(0,0,0)][kk]) > 7:
print kk
print colortolocation[(0,0,0)][kk]




On Wednesday, August 2, 2017 at 3:50:52 AM UTC+8, Ho Yeung Lee wrote:
> def isneighborlocation(lo1, lo2):
> if abs(lo1[0] - lo2[0]) < 7  and abs(lo1[1] - lo2[1]) < 7:
> return 1
> elif abs(lo1[0] - lo2[0]) == 1  and lo1[1] == lo2[1]:
> return 1
> elif abs(lo1[1] - lo2[1]) == 1  and lo1[0] == lo2[0]:
> return 1
> else:
> return 0
> 
> 
> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1]))
> 
> return something like
> [(1,2),(3,3),(2,5)]

-- 
https://mail.python.org/mailman/listinfo/python-list