Re: how to sort a list of tuples with custom function
On Friday, August 4, 2017 at 10:08:56 PM UTC+8, Ho Yeung Lee wrote: > i had changed to use kmeans > > https://gist.github.com/hoyeunglee/2475391ad554e3d2b2a40ec24ab47940 > > i do not know whether write it correctly > but it seems can cluster to find words in window, but not perfect > > > On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote: > > Glenn Linderman wrote: > > > > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote: > > >> Ho Yeung Lee writes: > > >> > > >>> def isneighborlocation(lo1, lo2): > > >>> if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > > >>> return 1 > > >>> elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > > >>> return 1 > > >>> elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > > >>> return 1 > > >>> else: > > >>> return 0 > > >>> > > >>> > > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > > >>> > > >>> return something like > > >>> [(1,2),(3,3),(2,5)] > > > > >> I think you are trying to sort a list of two-dimensional points into a > > >> one-dimensiqonal list in such a way thet points that are close together > > >> in the two-dimensional sense will also be close together in the > > >> one-dimensional list. But that is impossible. > > > > > It's not impossible, it just requires an appropriate distance function > > > used in the sort. > > > > That's a grossly misleading addition. > > > > Once you have an appropriate clustering algorithm > > > > clusters = split_into_clusters(items) # needs access to all items > > > > you can devise a key function > > > > def get_cluster(item, clusters=split_into_clusters(items)): > > return next( > > index for index, cluster in enumerate(clusters) if item in cluster > > ) > > > > such that > > > > grouped_items = sorted(items, key=get_cluster) > > > > but that's a roundabout way to write > > > > grouped_items = sum(split_into_clusters(items), []) > > > > In other words: sorting is useless, what you really need is a suitable > > approach to split the data into groups. > > > > One well-known algorithm is k-means clustering: > > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html > > > > Here is an example with pictures: > > > > https://dzone.com/articles/k-means-clustering-scipy i use number of clusters = 120 https://gist.github.com/hoyeunglee/2475391ad554e3d2b2a40ec24ab47940 https://drive.google.com/file/d/0Bxs_ao6uuBDUZFByNVgzd0Jrdm8/view?usp=sharing using my previous is not suitable for english words but using kmeans is better, however not perfect to cluster words, some words are missing -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
i had changed to use kmeans https://gist.github.com/hoyeunglee/2475391ad554e3d2b2a40ec24ab47940 i do not know whether write it correctly but it seems can cluster to find words in window, but not perfect On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote: > Glenn Linderman wrote: > > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote: > >> Ho Yeung Lee writes: > >> > >>> def isneighborlocation(lo1, lo2): > >>> if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > >>> return 1 > >>> elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > >>> return 1 > >>> elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > >>> return 1 > >>> else: > >>> return 0 > >>> > >>> > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > >>> > >>> return something like > >>> [(1,2),(3,3),(2,5)] > > >> I think you are trying to sort a list of two-dimensional points into a > >> one-dimensiqonal list in such a way thet points that are close together > >> in the two-dimensional sense will also be close together in the > >> one-dimensional list. But that is impossible. > > > It's not impossible, it just requires an appropriate distance function > > used in the sort. > > That's a grossly misleading addition. > > Once you have an appropriate clustering algorithm > > clusters = split_into_clusters(items) # needs access to all items > > you can devise a key function > > def get_cluster(item, clusters=split_into_clusters(items)): > return next( > index for index, cluster in enumerate(clusters) if item in cluster > ) > > such that > > grouped_items = sorted(items, key=get_cluster) > > but that's a roundabout way to write > > grouped_items = sum(split_into_clusters(items), []) > > In other words: sorting is useless, what you really need is a suitable > approach to split the data into groups. > > One well-known algorithm is k-means clustering: > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html > > Here is an example with pictures: > > https://dzone.com/articles/k-means-clustering-scipy -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
I remove red line and capture another version https://gist.github.com/hoyeunglee/99bbe7999bc489a79ffdf0277e80ecb6 it can capture words in windows, but since window words some are black and some gray, some are not exactly black, so I only choose notepad , since it is using black words but some words are splitted, I have already sorted by x[0] and x[1] can it improver to a consecutively a few words "檔案" <- File is succeed but "另存新檔" failed since words are splitted On Thursday, August 3, 2017 at 3:54:13 PM UTC+8, Ho Yeung Lee wrote: > https://gist.github.com/hoyeunglee/3d340ab4e9a3e2b7ad7307322055b550 > > I updated again > > how to do better because some words are stored in different files > > On Thursday, August 3, 2017 at 10:02:01 AM UTC+8, Ho Yeung Lee wrote: > > https://gist.github.com/hoyeunglee/f371f66d55f90dda043f7e7fea38ffa2 > > > > I am near succeed in another way, please run above code > > > > when so much black words, it will be very slow > > so I only open notepad and maximum it without any content > > then capture screen and save as roster.png > > > > and run it, but I discover it can not circle all words with red rectangle > > and only part of words > > > > > > On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote: > > > Glenn Linderman wrote: > > > > > > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote: > > > >> Ho Yeung Lee writes: > > > >> > > > >>> def isneighborlocation(lo1, lo2): > > > >>> if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > > > >>> return 1 > > > >>> elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > > > >>> return 1 > > > >>> elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > > > >>> return 1 > > > >>> else: > > > >>> return 0 > > > >>> > > > >>> > > > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > > > >>> > > > >>> return something like > > > >>> [(1,2),(3,3),(2,5)] > > > > > > >> I think you are trying to sort a list of two-dimensional points into a > > > >> one-dimensiqonal list in such a way thet points that are close together > > > >> in the two-dimensional sense will also be close together in the > > > >> one-dimensional list. But that is impossible. > > > > > > > It's not impossible, it just requires an appropriate distance function > > > > used in the sort. > > > > > > That's a grossly misleading addition. > > > > > > Once you have an appropriate clustering algorithm > > > > > > clusters = split_into_clusters(items) # needs access to all items > > > > > > you can devise a key function > > > > > > def get_cluster(item, clusters=split_into_clusters(items)): > > > return next( > > > index for index, cluster in enumerate(clusters) if item in cluster > > > ) > > > > > > such that > > > > > > grouped_items = sorted(items, key=get_cluster) > > > > > > but that's a roundabout way to write > > > > > > grouped_items = sum(split_into_clusters(items), []) > > > > > > In other words: sorting is useless, what you really need is a suitable > > > approach to split the data into groups. > > > > > > One well-known algorithm is k-means clustering: > > > > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html > > > > > > Here is an example with pictures: > > > > > > https://dzone.com/articles/k-means-clustering-scipy -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
https://gist.github.com/hoyeunglee/3d340ab4e9a3e2b7ad7307322055b550 I updated again how to do better because some words are stored in different files On Thursday, August 3, 2017 at 10:02:01 AM UTC+8, Ho Yeung Lee wrote: > https://gist.github.com/hoyeunglee/f371f66d55f90dda043f7e7fea38ffa2 > > I am near succeed in another way, please run above code > > when so much black words, it will be very slow > so I only open notepad and maximum it without any content > then capture screen and save as roster.png > > and run it, but I discover it can not circle all words with red rectangle > and only part of words > > > On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote: > > Glenn Linderman wrote: > > > > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote: > > >> Ho Yeung Lee writes: > > >> > > >>> def isneighborlocation(lo1, lo2): > > >>> if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > > >>> return 1 > > >>> elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > > >>> return 1 > > >>> elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > > >>> return 1 > > >>> else: > > >>> return 0 > > >>> > > >>> > > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > > >>> > > >>> return something like > > >>> [(1,2),(3,3),(2,5)] > > > > >> I think you are trying to sort a list of two-dimensional points into a > > >> one-dimensiqonal list in such a way thet points that are close together > > >> in the two-dimensional sense will also be close together in the > > >> one-dimensional list. But that is impossible. > > > > > It's not impossible, it just requires an appropriate distance function > > > used in the sort. > > > > That's a grossly misleading addition. > > > > Once you have an appropriate clustering algorithm > > > > clusters = split_into_clusters(items) # needs access to all items > > > > you can devise a key function > > > > def get_cluster(item, clusters=split_into_clusters(items)): > > return next( > > index for index, cluster in enumerate(clusters) if item in cluster > > ) > > > > such that > > > > grouped_items = sorted(items, key=get_cluster) > > > > but that's a roundabout way to write > > > > grouped_items = sum(split_into_clusters(items), []) > > > > In other words: sorting is useless, what you really need is a suitable > > approach to split the data into groups. > > > > One well-known algorithm is k-means clustering: > > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html > > > > Here is an example with pictures: > > > > https://dzone.com/articles/k-means-clustering-scipy -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
https://gist.github.com/hoyeunglee/f371f66d55f90dda043f7e7fea38ffa2 I am near succeed in another way, please run above code when so much black words, it will be very slow so I only open notepad and maximum it without any content then capture screen and save as roster.png and run it, but I discover it can not circle all words with red rectangle and only part of words On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote: > Glenn Linderman wrote: > > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote: > >> Ho Yeung Lee writes: > >> > >>> def isneighborlocation(lo1, lo2): > >>> if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > >>> return 1 > >>> elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > >>> return 1 > >>> elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > >>> return 1 > >>> else: > >>> return 0 > >>> > >>> > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > >>> > >>> return something like > >>> [(1,2),(3,3),(2,5)] > > >> I think you are trying to sort a list of two-dimensional points into a > >> one-dimensiqonal list in such a way thet points that are close together > >> in the two-dimensional sense will also be close together in the > >> one-dimensional list. But that is impossible. > > > It's not impossible, it just requires an appropriate distance function > > used in the sort. > > That's a grossly misleading addition. > > Once you have an appropriate clustering algorithm > > clusters = split_into_clusters(items) # needs access to all items > > you can devise a key function > > def get_cluster(item, clusters=split_into_clusters(items)): > return next( > index for index, cluster in enumerate(clusters) if item in cluster > ) > > such that > > grouped_items = sorted(items, key=get_cluster) > > but that's a roundabout way to write > > grouped_items = sum(split_into_clusters(items), []) > > In other words: sorting is useless, what you really need is a suitable > approach to split the data into groups. > > One well-known algorithm is k-means clustering: > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html > > Here is an example with pictures: > > https://dzone.com/articles/k-means-clustering-scipy -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
Glenn Linderman wrote: > On 8/1/2017 2:10 PM, Piet van Oostrum wrote: >> Ho Yeung Lee writes: >> >>> def isneighborlocation(lo1, lo2): >>> if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: >>> return 1 >>> elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: >>> return 1 >>> elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: >>> return 1 >>> else: >>> return 0 >>> >>> >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) >>> >>> return something like >>> [(1,2),(3,3),(2,5)] >> I think you are trying to sort a list of two-dimensional points into a >> one-dimensiqonal list in such a way thet points that are close together >> in the two-dimensional sense will also be close together in the >> one-dimensional list. But that is impossible. > It's not impossible, it just requires an appropriate distance function > used in the sort. That's a grossly misleading addition. Once you have an appropriate clustering algorithm clusters = split_into_clusters(items) # needs access to all items you can devise a key function def get_cluster(item, clusters=split_into_clusters(items)): return next( index for index, cluster in enumerate(clusters) if item in cluster ) such that grouped_items = sorted(items, key=get_cluster) but that's a roundabout way to write grouped_items = sum(split_into_clusters(items), []) In other words: sorting is useless, what you really need is a suitable approach to split the data into groups. One well-known algorithm is k-means clustering: https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html Here is an example with pictures: https://dzone.com/articles/k-means-clustering-scipy -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
how to write this distance function in sort there are the syntax error On Wednesday, August 2, 2017 at 6:03:13 AM UTC+8, Glenn Linderman wrote: > On 8/1/2017 2:10 PM, Piet van Oostrum wrote: > > Ho Yeung Lee writes: > > > >> def isneighborlocation(lo1, lo2): > >> if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > >> return 1 > >> elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > >> return 1 > >> elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > >> return 1 > >> else: > >> return 0 > >> > >> > >> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > >> > >> return something like > >> [(1,2),(3,3),(2,5)] > > I think you are trying to sort a list of two-dimensional points into a > > one-dimensiqonal list in such a way thet points that are close together > > in the two-dimensional sense will also be close together in the > > one-dimensional list. But that is impossible. > It's not impossible, it just requires an appropriate distance function > used in the sort. -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
On 8/1/2017 2:10 PM, Piet van Oostrum wrote: Ho Yeung Lee writes: def isneighborlocation(lo1, lo2): if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: return 1 elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: return 1 elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: return 1 else: return 0 sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) return something like [(1,2),(3,3),(2,5)] I think you are trying to sort a list of two-dimensional points into a one-dimensiqonal list in such a way thet points that are close together in the two-dimensional sense will also be close together in the one-dimensional list. But that is impossible. It's not impossible, it just requires an appropriate distance function used in the sort. -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
Ho Yeung Lee writes: > def isneighborlocation(lo1, lo2): > if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > return 1 > elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > return 1 > elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > return 1 > else: > return 0 > > > sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > > return something like > [(1,2),(3,3),(2,5)] I think you are trying to sort a list of two-dimensional points into a one-dimensiqonal list in such a way thet points that are close together in the two-dimensional sense will also be close together in the one-dimensional list. But that is impossible. -- Piet van Oostrum WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list
Re: how to sort a list of tuples with custom function
i tried with testing1.sort(key=lambda x: x[0]) but only first element of tuple are grouped then i expect to sort with custom function if difference between first element of tuple and another first element of tuple is less than some value and do for second element too, goal to segmentation of black words from photo from PIL import Image from functools import partial ma = Image.open("roster.png") color1 = ma.load() print ma.size print color1[1,1] color1 = ma.load() print ma.size print color1[1,1] colortolocation = {} def addtogroupkey(keyandmemory, key1, memorycontent): k = key1 if k in keyandmemory: keyandmemory[k].append(memorycontent) else: keyandmemory[k] = [memorycontent] return keyandmemory for ii in range(0, ma.size[0]): for jj in range(0, ma.size[1]): colortolocation = addtogroupkey(colortolocation, color1[ii,jj], (ii,jj)) def isneighborlocation(lo1, lo2): if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: return 1 elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: return 1 elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: return 1 else: return 0 for eachcolor in colortolocation: testing1 = list(colortolocation[eachcolor]) #testing1.sort(key=lambda x: x[1]) #custom_list_indices = {v: i for i, v in enumerate(custom_list)} testing1.sort(key=lambda x: x[0]-x[1]) locations = testing1 locationsgroup = {} continueconnect = 0 for ii in range(0,len(locations)-1): if isneighborlocation(locations[ii], locations[ii+1]) == 1: if continueconnect == 0: keyone = len(locationsgroup)+1 if keyone in locationsgroup: if locations[ii] not in locationsgroup[keyone]: locationsgroup = addtogroupkey(locationsgroup, keyone, locations[ii]) if locations[ii+1] not in locationsgroup[keyone]: locationsgroup = addtogroupkey(locationsgroup, keyone, locations[ii+1]) else: locationsgroup = addtogroupkey(locationsgroup, keyone, locations[ii]) locationsgroup = addtogroupkey(locationsgroup, keyone, locations[ii+1]) continueconnect = 1 else: if len(locationsgroup) > 0: if locations[ii] not in locationsgroup[len(locationsgroup)]: locationsgroup = addtogroupkey(locationsgroup, len(locationsgroup)+1, locations[ii]) else: locationsgroup = addtogroupkey(locationsgroup, len(locationsgroup)+1, locations[ii]) continueconnect = 0 colortolocation[eachcolor] = locationsgroup for kk in colortolocation[(0,0,0)]: if len(colortolocation[(0,0,0)][kk]) > 7: print kk print colortolocation[(0,0,0)][kk] On Wednesday, August 2, 2017 at 3:50:52 AM UTC+8, Ho Yeung Lee wrote: > def isneighborlocation(lo1, lo2): > if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > return 1 > elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > return 1 > elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > return 1 > else: > return 0 > > > sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > > return something like > [(1,2),(3,3),(2,5)] -- https://mail.python.org/mailman/listinfo/python-list