Re: Data-structure for multiway associativity in Python

2018-01-28 Thread Steven D'Aprano
On Sun, 28 Jan 2018 14:48:02 -0800, qrious wrote:

> First list = { 1, 2, 3}
> Second list = { 4, 5, 6}
> Third list = { 7, 8, 9}
> 
> If I pass 9 as the argument, the return value of the function would be 
> {7, 8}.



subsets = [{1, 2, 3}, {4, 5, 6}, {7, 8, 9}]

data = {}
for subset in subsets:
for key in subset:
data[key] = subset

# At this point, your master data looks like this:
# 
# {1: {1, 2, 3},  2: {1, 2, 3},  3: {1, 2, 3},
#  4: {4, 5, 6},  5: {4, 5, 6},  6: {4, 5, 6}, 
#  7: {7, 8, 9},  8: {7, 8, 9},  9: {7, 8, 9}}
#
# but don't worry, that's not three COPIES of each set, just three
# references to the same one: each reference costs only a single
# pointer.

# Find a set associated with a key
subset = data[9]  # this is a lightning-fast lookup

# Get the items in the set, excluding the key itself
result = subset - {9}
assert result == {7, 8}

# Add a new data point to that set:
subset.add(10)
data[10] = subset

# Delete a data point:
del subset[9]
del data[9]



If you had started with a concrete example from the start, this would not 
have seemed like such a mysterious and difficult problem. Describing it 
as "multiway associativity" doesn't really help: as far as I can see, 
it's not multiway at all, it is just a straightforward case of one way 
associativity between each key and a set of values that contains that key.

Concrete examples are *really* useful for clarifying the requirements.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Data-structure for multiway associativity in Python

2018-01-28 Thread qrious
On Sunday, January 28, 2018 at 7:00:38 AM UTC-8, Steven D'Aprano wrote:

> 
> Since you specified that there are no lists with shared members, why 
> bother returning a list of lists? There will only ever be a single 
> matching list.
> 

That's correct. It will be a single list. My mistake in typing. 

> 
> To speed it up more, we'd need to know more information about how you are 
> using this. For example, if the values c1, ... d1, ... etc have some sort 
> of relationship, 

No relationship. 

> you might be able to generate some kind of multiway tree 
> that avoids having to search all of the thousands of lists before giving 
> up.

Which brings us to the Subject line of this thread. Without any relationship 
among the members, could we solve this using clever data structure? 
> 
> Are searches going to typically hit the same set c1... over and over 
> again? If so, then after matching, bring it to the front of the master 
> list. (You might want to use a deque for that.)
> 
> 
> 
> > Here a hash may be a way to go, but need help in figuring it out. Also,
> > can there be a faster and more memory efficient solution other than
> > hashes?
> 
> Probably not, not unless you have some sort of internal structure you can 
> take advantage of. For example, if all the strings in any group start 
> with the same letter, then you can dispatch directly to the right list:
> 
> data = {'c': {c1, c2, c3, ..., cn},
> 'd': {d1, ... } # and so on...
> }
> 
> def getAssocList(element):
> if element in data[element[0]]:
> return L
> raise ValueError('not found')
> 
> 
> But if there is no structure at all, and the elements in each list are 
> completely arbitrary, then there is nothing better than a linear search 
> through the entire collection of lists, looking at every single element 
> until you've matched what you're after.

The motivation of posting this thread is to explore if the above is true. 

> 
> But your description is pretty vague and for all I know what you actually 
> want is already a solved problem. Can you give more detail and cut-down 
> example of your data set? Say, two or three values per list, and two or 
> three lists.
> 
> 

First list = { 1, 2, 3} 
Second list = { 4, 5, 6} 
Third list = { 7, 8, 9} 

If I pass 9 as the argument, the return value of the function would be {7, 8}.


> -- 
> Steve

Thanks very much for your help and thoughts. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Data-structure for multiway associativity in Python

2018-01-28 Thread Steven D'Aprano
On Sat, 27 Jan 2018 10:01:47 -0800, qrious wrote:

> I need a data structure and a corresponding (hopefully fast) mechanism
> associated with it to do the following. While I am looking for the
> concept first, my preference for implementation of this will be in
> Python.
> 
> [c1, c2,..., cn] is a list of strings (for my own implementation, but
> could be any other type for the generic problem). There will be many
> hundreds, if not thousands, of such lists with no shared member.
> 
> The method getAssocList(e) will return lists of the lists for which e is
> an element.

Since you specified that there are no lists with shared members, why 
bother returning a list of lists? There will only ever be a single 
matching list.

data = [
[c1, c2, ..., cn],
[d1, d2, ..., dn], 
# hundreds more...
[z1, z2, ..., zn],
]

def getAssocList(element):
for L in data:
if element in L:
return L
raise ValueError('not found')


For faster results, use sets {c1, c2, ..., cn} rather than lists. The 
outer list still has to be a list though.

To speed it up more, we'd need to know more information about how you are 
using this. For example, if the values c1, ... d1, ... etc have some sort 
of relationship, you might be able to generate some kind of multiway tree 
that avoids having to search all of the thousands of lists before giving 
up.

Are searches going to typically hit the same set c1... over and over 
again? If so, then after matching, bring it to the front of the master 
list. (You might want to use a deque for that.)



> Here a hash may be a way to go, but need help in figuring it out. Also,
> can there be a faster and more memory efficient solution other than
> hashes?

Probably not, not unless you have some sort of internal structure you can 
take advantage of. For example, if all the strings in any group start 
with the same letter, then you can dispatch directly to the right list:

data = {'c': {c1, c2, c3, ..., cn},
'd': {d1, ... } # and so on...
}

def getAssocList(element):
if element in data[element[0]]:
return L
raise ValueError('not found')


But if there is no structure at all, and the elements in each list are 
completely arbitrary, then there is nothing better than a linear search 
through the entire collection of lists, looking at every single element 
until you've matched what you're after.

But your description is pretty vague and for all I know what you actually 
want is already a solved problem. Can you give more detail and cut-down 
example of your data set? Say, two or three values per list, and two or 
three lists.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Data-structure for multiway associativity in Python

2018-01-27 Thread MRAB

On 2018-01-27 18:01, qrious wrote:



I need a data structure and a corresponding (hopefully fast) mechanism 
associated with it to do the following. While I am looking for the concept 
first, my preference for implementation of this will be in Python.

[c1, c2,..., cn] is a list of strings (for my own implementation, but could be 
any other type for the generic problem). There will be many hundreds, if not 
thousands, of such lists with no shared member.

The method getAssocList(e) will return lists of the lists for which e is an 
element.

Here a hash may be a way to go, but need help in figuring it out. Also, can 
there be a faster and more memory efficient solution other than hashes?

You could build a dict where the key is the element and the value is a 
list of those lists that contain that element.


The 'defaultdict' class is useful for that.


from collections import defaultdict

assoc_dict = defaultdict(list)

for lst in list_of_lists:
for elem in lst:
assoc_dict[elem].append(lst)

# Optionally convert to a plain dict once it's built.
assoc_dict = dict(assoc_dict)
--
https://mail.python.org/mailman/listinfo/python-list


Re: Data-structure for multiway associativity in Python

2018-01-27 Thread Dan Stromberg
It's possible, but not common, to do association lists in Python.
They're pretty inefficient in just about any language.

I'm not totally clear on what you need, but it might be a good thing
to do a list of sets - if you're looking for an in-memory solution.

On Sat, Jan 27, 2018 at 10:33 AM, Dennis Lee Bieber
 wrote:
> On Sat, 27 Jan 2018 10:01:47 -0800 (PST), qrious 
> declaimed the following:
>
>>
>>
>>I need a data structure and a corresponding (hopefully fast) mechanism 
>>associated with it to do the following. While I am looking for the concept 
>>first, my preference for implementation of this will be in Python.
>>
>>[c1, c2,..., cn] is a list of strings (for my own implementation, but could 
>>be any other type for the generic problem). There will be many hundreds, if 
>>not thousands, of such lists with no shared member.
>>
>>The method getAssocList(e) will return lists of the lists for which e is an 
>>element.
>>
>>Here a hash may be a way to go, but need help in figuring it out. Also, can 
>>there be a faster and more memory efficient solution other than hashes?
>
>
> Don't know about speed or memory but...
>
> SQLite3 (*n* is primary key/auto increment; _n_ is foreign key)
>
> LoL(*ID*, description)
>
> anL(*ID*, _LoL_ID_, cx)
>
> select LoL.description, anL.cx from LoL
> inner join anL on anL.LoL_ID = LoL.ID
> where anL.cx like "%e%"
>
> {note: using like and wildcards means e is anywhere in the cx field;
> otherwise just use
>
> anL.cx = "e"
> }
> --
> Wulfraed Dennis Lee Bieber AF6VN
> wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list