This all sounds useful but doesn't address your question on chunking, right?

Musing a bit, it seems like ultimately what you're going to need is a
ChunkLink, n-ary links for fairly large n ... where a ChunkLink is
basically the same as a SetLink but with the added semantics that
"this set should be considered a chunk from the standpoint of
distributed processing or memory caching"

ChunkLInks could be formed via an heuristic algorithm that greedily
partitions Atomspace into chunks of a desired size (measured in terms
of both nodes and links) ...   Potentially if the result-sets of prior
Pattern Matcher queries are saved these could also be used to
heuristically guide the formation of chunks.  (But note that according
to my current understanding, chunks should  be disjoint whereas the
results of PM queries need not be...)

In a distributed system each chunk (represented by a ChunkLink) has a
certain home local-Atomspace, and one then has some variant of the
publish-subscribe dynamics Matt Chapman suggested but on the chunk
rather than Atom level

In a random Atomspace, this could be horribly inefficient because the
heuristic for ChunkLink formation would badly reflect Atom usage
patterns.   However, if I look at the BioAtomspace -- the biggest
useful Atomspace we have around right now -- it seems like the
heuristic would work OK regarding both imported and inferred links...

An added heuristic would have to be used to determine which machine a
given chunk lives on of course,   But there are lots of cool
algorithms for this in the CS literature, e.g. Ja-Be-Ja which is
wholly distributed/decentralized/localized in operation...

https://www.researchgate.net/publication/279230270_A_Distributed_Algorithm_for_Large-Scale_Graph_Partitioning

But in what I'm suggesting something like Ja-Be-Ja would be carried
out on chunks not Atoms for figuring out which. machines in a
distributed-Atomspace network to put a given ChunkLink and its targets
on

This direction does not contradict what you just suggested with
timestamps and Bind/Get links etc., I'm just trying to more directly
address your earlier question on chunking...

Of course the dilemma re chunking is that to chunk a big knowledge
graph really effectively will require very expensive operations and we
are talking here about something that -- in the context of a dynamic
knowledge graph -- needs to be done quite rapidly and light-weight-ly
as it needs to be redone over and over and updated as the graph
changes....  So we are going to do chunking heuristically and
badly-ish, and rely on the rough overall semantic locality of the
graph to make it not-too-bad... there is nothing else to do is there?
But I guess you already know this Linas but are somehow holding out
for a miracle genius insight to bypass this cruel reality?  Not sure
it exists in this swath of the multiverse...

ben






On Thu, Jul 23, 2020 at 9:26 PM Linas Vepstas <linasveps...@gmail.com> wrote:
>
> Took the dog for a walk, which helps w/ thinking. So .. a very short reply 
> (as short as I can make it) and will ponder the entire email chain tomorrow 
> morning.
>
> The meta-question I pondered during the walk was "what can be prototyped in a 
> few days/weeks?" and the answer becomes simpler (because the choices are 
> fewer). Two parts.
>
> Part one: create a custom atom (UpLink (Atom X) (Number N)) and it will 
> return the incoming set up to N steps upward from X. This atom has a unique 
> hash, and so can always be found.
>
> What should a remote server do, when asked for this? Well, it could just do 
> the look-up, then and there, and return the results.  Alternately, the remote 
> node can do the lookup, attach the results, together with a timestamp, as a 
> Value on that UpLink, and return that. That way, you know how old/stale your 
> results are.
>
> Part two: Oh wait, we already have UpLink. It's called JoinLink.
>
> Part three: Oh wait, we could do this with an arbitrary BindLink/GetLink. We 
> could take the most recent search results, attach a timestamp to the results 
> and attach that as a Value on some key.  That way, you can ask the network 
> for that Bind/Get, and if it comes back with a new-enough timestamp, you can 
> be happy and just use the results, and not re-perform the search. If you're 
> not happy with the results, you can re-run the pattern match, and publish 
> your latest/greatest results to the world. (Or rather, you attach the results 
> to the Bind/Get, and announce "I too have a copy of this atom")
>
> There are still a few holes in what I describe above, but maybe they're not 
> serious. Not sure. My sense is that some variant of this can be prototyped in 
> not much time at all. It might even be usable for the genomics work, where 
> the data is almost totally static, where many searches tend to be built on 
> sub-searches which can be cached, and do not have to be recomputed each time. 
>  Currently, we cache in a scheme wrapper, but caching search results as a 
> Value on the Get/BindLink itself makes more sense.  (FWIW this kind of 
> caching was briefly done for openpsi, many years ago, but fell into disuse. 
> Amen might remember details, I don't. I just remember that caching made 
> sense, at the time.) Habash was thinking about a server for the genomics 
> data, but I think he was going in a different direction. But maybe this works 
> for him?
>
> --linas
>
>
> On Thu, Jul 23, 2020 at 10:20 PM Ben Goertzel <b...@goertzel.org> wrote:
>>
>> Differently but indirectly relatedly, this caching system for graph
>> queries looks interesting,
>>
>> https://openproceedings.org/2017/conf/edbt/paper-119.pdf
>>
>> On Thu, Jul 23, 2020 at 8:15 PM Ben Goertzel <b...@goertzel.org> wrote:
>> >
>> > Matt,
>> >
>> > So regarding these requirements,
>> >
>> > > 1. Some cluster node will "own" each atom by assignment via some simple 
>> > > division of the hash address space.
>> > > 2. Each cluster node will also contain replicas of many other atoms, not 
>> > > only for disaster recovery purposes, but also because mind agents on 
>> > > that node will need in local memory many atoms "owned" by other nodes. 
>> > > Once we've obtained them from their owners, we might as well keep them 
>> > > around until we need to recover memory space for other "borrowed" atoms 
>> > > more urgently needed.
>> > > 3. A mind agent on a given node wants to be able to update atom 
>> > > properties (truth value, etc) locally, without having to talk to the 
>> > > "owner" node directly.
>> > > 4. Perfect consistency of atom state between different nodes is not a 
>> > > strict requirement, but it is desirable for a node to be able to 
>> > > identify the 'authoritative' source for a given atom, and that source 
>> > > should reflect a reasonably recent state of the atom as updated by any 
>> > > replica node.
>> > > 5. Relatively poor storage efficiency is acceptable. I.e., a single node 
>> > > may only be able to dedicate a relatively small portion of its memory to 
>> > > storing the atoms it owns; a majority of its space may go to replicated 
>> > > atoms. Nodes are cheap; we'll just buy more. :-)
>> > >
>> > > Given those design goals, I think we're looking at a publish-subscribe 
>> > > model for replicating updates to atoms.
>> >
>> >
>> > -- what Linas and Cassio and Senna have all posited, is that it may be
>> > more sensible to replace "Atom" with "Chunk" (i.e. sub-metagraph) in
>> > the above requirements..
>> >
>> > What the references I sent in my just-prior email suggest is that, for
>> > the sorts of graphs that tend to be created in real life, defining
>> > Chunks in a fairly simple heuristic way (i.e. each chunk is just a
>> > bunch of tightly-ish connected nodes and links) rather than via
>> > running an expensive partitioning algorithm will generally be
>> > adequate.
>> >
>> > The requirements you state are in my view correct as regards Atoms.
>> > However, the perspective being put forth is that handling these
>> > requirements explicitly on the level of Atoms rather than Chunks will
>> > become computationally intractable given the number of Atoms involved
>> > and the dynamic nature of the Atomspace.
>> >
>> > -- Ben
>>
>>
>>
>> --
>> Ben Goertzel, PhD
>> http://goertzel.org
>>
>> “The only people for me are the mad ones, the ones who are mad to
>> live, mad to talk, mad to be saved, desirous of everything at the same
>> time, the ones who never yawn or say a commonplace thing, but burn,
>> burn, burn like fabulous yellow roman candles exploding like spiders
>> across the stars.” -- Jack Kerouac
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to opencog+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/opencog/CACYTDBcKdcwq%3DBpZ9dS3p4B9-stHF3BOAj5LqngcsLL1%3DQVmMg%40mail.gmail.com.
>
>
>
> --
> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>         --Peter da Silva
>
> --
> You received this message because you are subscribed to the Google Groups 
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to opencog+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/opencog/CAHrUA34AfV5UbAJA2M0VzRe-iJtkg-Ska0xq3zWY-iEYW6babA%40mail.gmail.com.



-- 
Ben Goertzel, PhD
http://goertzel.org

“The only people for me are the mad ones, the ones who are mad to
live, mad to talk, mad to be saved, desirous of everything at the same
time, the ones who never yawn or say a commonplace thing, but burn,
burn, burn like fabulous yellow roman candles exploding like spiders
across the stars.” -- Jack Kerouac

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBemKvf_avgfbGE65cXHPQyXodAr7zdB9t8GXK9tDz8C0w%40mail.gmail.com.

Reply via email to