> If you want to have 2-3 or 5 or 10 or 20 atomspaces talk to each-other
about genomic data, that is fine. But don't ask those atomspaces to also
store robot data, and language-processing data, and face-recognition data.
Partition them off into a group of peers who are interested only in
genomics.  They don't need to process those atoms that say that Sophia just
moved her arm, or that Sophia heard a round of applause after her speech at
some conference at the opposite end of the planet. There is absolutely no
need for that.
> What there is a need for is to find out who's doing what. So, if I have a
new robot-dancing algorithm, I want to find those Sophias with the newer
shoulder-joint designs, and talk to those atomspaces.

I can think of two ways to handle this using the Cassandra Architecture:

1) Implement a custom org.apache.cassandra.locator.AbstractEndpointSnitch class
where you "misuse" the `rack` and `datacenter` designations to instead
refer to your groups of peers. I.e., you might have a 'rack' called
'genomics' and then pers in that rack would use a query consistency of
"ONE" to ensure that they retrieve data only from the 'genomics' group if
it exists there.

2) You extend your partition key to include grouping data besides the chunk
central atom handle hash. I.e., the partition key can be calculated using
some explicit grouping label, like adding a "group: genomics" value onto
Atoms, or a bitmap for whether or not the chunk has an outgoing link to any
of the predefined set of "NetworkTopologyNodes," i.e., a special atom type
for the purpose.

Using a pub-sub messaging paradigm, you could have nodes that update chunks
publish their update to both a "local" topic used only by other group
members for lower latency updates, and also to a "global" used by the
undifferentiated masses to replicate those updates for data resiliency and
for use by other peer groups. I'm sure clever topic partitioning (in the
Kafka sense) could also be used here for performance optimizations.

> The who-has-got-what-where info can be published in a global index, a
universally-shared lookup table. e.g. IPFS. So this is like bit-torrent --
whatever content you want, you look it up in the global DHT, but then, to
actually download on bit-torrent, you only talk to the local peers who
actually have that data,

Anything this is globally shared and frequently updated doesn't scale.
That's why it's better to use hash-based mappings to vnodes with a location
that only rarely changes, combined with a local cache of peers having
replicas. If your DHT never contains more than a small multiple of the
total number of peers, and is only updated when a new peer joins or leaves
the cluster (which I expect to be extremely rare in current opencog use
cases, i.e., only in case of hardware failure) then I don't think you'll
run into performance problems with your DHT.

> You don't talk to the entire universe.

I don't think anyone is suggesting this is a good idea.

> Pie-in-the-sky design is usually terrible.  If you have actual, specific
technical problems, if you have trouble accomplishing some specific task,
that is when you talk, think, invent, write new code.

You are absolutely correct; but pie-in-the-sky design is way more fun than
real work... :-)

Is there a public document somewhere describing actual, present use-cases
for distributed atomspace? Ideally with some useful guesses at performance
requirements, in terms of updates per second to be processed on a single
node and across the cluster, and reasonably estimated hardware specs (num
cores, ram, disk) per peer?


All the Best,

Matt

--
Please interpret brevity as me valuing your time, and not as any negative
intention.


On Wed, Jul 29, 2020 at 1:21 PM Linas Vepstas <linasveps...@gmail.com>
wrote:

>
>
> On Wed, Jul 29, 2020 at 8:34 AM Abdulrahman Semrie <hsami...@gmail.com>
> wrote:
>
>>  > I think it's a mistake to try to think of a distributed atomspace as
>> one super-giant, universe-filling uniform, undifferentiated blob of storage.
>>
>> It is not clear to me why this is a mistake.
>>
>
> The mention of "quorum-sensing in bacteria" is not entirely spurious. The
> way that quorum sensing works is that a bacterium emits small polypeptides,
> which other bacteria can sense (i.e. can "smell"). The local strength of
> the "smell" provides feedback to bacteria as to how many neighbors it has.
> This can be used to perform computing; for example, in slime-molds, it is
> used to solve the two-armed bandit problem (the exploit-vs-explore problem
> - google it, it's fun). Turns out that tree-roots, shrubs, etc. use this
> technique as well.
>
> The problem with this technique is it's slow -- rate-limited by diffusion,
> and high-cross-talk -- everybody smells everything, messages interfere with
> one-another.  The jellyfish solves this problem by inventing the neuron.
> The neuron is a Star Trek teleporter, a Star Gate for small polypeptides.
> Except now we call them neuro-transmitters, not polypeptides. So, the
> polypeptide walks into a star-gate, and is instantly transported (about a
> millisecond) to a location a few inches, a few feet away -- this is 5-10
> orders of magnitude faster than diffusion.  And there's no cross-talk --
> the shielding around a neuron means that nothing leaks out of the axon --
> the star-gates are located at the dendrites, only. Message transmission is
> clean, no interference. High-fidelity.
>
> Neurons partition themselves off into local groupings. They only talk to
> peers. There is no every-to-every connection. Yes, sometimes neurons grow
> new connections, or drop old ones, but this is slow.  This partitioning has
> a huge data-processing advantage over the universe-filling undifferentiated
> blob of bacteria or slime-mold.
>
> If you want to have 2-3 or 5 or 10 or 20 atomspaces talk to each-other
> about genomic data, that is fine. But don't ask those atomspaces to also
> store robot data, and language-processing data, and face-recognition data.
> Partition them off into a group of peers who are interested only in
> genomics.  They don't need to process those atoms that say that Sophia just
> moved her arm, or that Sophia heard a round of applause after her speech at
> some conference at the opposite end of the planet. There is absolutely no
> need for that.
>
> What there is a need for is to find out who's doing what. So, if I have a
> new robot-dancing algorithm, I want to find those Sophias with the newer
> shoulder-joint designs, and talk to those atomspaces. The
> who-has-got-what-where info can be published in a global index, a
> universally-shared lookup table. e.g. IPFS. So this is like bit-torrent --
> whatever content you want, you look it up in the global DHT, but then, to
> actually download on bit-torrent, you only talk to the local peers who
> actually have that data, You don't talk to the entire universe.
>
> So here -- you look up to see who's got the genomic data you want, and you
> talk to just them.  And if, for example, you recently imported a newer
> cell-ontology, or a new copy of some other bio dataset, you publish that,
> like bit-torrent, so that other atomspaces can find it, and connect, and
> download.
>
> -- Linas
>
> --
> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>         --Peter da Silva
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA350d%2B-6FQHzXsH3c6YVqyEqdyFUXHuHk-OjxcUG_Gv32g%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA350d%2B-6FQHzXsH3c6YVqyEqdyFUXHuHk-OjxcUG_Gv32g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAPE4pjCb3_8Zbgv5k9NxqkcPXsk9JyE7_G_1%3DSj67ZkdG9%3Dpkw%40mail.gmail.com.

Reply via email to