Thanks for looking at this problem, David. I have some thoughts and ideas
around the same, but I'll be in a better position to comment after the
holidays. Focusing on chess these days 😊

On Tue, 22 Dec, 2020, 10:17 pm Mike Drob, <[email protected]> wrote:

> Hi David,
>
> Thanks for sharing. I am sure I will have thoughts on this, but won’t be
> able to substantively comment until January. Just letting you know that
> there is interest and not to be discouraged if you get only silence for a
> while.
>
> Hopefully others will look and comment as well.
>
> Mike
>
> On Tue, Dec 22, 2020 at 10:00 AM David Smiley <[email protected]> wrote:
>
>> Hello,
>>
>> There's lots of exciting work going on in Solr at the moment, judging
>> from the SIPs & some JIRA issues.  I want to draw attention to my proposal
>> for shared storage in SolrCloud that I call "BlobDirectory" -- SOLR-15051
>> [1].  It has a linked proposal document[2] in Google Docs.  If any of you
>> have comments / concerns on the design, now is a good time to share them.
>> I expect to share a very early draft WIP PR today, containing an early form
>> of only some of the components.  I'll repeat the issue description here:
>>
>> [1] https://issues.apache.org/jira/browse/SOLR-15051
>> [2]
>> https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing
>> ----
>>
>> This proposal is a way to accomplish shared storage in SolrCloud with a
>> few key characteristics: (A) using a Directory implementation, (B)
>> delegates to a backing local file Directory as a kind of read/write cache
>> (C) replicas have their own "space", (D) , de-duplication across replicas
>> via reference counting, (E) uses ZK but separately from SolrCloud stuff.
>>
>> The Directory abstraction is a good one, and helps isolate shared storage
>> from the rest of SolrCloud that doesn't care.  Using a backing normal file
>> Directory is faster for reads and is simpler than Solr's HDFSDirectory's
>> BlockCache.  Replicas having their own space solves the problem of multiple
>> writers (e.g. of the same shard) trying to own and write to the same space,
>> and it implies that any of Solr's replica types can be used along with what
>> goes along with them like peer-to-peer replication (sometimes
>> faster/cheaper than pulling from shared storage).  A de-duplication feature
>> solves needless duplication of files across replicas and from parent shards
>> (i.e. from shard splitting).  The de-duplication feature requires a place
>> to cache directory listings so that they can be shared across replicas and
>> atomically updated; this is handled via ZooKeeper.  Finally, some sort of
>> Solr daemon / auto-scaling code should be added to implement
>> "autoAddReplicas", especially to provide for a scenario where the leader is
>> gone and can't be replicated from directly but we can access shared storage.
>>
>> For more about shared storage concepts, consider looking at the
>> description in SOLR-13101
>> <https://issues.apache.org/jira/browse/SOLR-13101> and the linked Google
>> Doc.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>

Reply via email to