Re: Shared Storage -- BlobDirectory, SOLR-15051

David Smiley Tue, 22 Dec 2020 10:51:32 -0800

Glad to hear there is interest!  Atri Sharma intends to start helping as
soon as there is code to show, which is today.  The part of it that I think
might be most subject to feedback is file listing tracking w/ dedupe... so
I'll go slow there knowing there is feedback pending.


Ishan: yeah I saw your update on Twitter about chess.  I recently finished
watching The Queen's Gambit on Netflix, and it was such a fantastic show
that it has gotten me a little more interested in chess too.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Dec 22, 2020 at 11:55 AM Ishan Chattopadhyaya <
[email protected]> wrote:

> Thanks for looking at this problem, David. I have some thoughts and ideas
> around the same, but I'll be in a better position to comment after the
> holidays. Focusing on chess these days 😊
>
> On Tue, 22 Dec, 2020, 10:17 pm Mike Drob, <[email protected]> wrote:
>
>> Hi David,
>>
>> Thanks for sharing. I am sure I will have thoughts on this, but won’t be
>> able to substantively comment until January. Just letting you know that
>> there is interest and not to be discouraged if you get only silence for a
>> while.
>>
>> Hopefully others will look and comment as well.
>>
>> Mike
>>
>> On Tue, Dec 22, 2020 at 10:00 AM David Smiley <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> There's lots of exciting work going on in Solr at the moment, judging
>>> from the SIPs & some JIRA issues.  I want to draw attention to my proposal
>>> for shared storage in SolrCloud that I call "BlobDirectory" -- SOLR-15051
>>> [1].  It has a linked proposal document[2] in Google Docs.  If any of you
>>> have comments / concerns on the design, now is a good time to share them.
>>> I expect to share a very early draft WIP PR today, containing an early form
>>> of only some of the components.  I'll repeat the issue description here:
>>>
>>> [1] https://issues.apache.org/jira/browse/SOLR-15051
>>> [2]
>>> https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing
>>> ----
>>>
>>> This proposal is a way to accomplish shared storage in SolrCloud with a
>>> few key characteristics: (A) using a Directory implementation, (B)
>>> delegates to a backing local file Directory as a kind of read/write cache
>>> (C) replicas have their own "space", (D) , de-duplication across replicas
>>> via reference counting, (E) uses ZK but separately from SolrCloud stuff.
>>>
>>> The Directory abstraction is a good one, and helps isolate shared
>>> storage from the rest of SolrCloud that doesn't care.  Using a backing
>>> normal file Directory is faster for reads and is simpler than Solr's
>>> HDFSDirectory's BlockCache.  Replicas having their own space solves the
>>> problem of multiple writers (e.g. of the same shard) trying to own and
>>> write to the same space, and it implies that any of Solr's replica types
>>> can be used along with what goes along with them like peer-to-peer
>>> replication (sometimes faster/cheaper than pulling from shared storage).  A
>>> de-duplication feature solves needless duplication of files across replicas
>>> and from parent shards (i.e. from shard splitting).  The de-duplication
>>> feature requires a place to cache directory listings so that they can be
>>> shared across replicas and atomically updated; this is handled via
>>> ZooKeeper.  Finally, some sort of Solr daemon / auto-scaling code should be
>>> added to implement "autoAddReplicas", especially to provide for a scenario
>>> where the leader is gone and can't be replicated from directly but we can
>>> access shared storage.
>>>
>>> For more about shared storage concepts, consider looking at the
>>> description in SOLR-13101
>>> <https://issues.apache.org/jira/browse/SOLR-13101> and the linked
>>> Google Doc.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>

Re: Shared Storage -- BlobDirectory, SOLR-15051

Reply via email to