Glad to hear there is interest! Atri Sharma intends to start helping as soon as there is code to show, which is today. The part of it that I think might be most subject to feedback is file listing tracking w/ dedupe... so I'll go slow there knowing there is feedback pending.
Ishan: yeah I saw your update on Twitter about chess. I recently finished watching The Queen's Gambit on Netflix, and it was such a fantastic show that it has gotten me a little more interested in chess too. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Tue, Dec 22, 2020 at 11:55 AM Ishan Chattopadhyaya < [email protected]> wrote: > Thanks for looking at this problem, David. I have some thoughts and ideas > around the same, but I'll be in a better position to comment after the > holidays. Focusing on chess these days 😊 > > On Tue, 22 Dec, 2020, 10:17 pm Mike Drob, <[email protected]> wrote: > >> Hi David, >> >> Thanks for sharing. I am sure I will have thoughts on this, but won’t be >> able to substantively comment until January. Just letting you know that >> there is interest and not to be discouraged if you get only silence for a >> while. >> >> Hopefully others will look and comment as well. >> >> Mike >> >> On Tue, Dec 22, 2020 at 10:00 AM David Smiley <[email protected]> wrote: >> >>> Hello, >>> >>> There's lots of exciting work going on in Solr at the moment, judging >>> from the SIPs & some JIRA issues. I want to draw attention to my proposal >>> for shared storage in SolrCloud that I call "BlobDirectory" -- SOLR-15051 >>> [1]. It has a linked proposal document[2] in Google Docs. If any of you >>> have comments / concerns on the design, now is a good time to share them. >>> I expect to share a very early draft WIP PR today, containing an early form >>> of only some of the components. I'll repeat the issue description here: >>> >>> [1] https://issues.apache.org/jira/browse/SOLR-15051 >>> [2] >>> https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing >>> ---- >>> >>> This proposal is a way to accomplish shared storage in SolrCloud with a >>> few key characteristics: (A) using a Directory implementation, (B) >>> delegates to a backing local file Directory as a kind of read/write cache >>> (C) replicas have their own "space", (D) , de-duplication across replicas >>> via reference counting, (E) uses ZK but separately from SolrCloud stuff. >>> >>> The Directory abstraction is a good one, and helps isolate shared >>> storage from the rest of SolrCloud that doesn't care. Using a backing >>> normal file Directory is faster for reads and is simpler than Solr's >>> HDFSDirectory's BlockCache. Replicas having their own space solves the >>> problem of multiple writers (e.g. of the same shard) trying to own and >>> write to the same space, and it implies that any of Solr's replica types >>> can be used along with what goes along with them like peer-to-peer >>> replication (sometimes faster/cheaper than pulling from shared storage). A >>> de-duplication feature solves needless duplication of files across replicas >>> and from parent shards (i.e. from shard splitting). The de-duplication >>> feature requires a place to cache directory listings so that they can be >>> shared across replicas and atomically updated; this is handled via >>> ZooKeeper. Finally, some sort of Solr daemon / auto-scaling code should be >>> added to implement "autoAddReplicas", especially to provide for a scenario >>> where the leader is gone and can't be replicated from directly but we can >>> access shared storage. >>> >>> For more about shared storage concepts, consider looking at the >>> description in SOLR-13101 >>> <https://issues.apache.org/jira/browse/SOLR-13101> and the linked >>> Google Doc. >>> >>> ~ David Smiley >>> Apache Lucene/Solr Search Developer >>> http://www.linkedin.com/in/davidwsmiley >>> >>
