[ https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754041#comment-16754041 ]
Yonik Seeley commented on SOLR-13101: ------------------------------------- Thinking about how to kick this off... At the most basic level, looking at the HDFS layout scheme we see this ("test" is the name of the collection): {code} local_file_system://.../node1/test_shard1_replica_n1/core.properties hdfs://.../data/test/core_node2/data/ {code} And core.properties looks like: {code} numShards=1 collection.configName=conf1 name=test_shard1_replica_n1 replicaType=NRT shard=shard1 collection=test coreNodeName=core_node2 {code} It seems like the most basic desirable change would be to the naming scheme for collections with shared storage. Instead of .../<collection_name>/<core_node_name>/data it should be .../<collection_name>/<shard_name>/data since there is only one canonical index per shard. > Shared storage support in SolrCloud > ----------------------------------- > > Key: SOLR-13101 > URL: https://issues.apache.org/jira/browse/SOLR-13101 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Yonik Seeley > Priority: Major > > Solr should have first-class support for shared storage (blob/object stores > like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, > etc). > The key component will likely be a new replica type for shared storage. It > would have many of the benefits of the current "pull" replicas (not indexing > on all replicas, all shards identical with no shards getting out-of-sync, > etc), but would have additional benefits: > - Any shard could become leader (the blob store always has the index) > - Better elasticity scaling down > - durability not linked to number of replcias.. a single replica could be > common for write workloads > - could drop to 0 replicas for a shard when not needed (blob store always > has index) > - Allow for higher performance write workloads by skipping the transaction > log > - don't pay for what you don't need > - a commit will be necessary to flush to stable storage (blob store) > - A lot of the complexity and failure modes go away > An additional component a Directory implementation that will work well with > blob stores. We probably want one that treats local disk as a cache since > the latency to remote storage is so large. I think there are still some > "locking" issues to be solved here (ensuring that more than one writer to the > same index won't corrupt it). This should probably be pulled out into a > different JIRA issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org