[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737275#comment-16737275
 ] 

Olivér Szabó commented on SOLR-13101:
-------------------------------------

[~ysee...@gmail.com], FYI, i had a POC with these, see:
https://github.com/oleewere/solr-cloud-storage-poc
https://github.com/apache/ambari-infra/tree/cloud-storage-poc (custom solr 
build based on solr tarball)
(used hdfs client ... worked on only real environments... but included 
localstack, gcs emulator as a container..actually s3a setup can work against 
localstack, but that one is broken)
some notes:
- i replaced hadoop jars with custom hwx ones (those with 2.7.x build contains 
some classes that is not there in apache maven repo ones)
- s3n looked good, s3a seems to be broken but it would require some changes in 
aws-sdk (requires to use shared connection pool, that can be set in http 
client).
- wasb/wasbs looked good
- adlsV2 had some ssl related issues (although it did not used ssl) - some 
cipher problems, i used solr with jdk10 in docker, maybe that caused some issues
- gcs connector uses guava 27, solr is using like 14, so that results a 
ClassDefNotFound exception during loading the gcs fs implementation, maybe that 
can be solved with updating to a new guava or shade gcs-connector jar with the 
dependencies

what i have mostly see, i could create shards then adding documents as well. 
interestingly a simple delete query only deleted like 40% of the documents 
(then request failed)
also after stopping solr containers, write.lock files needs to be deleted from 
cloud storage, it would be nice if we would have an option to delete those on 
startup (not sure solr already have this or not)

> Shared storage support in SolrCloud
> -----------------------------------
>
>                 Key: SOLR-13101
>                 URL: https://issues.apache.org/jira/browse/SOLR-13101
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>            Priority: Major
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>    - durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>    - could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>    - don't pay for what you don't need
>    - a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to