[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-12-10 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247510#comment-17247510
 ] 

Ilan Ginzburg commented on SOLR-13101:
--

I have no issue with "will not fix" this Jira.

>From my perspective, the fundamental problem of this approach is not the 
>introduction of a new replica type but the need to commit every batch to be 
>able to push segments and having to wait for the push to complete and succeed 
>before calling the indexing itself as successful (there are a few possible 
>optimizations such as pushing files before commit happens so they're ready on 
>blob by then, but the fundamental issues do not go away). That's a major 
>performance degradation.

So yes, please close it. Thanks.

Looking forward to see a different approach that does not have the problems 
listed above! (or less of them :))

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-12-10 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247258#comment-17247258
 ] 

David Smiley commented on SOLR-13101:
-

I would like to close this issue as won't-fix because the substance and feature 
branch (with linked PRs) pointing to this issue is dead-in-the-water (will not 
be merged, or further publicly contributed to).  However the issue title, 
"Shared storage support" (rather general) is not a "won't-fix" !  So with that, 
I propose I re-title the issue to "Shared storage via new SHARED replica type" 
because in my mind, that's the most stand-out aspect of this PR compared to 
other alternatives.  WDYT [~ilan]?

That said, do not lose hope for a solution to come into being!  I've been 
excitedly working on a new plan I've been internally sharing that solves the 
contribut-ability matters that the SHARED replica type implementation lacks.  
If things go well in the coming weeks... there will end up being a new Jira 
issue to be called "BlobDirectory, a shared storage approach" that will link 
here.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-12-09 Thread Nahian-Al Hasan (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247028#comment-17247028
 ] 

Nahian-Al Hasan commented on SOLR-13101:


Hi, any updates on this issue? I would be keen to see a working version of this 
feature available in SolrCloud. We're having to scale our SolrCloud permanent 
storage environment in a GKE cluster on a regular basis, so it would be nice to 
have the data persisted in GCS so that we don't have to scale our storage every 
time. 

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-07-12 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156224#comment-17156224
 ] 

Noble Paul commented on SOLR-13101:
---

Thanks [~ilan] for the clarification

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-07-12 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156222#comment-17156222
 ] 

Ilan Ginzburg commented on SOLR-13101:
--

TBH adding this into SolrCloud under its current form (of this work and of 
SolrCloud) is going to be very challenging, and as David said, this is not the 
kind of abstraction that can be pluggable.
But not to worry: there are no short term intentions to merge anything here. 

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-07-11 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156184#comment-17156184
 ] 

Noble Paul commented on SOLR-13101:
---

[~dsmiley] I do not see any answers to [this 
question|https://issues.apache.org/jira/browse/SOLR-13101?focusedCommentId=17002508=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17002508]

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-07-11 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156182#comment-17156182
 ] 

Noble Paul commented on SOLR-13101:
---

As ishan mentioned, I see this as a problem of lack of modularity. We are not 
thinking in terms of separation of concerns. Why are we not able to define the 
interaction points with existing Solr? If this is not an integral part of Solr, 
it should have a set of interfaces thorough which it interacts with Solr. When 
people say that it has to be intertwined with Solr, the onus is on them to 
explain why it is so. 

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-07-11 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156149#comment-17156149
 ] 

Ishan Chattopadhyaya commented on SOLR-13101:
-

How do we know that this support won't suffer the same neglect that HDFS and 
CDCR suffered? I haven't seen a design document for the proposed work here, so 
I can't comment on specifics, but in general we do not want such a massive 
patch into Solr core for a non essential feature such as this. We should start 
with defining the interfaces that can stay in core and concrete implementations 
that should stay outside.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-07-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156146#comment-17156146
 ] 

David Smiley commented on SOLR-13101:
-

Noble; you asked on the 21st of December last year (13 comments up) and Yonik 
answered the next day.  In summary, the approach here is not at all pluggable, 
just as our other current replica types (NRT/PULL/TLOG) aren't pluggable either.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-07-10 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155802#comment-17155802
 ] 

Noble Paul commented on SOLR-13101:
---

[~andy_vuong]Vuong is it anything that stops us from making this a plugin 
available through Solr's package system?

Making it as a package helps many more users to use it and make the feedback 
loop faster

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-07-10 Thread Andy Vuong (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155639#comment-17155639
 ] 

Andy Vuong commented on SOLR-13101:
---

Hey Solr Community!

I wanted to share an update on this JIRA. We've recently decided to continue 
work on this project internally for convenience and as we work through new 
challenges and re-visit our design of shared storage. Some of these challenges 
include:
 * Forcing commits and pushes to Blob on each (sub) indexing batch makes things 
expensive (paying traffic to S3) and less efficient from a SolrCloud 
perspective (too many small commits, merge cost),
 * Delaying ack to client on an indexing batch until data is indexed, segment 
is created then pushed to S3 slows things down considerably,
 * Transaction logs are used heavily in SolrCloud code. Having nodes with non 
persistent storage is challenging (for example post shard split recovery mode)

Work is progressing but we'll no longer use our feature branch for this work. 
We'll be sure to keep the community updated in the future as we progress on 
addressing these issues.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-05-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110552#comment-17110552
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit ad3f0a30181aa4d7ba099ee67fd45fa98cf9c2c0 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Bilal Waheed
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad3f0a3 ]

SOLR-13101: SHARED replica's distributed indexing (#1430)

* SOLR-13101: SHARED replica's distributed indexing
The basic purpose of this change was to refactor most of the SHARED replica 
logic out of DistributedZkUpdateProcessor. Along with that refactoring I came 
across couple of functional issues in the logic that have been fixed too.

Functional fixes:
-If a replica looses its leadership in the middle of indexing batch, it should 
still push its changes to the shared store.
-SHARED replica does not need to process soft commits and does not need to 
broadcast hard commits to all the shards of a collection.
-Previously we would pull from the shared store even when the doc being 
added/deleted is not meant for the current core/shard.
-Previously DistributedZkUpdateProcessor#processDelete was running its pull 
logic even before the request had been setup 
(DistributedZkUpdateProcessor#setupRequest).
-DistributedZkUpdateProcessorTest have been deleted in favor of new 
SharedStoreDistributedIndexingTest SharedCoreIndexingBatchProcessorTest.

Refactoring:
-Most of DistributedZkUpdateProcessor's SHARED replica logic is moved into a 
new SharedCoreIndexingBatchProcessor. The purpose of this class is to pull from 
the shared store at the start of an indexing batch (if the core is stale) and 
push to the shared store at the end of a successfully committed indexing batch.
-CoreUpdateTracker has been deleted and its only persistShardIndexToSharedStore 
method has been renamed to pushCoreToSharedStore and moved to CorePusher.
-BlobStoreUtilsTest#syncLocalCoreWithSharedStore is renamed to 
pullCoreFromSharedStore and moved into a new CorePuller class and the tests to 
CorePullerTests.
-I did rename phrase "blob store" to "shared store" at some places in the 
changed classes. But it was not meant to be an exhaustive attempt.

* -Throw error for pull request for an unknown.
-Log warning when indexing a non-active shard.

* Address CR feedback.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-05-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110553#comment-17110553
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit ad3f0a30181aa4d7ba099ee67fd45fa98cf9c2c0 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Bilal Waheed
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ad3f0a3 ]

SOLR-13101: SHARED replica's distributed indexing (#1430)

* SOLR-13101: SHARED replica's distributed indexing
The basic purpose of this change was to refactor most of the SHARED replica 
logic out of DistributedZkUpdateProcessor. Along with that refactoring I came 
across couple of functional issues in the logic that have been fixed too.

Functional fixes:
-If a replica looses its leadership in the middle of indexing batch, it should 
still push its changes to the shared store.
-SHARED replica does not need to process soft commits and does not need to 
broadcast hard commits to all the shards of a collection.
-Previously we would pull from the shared store even when the doc being 
added/deleted is not meant for the current core/shard.
-Previously DistributedZkUpdateProcessor#processDelete was running its pull 
logic even before the request had been setup 
(DistributedZkUpdateProcessor#setupRequest).
-DistributedZkUpdateProcessorTest have been deleted in favor of new 
SharedStoreDistributedIndexingTest SharedCoreIndexingBatchProcessorTest.

Refactoring:
-Most of DistributedZkUpdateProcessor's SHARED replica logic is moved into a 
new SharedCoreIndexingBatchProcessor. The purpose of this class is to pull from 
the shared store at the start of an indexing batch (if the core is stale) and 
push to the shared store at the end of a successfully committed indexing batch.
-CoreUpdateTracker has been deleted and its only persistShardIndexToSharedStore 
method has been renamed to pushCoreToSharedStore and moved to CorePusher.
-BlobStoreUtilsTest#syncLocalCoreWithSharedStore is renamed to 
pullCoreFromSharedStore and moved into a new CorePuller class and the tests to 
CorePullerTests.
-I did rename phrase "blob store" to "shared store" at some places in the 
changed classes. But it was not meant to be an exhaustive attempt.

* -Throw error for pull request for an unknown.
-Log warning when indexing a non-active shard.

* Address CR feedback.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-05-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110541#comment-17110541
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 738dbb35c7c1db82141d90b200c0bc69c6fa1b9b in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Andy Vuong
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=738dbb3 ]

SOLR-13101: Enable shared store via system property only (#1424)



> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-03-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065823#comment-17065823
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit ca6dc53c78b897a7f402fa1681d825c5cd7b412f in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Andy Vuong
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ca6dc53 ]

SOLR-13101: Make dir hash computation optional and resilient (#1359)

* Make dir hash computation optional and resilient

* Throw illegalstateexception and update comments

* Delete unused method, fix typo

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-02-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030317#comment-17030317
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 51bf98cf8367a67d5d17fec6e172625967221cb4 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from ebehrendt
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=51bf98c ]

SOLR-13101: Convert nanotime to ms (#1208)

* Convert nanoseconds to milliseconds

* Clean up from incorrect merge

* Incorporate CR feedback to move time to a utility function. Use TimeUnit java 
util to convert from nanoseconds to milliseconds.

* Rename method from getCurrentNanoTimeInMs to getCurrentTimeMs


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023628#comment-17023628
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 5c797bfa31063a6532e223a18db16c0a2dc8effe in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from ebehrendt
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5c797bf ]

SOLR-13101: Log accurate file counts for Push and Pull in CorePushPull (#1195)

* CorePushPull blob interaction log line inaccurate in case of failures. Fix 
logged file and bytes count to be accurate in both success and failure

* Move file transfer count data into inner class to preserve incremented values 
when exception is thrown


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-12-29 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005160#comment-17005160
 ] 

David Smiley commented on SOLR-13101:
-

The issue definitely deserves a 
[SIP|https://cwiki.apache.org/confluence/display/SOLR/Solr+Improvement+Proposals]!

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-12-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003748#comment-17003748
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit ef01979c6484012e1369cb2aedf927fd58152709 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Megan Carey
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ef01979 ]

SOLR-13101: ant precommit fixes (#1117)

* Fix the gson version reference

* Fixed all precommit failures

Co-authored-by: Andy Vuong 


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-12-25 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003169#comment-17003169
 ] 

Ishan Chattopadhyaya commented on SOLR-13101:
-

Sorry, haven't looked into the branch yet.
bq. Yes, the fundamentals (the new SHARED replica type) need to be in core. 
Specific storage options (such as S3) are already designed to be pluggable.
This means that the fundamentals stay in solr-core and the specific pluggable 
storage options go into separate modules (contribs?), right?

If that's the case, would it be possible (for easier reviewing) to please split 
this up into two separate PRs, i.e. one for the solr-core changes, and another 
PR for the separate modules?

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-12-23 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002508#comment-17002508
 ] 

Noble Paul commented on SOLR-13101:
---

I would love to see a few more details

 

Is it a standard Solr plugin that I can define in solrconfig.xml? Can it be 
configured through remote API?
If yes? which one
If not, let's have a separate discussion

What are the public touch points? 
* remote APIs
* configurations
* files created/used in ZK/filesystem

We need to make every new addition to Solr easily digestible to a casual 
observer.



> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-12-22 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001982#comment-17001982
 ] 

Yonik Seeley commented on SOLR-13101:
-

bq. Is there  any reason why wee can't release this as a third party plugin

Yes, the fundamentals (the new SHARED replica type) need to be in core.  
Specific storage options (such as S3) are already designed to be pluggable.
Another point is that deployment in public cloud environments is now the norm, 
as is separation of compute and storage.  It deserves first-class support.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-12-21 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001807#comment-17001807
 ] 

Noble Paul commented on SOLR-13101:
---

Is there  any reason why wee can't release this as a third party plugin using 
the new package manager instead of pushing this to Solr codebase?

 

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-12-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001717#comment-17001717
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 036ce539525aa9c0eb0270d94e4105cd1a0313e9 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Bilal Waheed
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=036ce53 ]

SOLR-13101: Concurrency tests for SHARED collection. (#1081)

* SOLR-13101: Concurrency tests for SHARED collection.

-Concurrency tests for SHARED collection.
-On core container shutdown, shutdown the background blob delete manager too.
-Use correct casing for SharedCoreStage enum.
-Added INDEXING_BATCH_FINISHED to indicate the end of a batch 
(BLOB_PUSH_FINISHED does not help if push itself runs into error)

* -CR feedback.


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991863#comment-16991863
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 1a6e1e6ebcddc73c488c1eaee0201cb2fa518109 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Bilal Waheed
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1a6e1e6 ]

SOLR-13101: Don't invoke push to shared store for isolated commits. (#1065)

-Disable the noisy "going back to task queue" log line.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-11-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982769#comment-16982769
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit db72b8b2061875bd421e1a657caecbca0c922817 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Andy Vuong
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=db72b8b ]

SOLR-13101: Address flakiness of tests using async pulls and handle interrupt 
properly (#1029)



> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-11-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979713#comment-16979713
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 9201c586ba05537a049e6406070361cdda561a4c in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Bilal Waheed
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9201c58 ]

SOLR-13101:These are pretty much non functional changes, related to SHARED 
replica concurrent updates (#1028)

https://github.com/apache/lucene-solr/pull/983/commits/581f468f9914ce2488201efbed42fd43dc4b481
- Pull read lock is only required before the local indexing starts. If indexing 
has to pull it can acquire and release write lock without anything to do with 
read lock first. Removing the unneeded lock upgrade/downgrade logic to simplify 
things.
- Added one pager explanation in the start of SharedCoreConcurrencyController 
around overall concurrency design.
- Added mores comments around BlobCoreMetadata#generation number and its usage.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-11-14 Thread Andy Vuong (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974515#comment-16974515
 ] 

Andy Vuong commented on SOLR-13101:
---

We still need to work on adding some documentation to the ref-guide on how to 
configure/use the feature from an end-user prospective. I suppose a doc would 
will be useful covering public interfaces, additions to ZK, and the overall 
design for solr developers as well.

 

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-30 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963477#comment-16963477
 ] 

Noble Paul commented on SOLR-13101:
---

Is it possible to give an example of how one would use this new feature?

like workflow, APIs, public interfaces, new additions to ZK etc?

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-30 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963450#comment-16963450
 ] 

Yonik Seeley commented on SOLR-13101:
-

The branch https://github.com/apache/lucene-solr/tree/jira/SOLR-13101 has been 
brought up to date by this PR:
https://github.com/apache/lucene-solr/pull/983
None of the commits mentioned this JIRA, which is why I think this issue wasn't 
automatically updates.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962203#comment-16962203
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 0e66beca46d193fed0fc864d55487961ce7f4cc5 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Yonik Seeley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0e66bec ]

SOLR-13101: disable SharedStorageSplitTest.testLiveSplit


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962202#comment-16962202
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 619dde553a0510a5380691bbe6037d0dcdee46c3 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Yonik Seeley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=619dde5 ]

SOLR-13101: merge 8.3 branch


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962201#comment-16962201
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 57245e9c93caa2173deede6f28dc29b6105e60f5 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Yonik Seeley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=57245e9 ]

Merge branch 'branch_8_3' into jira/SOLR-13101


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942991#comment-16942991
 ] 

ASF subversion and git services commented on SOLR-13101:


Commit 8a34ce0257cd48ad2c65a94ace2d9d3e8d102f60 in lucene-solr's branch 
refs/heads/jira/SOLR-13101 from Yonik Seeley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8a34ce0 ]

SOLR-13101: fix test compilation


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-10-02 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942890#comment-16942890
 ] 

Yonik Seeley commented on SOLR-13101:
-

bq. How far do you think is it complete? Do you forsee a lot of more work going 
in here? Or, do you suggest we start reviewing it and attempt to merge it soon 
(in a week or so?).

I think it's got a bit more to go.  It would be nice if the behavior matched 
normal solr semantics a little closer... would be easier to get better test 
coverage by reusing existing tests and changing the replica type.  Some things 
off the top of my head:
 - a commit doesn't cause latest changes to be visible on replicas (a query on 
a non-leader replica actually causes an async pull from blob of the latest 
index)
 - there are currently some concurrency issues with index pushing
 - I 
I need to dig into the code in general more... as you can see from the commits 
on the branch, this work was all done by my colleagues, not me.  But we're 
working on encouraging more open development!


> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-09-30 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941362#comment-16941362
 ] 

Ishan Chattopadhyaya commented on SOLR-13101:
-

Can we collaborate over the ASF slack for discussing harmonizing the 3 blob 
stores? I am okay with having all three, if they serve different usecases; just 
that we need to have a cohesive and consistent story around it in terms of 
documentation.

bq. I plan on creating a branch jira/SOLR-13101 soon for future work on this 
issue.
How far do you think is it complete? Do you forsee a lot of more work going in 
here? Or, do you suggest we start reviewing it and attempt to merge it soon (in 
a week or so?).

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-09-27 Thread Yonik Seeley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939809#comment-16939809
 ] 

Yonik Seeley commented on SOLR-13101:
-

I plan on creating a branch jira/SOLR-13101 soon for future work on this issue.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-09-25 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938084#comment-16938084
 ] 

Noble Paul commented on SOLR-13101:
---

We can have multiple file share systems. Just use a different name so that we 
don't confuse the users

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-09-25 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937998#comment-16937998
 ] 

David Smiley commented on SOLR-13101:
-

Solr already has a [Blob 
Store|https://lucene.apache.org/solr/guide/8_0/blob-store-api.html] for jars 
(plugins).  And as Noble points out, SOLR-13710 introduces _a second_ Blob 
Store in SOLR-13710 that appears duplicative with the former, committed to 8.x. 
 Eventually either could be used for not just plugins but resources (e.g. 
language models, etc.) generally.  Does it make sense to use the same name 
"blob store" for index data?  That would imply not just a common name but some 
common APIs as well that work seamlessly.  I'm not sure if these use cases fit 
well together or not.  If we separate them, I suggest we abandon this nebulous 
word "blob" and be more specific – a "Resource Store" and a "Index Store".  
What do others think?

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-09-19 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933641#comment-16933641
 ] 

Noble Paul commented on SOLR-13101:
---

The new package management system already has a blob store . We can probably 
combine these efforts to avoid confusion

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org