[jira] [Commented] (SOLR-16153) Clone Collection

David Smiley (Jira) Tue, 12 Apr 2022 14:40:05 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521345#comment-17521345
 ]


David Smiley commented on SOLR-16153:
-------------------------------------

I imagine a sub-task would be a core level clone -- a new Core admin operation, 
or perhaps a modification/addition to the existing core creation.  Using the 
replication handler would work to get the index but obviously it would do lots 
of needless IO.  MERGEINDEXES would work but it merges segments as part of its 
job -- which is tons of work that is slow and wasteful -- thus even worse.  The 
actual Lucene level call is IndexWriter.addIndexes(CodecReader) but there's a 
IndexWriter.addIndexes(Directory) that is much more appropriate -- it doesn't 
do any merges, and we can pass the HardlinkCopyDirectoryWrapper to make it do 
the hard links (splits use this too).  Great!  Unfortunately it creates a lock 
in the source directory, but the lock will already be there from a running 
SolrIndexWriter on the source core.  We could do similar stuff as 
SolrIndexSplitter.split for the LINK method which engages 
UpdateLog.bufferUpdates and then closes the index writer, thus effectively 
being in a read-only state for a time.  If for some reason there is no 
updateLog, we can just use SolrCore.readOnly/indexEnabled as a substitute.

> Clone Collection
> ----------------
>
>                 Key: SOLR-16153
>                 URL: https://issues.apache.org/jira/browse/SOLR-16153
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Priority: Major
>
> It should be possible to "clone" a collection, and to do so cheaply.  This 
> might be its own command or be an option of collection creation; either way, 
> it'd be great to support the vast majority of the same options of collection 
> creation.  A cloned collection should be the same in every way (shard ranges, 
> collection properties, replicas types and counts, etc.) unless configured to 
> be different (e.g. specify a different configSet).  Most importantly, a 
> cloned collection should have the same data, and this can be accomplished via 
> UNIX hard links (when supported) to the underlying files.  This would make 
> clones cheap!  A no-data option should be supported as well, useful when the 
> intended action is to re-index subsequently.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16153) Clone Collection

Reply via email to