[ 
https://issues.apache.org/jira/browse/SOLR-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269135#comment-15269135
 ] 

David Smiley commented on SOLR-9055:
------------------------------------

(p.s. use {{bq.}} to quote)

bq. (me) I have a general question about HDFS; I have no real experience with 
it: I wonder if Java's NIO file abstractions could be used so we don't have to 
have separate code? If so it would be wonderful – simpler and less code to 
maintain. See https://github.com/damiencarol/jsr203-hadoop What do you think?

bq. (Gadre) Although integrating HDFS and Java NIO API sounds interesting, I 
would prefer if it is directly provided by HDFS client library as against a 
third party library which may/may not be supported in future. Also since Solr 
provides a HDFS backed Directory implementation, it probably make sense to 
reuse it.

Any thoughts on this one [[email protected]] or [~gchanan] perhaps?

bq. However if we want to keep things simple, we can choose to not provide 
separate APIs to configure "repositories". Instead we can just pick the same 
file-system used to store the indexed data. That means in case of local 
file-system, the backup will be stored on shared file-system using 
SimpleFSDirectory implementation AND for HDFS we will use HdfsDirectory impl. 
Make sense?

I understand what you mean, but it seems a shame, and loses the extensibility 
we want.  I think what this comes down to is, should we re-use the Lucene 
Directory API for moving data in/out of the backup location, or should we use 
something else. 

bq. I think the main problem here is identifying type of file-system used for a 
given collection at the Overseer (The solr core on the other hand already has a 
Directory factory reference. So we can instantiate appropriate directory in the 
snapshooter).

It was suggested early in SOLR-5750 that the location param should have a 
protocol/impl scheme URL prefix (assume {{file://}} if not specified).  That 
may help the Overseer?  Or if you mean it needs to know the directory impl of 
the live indexes well I imagine it could look this up in the same way that it 
is done from Solr's admin screen (which shows the impl factory).


I doubt I'll have time to help much more here... I'm a bit behind on my work 
load.



> Make collection backup/restore extensible
> -----------------------------------------
>
>                 Key: SOLR-9055
>                 URL: https://issues.apache.org/jira/browse/SOLR-9055
>             Project: Solr
>          Issue Type: Task
>            Reporter: Hrishikesh Gadre
>            Assignee: Mark Miller
>         Attachments: SOLR-9055.patch
>
>
> SOLR-5750 implemented backup/restore API for Solr. This JIRA is to track the 
> code cleanup/refactoring. Specifically following improvements should be made,
> - Add Solr/Lucene version to check the compatibility between the backup 
> version and the version of Solr on which it is being restored.
> - Add a backup implementation version to check the compatibility between the 
> "restore" implementation and backup format.
> - Introduce a Strategy interface to define how the Solr index data is backed 
> up (e.g. using file copy approach).
> - Introduce a Repository interface to define the file-system used to store 
> the backup data. (currently works only with local file system but can be 
> extended). This should be enhanced to introduce support for "registering" 
> repositories (e.g. HDFS, S3 etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to