Timothy Potter created SOLR-6305:
------------------------------------

             Summary: Ability to set the replication factor for index files 
created by HDFSDirectoryFactory
                 Key: SOLR-6305
                 URL: https://issues.apache.org/jira/browse/SOLR-6305
             Project: Solr
          Issue Type: Improvement
          Components: hdfs
         Environment: hadoop-2.2.0
            Reporter: Timothy Potter


HdfsFileWriter doesn't allow us to create files in HDFS with a different 
replication factor than the configured DFS default because it uses:     
{{FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);}}

Since we have two forms of replication going on when using 
HDFSDirectoryFactory, it would be nice to be able to set the HDFS replication 
factor for the Solr directories to a lower value than the default. I realize 
this might reduce the chance of data locality but since Solr cores each have 
their own path in HDFS, we should give operators the option to reduce it.

My original thinking was to just use Hadoop setrep to customize the replication 
factor, but that's a one-time shot and doesn't affect new files created. For 
instance, I did:

{{hadoop fs -setrep -R 1 solr49/coll1}}

My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an example

Then added some more docs to the coll1 and did:

{{hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3}}

3 <-- should be 1

So it looks like new files don't inherit the repfact from their parent 
directory.

Not sure if we need to go as far as allowing different replication factor per 
collection but that should be considered if possible.

I looked at the Hadoop 2.2.0 code to see if there was a way to work through 
this using the Configuration object but nothing jumped out at me ... and the 
implementation for getServerDefaults(path) is just:

  public FsServerDefaults getServerDefaults(Path p) throws IOException {
    return getServerDefaults();
  }

Path is ignored ;-)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to