Timothy Potter created SOLR-6305:
------------------------------------
Summary: Ability to set the replication factor for index files
created by HDFSDirectoryFactory
Key: SOLR-6305
URL: https://issues.apache.org/jira/browse/SOLR-6305
Project: Solr
Issue Type: Improvement
Components: hdfs
Environment: hadoop-2.2.0
Reporter: Timothy Potter
HdfsFileWriter doesn't allow us to create files in HDFS with a different
replication factor than the configured DFS default because it uses:
{{FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);}}
Since we have two forms of replication going on when using
HDFSDirectoryFactory, it would be nice to be able to set the HDFS replication
factor for the Solr directories to a lower value than the default. I realize
this might reduce the chance of data locality but since Solr cores each have
their own path in HDFS, we should give operators the option to reduce it.
My original thinking was to just use Hadoop setrep to customize the replication
factor, but that's a one-time shot and doesn't affect new files created. For
instance, I did:
{{hadoop fs -setrep -R 1 solr49/coll1}}
My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an example
Then added some more docs to the coll1 and did:
{{hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3}}
3 <-- should be 1
So it looks like new files don't inherit the repfact from their parent
directory.
Not sure if we need to go as far as allowing different replication factor per
collection but that should be considered if possible.
I looked at the Hadoop 2.2.0 code to see if there was a way to work through
this using the Configuration object but nothing jumped out at me ... and the
implementation for getServerDefaults(path) is just:
public FsServerDefaults getServerDefaults(Path p) throws IOException {
return getServerDefaults();
}
Path is ignored ;-)
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]