[ https://issues.apache.org/jira/browse/SOLR-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996083#comment-15996083 ]
Harsh J commented on SOLR-6305: ------------------------------- [~thelabdude] is right here in the description BTW. Hadoop APIs let you pass any arbitrary replication value via the FileSystem.create API - this overrides the local default (dfs.replication config) when passed. In Solr, the API usage is effectively asking the NameNode what its default replication factor is, and then creates a file with that value, ignoring the local configuration. As a result, you cannot specifically control the replication factor of index files in Solr without changing the whole HDFS cluster's default. > Ability to set the replication factor for index files created by > HDFSDirectoryFactory > ------------------------------------------------------------------------------------- > > Key: SOLR-6305 > URL: https://issues.apache.org/jira/browse/SOLR-6305 > Project: Solr > Issue Type: Improvement > Components: hdfs > Environment: hadoop-2.2.0 > Reporter: Timothy Potter > > HdfsFileWriter doesn't allow us to create files in HDFS with a different > replication factor than the configured DFS default because it uses: > {{FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);}} > Since we have two forms of replication going on when using > HDFSDirectoryFactory, it would be nice to be able to set the HDFS replication > factor for the Solr directories to a lower value than the default. I realize > this might reduce the chance of data locality but since Solr cores each have > their own path in HDFS, we should give operators the option to reduce it. > My original thinking was to just use Hadoop setrep to customize the > replication factor, but that's a one-time shot and doesn't affect new files > created. For instance, I did: > {{hadoop fs -setrep -R 1 solr49/coll1}} > My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an > example > Then added some more docs to the coll1 and did: > {{hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3}} > 3 <-- should be 1 > So it looks like new files don't inherit the repfact from their parent > directory. > Not sure if we need to go as far as allowing different replication factor per > collection but that should be considered if possible. > I looked at the Hadoop 2.2.0 code to see if there was a way to work through > this using the Configuration object but nothing jumped out at me ... and the > implementation for getServerDefaults(path) is just: > public FsServerDefaults getServerDefaults(Path p) throws IOException { > return getServerDefaults(); > } > Path is ignored ;-) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org