Hi, FYI,
A relevant jira HDFS-6133 tries to tell Balancer not to move around the blocks stored at the favored nodes that application selected. I reviewed the patch, and the latest on looks good to me. Hope some committers can pick it up and push it forward. Thanks. --Yongjun On Fri, Dec 19, 2014 at 1:52 PM, Ananth Gundabattula < agundabatt...@gmail.com> wrote: > > Hello Zhe, > > Thanks a lot for the inputs. Storage policies is really what I was looking > for one of the problems. > > @Nick: I agree that it would be a nice feature to have. Thanks for the > info. > > Regards, > Ananth > > On Fri, Dec 19, 2014 at 10:49 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: > > > HBase would enjoy a similar functionality. In our case, we'd like all > > replicas for all files in a given HDFS path to land on the same set of > > machines. That way, in the event of a failover, regions can be assigned > to > > one of these other machines that has local access to all blocks for all > > region files. > > > > On Thu, Dec 18, 2014 at 3:36 PM, Zhe Zhang <zhe.zhang.resea...@gmail.com > > > > wrote: > > > > > > > The second aspect is that our queries are time based and this time > > window > > > > follows a familiar pattern of old data not being queried much. Hence > we > > > > would like to preserve the most recent data in the HDFS cache ( > impala > > is > > > > helping us manage this aspect via their command set ) but we would > like > > > the > > > > next recent amount of data chunks to land on an SSD that is present > on > > > > every datanode. The remaining set of blocks which are "very old but > in > > > > large quantities" would land on spinning disks. The decision to > choose > > a > > > > given volume is based on the file name as we can control the filename > > > that > > > > is being used to generate the file. > > > > > > > > > > Have you tried the 'setStoragePolicy' command? It's part of the HDFS > > > "Heterogeneous Storage Tiers" work and seems to address your scenario. > > > > > > > 1. Is there a way to control that all file blocks belonging to a > > > particular > > > > hdfs directory & file go to the same physical datanode ( and their > > > > corresponding replicas as well ? ) > > > > > > This seems inherently hard: the file/dir could have more data than a > > > single DataNode can host. Implementation wise, if requires some sort > > > of a map in BlockPlacementPolicy from inode or file path to DataNode > > > address. > > > > > > My 2 cents.. > > > > > > -- > > > Zhe Zhang > > > Software Engineer, Cloudera > > > https://sites.google.com/site/zhezhangresearch/ > > > > > >