Hello Zhe, Thanks a lot for the inputs. Storage policies is really what I was looking for one of the problems.
@Nick: I agree that it would be a nice feature to have. Thanks for the info. Regards, Ananth On Fri, Dec 19, 2014 at 10:49 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: > HBase would enjoy a similar functionality. In our case, we'd like all > replicas for all files in a given HDFS path to land on the same set of > machines. That way, in the event of a failover, regions can be assigned to > one of these other machines that has local access to all blocks for all > region files. > > On Thu, Dec 18, 2014 at 3:36 PM, Zhe Zhang <zhe.zhang.resea...@gmail.com> > wrote: > > > > > The second aspect is that our queries are time based and this time > window > > > follows a familiar pattern of old data not being queried much. Hence we > > > would like to preserve the most recent data in the HDFS cache ( impala > is > > > helping us manage this aspect via their command set ) but we would like > > the > > > next recent amount of data chunks to land on an SSD that is present on > > > every datanode. The remaining set of blocks which are "very old but in > > > large quantities" would land on spinning disks. The decision to choose > a > > > given volume is based on the file name as we can control the filename > > that > > > is being used to generate the file. > > > > > > > Have you tried the 'setStoragePolicy' command? It's part of the HDFS > > "Heterogeneous Storage Tiers" work and seems to address your scenario. > > > > > 1. Is there a way to control that all file blocks belonging to a > > particular > > > hdfs directory & file go to the same physical datanode ( and their > > > corresponding replicas as well ? ) > > > > This seems inherently hard: the file/dir could have more data than a > > single DataNode can host. Implementation wise, if requires some sort > > of a map in BlockPlacementPolicy from inode or file path to DataNode > > address. > > > > My 2 cents.. > > > > -- > > Zhe Zhang > > Software Engineer, Cloudera > > https://sites.google.com/site/zhezhangresearch/ > > >