On Wed, Feb 20, 2013 at 4:29 PM, Chris Douglas <cdoug...@apache.org> wrote:
> The throughput on this thread is too high. > > Nicholas/Suresh: do either of you disagree with the general approach > in HDFS-347? Holding an inevitable branch open causes a lot of pain > (Suresh, you endured much of that personally with security, which had > a lot of follow-on work). While it makes sense to block HDFS-347 from > 2.x before all concerns are addressed, are most objections so > fundamental that the merge to trunk should be prevented (except for > HDFS-2246)? Is there related development that will be impacted? Colin > won't disappear after this is committed, so if the answers to these > questions are "no", then let's move forward. > Yes. I have already stated this earlier. "... if the changes are deemed to be trivial, we can do it post merge to trunk." > Todd: Yes, the idea that the project only allows optimizations that > work on all platforms is idiotic, which is why nobody has suggested > it. Given that HDFS-347 is a strictly better approach, once committed, > there will be ample motivation to add support for other OSes and > remove HDFS-2246 entirely. Nobody is confused about this. There's > ample precedent for retaining obscure, clumsy features as a temporary > stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks, > configurable combiner semantics). What's the virtue of insisting on > removing this? Unless there was a lot of follow-on work, HDFS-2246 > doesn't look like a lot of code... -C > > On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <t...@cloudera.com> wrote: > > On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas <sur...@hortonworks.com> > wrote: > >>> Given that this is an optimization, and we have a ton of optimizations > >>> which don't yet run on Windows, I don't think that should be > >>> considered. Additionally, the Windows support has not yet been merged, > >>> nor is it in any release, so this isn't a regression. > >>> > >> > >> This is a critical functionality for HBase peformance and an > optimization > >> we consider > >> very important to have. > > > > Too bad it doesn't work in any of the normal installations... none of > > the packages for Hadoop would allow it to work, given that the data > > directories will be owned by HDFS and not world readable, and > > tasks/HBase would run as an "hbase" user, which wouldn't have direct > > access to the block files. > > > >>> > >>> I would be happy to review an addition to the HDFS-347 branch which > >>> addresses this issue. But I don't think we should be maintaining two > >>> codepaths for the sake of an optimization on a platform which is not > >>> yet officially supported on trunk, especially when the old code path > >>> is _insecure_ and thus unusable in most environments. > >> > >> > >> I have to disagree. No where in the jira or the design it is explicitly > >> stated that > >> the old short circuit functionality is being removed. My assumption has > been > >> that it will not be removed. > > > > I've tried this avenue in the past on other insecurities which were > > fixed. Sorry if you were depending on insecure behavior. The project > > should move on and not have 3+ ways of implementing the same thing. > > > >> As regards "officially supported", we have been doing > >> windows development for > >> more than a year. In fact branch-1-win is being used by a lot users. > Given > >> merging it to branch-1 requires first making it available in trunk, we > have > >> been doing > >> a lot of work in branch-trunk-win. It is almost ready to be merged as > well. > >> > >> I am -1 on removing existing short circuit until an alternative short > >> circuit similar > >> to HDFS-347 on all the platforms. > > > > Great -- are you committed to building this equivalent feature for > > Windows, then? On what timeline? From my viewpoint, Windows isn't a > > supported platform *right now*, so vetoing based on it seems > > meritless. > > > > BTW, the posix_fadvise based readahead is an important optimization > > for many workloads, too, but from what I can tell looking at the > > Windows branch, it doesn't support it. There are other places in the > > Windows branch where performance is going to be worse - eg disabling > > the pipelined crc32c implementation will be a 2-3x hit on that code > > path. > > > > No one has voted to merge Windows support, and if merging Windows > > support means that, from now on, every _optimization_ must work on > > Windows, I don't think I will be able to vote +1. The vast majority of > > the community is _not_ running Windows, and I don't want to block > > progress on the small number of developers who know how to program > > against that platform. > > > > If that's the axe hanging over our head with the Windows branch, then > > I'm all for saying "good, keep it on a branch and don't merge it to > > trunk". > > > > I was hoping we could all work together a bit better here... > > contentious merge votes like this just cause cases where different > > distros diverge by merging different branches way ahead of the > > upstream (eg yours with windows support, ours with 347, etc) > > > > -Todd > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <t...@cloudera.com> wrote: > > On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas <sur...@hortonworks.com> > wrote: > >>> Given that this is an optimization, and we have a ton of optimizations > >>> which don't yet run on Windows, I don't think that should be > >>> considered. Additionally, the Windows support has not yet been merged, > >>> nor is it in any release, so this isn't a regression. > >>> > >> > >> This is a critical functionality for HBase peformance and an > optimization > >> we consider > >> very important to have. > > > > Too bad it doesn't work in any of the normal installations... none of > > the packages for Hadoop would allow it to work, given that the data > > directories will be owned by HDFS and not world readable, and > > tasks/HBase would run as an "hbase" user, which wouldn't have direct > > access to the block files. > > > >>> > >>> I would be happy to review an addition to the HDFS-347 branch which > >>> addresses this issue. But I don't think we should be maintaining two > >>> codepaths for the sake of an optimization on a platform which is not > >>> yet officially supported on trunk, especially when the old code path > >>> is _insecure_ and thus unusable in most environments. > >> > >> > >> I have to disagree. No where in the jira or the design it is explicitly > >> stated that > >> the old short circuit functionality is being removed. My assumption has > been > >> that it will not be removed. > > > > I've tried this avenue in the past on other insecurities which were > > fixed. Sorry if you were depending on insecure behavior. The project > > should move on and not have 3+ ways of implementing the same thing. > > > >> As regards "officially supported", we have been doing > >> windows development for > >> more than a year. In fact branch-1-win is being used by a lot users. > Given > >> merging it to branch-1 requires first making it available in trunk, we > have > >> been doing > >> a lot of work in branch-trunk-win. It is almost ready to be merged as > well. > >> > >> I am -1 on removing existing short circuit until an alternative short > >> circuit similar > >> to HDFS-347 on all the platforms. > > > > Great -- are you committed to building this equivalent feature for > > Windows, then? On what timeline? From my viewpoint, Windows isn't a > > supported platform *right now*, so vetoing based on it seems > > meritless. > > > > BTW, the posix_fadvise based readahead is an important optimization > > for many workloads, too, but from what I can tell looking at the > > Windows branch, it doesn't support it. There are other places in the > > Windows branch where performance is going to be worse - eg disabling > > the pipelined crc32c implementation will be a 2-3x hit on that code > > path. > > > > No one has voted to merge Windows support, and if merging Windows > > support means that, from now on, every _optimization_ must work on > > Windows, I don't think I will be able to vote +1. The vast majority of > > the community is _not_ running Windows, and I don't want to block > > progress on the small number of developers who know how to program > > against that platform. > > > > If that's the axe hanging over our head with the Windows branch, then > > I'm all for saying "good, keep it on a branch and don't merge it to > > trunk". > > > > I was hoping we could all work together a bit better here... > > contentious merge votes like this just cause cases where different > > distros diverge by merging different branches way ahead of the > > upstream (eg yours with windows support, ours with 347, etc) > > > > -Todd > > -- > > Todd Lipcon > > Software Engineer, Cloudera > -- http://hortonworks.com/download/