[ https://issues.apache.org/jira/browse/HADOOP-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388200#comment-15388200 ]
Allen Wittenauer commented on HADOOP-13397: ------------------------------------------- A couple of things: a) I and I know others as well have some rather large licensing questions around Docker images. They effectively act as a binary distribution and it is very much against ASF rules to distribute GPL and other Category X components. It makes me extremely uncomfortable to move forward without some clarification from legal. (Yes, I know other ASF projects are publishing images on docker hub. Hopefully that means that there is a JIRA issue in the LEGAL project to point to.) This is a blocking issue that really needs to get clarified before further time investment. b) I'm going to change the description in this issue from "Official image from Cloudera" to "Cloudera's image". Cloudera can't make an "official image" for Apache Hadoop, so let's clear up any potential confusion before it starts. c) Is this actually useful in reality? The vast vast vast majority of Apache Hadoop deployments add a wide variety of additional components on top of Apache Hadoop to the point that even making a base image still seems like it wouldn't be particularly usable without downstream conflict resolution. It may be useful to make Dockerfile templates, but full blown images? Hmm.. I'm going to need some convincing. d) Upon working with the existing Dockerfile and porting it over to support the ASF PowerPC build machines (HADOOP-13329) we need to be aware that we're going to need more than one Dockerfile per hardware platform. We made that mistake with start-build-env.sh (which we'll fix as part of 13329), but we should avoid it here. (We've gotten some poking from the ARM64 folks as well.) e) This is going to hit upon the larger issue of distributed configuration management, which is going to be extremely tricky to make consumable, never mind what types of configurations are actually supported: security? persistent storage? Then there are client configs--which, it's worthwhile pointing out, not even the vendor tools handle particularly well. f) I think a much more attainable goal to start is making a single Dockerfile that runs all of the Apache Hadoop daemons as a single node configuration. That's a highly desirable thing to have for a variety of reasons. If there is still heavy interest in breaking it apart, it gives a base working example before proceeding further to tease out the various daemons. > Add dockerfile for Hadoop > ------------------------- > > Key: HADOOP-13397 > URL: https://issues.apache.org/jira/browse/HADOOP-13397 > Project: Hadoop Common > Issue Type: Bug > Reporter: Klaus Ma > > For now, there's no community version Dockerfile in Hadoop; most of docker > images are provided by vendor, e.g. > 1. Official image from Cloudera is the quickstart image: > https://hub.docker.com/r/cloudera/quickstart/ > 2. From HortonWorks sequenceiq: > https://hub.docker.com/r/sequenceiq/hadoop-docker/ > 3. MapR provides the mapr-sandbox-base: > https://hub.docker.com/r/maprtech/mapr-sandbox-base/ > The proposal of this JIRA is to provide a community version Dockerfile in > Hadoop, and here's some requirement: > 1. Seperated docker image for master & agents, e.g. resource manager & node > manager > 2. Default configuration to start master & agent instead of configurating > manually > 3. Start Hadoop process as no-daemon > Here's my dockerfile to start master/agent: > https://github.com/k82cn/outrider/tree/master/kubernetes/imgs/yarn > I'd like to contribute it after polishing :). > Email Thread : > http://mail-archives.apache.org/mod_mbox/hadoop-user/201607.mbox/%3CSG2PR04MB162977CFE150444FA022510FB6370%40SG2PR04MB1629.apcprd04.prod.outlook.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org