Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-09-13 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/

[Sep 13, 2017 5:49:34 PM] (cliang) HADOOP-14804. correct wrong parameters 
format order in core-default.xml.
[Sep 13, 2017 5:59:04 PM] (wang) HADOOP-14857. Fix downstream shaded client 
integration test. Contributed
[Sep 13, 2017 6:06:47 PM] (rohithsharmaks) YARN-7157. Add admin configuration 
to filter per-user's apps in secure
[Sep 13, 2017 7:29:08 PM] (epayne) YARN-4727. Unable to override the 
/home/ericp/run/conf/ env variable for
[Sep 13, 2017 7:38:58 PM] (epayne) Revert 'YARN-4727. Unable to override the 
$HADOOP_CONF_DIR env variable
[Sep 13, 2017 7:41:55 PM] (epayne) YARN-4727. Unable to override the 
$HADOOP_CONF_DIR env variable for
[Sep 13, 2017 7:54:02 PM] (arp) HADOOP-14867. Update HDFS Federation setup 
document, for incorrect




-1 overall


The following subsystems voted -1:
findbugs unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
   Hard coded reference to an absolute pathname in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext)
 At DockerLinuxContainerRuntime.java:absolute pathname in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext)
 At DockerLinuxContainerRuntime.java:[line 490] 

Failed junit tests :

   hadoop.hdfs.TestDFSInotifyEventInputStream 
   hadoop.hdfs.server.namenode.TestReencryptionWithKMS 
   hadoop.hdfs.TestClientProtocolForPipelineRecovery 
   hadoop.hdfs.TestLeaseRecoveryStriped 
   hadoop.hdfs.TestReconstructStripedFile 
   hadoop.hdfs.TestDFSUpgradeFromImage 
   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   hadoop.hdfs.TestFileAppendRestart 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040 
   hadoop.yarn.server.nodemanager.containermanager.TestContainerManager 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 
   hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler 
   hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest 
   hadoop.mapreduce.v2.hs.webapp.TestHSWebApp 
   hadoop.fs.azure.TestNativeAzureFileSystemConcurrency 
   hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked 
   hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck 
   hadoop.fs.azure.TestOutOfBandAzureBlobOperations 
   hadoop.fs.azure.TestNativeAzureFileSystemContractMocked 
   hadoop.fs.azure.TestNativeAzureFileSystemMocked 
   hadoop.fs.azure.TestWasbFsck 
   hadoop.yarn.sls.TestSLSRunner 
   hadoop.yarn.sls.TestReservationSystemInvariants 
   hadoop.yarn.sls.nodemanager.TestNMSimulator 

Timed out junit tests :

   org.apache.hadoop.hdfs.TestWriteReadStripedFile 
   
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/diff-compile-javac-root.txt
  [292K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/whitespace-eol.txt
  [11M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/whitespace-tabs.txt
  [1.2M]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/diff-javadoc-javadoc-root.txt
  [1.9M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/523/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [1.2

Re: [DISCUSS] official docker image(s) for hadoop

2017-09-13 Thread Mingliang Liu
> It would be very helpful for testing the RC.
For testing and voting, I have been using docker containers for a while, see 
code at: https://github.com/weiqingy/caochong 


> TL;DR: I propose to create official hadoop images and upload them to the 
> dockerhub
I’m +1 on this idea. The “official” docker image basically means a commitment 
to maintain well documented and broadly tested images, which seems not a burden 
to us.

Ceph has a community docker project https://github.com/ceph/ceph-docker 
 and I think our scope here is similar to 
it.

Mingliang

> On Sep 13, 2017, at 11:39 AM, Yufei Gu  wrote:
> 
> It would be very helpful for testing the RC. To vote a RC, committers and
> PMCs usually spend lots of time to compile, deploy the RC, do several
> sanity tests, then +1 for the RC. The docker image potentially saves the
> compilation and deployment time, and people can do more tests.
> 
> Best,
> 
> Yufei
> 
> On Wed, Sep 13, 2017 at 11:19 AM, Wangda Tan  wrote:
> 
>> +1 to add Hadoop docker image for easier testing / prototyping, it gonna be
>> super helpful!
>> 
>> Thanks,
>> Wangda
>> 
>> On Wed, Sep 13, 2017 at 10:48 AM, Miklos Szegedi <
>> miklos.szeg...@cloudera.com> wrote:
>> 
>>> Marton, thank you for working on this. I think Official Docker images for
>>> Hadoop would be very useful for a lot of reasons. I think that it is
>> better
>>> to have a coordinated effort with production ready base images with
>>> dependent images for prototyping. Does anyone else have an opinion about
>>> this?
>>> 
>>> Thank you,
>>> Miklos
>>> 
>>> On Fri, Sep 8, 2017 at 5:45 AM, Marton, Elek  wrote:
>>> 
 
 TL;DR: I propose to create official hadoop images and upload them to
>> the
 dockerhub.
 
 GOAL/SCOPE: I would like improve the existing documentation with
 easy-to-use docker based recipes to start hadoop clusters with various
 configuration.
 
 The images also could be used to test experimental features. For
>> example
 ozone could be tested easily with these compose file and configuration:
 
 https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
 
 Or even the configuration could be included in the compose file:
 
 https://github.com/elek/hadoop/blob/docker-2.8.0/example/doc
 ker-compose.yaml
 
 I would like to create separated example compose files for federation,
>>> ha,
 metrics usage, etc. to make it easier to try out and understand the
 features.
 
 CONTEXT: There is an existing Jira https://issues.apache.org/jira
 /browse/HADOOP-13397
 But it’s about a tool to generate production quality docker images
 (multiple types, in a flexible way). If no objections, I will create a
 separated issue to create simplified docker images for rapid
>> prototyping
 and investigating new features. And register the branch to the
>> dockerhub
>>> to
 create the images automatically.
 
 MY BACKGROUND: I am working with docker based hadoop/spark clusters
>> quite
 a while and run them succesfully in different environments (kubernetes,
 docker-swarm, nomad-based scheduling, etc.) My work is available from
>>> here:
 https://github.com/flokkr but they could handle more complex use cases
 (eg. instrumenting java processes with btrace, or read/reload
>>> configuration
 from consul).
 And IMHO in the official hadoop documentation it’s better to suggest
>> to
 use official apache docker images and not external ones (which could be
 changed).
 
 Please let me know if you have any comments.
 
 Marton
 
 -
 To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
 For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
 
 
>>> 
>> 



Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-13 Thread Jian He
Hi Allen,

Thanks for sharing the feedback. I opened YARN-7191 for addressing the feedback.
We can move the discussions there. 

Thanks,
Jian

> On Sep 13, 2017, at 10:10 AM, Allen Wittenauer  
> wrote:
> 
> 
>> On Sep 8, 2017, at 9:25 AM, Jian He  wrote:
>> 
>> Hi Allen,
>> The documentations are committed. Please check QuickStart.md and others in 
>> the same folder.
>> YarnCommands.md doc is updated to include new commands.
>> DNS default port is also documented. 
>> Would you like to give a look and see if it address your concerns ?
> 
>   Somewhat. Greatly improved, but there’s still way too much “we’re 
> working on this” and “here’s a link to a JIRA” and just general brokenness 
> going on.
> 
>   Here’s some examples from concepts.  Concepts!  The document I’d expect 
> to give me very basic “when we talk about X, we mean Y” definitions:
> 
> "A host of scheduling features are being developed to support long running 
> services.”
> 
>   Yeah, ok?  How is this a concept?
> 
>  or
> 
>   "[YARN-3998](https://issues.apache.org/jira/browse/YARN-3998) 
> implements a retry-policy to let NM re-launch a service container when it 
> fails.”
> 
> 
>   The patch itself went through nine revisions and a long discussion. 
> Would an end user care about the details in that JIRA?  
> 
>   If the answer to the last question is YES, then the documentation has 
> failed.  The whole point of documentation is so they don’t have to go digging 
> into the details of the implementation, the decision process that got us 
> there, etc.  If they care enough about the details, they’ll run through the 
> changelog and click on the JIRA link there.  If the summary line of the 
> changelog isn’t obvious, well… then we need better summaries.
> 
>   etc, etc.
> 
> ...
> 
>   The sleep example is nice.  Now, let’s see a non-toy example:  multiple 
> instances of Apache httpd or MariaDB or something real and not from the 
> Hadoop echo chamber (e.g., non-JVM-based).  If this is for “native” services, 
> this shouldn’t be a problem, right?  Give a real example and users will buy 
> what you’re selling.  I also think writing the docs and providing an example 
> of doing something big and outside the team’s comfort zone will clarify where 
> end users are going to need more help than what’s being provided.  Getting a 
> MariaDB instance or three up will help tremendously here.
> 
>   Which reminds me: something the documentation doesn’t cover is storage. 
> What happens to it, where does it come from, etc, etc.  That’s an important 
> detail that I didn’t see covered.  (I may have missed it.)  
> 
> …
> 
>   Why are there directions to enable other, partially unrelated services 
> in here?  Shouldn’t there be pointers to their specific documentation?  Is 
> the expectation that if the requirements for those other services change that 
> contributors will need to update multiple documents?
> 
> "Start the DNS server”
> 
>   Just… yikes.
> 
>   a) yarn classname … This is not how we do user-facing things. 
> The fact it’s not really possible for a *daemon* to be put in the 
> YarnCommands.md doc should be a giant red flag that something isn’t going 
> correctly here.
>   b) no jsvc support for something that it’s strongly hinted at 
> wanting to run privileged = an instant -1 for failing basic security 
> practices.  There’s zero reason for it to be running continually as root.
>   c) If this would have been hooked into the shell scripts 
> appropriately, logs, user switching, etc would have been had for free.
>   d) Where’s stop?  Right. Since it’s outside the scripts, there 
> is no pid support so one has to do all of that manually….
> 
> 
> Given:
> 
>"3. Supports reverse lookups (name based on IP). Note, this works only 
> for Docker containers.”
> 
> then:
> 
>   "It should not be used as a fully-functional corporate DNS.”
> 
> Scratch corporate.  It’s not a fully functional DNS server if it can’t do 
> reverse lookups.  (Which, ironically, means it’s not suitable for use with 
> Apache Hadoop, given it requires both fwd and rev DNS ...)
> 
> 



Re: [DISCUSS] official docker image(s) for hadoop

2017-09-13 Thread Yufei Gu
It would be very helpful for testing the RC. To vote a RC, committers and
PMCs usually spend lots of time to compile, deploy the RC, do several
sanity tests, then +1 for the RC. The docker image potentially saves the
compilation and deployment time, and people can do more tests.

Best,

Yufei

On Wed, Sep 13, 2017 at 11:19 AM, Wangda Tan  wrote:

> +1 to add Hadoop docker image for easier testing / prototyping, it gonna be
> super helpful!
>
> Thanks,
> Wangda
>
> On Wed, Sep 13, 2017 at 10:48 AM, Miklos Szegedi <
> miklos.szeg...@cloudera.com> wrote:
>
> > Marton, thank you for working on this. I think Official Docker images for
> > Hadoop would be very useful for a lot of reasons. I think that it is
> better
> > to have a coordinated effort with production ready base images with
> > dependent images for prototyping. Does anyone else have an opinion about
> > this?
> >
> > Thank you,
> > Miklos
> >
> > On Fri, Sep 8, 2017 at 5:45 AM, Marton, Elek  wrote:
> >
> > >
> > > TL;DR: I propose to create official hadoop images and upload them to
> the
> > > dockerhub.
> > >
> > > GOAL/SCOPE: I would like improve the existing documentation with
> > > easy-to-use docker based recipes to start hadoop clusters with various
> > > configuration.
> > >
> > > The images also could be used to test experimental features. For
> example
> > > ozone could be tested easily with these compose file and configuration:
> > >
> > > https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> > >
> > > Or even the configuration could be included in the compose file:
> > >
> > > https://github.com/elek/hadoop/blob/docker-2.8.0/example/doc
> > > ker-compose.yaml
> > >
> > > I would like to create separated example compose files for federation,
> > ha,
> > > metrics usage, etc. to make it easier to try out and understand the
> > > features.
> > >
> > > CONTEXT: There is an existing Jira https://issues.apache.org/jira
> > > /browse/HADOOP-13397
> > > But it’s about a tool to generate production quality docker images
> > > (multiple types, in a flexible way). If no objections, I will create a
> > > separated issue to create simplified docker images for rapid
> prototyping
> > > and investigating new features. And register the branch to the
> dockerhub
> > to
> > > create the images automatically.
> > >
> > > MY BACKGROUND: I am working with docker based hadoop/spark clusters
> quite
> > > a while and run them succesfully in different environments (kubernetes,
> > > docker-swarm, nomad-based scheduling, etc.) My work is available from
> > here:
> > > https://github.com/flokkr but they could handle more complex use cases
> > > (eg. instrumenting java processes with btrace, or read/reload
> > configuration
> > > from consul).
> > >  And IMHO in the official hadoop documentation it’s better to suggest
> to
> > > use official apache docker images and not external ones (which could be
> > > changed).
> > >
> > > Please let me know if you have any comments.
> > >
> > > Marton
> > >
> > > -
> > > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> > >
> > >
> >
>


Re: [DISCUSS] official docker image(s) for hadoop

2017-09-13 Thread Bharat Viswanadham
+1 (non-binding)
It would be really nice to have Docker to try different features of Hadoop 
(like HA, Federation Enabled, Erasure coding…), which will helpful for both 
developers and users.


Thanks,
Bharat


On 9/13/17, 11:31 AM, "Eric Badger"  wrote:

+1 definitely think an official Hadoop docker image (possibly 1 per major
or minor release) would be a positive both for contributors and for users
of Hadoop.

Eric

On Wed, Sep 13, 2017 at 1:19 PM, Wangda Tan  wrote:

> +1 to add Hadoop docker image for easier testing / prototyping, it gonna 
be
> super helpful!
>
> Thanks,
> Wangda
>
> On Wed, Sep 13, 2017 at 10:48 AM, Miklos Szegedi <
> miklos.szeg...@cloudera.com> wrote:
>
> > Marton, thank you for working on this. I think Official Docker images 
for
> > Hadoop would be very useful for a lot of reasons. I think that it is
> better
> > to have a coordinated effort with production ready base images with
> > dependent images for prototyping. Does anyone else have an opinion about
> > this?
> >
> > Thank you,
> > Miklos
> >
> > On Fri, Sep 8, 2017 at 5:45 AM, Marton, Elek  wrote:
> >
> > >
> > > TL;DR: I propose to create official hadoop images and upload them to
> the
> > > dockerhub.
> > >
> > > GOAL/SCOPE: I would like improve the existing documentation with
> > > easy-to-use docker based recipes to start hadoop clusters with various
> > > configuration.
> > >
> > > The images also could be used to test experimental features. For
> example
> > > ozone could be tested easily with these compose file and 
configuration:
> > >
> > > https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> > >
> > > Or even the configuration could be included in the compose file:
> > >
> > > https://github.com/elek/hadoop/blob/docker-2.8.0/example/doc
> > > ker-compose.yaml
> > >
> > > I would like to create separated example compose files for federation,
> > ha,
> > > metrics usage, etc. to make it easier to try out and understand the
> > > features.
> > >
> > > CONTEXT: There is an existing Jira https://issues.apache.org/jira
> > > /browse/HADOOP-13397
> > > But it’s about a tool to generate production quality docker images
> > > (multiple types, in a flexible way). If no objections, I will create a
> > > separated issue to create simplified docker images for rapid
> prototyping
> > > and investigating new features. And register the branch to the
> dockerhub
> > to
> > > create the images automatically.
> > >
> > > MY BACKGROUND: I am working with docker based hadoop/spark clusters
> quite
> > > a while and run them succesfully in different environments 
(kubernetes,
> > > docker-swarm, nomad-based scheduling, etc.) My work is available from
> > here:
> > > https://github.com/flokkr but they could handle more complex use cases
> > > (eg. instrumenting java processes with btrace, or read/reload
> > configuration
> > > from consul).
> > >  And IMHO in the official hadoop documentation it’s better to suggest
> to
> > > use official apache docker images and not external ones (which could 
be
> > > changed).
> > >
> > > Please let me know if you have any comments.
> > >
> > > Marton
> > >
> > > -
> > > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> > >
> > >
> >
>




Re: [DISCUSS] official docker image(s) for hadoop

2017-09-13 Thread Eric Badger
+1 definitely think an official Hadoop docker image (possibly 1 per major
or minor release) would be a positive both for contributors and for users
of Hadoop.

Eric

On Wed, Sep 13, 2017 at 1:19 PM, Wangda Tan  wrote:

> +1 to add Hadoop docker image for easier testing / prototyping, it gonna be
> super helpful!
>
> Thanks,
> Wangda
>
> On Wed, Sep 13, 2017 at 10:48 AM, Miklos Szegedi <
> miklos.szeg...@cloudera.com> wrote:
>
> > Marton, thank you for working on this. I think Official Docker images for
> > Hadoop would be very useful for a lot of reasons. I think that it is
> better
> > to have a coordinated effort with production ready base images with
> > dependent images for prototyping. Does anyone else have an opinion about
> > this?
> >
> > Thank you,
> > Miklos
> >
> > On Fri, Sep 8, 2017 at 5:45 AM, Marton, Elek  wrote:
> >
> > >
> > > TL;DR: I propose to create official hadoop images and upload them to
> the
> > > dockerhub.
> > >
> > > GOAL/SCOPE: I would like improve the existing documentation with
> > > easy-to-use docker based recipes to start hadoop clusters with various
> > > configuration.
> > >
> > > The images also could be used to test experimental features. For
> example
> > > ozone could be tested easily with these compose file and configuration:
> > >
> > > https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> > >
> > > Or even the configuration could be included in the compose file:
> > >
> > > https://github.com/elek/hadoop/blob/docker-2.8.0/example/doc
> > > ker-compose.yaml
> > >
> > > I would like to create separated example compose files for federation,
> > ha,
> > > metrics usage, etc. to make it easier to try out and understand the
> > > features.
> > >
> > > CONTEXT: There is an existing Jira https://issues.apache.org/jira
> > > /browse/HADOOP-13397
> > > But it’s about a tool to generate production quality docker images
> > > (multiple types, in a flexible way). If no objections, I will create a
> > > separated issue to create simplified docker images for rapid
> prototyping
> > > and investigating new features. And register the branch to the
> dockerhub
> > to
> > > create the images automatically.
> > >
> > > MY BACKGROUND: I am working with docker based hadoop/spark clusters
> quite
> > > a while and run them succesfully in different environments (kubernetes,
> > > docker-swarm, nomad-based scheduling, etc.) My work is available from
> > here:
> > > https://github.com/flokkr but they could handle more complex use cases
> > > (eg. instrumenting java processes with btrace, or read/reload
> > configuration
> > > from consul).
> > >  And IMHO in the official hadoop documentation it’s better to suggest
> to
> > > use official apache docker images and not external ones (which could be
> > > changed).
> > >
> > > Please let me know if you have any comments.
> > >
> > > Marton
> > >
> > > -
> > > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> > >
> > >
> >
>


[jira] [Created] (HADOOP-14867) Update HDFS Federation Document, for incorrect property name for secondary name node

2017-09-13 Thread Bharat Viswanadham (JIRA)
Bharat Viswanadham created HADOOP-14867:
---

 Summary: Update HDFS Federation Document, for incorrect property 
name for secondary name node
 Key: HADOOP-14867
 URL: https://issues.apache.org/jira/browse/HADOOP-14867
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Bharat Viswanadham


HDFS Federation setup documentation is having incorrect property name for 
secondary namenode http port

It is mentioned as 

dfs.namenode.secondaryhttp-address.ns1
snn-host1:http-port
  
  
dfs.namenode.rpc-address.ns2
nn-host2:rpc-port
  

Actual property should be dfs.namenode.secondary.http-address.ns.

Because of this documentation error, when the document is followed and user 
tries to setup HDFS federated cluster, secondary namenode will not be started 
and also 
hdfs getconf -secondarynamenodes will throw an exception 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] official docker image(s) for hadoop

2017-09-13 Thread Wangda Tan
+1 to add Hadoop docker image for easier testing / prototyping, it gonna be
super helpful!

Thanks,
Wangda

On Wed, Sep 13, 2017 at 10:48 AM, Miklos Szegedi <
miklos.szeg...@cloudera.com> wrote:

> Marton, thank you for working on this. I think Official Docker images for
> Hadoop would be very useful for a lot of reasons. I think that it is better
> to have a coordinated effort with production ready base images with
> dependent images for prototyping. Does anyone else have an opinion about
> this?
>
> Thank you,
> Miklos
>
> On Fri, Sep 8, 2017 at 5:45 AM, Marton, Elek  wrote:
>
> >
> > TL;DR: I propose to create official hadoop images and upload them to the
> > dockerhub.
> >
> > GOAL/SCOPE: I would like improve the existing documentation with
> > easy-to-use docker based recipes to start hadoop clusters with various
> > configuration.
> >
> > The images also could be used to test experimental features. For example
> > ozone could be tested easily with these compose file and configuration:
> >
> > https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> >
> > Or even the configuration could be included in the compose file:
> >
> > https://github.com/elek/hadoop/blob/docker-2.8.0/example/doc
> > ker-compose.yaml
> >
> > I would like to create separated example compose files for federation,
> ha,
> > metrics usage, etc. to make it easier to try out and understand the
> > features.
> >
> > CONTEXT: There is an existing Jira https://issues.apache.org/jira
> > /browse/HADOOP-13397
> > But it’s about a tool to generate production quality docker images
> > (multiple types, in a flexible way). If no objections, I will create a
> > separated issue to create simplified docker images for rapid prototyping
> > and investigating new features. And register the branch to the dockerhub
> to
> > create the images automatically.
> >
> > MY BACKGROUND: I am working with docker based hadoop/spark clusters quite
> > a while and run them succesfully in different environments (kubernetes,
> > docker-swarm, nomad-based scheduling, etc.) My work is available from
> here:
> > https://github.com/flokkr but they could handle more complex use cases
> > (eg. instrumenting java processes with btrace, or read/reload
> configuration
> > from consul).
> >  And IMHO in the official hadoop documentation it’s better to suggest to
> > use official apache docker images and not external ones (which could be
> > changed).
> >
> > Please let me know if you have any comments.
> >
> > Marton
> >
> > -
> > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >
> >
>


[jira] [Resolved] (HADOOP-14804) correct wrong parameters format order in core-default.xml

2017-09-13 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HADOOP-14804.
-
Resolution: Fixed

> correct wrong parameters format order in core-default.xml
> -
>
> Key: HADOOP-14804
> URL: https://issues.apache.org/jira/browse/HADOOP-14804
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha4
>Reporter: Chen Hongfei
>Assignee: Chen Hongfei
>Priority: Trivial
> Fix For: 3.1.0
>
> Attachments: HADOOP-14804.001.patch, HADOOP-14804.002.patch, 
> HADOOP-14804.003.patch
>
>
> descriptions of "HTTP CORS" parameters is before the names:  
> 
>Comma separated list of headers that are allowed for web
> services needing cross-origin (CORS) support.
>   hadoop.http.cross-origin.allowed-headers
>   X-Requested-With,Content-Type,Accept,Origin
>  
> ..
> but they should be following value as others.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-09-13 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/522/

[Sep 12, 2017 4:19:09 PM] (wangda) YARN-4081. Add support for multiple resource 
types in the Resource
[Sep 12, 2017 4:19:09 PM] (wangda) YARN-4172. Extend DominantResourceCalculator 
to account for all
[Sep 12, 2017 4:19:10 PM] (wangda) YARN-4715. Add support to read resource 
types from a config file.
[Sep 12, 2017 4:19:10 PM] (wangda) YARN-4829. Add support for binary units in 
Resource class.(vvasudev via
[Sep 12, 2017 4:19:10 PM] (wangda) YARN-4830. Add support for resource types in 
the nodemanager.
[Sep 12, 2017 4:19:10 PM] (wangda) YARN-5242. Update DominantResourceCalculator 
to consider all resource
[Sep 12, 2017 4:19:10 PM] (wangda) YARN-5586. Update the Resources class to 
consider all resource types.
[Sep 12, 2017 4:19:10 PM] (wangda) YARN-5707. Add manager class for resource 
profiles. Contributed by Varun
[Sep 12, 2017 4:19:10 PM] (wangda) YARN-5708. Implement APIs to get resource 
profiles from the RM.
[Sep 12, 2017 4:19:10 PM] (wangda) YARN-5587. Add support for resource 
profiles. (vvasudev via asuresh)
[Sep 12, 2017 4:19:11 PM] (wangda) YARN-5588. [YARN-3926] Add support for 
resource profiles in distributed
[Sep 12, 2017 4:19:11 PM] (wangda) YARN-6232. Update resource usage and 
preempted resource calculations to
[Sep 12, 2017 4:19:11 PM] (wangda) YARN-6445. [YARN-3926] Performance 
improvements in resource profile
[Sep 12, 2017 4:19:11 PM] (wangda) YARN-6761. Fix build for YARN-3926 branch. 
Contributed by Varun Vasudev.
[Sep 12, 2017 4:19:11 PM] (wangda) YARN-6786. [YARN-3926] ResourcePBImpl 
imports cleanup. Contributed by
[Sep 12, 2017 4:19:11 PM] (wangda) YARN-6788. [YARN-3926] Improve performance 
of resource profile branch
[Sep 12, 2017 4:19:11 PM] (wangda) YARN-6935. [YARN-3926] 
ResourceProfilesManagerImpl.parseResource() has
[Sep 12, 2017 4:19:11 PM] (wangda) YARN-6994. [YARN-3926] Remove last uses of 
Long from resource types
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-6892. [YARN-3926] Improve API 
implementation in Resources and
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-6908. ResourceProfilesManagerImpl is 
missing @Overrides on methods
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-6610. [YARN-3926] 
DominantResourceCalculator#getResourceAsValue
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7030. [YARN-3926] Performance 
optimizations in Resource and
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7042. Clean up unit tests after 
YARN-6610. (Daniel Templeton via
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-6789. Add Client API to get all 
supported resource types from RM.
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-6781. [YARN-3926] 
ResourceUtils#initializeResourcesMap takes an
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7043. Cleanup ResourceProfileManager. 
(wangda)
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7067. [YARN-3926] Optimize ResourceType 
information display in UI.
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7039. Fix javac and javadoc errors in 
YARN-3926 branch. (Sunil G
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7093. Improve log message in 
ResourceUtils. (Sunil G via wangda)
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-6933. [YARN-3926] 
ResourceUtils.DISALLOWED_NAMES check is
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7056. Document Resource Profiles 
feature. (Sunil G via wangda)
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7136. Additional Performance 
Improvement for Resource Profile
[Sep 12, 2017 4:19:12 PM] (wangda) YARN-7137. [YARN-3926] Move newly added APIs 
to unstable in YARN-3926
[Sep 12, 2017 5:04:22 PM] (rchiang) HADOOP-14798. Update sshd-core and related 
mina-core library versions.
[Sep 12, 2017 5:19:34 PM] (rchiang) HADOOP-14799. Update nimbus-jose-jwt to 
4.41.1. (rchiang)
[Sep 12, 2017 5:36:04 PM] (rchiang) HADOOP-14796. Update json-simple version to 
1.1.1. (rchiang)
[Sep 12, 2017 5:53:48 PM] (rchiang) HADOOP-14648. Bump commons-configuration2 
to 2.1.1. (rchiang)
[Sep 12, 2017 6:12:44 PM] (rchiang) HADOOP-14653. Update joda-time version to 
2.9.9. (rchiang)
[Sep 12, 2017 6:20:56 PM] (wang) HDFS-12417. Disable flaky 
TestDFSStripedOutputStreamWithFailure.
[Sep 12, 2017 6:35:21 PM] (rchiang) HADOOP-14797. Update re2j version to 1.1. 
(rchiang)
[Sep 12, 2017 8:37:38 PM] (rchiang) HADOOP-14856. Fix AWS, Jetty, HBase, 
Ehcache entries for NOTICE.txt.
[Sep 12, 2017 9:51:08 PM] (jlowe) HADOOP-14843. Improve FsPermission symbolic 
parsing unit test coverage.
[Sep 12, 2017 11:10:08 PM] (Arun Suresh) YARN-7185. ContainerScheduler should 
only look at availableResource for
[Sep 12, 2017 11:13:39 PM] (yufei) YARN-7057. FSAppAttempt#getResourceUsage 
doesn't need to consider
[Sep 12, 2017 11:18:41 PM] (arp) HDFS-12407. Journal node fails to shutdown 
cleanly if
[Sep 13, 2017 12:03:32 AM] (Arun Suresh) YARN-7185. [Addendum patch] Minor 
javadoc and checkstyle fix.
[Sep 13, 2017 12:35:30 AM] (wang) HDFS-1. Document and test BlockLocation 
for erasure-coded files.
[Sep 13, 2017 1:12:07 AM] (lei) HDFS-12412. Change 
Er

Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-13 Thread Allen Wittenauer

> On Sep 8, 2017, at 9:25 AM, Jian He  wrote:
> 
> Hi Allen,
> The documentations are committed. Please check QuickStart.md and others in 
> the same folder.
> YarnCommands.md doc is updated to include new commands.
> DNS default port is also documented. 
> Would you like to give a look and see if it address your concerns ?

Somewhat. Greatly improved, but there’s still way too much “we’re 
working on this” and “here’s a link to a JIRA” and just general brokenness 
going on.

Here’s some examples from concepts.  Concepts!  The document I’d expect 
to give me very basic “when we talk about X, we mean Y” definitions:

"A host of scheduling features are being developed to support long running 
services.”

Yeah, ok?  How is this a concept?

  or

"[YARN-3998](https://issues.apache.org/jira/browse/YARN-3998) 
implements a retry-policy to let NM re-launch a service container when it 
fails.”


The patch itself went through nine revisions and a long discussion. 
Would an end user care about the details in that JIRA?  

If the answer to the last question is YES, then the documentation has 
failed.  The whole point of documentation is so they don’t have to go digging 
into the details of the implementation, the decision process that got us there, 
etc.  If they care enough about the details, they’ll run through the changelog 
and click on the JIRA link there.  If the summary line of the changelog isn’t 
obvious, well… then we need better summaries.

etc, etc.

...

The sleep example is nice.  Now, let’s see a non-toy example:  multiple 
instances of Apache httpd or MariaDB or something real and not from the Hadoop 
echo chamber (e.g., non-JVM-based).  If this is for “native” services, this 
shouldn’t be a problem, right?  Give a real example and users will buy what 
you’re selling.  I also think writing the docs and providing an example of 
doing something big and outside the team’s comfort zone will clarify where end 
users are going to need more help than what’s being provided.  Getting a 
MariaDB instance or three up will help tremendously here.

Which reminds me: something the documentation doesn’t cover is storage. 
What happens to it, where does it come from, etc, etc.  That’s an important 
detail that I didn’t see covered.  (I may have missed it.)  

…

Why are there directions to enable other, partially unrelated services 
in here?  Shouldn’t there be pointers to their specific documentation?  Is the 
expectation that if the requirements for those other services change that 
contributors will need to update multiple documents?

"Start the DNS server”

Just… yikes.

a) yarn classname … This is not how we do user-facing things. 
The fact it’s not really possible for a *daemon* to be put in the 
YarnCommands.md doc should be a giant red flag that something isn’t going 
correctly here.
b) no jsvc support for something that it’s strongly hinted at 
wanting to run privileged = an instant -1 for failing basic security practices. 
 There’s zero reason for it to be running continually as root.
c) If this would have been hooked into the shell scripts 
appropriately, logs, user switching, etc would have been had for free.
d) Where’s stop?  Right. Since it’s outside the scripts, there 
is no pid support so one has to do all of that manually….


Given:

 "3. Supports reverse lookups (name based on IP). Note, this works only 
for Docker containers.”

then:

"It should not be used as a fully-functional corporate DNS.”

Scratch corporate.  It’s not a fully functional DNS server if it can’t do 
reverse lookups.  (Which, ironically, means it’s not suitable for use with 
Apache Hadoop, given it requires both fwd and rev DNS ...)



-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: impersonation in hadoop

2017-09-13 Thread Steve Loughran

On 13 Sep 2017, at 14:03, Srikrishan Malik 
mailto:malik@gmail.com>> wrote:

Hello,

I was trying to understand how impersonation works in hadoop environment.
I found a few resources like:
About doAs and proxy users:
http://dewoods.com/blog/hadoop-kerberos-guide
and about tokens:
https://hortonworks.com/blog/the-role-of-delegation-tokens-in-apache-hadoop-security/
..


alsohttps://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details

But I was not able to connect all the dots wrt the full flow of operations.
My current understanding is :
1. user does a kinit and executes a end user facing program like
beeline, spark-submit etc.

2. The program is app specific and gets service tickets for HDFS

yes

3. It then gets tokens for all the services it may need during the job

yes

exeution and saves the tokens in an HDFS directory.

or includes them in IPC calls

4. The program then connects a job executer(using a service ticket for
the job executer??) e.g. yarn with the job info and the token path.


If you are using Hadoop RPC, you can include credentials (Kerberos tickets, 
hadoop tokens)
 in the IPC call (which are encrypted if wire encryption is turned on)



5. The job executor get the tocken and initializes UGI and all
communication with HDFS is done using the token and kerberos ticket
are not used.

if you want to talk with the identify of the user from an RPC call, you just 
use UGI.getCurrentUser.doAs(), as the RPC call will be mapped
to the caller before your code is invoked


Is the above high level understanding correct? (I have more follow up queries.)

nobody really understands it. Nobody understands Kerberos either. Stepping 
through with a debugger always helps.

Can the token mecahnism be skipped and use only kerberos at each
layer, if so, any resources will help.

yes


My final aim is to write a spark connector with impersonation support
for an data storage system  which does not use hadoop(tokens) but
supports kerberos.


Spark yarn job submit takes a list of filesystems to get K-tickets for, 
includes it in job setup. Also looks for Hive and HBase if configured.

Long lived spark jobs (streaming) take a keytab; the app master re-auths with 
the KDC reguarly, then pushes tokens out to the workers





impersonation in hadoop

2017-09-13 Thread Srikrishan Malik
Hello,

I was trying to understand how impersonation works in hadoop environment.
I found a few resources like:
About doAs and proxy users:
http://dewoods.com/blog/hadoop-kerberos-guide
and about tokens:
https://hortonworks.com/blog/the-role-of-delegation-tokens-in-apache-hadoop-security/
..

But I was not able to connect all the dots wrt the full flow of operations.
My current understanding is :
1. user does a kinit and executes a end user facing program like
beeline, spark-submit etc.
2. The program is app specific and gets service tickets for HDFS
3. It then gets tokens for all the services it may need during the job
exeution and saves the tokens in an HDFS directory.
4. The program then connects a job executer(using a service ticket for
the job executer??) e.g. yarn with the job info and the token path.
5. The job executor get the tocken and initializes UGI and all
communication with HDFS is done using the token and kerberos ticket
are not used.

Is the above high level understanding correct? (I have more follow up queries.)
Can the token mecahnism be skipped and use only kerberos at each
layer, if so, any resources will help.

My final aim is to write a spark connector with impersonation support
for an data storage system  which does not use hadoop(tokens) but
supports kerberos.

Thanks & regards
-Sri

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org