Re: Securely discovering Application Master's metadata or sending a secret to Application Master at submission

2016-06-10 Thread Mingyu Kim
Ah, I see. I thought the RPC port was used for YARN’s own communication, but 
that field in ApplicationReport is exactly there for exposing some interfaces 
to the client, which is what I was looking for. I’ll take a look at MRAppMaster 
for an example. Thanks a lot for help!

 

Mingyu

 

From: Sunil Govind 
Date: Friday, June 10, 2016 at 5:37 AM
To: Rohith Sharma K S , Mingyu Kim 
, "user@hadoop.apache.org" 
Cc: Matt Cheah 
Subject: Re: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

 

Hi Mike 

 

Adding to what Rohith has mentioned, you can refer to below interface to know 
what all information which you can get from Yarn w.r.t one application. 
https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/yarn/api/records/ApplicationReport.html

 

This has RPC port ApplicationMaster, and you can try to interact AM through 
that. Being said this, its upto ApplicationMaster to expose interfaces which 
you are looking. And YARN doesnt have any control on same as mentioned by 
Rohith.

 

- Sunil

 

 

On Fri, Jun 10, 2016 at 11:26 AM Rohith Sharma K S  
wrote:

Hi

 

Basically I see you have multiple questions

1.   How to get AM RPC port ?

>>> This you can get it via YarnClient# getApplicationReport(). This gives 
>>> common/generic application specific details. Note that RM does not maintain 
>>> any custom details for applications.

2.   How can you get metadata of AM?

>>> Basically AM design should be such that bind an interface to AM RPC. And 
>>> AM-RPC host and port can be obtained from ResourceManager. Using host:port 
>>> of AM from application submitter,  connect to AM and get required details 
>>> from AM only. To achieve this , YARN does not provide any interface since 
>>> AM are written users. Essentially, user can design AM to expose client 
>>> interface to their clients. For your better understanding , see MapReduce 
>>> framework MRAppMaster. 

3.   About the authenticity of job-submitter to AM 

>>> Use secured hadoop cluster with Kerberos enabled. Note that AM also should 
>>> be implemented for handling Kerberos.

 

Thanks & Regards

Rohith Sharma K S

 

From: Mingyu Kim [mailto:m...@palantir.com] 
Sent: 10 June 2016 03:47


To: Rohith Sharma K S; user@hadoop.apache.org
Cc: Matt Cheah
Subject: Re: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

 

Hi Rohith,

 

Thanks for the pointers. I checked the Hadoop documentation you linked, but 
it’s not clear how I can expose client interface for providing metadata. By 
“YARN internal communications”, I was referring to the endpoints that are 
exposed by AM on the RPC port as reported in ApplicationReport. I assume either 
RM or containers will communicate with AM through these endpoints.

 

I believe your suggestion is to expose additional endpoints to the AM RPC port. 
Can you clarify how I can do that? Is there an interface/class I need to 
extend? How can I register the extra endpoints for providing metadata on the 
existing AM RPC port?

 

Mingyu

 

From: Rohith Sharma K S 
Date: Wednesday, June 8, 2016 at 11:15 PM
To: Mingyu Kim , "user@hadoop.apache.org" 

Cc: Matt Cheah 
Subject: RE: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

 

Hi

 

Do you know how I can extend the client interface of the RPC port?

>>> YARN provides YARNClIent library that uses ApplicationClientProtocol. For 
>>> your more understanding refer 
>>> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client

 

I know AM has some endpoints exposed through the RPC port for internal YARN 
communications, but was not sure how I can extend it to expose a custom 
endpoint.

>>> I am not sure what you mean here internal YARN communication? AM can 
>>> connect to RM only via AM-RM interface for register/unregister and 
>>> heartbeat and details sent to RM are limited.  It is up to the AM’s to 
>>> expose client interface for providing metadata.

Thanks & Regards

Rohith Sharma K S

From: Mingyu Kim [mailto:m...@palantir.com] 
Sent: 09 June 2016 11:21
To: Rohith Sharma K S; user@hadoop.apache.org
Cc: Matt Cheah
Subject: Re: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

 

Hi Rohith,

 

Thanks for the quick response. That sounds promising. Do you know how I can 
extend the client interface of the RPC port? I know AM has some endpoints 
exposed through the RPC port for internal YARN communications, but was not sure 
how I can extend it to expose a custom endpoint. Any pointer would be 
appreciated!

 

Mingyu

 

From: Rohith Sharma K S 
Date: Wednesday, June 8, 2016 at 10:39 PM
To: Mingyu Kim , "user@hadoop.apache.org" 

Cc: Matt Cheah 
Subject: RE: Securely discovering Application Master's metadata or sending a 
secret to Application Master at

view job counters in real time

2016-06-10 Thread Joseph Naegele
Hi all,

 

Is it possible to monitor MR Job counters while a job is running? I'm using
Hadoop 2.7.1 and YARN. I can view counters for past jobs using the history
server, but can't find a way to view them in real time.

 

Thanks



Re: Verifying the authenticity of submitted AM

2016-06-10 Thread Sunil Govind
HI Mingyu,

May be you can take a look at below link
https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/yarn.html

It will give a fair idea about the security you can get for an application

- Sunil

On Fri, Jun 10, 2016 at 3:54 AM Mingyu Kim  wrote:

> // forking for clarify
>
>
>
> Related to the question I had below, I’m wondering how I can verify the
> authenticity of the submitted AM. (For example, when I’m making a call to
> AM, I’d like to verify that I’m talking to the AM that I submitted, not
> someone else who hijacked my network traffic. Also, when AM makes a
> callback to a server outside YARN, I’d like to verify that it’s the AM I
> submitted, not someone else who’s spoofing) This can generally be achieved
> by sending a secret (whether that’s a one-time secret that the server
> outside YARN can verity or a SSL keystore) to AM. Do you know how one can
> securely send the secret to AM? Or, is there an existing YARN mechanism I
> can rely on to verify the authenticity? (I saw
> ApplicationReport.getClientToAMToken(), but that seems to be for AM to
> verify the authenticity of client) Again, any pointer will be appreciated.
>
>
>
> Thanks,
>
> Mingyu
>
>
>
> *From: *Rohith Sharma K S 
> *Date: *Wednesday, June 8, 2016 at 11:15 PM
> *To: *Mingyu Kim , "user@hadoop.apache.org" <
> user@hadoop.apache.org>
> *Cc: *Matt Cheah 
> *Subject: *RE: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi
>
>
>
> Do you know how I can extend the client interface of the RPC port?
>
> >>> YARN provides YARNClIent library that uses ApplicationClientProtocol.
> For your more understanding refer
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client
> 
>
>
>
> I know AM has some endpoints exposed through the RPC port for internal
> YARN communications, but was not sure how I can extend it to expose a
> custom endpoint.
>
> >>> I am not sure what you mean here internal YARN communication? AM can
> connect to RM only via AM-RM interface for register/unregister and
> heartbeat and details sent to RM are limited.  It is up to the AM’s to
> expose client interface for providing metadata.
>
> Thanks & Regards
>
> Rohith Sharma K S
>
> *From:* Mingyu Kim [mailto:m...@palantir.com]
> *Sent:* 09 June 2016 11:21
> *To:* Rohith Sharma K S; user@hadoop.apache.org
> *Cc:* Matt Cheah
> *Subject:* Re: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi Rohith,
>
>
>
> Thanks for the quick response. That sounds promising. Do you know how I
> can extend the client interface of the RPC port? I know AM has some
> endpoints exposed through the RPC port for internal YARN communications,
> but was not sure how I can extend it to expose a custom endpoint. Any
> pointer would be appreciated!
>
>
>
> Mingyu
>
>
>
> *From: *Rohith Sharma K S 
> *Date: *Wednesday, June 8, 2016 at 10:39 PM
> *To: *Mingyu Kim , "user@hadoop.apache.org" <
> user@hadoop.apache.org>
> *Cc: *Matt Cheah 
> *Subject: *RE: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi
>
>
>
> Apart from AM address and tracking URL, no other meta data of
> applicationMaster are stored in YARN. May be AM can expose client interface
> so that AM clients can interact with Running AM to retrieve specific AM
> details.
>
>
>
> RPC port of AM can be get from YARN client interface such as
> ApplicationClientProtocol# getApplicationReport() OR
> ApplicationClientProtocol #getApplicationAttemptReport().
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Mingyu Kim [mailto:m...@palantir.com ]
> *Sent:* 09 June 2016 10:36
> *To:* user@hadoop.apache.org
> *Cc:* Matt Cheah
> *Subject:* Securely discovering Application Master's metadata or sending
> a secret to Application Master at submission
>
>
>
> Hi all,
>
>
>
> To provide a bit of background, I’m trying to deploy a REST server on
> Application Master and discover the randomly assigned port number securely.
> I can easily discover the host name of AM through YARN REST API, but the
> port number needs to be discovered separately. (Port number is assigned
> within a specified range with retries to avoid port conflicts) An easy
> solution would be to have Application Master make a callback with the port
> number, but I’d like to design it such that YARN nodes don’t talk back to
> the node that submitted the YARN application. So, this problem reduces to
> securely discovering a

Re: ResourceManager API

2016-06-10 Thread Sunil Govind
Hi Kishore

Below command may help you to get some basic information which you are
looking for.
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs

Further to this, some more enhancements are happening as part of YARN-4904,
but its not part of any release as of now.

- Sunil


On Fri, Jun 10, 2016 at 9:32 AM kishore alajangi 
wrote:

> Hi Experts,
>
> Is there a way to get the logs from resourcemanager api for running job ?
> please help me.
>
>
> --
> Sincere Regards,
> A.Kishore Kumar,
>
>


Re: Securely discovering Application Master's metadata or sending a secret to Application Master at submission

2016-06-10 Thread Sunil Govind
Hi Mike

Adding to what Rohith has mentioned, you can refer to below interface to
know what all information which you can get from Yarn w.r.t one
application.
https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/yarn/api/records/ApplicationReport.html

This has RPC port ApplicationMaster, and you can try to interact AM through
that. Being said this, its upto ApplicationMaster to expose interfaces
which you are looking. And YARN doesnt have any control on same as
mentioned by Rohith.

- Sunil


On Fri, Jun 10, 2016 at 11:26 AM Rohith Sharma K S <
rohithsharm...@huawei.com> wrote:

> Hi
>
>
>
> Basically I see you have multiple questions
>
> 1.   How to get AM RPC port ?
>
> >>> This you can get it via YarnClient# getApplicationReport(). This
> gives common/generic application specific details. Note that RM does not
> maintain any custom details for applications.
>
> 2.   How can you get metadata of AM?
>
> >>> Basically AM design should be such that bind an interface to AM RPC.
> And AM-RPC host and port can be obtained from ResourceManager. Using
> host:port of AM from application submitter,  connect to AM and get required
> details from AM only. To achieve this , YARN does not provide any interface
> since AM are written users. Essentially, user can design AM to expose
> client interface to their clients. For your better understanding , see
> MapReduce framework MRAppMaster.
>
> 3.   About the authenticity of job-submitter to AM
>
> >>> Use secured hadoop cluster with Kerberos enabled. Note that AM also
> should be implemented for handling Kerberos.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Mingyu Kim [mailto:m...@palantir.com]
> *Sent:* 10 June 2016 03:47
>
>
> *To:* Rohith Sharma K S; user@hadoop.apache.org
> *Cc:* Matt Cheah
> *Subject:* Re: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi Rohith,
>
>
>
> Thanks for the pointers. I checked the Hadoop documentation you linked,
> but it’s not clear how I can expose client interface for providing
> metadata. By “YARN internal communications”, I was referring to the
> endpoints that are exposed by AM on the RPC port as reported in
> ApplicationReport. I assume either RM or containers will communicate with
> AM through these endpoints.
>
>
>
> I believe your suggestion is to expose additional endpoints to the AM RPC
> port. Can you clarify how I can do that? Is there an interface/class I need
> to extend? How can I register the extra endpoints for providing metadata on
> the existing AM RPC port?
>
>
>
> Mingyu
>
>
>
> *From: *Rohith Sharma K S 
> *Date: *Wednesday, June 8, 2016 at 11:15 PM
> *To: *Mingyu Kim , "user@hadoop.apache.org" <
> user@hadoop.apache.org>
> *Cc: *Matt Cheah 
> *Subject: *RE: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi
>
>
>
> Do you know how I can extend the client interface of the RPC port?
>
> >>> YARN provides YARNClIent library that uses ApplicationClientProtocol.
> For your more understanding refer
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client
> 
>
>
>
> I know AM has some endpoints exposed through the RPC port for internal
> YARN communications, but was not sure how I can extend it to expose a
> custom endpoint.
>
> >>> I am not sure what you mean here internal YARN communication? AM can
> connect to RM only via AM-RM interface for register/unregister and
> heartbeat and details sent to RM are limited.  It is up to the AM’s to
> expose client interface for providing metadata.
>
> Thanks & Regards
>
> Rohith Sharma K S
>
> *From:* Mingyu Kim [mailto:m...@palantir.com ]
> *Sent:* 09 June 2016 11:21
> *To:* Rohith Sharma K S; user@hadoop.apache.org
> *Cc:* Matt Cheah
> *Subject:* Re: Securely discovering Application Master's metadata or
> sending a secret to Application Master at submission
>
>
>
> Hi Rohith,
>
>
>
> Thanks for the quick response. That sounds promising. Do you know how I
> can extend the client interface of the RPC port? I know AM has some
> endpoints exposed through the RPC port for internal YARN communications,
> but was not sure how I can extend it to expose a custom endpoint. Any
> pointer would be appreciated!
>
>
>
> Mingyu
>
>
>
> *From: *Rohith Sharma K S 
> *Date: *Wednesday, June 8, 2016 at 10:39 PM
> *To: *Mingyu Kim , "user@hadoop.apache.org" <
> user@hadoop.apache.org>
> *Cc: *Matt Cheah 
> *Subject: *RE: Securely discovering Application Master's metadata or
> sending a sec

AW: Looking for documentation/guides on Hadoop 2.7.2

2016-06-10 Thread Mike Wenzel
Hi Anu and Johny,

first of all I want to thank both of you for replying. I didn't wanted to look 
that frustrated as it seems to look. I'm pretty fine and my interests in Hadoop 
are huge :)

@Anu: I did an free-online course by cloudera / udacity which helped me a lot 
to make things clear regarding the Hadoop Ecosystem. At least I think it's 
enough right now for me. Taking a second look after in the book helped me too 
to make things clear. Yes I got the 4th Edition and I will take some more looks 
into the specific chapters after. Thanks for the advice.

Regarding SSH: Yeah, I admit that this is my gap of linux-knowledge. I'm sorry 
for that. Today, I don't even know why I didn't researched SSH to get a basic 
knowledge of this keygen usage. Anyway, sorry, you're totally right here.

Regarding Yarn and HDFS: Seems like I didn't clearly showed what I'm missing 
here. (I think) I got a basic knowledge of YARN which is fine for now. In 
future this needs to be extended and improved anyway.

When I ran through the guide "Hadoop: Setting up a Single Node Cluster" the 
guide says "Format the filesystem: $ bin/hdfs namenode -format".  The HDFS was 
installed and formatted and almost ready to start and work on it.
> My thoughts: Where did the filesystem got installed?
> How can I change these settings to another location?
I think when I install an Hadoop Cluster, I won't have the HDFS installed on 
/tmp.
"Make the HDFS directories required to execute MapReduce jobs:"
I thought:
> Why are those required to run MapReduce jobs and do I only need them for this 
> specific example/guide here, or do I really need them no matter if I install 
> Hadoop different from this guide?

I think that clarifying all those possible questions would make the guide 
looking worse, because mostly people want simple step by step guides. And 
that's totally fine, it's maybe just me who didn't found other documentation 
before doing this. But at the points shown above, I just had those questions 
and I couldn't simply go on without keeping them in mind and thinking about it. 
I would have loved to see some links there like: "for further details check the 
in-depth configuration guide hdfs" and a specific guide for this.

For me this is my first time I got my hands on Hadoop, I did all this running 
in a VM just for some first testing purposes. I know that VMs shouldn't be used.

To answer your question:
"Please let us know what is challenging for you in the current set of 
instructions. Are you able to setup single instance, pseudo instance and then 
progress to a cluster setup?."
I did all 3 steps on the Guide "Hadoop: Setting up a Single Node Cluster." and 
everything worked fine. For me the challenge is to answer peoples questions 
about the system at this point. By following the guide I installed software on 
my pc, having no idea what components, how many components and where they got 
installed.

Maybe I asked to early. For now, I'll try to setup multiple machines, each 
machine got their specific job and asking/researched my question as soon as 
they come up.

Best Regards,
Mike.


Von: johny casanova [mailto:pcgamer2...@outlook.com]
Gesendet: Donnerstag, 9. Juni 2016 20:38
An: Anu Engineer ; Mike Wenzel 
; user@hadoop.apache.org
Betreff: Re: Looking for documentation/guides on Hadoop 2.7.2


Mike,



Here is a guide on how to do some of the work but, this is using Ambari and not 
just the tar.gz This can help you understand how to piece certain things 
together. 
https://cwiki.apache.org/confluence/display/AMBARI/Start+Guide+Using+Centos+6.x 
 this help me understand more when I was in the same position as you.


From: Anu Engineer mailto:aengin...@hortonworks.com>>
Sent: Thursday, June 9, 2016 2:33 PM
To: Mike Wenzel; user@hadoop.apache.org
Subject: Re: Looking for documentation/guides on Hadoop 2.7.2


Hi Mike,



I am sorry your experience with setting up Hadoop has been frustrating and 
mysterious. I will try to give partial answers / pointers to where you should 
be looking. Please be patient with me.



>  After reading the book I had a first idea of how components work together, 
> but for me the book didn't helped me to understand what's going on

I generally recommend this book to anyone starting off with Hadoop, and IMHO it 
is the best book for an overview of Hadoop.



>  All my knowledge about the Installing of Hadoop right now is: Unpacking a 
> .tar.gz. I ran some shell-scripts and everything was running fine

I would have presumed that book tells you the about the various components - 
HDFS, MapReduce, YARN etc. if you are using edition 4 of the book please look 
at Chapter 2, 3, & 4.



>  Furthermore, I'm missing all kinds of information about setting those up. 
> The apache guide on some point says "Now check that you can ssh to the 
> localhost without a passphrase" "

Thank you for the feedback. Hadoop relies on an underlying operating system