Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Stefano Baghino
Ok, thank you for the very detailed explanation!

On Sun, Mar 6, 2016 at 10:02 PM, Maximilian Michels  wrote:

> Hi Stefano,
>
> That is currently a limitation of the Kerberos implementation. The
> Kerberos authentication is performed only once the Flink cluster is
> brought up. The Yarn session is then tight to a particular user's
> ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
> run long-running jobs because there is a bug in the Kerberos client
> that may let the ticket expire.
>
> The workaround you already mentioned is to use a per-job Yarn cluster.
> There is currently no plan to delegate the user token per job but we
> could certainly think about implementing this in the future.
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#kerberos
>
> Cheers,
> Max
>
> On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
>  wrote:
> > One last note: initially I tried to run the session as the same OS user,
> > running kdestroy and then kinit with the other user, having this error.
> > Trying to run the job in a different OS session, authenticating with
> > Kerberos as the user who should run the job, I can't connect to the
> > JobManager. I've added a second log with this error to the gist.
> >
> > On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
> >  wrote:
> >>
> >> In the initial description, I meant "I'm trying to access a private
> folder
> >> of the latter", so not the service account. Sorry for the mistake.
> >>
> >> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
> >>  wrote:
> >>>
> >>> Hello everybody,
> >>>
> >>> I'm running some tests on how Flink as a long-running YARN session
> >>> handles security with Kerberos. In particular, I'm running a test
> where I
> >>> run Flink on YARN with a service account and then deploy a job via CLI
> as
> >>> another user; in the job I'm trying to access a private folder of the
> former
> >>> on HDFS but the job fails due to permission issues (the user running
> the job
> >>> is actually the one who ran Flink on YARN in the first place — the
> service
> >>> account).
> >>>
> >>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
> >>>
> >>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
> >>>
> >>> and then running the following command:
> >>>
> >>> bin/flink run examples/batch/WordCount.jar \
> >>> --input hdfs:///user/stefano.baghino/hamlet.txt \
> >>> --output hdfs:///user/stefano.baghino/hamlet.out
> >>>
> >>> Here are the logs:
> >>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
> >>>
> >>> It looks like the YARN session is acting as a proxy for the user
> instead
> >>> of receiving a delegation. Is there a way to change this behavior? Is
> this
> >>> by design? Is there an interest in implementing the delegation (if
> it's not
> >>> already implemented)? Otherwise, is there a workaround, apart from
> running
> >>> one-off jobs on YARN?
> >>>
> >>> Thank you so much in advance.
> >>>
> >>> --
> >>> BR,
> >>> Stefano Baghino
> >>>
> >>> Software Engineer @ Radicalbit
> >>
> >>
> >>
> >>
> >> --
> >> BR,
> >> Stefano Baghino
> >>
> >> Software Engineer @ Radicalbit
> >
> >
> >
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Maximilian Michels
Hi Stefano,

That is currently a limitation of the Kerberos implementation. The
Kerberos authentication is performed only once the Flink cluster is
brought up. The Yarn session is then tight to a particular user's
ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
run long-running jobs because there is a bug in the Kerberos client
that may let the ticket expire.

The workaround you already mentioned is to use a per-job Yarn cluster.
There is currently no plan to delegate the user token per job but we
could certainly think about implementing this in the future.

https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#kerberos

Cheers,
Max

On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
 wrote:
> One last note: initially I tried to run the session as the same OS user,
> running kdestroy and then kinit with the other user, having this error.
> Trying to run the job in a different OS session, authenticating with
> Kerberos as the user who should run the job, I can't connect to the
> JobManager. I've added a second log with this error to the gist.
>
> On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
>  wrote:
>>
>> In the initial description, I meant "I'm trying to access a private folder
>> of the latter", so not the service account. Sorry for the mistake.
>>
>> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
>>  wrote:
>>>
>>> Hello everybody,
>>>
>>> I'm running some tests on how Flink as a long-running YARN session
>>> handles security with Kerberos. In particular, I'm running a test where I
>>> run Flink on YARN with a service account and then deploy a job via CLI as
>>> another user; in the job I'm trying to access a private folder of the former
>>> on HDFS but the job fails due to permission issues (the user running the job
>>> is actually the one who ran Flink on YARN in the first place — the service
>>> account).
>>>
>>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>>>
>>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>>>
>>> and then running the following command:
>>>
>>> bin/flink run examples/batch/WordCount.jar \
>>> --input hdfs:///user/stefano.baghino/hamlet.txt \
>>> --output hdfs:///user/stefano.baghino/hamlet.out
>>>
>>> Here are the logs:
>>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>>>
>>> It looks like the YARN session is acting as a proxy for the user instead
>>> of receiving a delegation. Is there a way to change this behavior? Is this
>>> by design? Is there an interest in implementing the delegation (if it's not
>>> already implemented)? Otherwise, is there a workaround, apart from running
>>> one-off jobs on YARN?
>>>
>>> Thank you so much in advance.
>>>
>>> --
>>> BR,
>>> Stefano Baghino
>>>
>>> Software Engineer @ Radicalbit
>>
>>
>>
>>
>> --
>> BR,
>> Stefano Baghino
>>
>> Software Engineer @ Radicalbit
>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit


Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Stefano Baghino
One last note: initially I tried to run the session as the same OS user,
running kdestroy and then kinit with the other user, having this error.
Trying to run the job in a different OS session, authenticating with
Kerberos as the user who should run the job, I can't connect to the
JobManager. I've added a second log with this error to the gist
.

On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino <
stefano.bagh...@radicalbit.io> wrote:

> In the initial description, I meant "I'm trying to access a private
> folder of the latter", so not the service account. Sorry for the mistake.
>
> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino <
> stefano.bagh...@radicalbit.io> wrote:
>
>> Hello everybody,
>>
>> I'm running some tests on how Flink as a long-running YARN session
>> handles security with Kerberos. In particular, I'm running a test where I
>> run Flink on YARN with a service account and then deploy a job via CLI as
>> another user; in the job I'm trying to access a private folder of the
>> former on HDFS but the job fails due to permission issues (the user running
>> the job is actually the one who ran Flink on YARN in the first place — the
>> service account).
>>
>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>>
>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>>
>> and then running the following command:
>>
>> bin/flink run examples/batch/WordCount.jar \
>> --input hdfs:///user/stefano.baghino/hamlet.txt \
>> --output hdfs:///user/stefano.baghino/hamlet.out
>>
>> Here are the logs:
>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>>
>> It looks like the YARN session is acting as a proxy for the user instead
>> of receiving a delegation. Is there a way to change this behavior? Is this
>> by design? Is there an interest in implementing the delegation (if it's not
>> already implemented)? Otherwise, is there a workaround, apart from running
>> one-off jobs on YARN?
>>
>> Thank you so much in advance.
>>
>> --
>> BR,
>> Stefano Baghino
>>
>> Software Engineer @ Radicalbit
>>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Stefano Baghino
In the initial description, I meant "I'm trying to access a private folder
of the latter", so not the service account. Sorry for the mistake.

On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino <
stefano.bagh...@radicalbit.io> wrote:

> Hello everybody,
>
> I'm running some tests on how Flink as a long-running YARN session handles
> security with Kerberos. In particular, I'm running a test where I run Flink
> on YARN with a service account and then deploy a job via CLI as another
> user; in the job I'm trying to access a private folder of the former on
> HDFS but the job fails due to permission issues (the user running the job
> is actually the one who ran Flink on YARN in the first place — the service
> account).
>
> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>
> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>
> and then running the following command:
>
> bin/flink run examples/batch/WordCount.jar \
> --input hdfs:///user/stefano.baghino/hamlet.txt \
> --output hdfs:///user/stefano.baghino/hamlet.out
>
> Here are the logs:
> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>
> It looks like the YARN session is acting as a proxy for the user instead
> of receiving a delegation. Is there a way to change this behavior? Is this
> by design? Is there an interest in implementing the delegation (if it's not
> already implemented)? Otherwise, is there a workaround, apart from running
> one-off jobs on YARN?
>
> Thank you so much in advance.
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Stefano Baghino
Hello everybody,

I'm running some tests on how Flink as a long-running YARN session handles
security with Kerberos. In particular, I'm running a test where I run Flink
on YARN with a service account and then deploy a job via CLI as another
user; in the job I'm trying to access a private folder of the former on
HDFS but the job fails due to permission issues (the user running the job
is actually the one who ran Flink on YARN in the first place — the service
account).

I'm running Flink 1.0.0-RC5, launching the long-running session with:

bin/yarn-session.sh -n 2 -tm 4096 -s 3

and then running the following command:

bin/flink run examples/batch/WordCount.jar \
--input hdfs:///user/stefano.baghino/hamlet.txt \
--output hdfs:///user/stefano.baghino/hamlet.out

Here are the logs:
https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78

It looks like the YARN session is acting as a proxy for the user instead of
receiving a delegation. Is there a way to change this behavior? Is this by
design? Is there an interest in implementing the delegation (if it's not
already implemented)? Otherwise, is there a workaround, apart from running
one-off jobs on YARN?

Thank you so much in advance.

-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit


Re: SourceFunction Scala

2016-03-06 Thread Márton Balassi
Hey Ankur,

Add the following line to your imports, and have a look at the referenced
FAQ. [1]

import org.apache.flink.streaming.api.scala._

[1]
https://flink.apache.org/faq.html#in-scala-api-i-get-an-error-about-implicit-values-and-evidence-parameters

Best,

Marton

On Sun, Mar 6, 2016 at 8:17 PM, Ankur Sharma 
wrote:

> Hello,
>
> I am trying to use a custom source function (declaration given below) for
> DataStream.
> if I add the source to stream using add source:
>
> val stream = env.addSource(new QueryOneSource(args))
>
> *I get following error:  Any explanations and help ??*
>
>
> Error:(14, 31) could not find implicit value for evidence parameter of type 
> org.apache.flink.api.common.typeinfo.TypeInformation[org.mpi.debs.Tuple]
>
> val stream = env.addSource(new QueryOneSource(args))
>
>   ^
>
> Error:(14, 31) not enough arguments for method addSource: (implicit 
> evidence$15: scala.reflect.ClassTag[org.mpi.debs.Tuple], implicit 
> evidence$16: 
> org.apache.flink.api.common.typeinfo.TypeInformation[org.mpi.debs.Tuple])org.apache.flink.streaming.api.scala.DataStream[org.mpi.debs.Tuple].
>
> Unspecified value parameter evidence$16.
>
> val stream = env.addSource(new QueryOneSource(args))
>
>   ^
>
>
> class QueryOneSource(filenames: Array[String]) extends SourceFunction[Tuple] {
>
> val nextTuple: Tuple // case class Tuple(id: Long, value: Int)
>
> override def run(ctx: SourceContext[Tuple]) = {
>   while (true) {
> nextRecord()
> ctx.collect(this.nextTuple)
>   }
> }
>
> override def cancel() = { }
>
> }
>
> override def nextRecord() = {
>
> }
>
> }
>
> Best,
> *Ankur Sharma*
> *Information Systems Group*
> *3.15 E1.1 Universität des Saarlandes*
> *66123, Saarbrücken Germany*
> *Email: ankur.sha...@mpi-inf.mpg.de  *
> *an...@stud.uni-saarland.de *
>
>


SourceFunction Scala

2016-03-06 Thread Ankur Sharma
Hello,

I am trying to use a custom source function (declaration given below) for 
DataStream.
if I add the source to stream using add source: 

val stream = env.addSource(new QueryOneSource(args))
I get following error:  Any explanations and help ??

Error:(14, 31) could not find implicit value for evidence parameter of type 
org.apache.flink.api.common.typeinfo.TypeInformation[org.mpi.debs.Tuple]
val stream = env.addSource(new QueryOneSource(args))
  ^
Error:(14, 31) not enough arguments for method addSource: (implicit 
evidence$15: scala.reflect.ClassTag[org.mpi.debs.Tuple], implicit evidence$16: 
org.apache.flink.api.common.typeinfo.TypeInformation[org.mpi.debs.Tuple])org.apache.flink.streaming.api.scala.DataStream[org.mpi.debs.Tuple].
Unspecified value parameter evidence$16.
val stream = env.addSource(new QueryOneSource(args))
  ^

class QueryOneSource(filenames: Array[String]) extends SourceFunction[Tuple] {
val nextTuple: Tuple // case class Tuple(id: Long, value: Int)
override def run(ctx: SourceContext[Tuple]) = {
  while (true) {
nextRecord()
ctx.collect(this.nextTuple)
  }
}

override def cancel() = { }
}

override def nextRecord() = {
}
}

Best,
Ankur Sharma
Information Systems Group
3.15 E1.1 Universität des Saarlandes
66123, Saarbrücken Germany
Email: ankur.sha...@mpi-inf.mpg.de  
an...@stud.uni-saarland.de 


smime.p7s
Description: S/MIME cryptographic signature


Re: Retrieve elements from the Dataset without using collect

2016-03-06 Thread subash basnet
Hello Konstantin,

Yup thanks.


Best Regards,
Subash Basnet

On Sun, Mar 6, 2016 at 7:20 PM, Konstantin Knauf <
konstantin.kn...@tngtech.com> wrote:

> [image: Boxbe]  This message is eligible
> for Automatic Cleanup! (konstantin.kn...@tngtech.com) Add cleanup rule
> 
> | More info
> 
>
> Hi Subash,
>
> I think Dataset.first(int n) is what you are looking for.
>
> Cheers,
>
> Konstantin
>
> On 06.03.2016 19:10, subash basnet wrote:
> > Hello all,
> >
> > My requirement is to get suppose top '10' elements from the DataSet as
> > another DataSet. How would I do that without using collect.
> > Eg:
> > *DataSet> counts =* *data.flatMap(new
> > Tokenizer());*
> >
> > I want a new DataSet containing 10 elements of *counts*.
> >
> > And, what would be the way to retrieve individual elements of DataSet
> > without using list via collect?
> >
> >
> > Best Regards,
> > Subash Basnet
>
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>
>


Re: Retrieve elements from the Dataset without using collect

2016-03-06 Thread Konstantin Knauf
Hi Subash,

I think Dataset.first(int n) is what you are looking for.

Cheers,

Konstantin

On 06.03.2016 19:10, subash basnet wrote:
> Hello all, 
> 
> My requirement is to get suppose top '10' elements from the DataSet as
> another DataSet. How would I do that without using collect. 
> Eg:
> *DataSet> counts =* *data.flatMap(new
> Tokenizer());*
> 
> I want a new DataSet containing 10 elements of *counts*. 
> 
> And, what would be the way to retrieve individual elements of DataSet
> without using list via collect?
> 
> 
> Best Regards,
> Subash Basnet

-- 
Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082


Retrieve elements from the Dataset without using collect

2016-03-06 Thread subash basnet
Hello all,

My requirement is to get suppose top '10' elements from the DataSet as
another DataSet. How would I do that without using collect.
Eg:
*DataSet> counts =* *data.flatMap(new
Tokenizer());*

I want a new DataSet containing 10 elements of *counts*.

And, what would be the way to retrieve individual elements of DataSet
without using list via collect?


Best Regards,
Subash Basnet