Re: Flink parallel tasks, slots and vcores

2017-06-01 Thread Robert Metzger
Hi Sathi,

Are you seeing 8 slots in the JobManager UI?

How many shards do you have in your kinesis stream?


On Fri, May 26, 2017 at 3:14 PM, Jason Brelloch  wrote:

> Can you give us more information about what your Flink job is doing and
> the distribution of the Kinesis data/keys?  The distribution of work
> depends a lot on that.  Example: If you are using a kafka source with a
> single partition (I think they are called shards in Kenisis) in the
> datastream api and do not do a keyBy or some other partitioning operation
> (rebalance, broadcast, etc.) in the job, then all processing will happen in
> a single task slot.  Similarly if you do a keyBy operation and everything
> has the same key, then you will get a very similar distribution of work.
>
> On Thu, May 25, 2017 at 5:31 PM, Sathi Chowdhury <
> sathi.chowdh...@elliemae.com> wrote:
>
>> Hi Till/ flink-devs,
>>
>> I am trying to understand why adding slots in the task manager is having
>> no impact in performance for the test pipeline.
>>
>> Here is my flink-conf.yaml
>>
>> jobmanager.rpc.address: localhost
>>
>> jobmanager.rpc.port: 6123
>>
>> jobmanager.heap.mb: 1024
>>
>> taskmanager.memory.preallocate: false
>>
>> taskmanager.numberOfTaskSlots: 8
>>
>> parallelism.default: 8
>>
>> akka.ask.timeout: 1 s
>>
>> akka.lookup.timeout: 100 s
>>
>>
>>
>> I have an EMR cluster of 8 task manager …for now I wanted to use one
>> taskmanager with multiple slots.
>>
>> So I start cluster with
>>
>> $FLINK_HOME/bin/yarn-session.sh -d  -n 1 -s 8 -tm 57344
>>
>>
>>
>> then I run my flink job with –p 8
>>
>> I see only 1 cpu core is being used in this task manager
>>
>>
>>
>> In yarn web ui , I see the node that is chosen to be the taskmanager (as
>> I gave –n 1) ip-10-202-4-14.us-west-2.compute.internal:8041
>>
>> Under memory avail : 56GB
>>
>> Under vcores used : 1
>>
>> Under vcores Avail: 31
>>
>>
>>
>> I was expecting to see vcores used to be 8
>>
>>
>>
>> Any clue or hint why I am seeing this, also I am not seeing any
>> performance gain(using flink kinesis connector to read a json file and sink
>> to s3 )
>>
>> It is getting capped with the same performance I have seen with one slot.
>>
>> Thanks
>>
>> Sathi
>>
>>
>> =Notice to Recipient: This e-mail transmission, and any
>> documents, files or previous e-mail messages attached to it may contain
>> information that is confidential or legally privileged, and intended for
>> the use of the individual or entity named above. If you are not the
>> intended recipient, or a person responsible for delivering it to the
>> intended recipient, you are hereby notified that you must not read this
>> transmission and that any disclosure, copying, printing, distribution or
>> use of any of the information contained in or attached to this transmission
>> is STRICTLY PROHIBITED. If you have received this transmission in error,
>> please immediately notify the sender by telephone or return e-mail and
>> delete the original transmission and its attachments without reading or
>> saving in any manner. Thank you. =
>>
>
>
>
> --
> *Jason Brelloch* | Product Developer
> 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
> 
> Subscribe to the BetterCloud Monitor
> 
>  -
> Get IT delivered to your inbox
>


Re: Flink parallel tasks, slots and vcores

2017-05-26 Thread Jason Brelloch
Can you give us more information about what your Flink job is doing and the
distribution of the Kinesis data/keys?  The distribution of work depends a
lot on that.  Example: If you are using a kafka source with a single
partition (I think they are called shards in Kenisis) in the datastream api
and do not do a keyBy or some other partitioning operation (rebalance,
broadcast, etc.) in the job, then all processing will happen in a single
task slot.  Similarly if you do a keyBy operation and everything has the
same key, then you will get a very similar distribution of work.

On Thu, May 25, 2017 at 5:31 PM, Sathi Chowdhury <
sathi.chowdh...@elliemae.com> wrote:

> Hi Till/ flink-devs,
>
> I am trying to understand why adding slots in the task manager is having
> no impact in performance for the test pipeline.
>
> Here is my flink-conf.yaml
>
> jobmanager.rpc.address: localhost
>
> jobmanager.rpc.port: 6123
>
> jobmanager.heap.mb: 1024
>
> taskmanager.memory.preallocate: false
>
> taskmanager.numberOfTaskSlots: 8
>
> parallelism.default: 8
>
> akka.ask.timeout: 1 s
>
> akka.lookup.timeout: 100 s
>
>
>
> I have an EMR cluster of 8 task manager …for now I wanted to use one
> taskmanager with multiple slots.
>
> So I start cluster with
>
> $FLINK_HOME/bin/yarn-session.sh -d  -n 1 -s 8 -tm 57344
>
>
>
> then I run my flink job with –p 8
>
> I see only 1 cpu core is being used in this task manager
>
>
>
> In yarn web ui , I see the node that is chosen to be the taskmanager (as I
> gave –n 1) ip-10-202-4-14.us-west-2.compute.internal:8041
>
> Under memory avail : 56GB
>
> Under vcores used : 1
>
> Under vcores Avail: 31
>
>
>
> I was expecting to see vcores used to be 8
>
>
>
> Any clue or hint why I am seeing this, also I am not seeing any
> performance gain(using flink kinesis connector to read a json file and sink
> to s3 )
>
> It is getting capped with the same performance I have seen with one slot.
>
> Thanks
>
> Sathi
>
>
> =Notice to Recipient: This e-mail transmission, and any
> documents, files or previous e-mail messages attached to it may contain
> information that is confidential or legally privileged, and intended for
> the use of the individual or entity named above. If you are not the
> intended recipient, or a person responsible for delivering it to the
> intended recipient, you are hereby notified that you must not read this
> transmission and that any disclosure, copying, printing, distribution or
> use of any of the information contained in or attached to this transmission
> is STRICTLY PROHIBITED. If you have received this transmission in error,
> please immediately notify the sender by telephone or return e-mail and
> delete the original transmission and its attachments without reading or
> saving in any manner. Thank you. =
>



-- 
*Jason Brelloch* | Product Developer
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305

Subscribe to the BetterCloud Monitor

-
Get IT delivered to your inbox


Flink parallel tasks, slots and vcores

2017-05-25 Thread Sathi Chowdhury
Hi Till/ flink-devs,
I am trying to understand why adding slots in the task manager is having no 
impact in performance for the test pipeline.
Here is my flink-conf.yaml
jobmanager.rpc.address: localhost
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.memory.preallocate: false
taskmanager.numberOfTaskSlots: 8
parallelism.default: 8
akka.ask.timeout: 1 s
akka.lookup.timeout: 100 s

I have an EMR cluster of 8 task manager …for now I wanted to use one 
taskmanager with multiple slots.
So I start cluster with
$FLINK_HOME/bin/yarn-session.sh -d  -n 1 -s 8 -tm 57344

then I run my flink job with –p 8
I see only 1 cpu core is being used in this task manager

In yarn web ui , I see the node that is chosen to be the taskmanager (as I gave 
–n 1) ip-10-202-4-14.us-west-2.compute.internal:8041
Under memory avail : 56GB
Under vcores used : 1
Under vcores Avail: 31

I was expecting to see vcores used to be 8

Any clue or hint why I am seeing this, also I am not seeing any performance 
gain(using flink kinesis connector to read a json file and sink to s3 )
It is getting capped with the same performance I have seen with one slot.
Thanks
Sathi

=Notice to Recipient: This e-mail transmission, and any documents, 
files or previous e-mail messages attached to it may contain information that 
is confidential or legally privileged, and intended for the use of the 
individual or entity named above. If you are not the intended recipient, or a 
person responsible for delivering it to the intended recipient, you are hereby 
notified that you must not read this transmission and that any disclosure, 
copying, printing, distribution or use of any of the information contained in 
or attached to this transmission is STRICTLY PROHIBITED. If you have received 
this transmission in error, please immediately notify the sender by telephone 
or return e-mail and delete the original transmission and its attachments 
without reading or saving in any manner. Thank you. =