Re: Capturing statistics for more than 5 minutes

2017-11-18 Thread Nomchin Banga
Hi

I have been able to run JMX with Flink with the following configuration
applied to the flink-conf.yaml file of all nodes in the cluster:

metrics.reporters: jmx
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 9020-9022

env.java.opts: -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

When I run JConsole and listen on ports master-IP:/slave-IP:9020, I am
able to see the system metrics like CPU, memory etc.

How can I access the task metrics and their respective graphs like
bytesRead, latency etc. which are collected for each subtask and shown on
the GUI.

Any help will be appreciated!

Thanks

On Wed, Nov 15, 2017 at 1:12 AM, Chesnay Schepler 
wrote:

> Hello,
>
> the Metric System
> 
> should be exactly what you're looking for.
>
>
> On 15.11.2017 03:55, Nomchin Banga wrote:
>
> Hi
>
> We are a group of graduate students from Purdue University who are doing
> an experimental study to compare different data ingestion engines.
>
> For this purpose, we are trying to collect statistics of running jobs over
> a few days. However, Flink’s UI captures the statistics for the last 5
> minutes.
> We tried to hit the Rest API but that does not contain the data of
> statistics other than bytes read/written.
>
> It would be very helpful if you could guide us regarding a way in which we
> can capture the entire history of metrics.
>
> Looking forward to hearing from you
>
> Regards
> Nomchin Banga
>
>
>


-- 
Nomchin Banga
*Senior Year*




*B.E. (Hons.) Computer SciencesMsc. (Hons.) Mathematics Birla Institute of
Technology and Sciences, PilaniAlternative email :
f2009...@pilani.bits-pilani.ac.in Voice :
+91-9829834948*


Re: Apache Flink - Evictor interface clarification

2017-11-18 Thread Vishnu Viswanath
Hi Mans,

Have a look at this: http://apache-flink-mailing-list-archive.1008284.
n3.nabble.com/DISCUSS-Enhance-Window-Evictor-in-Flink-tp12406p12442.html

Thanks,
Vishnu 

On Sat, Nov 18, 2017 at 11:28 AM, M Singh  wrote:

> Hi:
>
> I am looking at the documentation and it states:
>
> 'The evictor has the ability to remove elements from a window *after* the
> trigger fires and *before and/or after* the window function is applied. '
>
> I understand that if we use evictor's before method (evictBefore) it can
> remove items before invocation of the window computation.  But I am not
> sure what is the use of the evictor's after method (evictAfter) ?
>
> Please let me know where I can find information about what are the use
> cases for evictor's after method (evictorAfter).
>
> Thanks
>
> Mans
>


RE: Job Manager Configuration

2017-11-18 Thread Chan, Regina
Is your job running on a standalone cluster? I’m using a detached yarn session 
in a multi-tenant environment.
And I’m guessing you haven’t had to do anything special for the akka 
configurations.


From: Joshua Griffith [mailto:jgriff...@campuslabs.com]
Sent: Thursday, November 16, 2017 2:57 PM
To: Chan, Regina [Tech]
Cc: user@flink.apache.org
Subject: Re: Job Manager Configuration

I have an IO-dominated batch job with 471 distinct tasks (3786 tasks with 
parallelism) running on 8 nodes with 12 GiB of memory and 4 CPUs each. I 
haven’t had any problems adding additional tasks except for 1) tasks timing out 
the first time the cluster is started (I suppose the JVM needs to warm up), and 
2) the UI can’t really handle this many tasks, although using Firefox Quantum 
makes it possible to see what’s going on.

Joshua

On Oct 31, 2017, at 10:25 AM, Chan, Regina 
mailto:regina.c...@gs.com>> wrote:

Asking an additional question, what is the largest plan that the JobManager can 
handle? Is there a limit? My flows don’t need to run in parallel and can run 
independently. I wanted them to run in one single job because it’s part of one 
logical commit on my side.

Thanks,
Regina

From: Chan, Regina [Tech]
Sent: Monday, October 30, 2017 3:22 PM
To: 'user@flink.apache.org'
Subject: Job Manager Configuration

Flink Users,

I have about 300 parallel flows in one job each with 2 inputs, 3 operators, and 
1 sink which makes for a large job. I keep getting the below timeout exception 
but I’ve already set it to a 30 minute time out with a 6GB heap on the 
JobManager? Is there a heuristic to better configure the job manager?

Caused by: 
org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: Job 
submission to the JobManager timed out. You may increase 'akka.client.timeout' 
in case the JobManager needs more time to configure and confirm the job 
submission.

Regina Chan
Goldman Sachs – Enterprise Platforms, Data Architecture
30 Hudson Street, 37th floor | Jersey City, NY 07302 •  (212) 902-5697



Apache Flink - Evictor interface clarification

2017-11-18 Thread M Singh
Hi:
I am looking at the documentation and it states:
'The evictor has the ability to remove elements from a window after the trigger 
fires and before and/or after the window function is applied. '
I understand that if we use evictor's before method (evictBefore) it can remove 
items before invocation of the window computation.  But I am not sure what is 
the use of the evictor's after method (evictAfter) ?  
Please let me know where I can find information about what are the use cases 
for evictor's after method (evictorAfter).
Thanks
Mans

Re: all task managers reading from all kafka partitions

2017-11-18 Thread r. r.
Gary, thanks a lot!
I completely forgot that parallelism extends over all slots visible to the 
JobManager!
So adding e.g. -p4 to 'flink run' approach should suit my use case just fine, I 
believe.
I'll look deeper into failure recovery with this scheme

Have a great weekend!
-Robert








 > Оригинално писмо 

 >От: Gary Yao g...@data-artisans.com

 >Относно: Re: all task managers reading from all kafka partitions

 >До: "r. r." 

 >Изпратено на: 18.11.2017 11:28



 
> 
 
>  
 
>  
 
>   
 
>
 
> 
 
>  Hi Robert,
 
> 
 
> 
 
>  
 
> 
 
> 
 
>  Running a single job does not mean that you are limited to a single JVM.
 
> 
 
> 
 
>  
 
> 
 
> 
 
>  For example, a job with parallelism 4 by default requires 4 task slots 
> to run.
 
> 
 
> 
 
>  You can provision 4 single slot TaskMangers on different hosts to 
> connect to the
 
> 
 
> 
 
>  same JobManager. The JobManager can then take your job and distribute the
 
> 
 
> 
 
>  execution on the 4 slots. To learn more about the distributed runtime
 
> 
 
> 
 
>  environment:
 
> 
 
> 
 
>  
 
> 
 
> 
 
>    
 
>  
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/runtime.html
 
> 
 
> 
 
>  
 
> 
 
> 
 
>  Regarding your concerns about job failures, a failure in the JobManager 
> or one
 
> 
 
> 
 
>  of the TaskManagers can bring your job down but Flink has built-in
 
> 
 
> 
 
>  fault-tolerance on different levels. You may want to read up on the 
> following
 
> 
 
> 
 
>  topics:
 
> 
 
> 
 
>  
 
> 
 
> 
 
>  - Data Streaming Fault Tolerance: 
 
>  
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html
 
> 
 
> 
 
>  - Restart Strategies: 
 
>  
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/restart_strategies.html
 
> 
 
> 
 
>  - JobManager High Availability: 
 
>  
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html
 
> 
 
> 
 
>  
 
> 
 
> 
 
>  Let me know if you have further questions.
 
> 
 
> 
 
>  
 
> 
 
> 
 
>  Best,
 
> 
 
> 
 
>  
 
> 
 
> 
 
>  Gary
 
> 
 
>
 
>
 
> 
 
> 
 
>  On Fri, Nov 17, 2017 at 11:11 PM, r. r. 
 
>   wrote:
 
>  
 
>  
 
>   Hmm, but I want single slot task managers and multiple jobs so that if 
> one job fails it doesn't bring the whole setup (for example 30+ parallel 
> consumers) down.
 
>What setup would you advise? The job is quite heavy and might bring 
> the VM down if run with such concurency in one JVM.
 
>
 
>Thanks!
 
>
 
>   > Оригинално писмо    >От: Gary Yao 
> g...@data-artisans.com   >Относно: Re: all task managers reading from all 
> kafka partitions   >До: "r. r."    >Изпратено на: 17.11.2017 
> 22:58
 
>
 
>   
 
>
 
>  
 
>  
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >    Forgot to hit "reply all" in my last email.
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >      On Fri, Nov 17, 2017 at 8:26 PM, Gary Yao
 
>  
 
>  >      <
 
> g...@data-artisans.com> wrote:
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >        Hi Robert,
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >         To get your desired behavior, you should start a single 
> job with parallelism set to 4.
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >         Flink does not rely on Kafka's consumer groups to 
> distribute the partitions to the parallel subtasks.
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >         Instead, Flink does the assignment of partitions itself 
> and also tracks and checkpoints the offsets internally.
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >         This is needed to achieve exactly-once semantics.
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >
 
>  
 
>  >         The
 
>  
 
>  >         
 
> group.id that you are setting is used for different purposes, e.g., 
> to track the 

Re: all task managers reading from all kafka partitions

2017-11-18 Thread Gary Yao
Hi Robert,

Running a single job does not mean that you are limited to a single JVM.

For example, a job with parallelism 4 by default requires 4 task slots to
run.
You can provision 4 single slot TaskMangers on different hosts to connect
to the
same JobManager. The JobManager can then take your job and distribute the
execution on the 4 slots. To learn more about the distributed runtime
environment:


https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/runtime.html

Regarding your concerns about job failures, a failure in the JobManager or
one
of the TaskManagers can bring your job down but Flink has built-in
fault-tolerance on different levels. You may want to read up on the
following
topics:

- Data Streaming Fault Tolerance:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html
- Restart Strategies:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/restart_strategies.html
- JobManager High Availability:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html

Let me know if you have further questions.

Best,

Gary

On Fri, Nov 17, 2017 at 11:11 PM, r. r.  wrote:

> Hmm, but I want single slot task managers and multiple jobs so that if one
> job fails it doesn't bring the whole setup (for example 30+ parallel
> consumers) down.
> What setup would you advise? The job is quite heavy and might bring the VM
> down if run with such concurency in one JVM.
>
> Thanks!
>
>
>
>
>
>
>
>  > Оригинално писмо 
>
>  >От: Gary Yao g...@data-artisans.com
>
>  >Относно: Re: all task managers reading from all kafka partitions
>
>  >До: "r. r." 
>
>  >Изпратено на: 17.11.2017 22:58
>
>
>
>
> >
>
> >
>
> >
>
> >
>
> >Forgot to hit "reply all" in my last email.
>
> >
>
> >
>
> >
>
> >
>
> >  On Fri, Nov 17, 2017 at 8:26 PM, Gary Yao
>
> >   wrote:
>
> >
>
> >
>
> >
>
> >Hi Robert,
>
> >
>
> >
>
> >
>
> >
>
> > To get your desired behavior, you should start a single job with
> parallelism set to 4.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Flink does not rely on Kafka's consumer groups to distribute the
> partitions to the parallel subtasks.
>
> >
>
> >
>
> > Instead, Flink does the assignment of partitions itself and also
> tracks and checkpoints the offsets internally.
>
> >
>
> >
>
> > This is needed to achieve exactly-once semantics.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > The
>
> > group.id that you are setting is used for different purposes,
> e.g., to track the consumer lag of a job.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Best,
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Gary
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >   On Fri, Nov 17, 2017 at 7:54 PM, r. r.
>
> >wrote:
>
> >
>
> >
>
> >Hiit's Flink 1.3.2, Kafka 0.10.2.0  I am starting 1 JM
> and 4 TM (with 1 task slot each). Then I deploy 4 times (via ./flink run
> -p1 x.jar), job parallelism is set to 1.A new thing I just noticed: if
> I start in parallel to the Flink jobs two  kafka-console-consumer (with
> --consumer-property group.id=TopicConsumers) and write a msg to Kafka,
> then one of the console consumers receives the msg together with both Flink
> jobs.  I though maybe the Flink consumers didn't receive the group property
> passed via "flink run .. --group.id TopicConsumers", but no - they do
> belong to the group as well:taskmanager_3  | 2017-11-17 18:29:00,750
> INFO
>
> > org.apache.kafka.clients.consumer.ConsumerConfig
> -
>
> > ConsumerConfig values:
>
> >
>
> >
>
> >
>
> >   taskmanager_3  |
>
> >  auto.commit.interval.ms = 5000
>
> >
>
> >   taskmanager_3  | auto.offset.reset = latest
>
> >
>
> >   taskmanager_3  | bootstrap.servers = [kafka:9092]
>
> >
>
> >   taskmanager_3  | check.crcs = true
>
> >
>
> >   taskmanager_3  |
>
> >  client.id =
>
> >
>
> >   taskmanager_3  |
>
> >  connections.max.idle.ms = 54
>
> >
>
> >   taskmanager_3  | enable.auto.commit = true
>
> >
>
> >   taskmanager_3  | exclude.internal.topics = true
>
> >
>
> >   taskmanager_3  | fetch.max.bytes = 52428800
>
> >
>
> >   taskmanager_3  |
>
> >  fetch.max.wait.ms = 500
>
> >
>
> >   taskmanager_3  | fetch.min.bytes = 1
>
> >
>
> >   taskmanager_3  |
>
> >  group.id = TopicConsumers
>
> >
>
> >   taskmanager_3  |
>
> >  heartbeat.interval.ms = 3000
>
> >
>
> >   taskmanager_3  | interceptor.classes = null
>
> >
>
> >   taskmanager_3  | key.deserializer = class
> org.apache.kafka.common.serialization.ByteArrayDeserializer
>
> >
>
> >   taskmanager_3  | max.partition.fetch.bytes = 1048576
>
> >
>
> >