Is Flink HIPAA certified

2020-06-27 Thread Prasanna kumar
Hi Community ,

Could anyone let me know if Flink is used in US healthcare tech space ?

Thanks,
Prasanna.


Distributed Anomaly Detection using MIDAS

2020-06-27 Thread Shivin Srivastava
Hi All,

I have recently been exploring MIDAS: an algorithm for Streaming Anomaly
Detection. A production level parallel and distributed implementation of
MIDAS should be quite useful to the industry. I feel that Flink is very
well-suited for the same as MIDAS deals with streaming data. If anyone is
interested to contribute/collaborate, please let me know. Currently, there
exist C++, Python, Ruby, Rust, R, and Golang implementations.

MIDAS repository: https://github.com/bhatiasiddharth/MIDAS
MIDAS paper: https://www.comp.nus.edu.sg/~sbhatia/assets/pdf/midas.pdf

Thanks!
Shivin


Optimal Flink configuration for Standalone cluster.

2020-06-27 Thread Dimitris Vogiatzidakis
Hello,

I'm having a bit of trouble understanding the memory configuration on
flink.
I'm using flink10.0.0 to read some datasets of edges and extract features.
I run this on a cluster consisting of 4 nodes , with 32cores and 252GB Ram
each, and hopefully I could expand this as long as I can add extra nodes to
the cluster.

So regarding the configuration file (flink-conf.yaml).
a) I can't understand when should I use process.size and when .flink.size.

b) From the detailed memory model I understand that Direct memory is
included in both of flink and process size, however if I don't specify
off-heap.task.size I get
" OutOfMemoryError: Direct buffer memory " .  Also should I change
off-heap.fraction as well?

c)When I fix this, I get network buffers error, which if I understand
correctly,  flink.size * network fraction , should be between min and max.

I can't find the 'perfect' configuration regarding my setup. What is the
optimal way to use the system I have currently?

Thank you for your time.


Re: Question about Watermarks within a KeyedProcessFunction

2020-06-27 Thread David Anderson
With an AscendingTimestampExtractor, watermarks are not created for every
event, and as your job starts up, some events will be processed before the
first watermark is generated.

The impossible value you see is an initial value that's in place until the
first real watermark is available. On the other hand, onTimer can not be
called until some timer is triggered by the arrival of a watermark, at
which point the watermark will have a reasonable value.

On Sat, Jun 27, 2020 at 2:37 AM Marco Villalobos 
wrote:

>
> My source is a Kafka topic.
> I am using Event Time.
> I assign the event time with an AscendingTimestampExtractor
>
> I noticed when debugging that in the KeyedProcessFunction that
> after my highest known event time of:  2020-06-23T00:46:30.000Z
>
> the processElement method had a watermark with an impossible date of:
> -292275055-05-16T16:47:04.192Z
>
> but in the onTimer method it had a more reasonable value that trails the
> highest known event time by 1 millisecond, which is this
> value:  2020-06-23T00:46:29.999Z
>
> I want to know, why does the processElement method have an impossible
> watermark value?
>
>
>


Re: Running Apache Flink on the GraalVM as a Native Image

2020-06-27 Thread Stephen Connolly
On Thu 25 Jun 2020 at 12:48, ivo.kn...@t-online.de 
wrote:

> Whats up guys,
>
>
>
> I'm trying to run an Apache Flink Application with the GraalVM Native
> Image but I get the following error: (check attached file)
>
>
>
> I suppose this happens, because Flink uses a lot of low-level-code and is
> highly optimized.
>

Actually I suspect the reason is that Flink uses dynamic classloading.

GraalVM requires all the code available in order to produce a native image.

You’d need to pre-bind the topology you want Flink to run into the native
image.

More fun, you’ll actually need two images, one for the job manager and one
for the task manager.

And you’ll need to convince GraalVM that the entry-point is your topology
needs reflection support enabled... plus whatever other classes use
reflection in Flink.

Sounds rather complex to me. If native images are what is important to you,
there seemed to be a strong contender in the Rust language community,
didn’t provide as strong management as Flink, and you’d probably have more
work managing things like checkpointing, but if native code is important
that’s where I’d be looking. Sadly I cannot remember the name and my
google-foo is weak tonight


>
> When I googled the combination of GraalVM Native Image and Apache Flink I
> get no results.
>
>
>
> Did anyone ever succeeded in making it work and how?
>
>
>
> Best regards,
>
>
>
> Ivo
> 
>
-- 
Sent from my phone


Re: Heartbeat of TaskManager timed out.

2020-06-27 Thread Xintong Song
Hi Ori,

Here are some suggestions from my side.

   - Probably the most straightforward way is to try increasing the timeout
   to see if that helps. You can leverage the configuration option
   `heartbeat.timeout`[1]. The default is 50s.
   - It might be helpful to share your configuration setups (e.g., the TM
   resources, JVM parameters, timeout, etc.). Maybe the easiest way is to
   share the beginning part of your JM/TM logs, including the JVM parameters
   and all the loaded configurations.
   - You may want to look into the GC logs in addition to the metrics. In
   case of a CMS GC stop-the-world, you may not be able to see the most recent
   metrics due to the process not responding to the metric querying services.
   - You may also look into the status of the JM process. If JM is under
   significant GC pressure, it could also happen that the heartbeat message
   from TM is not timely handled before the timeout check.
   - Is there any metrics monitoring the network condition between the JM
   and timeouted TM? Possibly any jitters?


Thank you~

Xintong Song


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#heartbeat-timeout

On Thu, Jun 25, 2020 at 11:15 PM Ori Popowski  wrote:

> Hello,
>
> I'm running Flink 1.10 on EMR and reading from Kafka with 189 partitions
> and I have parallelism of 189.
>
> Currently running with RocksDB, with checkpointing disabled. My state size
> is appx. 500gb.
>
> I'm getting sporadic "Heartbeat of TaskManager timed out" errors with no
> apparent reason.
>
> I check the container that gets the timeout for GC pauses, heap memory,
> direct memory, mapped memory, offheap memory, CPU load, network load, total
> out-records, total in-records, backpressure, and everything I can think of.
> But all those metrics show that there's nothing unusual, and it has around
> average values for all those metrics. There are a lot of other containers
> which score higher.
>
> All the metrics are very low because every TaskManager runs on a
> r5.2xlarge machine alone.
>
> I'm trying to debug this for days and I cannot find any explanation for it.
>
> Can someone explain why it's happening?
>
> java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id
> container_1593074931633_0011_01_000127 timed out.
> at org.apache.flink.runtime.jobmaster.
> JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster
> .java:1147)
> at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(
> HeartbeatMonitorImpl.java:109)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
> 511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
> AkkaRpcActor.java:397)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(
> AkkaRpcActor.java:190)
> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor
> .handleRpcMessage(FencedAkkaRpcActor.java:74)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(
> AkkaRpcActor.java:152)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool
> .java:1339)
> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:
> 1979)
> at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
>
> Thanks
>


Re: Optimal Flink configuration for Standalone cluster.

2020-06-27 Thread Xintong Song
Hi Dimitris,

Regarding your questions.
a) For standalone clusters, the recommended way is to use `.flink.size`
rather than `.process.size`. `.process.size` includes JVM metaspace and
overhead in addition to `.flink.size`, which usually do not really matter
for standalone clusters.
b) In case of direct OOMs, you should increase
`taskmanager.memory.task.off-heap.size`. There's no fraction for that.
c) Your understanding is correct. And you can also specify the absolute
network memory size by setting the min and max to the same value.

Here are my suggestions according to what you described.

   1. Since both off-heap and network memory seems insufficient, I would
   suggest to increase `taskmanager.memory.flink.size` to give your task
   managers more memory in total.
   2. If 1) does not work, I would suggest not to set the total memory
   (means configure neither `.flink.size` nor `process.size`), but go for the
   fine grained configuration where explicitly specify the individual memory
   components. Flink will automatically add them up to derive the total memory.
  1. In addition to `.task.off-heap.size` and `.network.[min|max]`, you
  will also need to set `.task.heap.size` and `managed.size`.
  2. If you don't know how many heap/managed memory to configure, you
  can look for the configuration options in the beginning of the TM logs
  (`-Dkey=value`). Those are the values derived from your current
  configuration.


Thank you~

Xintong Song



On Sat, Jun 27, 2020 at 10:56 PM Dimitris Vogiatzidakis <
dimitrisvogiatzida...@gmail.com> wrote:

> Hello,
>
> I'm having a bit of trouble understanding the memory configuration on
> flink.
> I'm using flink10.0.0 to read some datasets of edges and extract features.
> I run this on a cluster consisting of 4 nodes , with 32cores and 252GB Ram
> each, and hopefully I could expand this as long as I can add extra nodes to
> the cluster.
>
> So regarding the configuration file (flink-conf.yaml).
> a) I can't understand when should I use process.size and when .flink.size.
>
> b) From the detailed memory model I understand that Direct memory is
> included in both of flink and process size, however if I don't specify
> off-heap.task.size I get
> " OutOfMemoryError: Direct buffer memory " .  Also should I change
> off-heap.fraction as well?
>
> c)When I fix this, I get network buffers error, which if I understand
> correctly,  flink.size * network fraction , should be between min and max.
>
> I can't find the 'perfect' configuration regarding my setup. What is the
> optimal way to use the system I have currently?
>
> Thank you for your time.
>
>
>


Re: CEP use case !

2020-06-27 Thread Benchao Li
Hi Aissa,

Flink CEP is an api that processes multi-event matching with a pattern,
like (START MIDDLE+ END).
If you can calculate the "sensor_status" by one record, I think Flink
DataStream API / Table & SQL API
could satisfy your requirement already.

Aissa Elaffani  于2020年6月25日周四 下午11:35写道:

> Hello Guys,
> I am asking if the CEP Api can resolve my use case. Actually, I have a lot
> of sensors generating some data, and I want to apply a rules engine on
> those sensor's data,in order to define a "sensor_status" if it is Normal or
> Alert or warning.for each record I want to apply some conditions (if
> temperature>15, humidity>...) and then defne the status of the sensor, if
> it is in Nomarl status, or Alerte status ...
> And I am wondering if CEP Api can help me achieve that.
> Thank you guys for your time !
> Best,
> AISSA
>


-- 

Best,
Benchao Li


Re: Native K8S IAM Role?

2020-06-27 Thread Yang Wang
Hi kevin,

If you mean to add annotations for Flink native K8s session pods, you could
use "kubernetes.jobmanager.annotations"
and "kubernetes.taskmanager.annotations"[1]. However, they are only
supported from release-1.11. Maybe you could
wait for a little bit more time, 1.11 will be released soon. And we add
more features for native K8s integration in 1.11
(e.g. application mode, label, annotation, toleration, etc.).


[1].
https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#kubernetes

Best,
Yang

Bohinski, Kevin  于2020年6月26日周五 上午3:09写道:

> Hi,
>
>
>
> How do we attach an IAM role to the native K8S sessions?
>
>
>
> Typically for our other pods we use the following in our yamls:
>
> spec:
>
>   template:
>
> metadata:
>
>   annotations:
>
> iam.amazonaws.com/role: ROLE_ARN
>
>
>
> Best
>
> kevin
>