Big data architecture

2021-07-15 Thread Aissa Elaffani
Hello Guys,

I'm sorry for asking you this question, it does not have any link with
Apache Flink, but if someone can help I would be so grateful. I want to
build a big data architecture for batch processing, we have a lot of data
that is generated everyday and we receive it upon a lot of sources
especially 2 oracle systems and other external data (scraping, external
files, ...), and we want to build a big data architecture for the
storage and the processing of this data; usually we process the data of
yesterday (J-1). I've already worked on a big data project, but it was for
real time streaming. I used kafka as a distributed messaging system and
apache flink for consuming data. So in this case I'm kind of lost,
especially since it is my first batch processing project.
If anyone can help me, oriente me to what technologies i can use for
storage and processing i would be so grateful, and i'm sorry again for
asking you this question that does not have any link with flink ... But I
know that there are some genius Senior big data architects that can help me
build this architecture.

Thank you so much, I wish you all LUCK & SUCCESS !


multiple kafka topics

2020-08-09 Thread Aissa Elaffani
Hello Guys,
I am working on a Flink application, in which I consume data from Apache
Kafka, the data is published in three topics of the cluster, and I need to
read from them, I suppose I can create three
FlikKafkaConsumer constructors.  The data I am consuming is in the same
format {Id_sensor:, Id_equipement, Date:, Value{...}, ...}, the problem is
the "Value" field changes from topic to topic, in fact in the first topic I
have the temperature as a value "Value":{"temperature":26}  , the second
topic contains oil data as a value "Value":{"oil_data":26}, the third topic
the value field is "Value": {"Pitch":, "Roll", "Yaw"}.
So I created three FlinkKafkaConsumer, and I defined three
DeserializationSchema for each data of a topic, the problem is I want to do
some aggregations on those data all together in order to apply a function.
So I am wondering if It is a problem to join the three streams together in
one stream and then do my aggregation by a field, and then apply the
function, and finally sink it. and if so, am I going to have a problem
sinking the data, because actually as I explained the value field is
different from topic to another. Can anyone give me an explanation Please,
I would be so grateful. Thank you for your time !!
Aissa


CEP use case ?

2020-07-16 Thread Aissa Elaffani
Hello Guys,
I have some sensors  generating some data about their (température,
humidity, positioning , ...) and I want to apply some rules (simple
conditions, if température>25, ...), in order to define if the sensor is on
"Normal" status or "Alerte" status. Do i need to use flink CEP, or just the
DataStream Api is suffisant to define the status of each sensor.
Sorry for disturbing you ! I hope someone can help me with that.
Thank you guys.
Best
Aissa


ERROR submmiting a flink job

2020-07-14 Thread Aissa Elaffani
Hello Guys,
I am trying to launch a FLINK app on a distance server, but I have this
error message.
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 8bf7f299746e051ea7b94afd07e29d3d)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
at
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)
at
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966)
Caused by: java.util.concurrent.ExecutionException:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 8bf7f299746e051ea7b94afd07e29d3d)
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:83)
at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)
at sensors.StreamingJob.main(StreamingJob.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
... 8 more
Caused by: org.apache.flink.client.program.ProgramInvocationException: Job
failed (JobID: 8bf7f299746e051ea7b94afd07e29d3d)
at
org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:112)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.flink.client.program.rest.RestClusterClient.lambda$pollResourceAsync$21(RestClusterClient.java:565)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:291)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:943)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
execution failed.
at
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at
org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:110)
... 19 more
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by
NoRestartBackoffTimeStrategy
at
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110)
at
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76)
at

deployin a flink app on a server;

2020-07-13 Thread Aissa Elaffani
Hello Guys,
Can someone please, explain to me how can I deploy a flink app on a server,
the steps I need to flow in order to achieve that ?
Sorry for disturbing  you guys.
Aissa


CEP use case !

2020-06-25 Thread Aissa Elaffani
Hello Guys,
I am asking if the CEP Api can resolve my use case. Actually, I have a lot
of sensors generating some data, and I want to apply a rules engine on
those sensor's data,in order to define a "sensor_status" if it is Normal or
Alert or warning.for each record I want to apply some conditions (if
temperature>15, humidity>...) and then defne the status of the sensor, if
it is in Nomarl status, or Alerte status ...
And I am wondering if CEP Api can help me achieve that.
Thank you guys for your time !
Best,
AISSA


Window Function use case;

2020-06-04 Thread Aissa Elaffani
Hello guys,
I have a use case, where I am receiving data from sensors about their
status (Normal or Alerte), {SensorID:"1", FactoryID:"1", Status:"Normal"
..}, a factory can contain a lot of sensors, so what I want to do is, if
the status of one sensor in a factory, is Alerte I want to raise an alerte
for all the factory (the factory status must be alerte) ... I did a
stream.keyBy("FactoryID").window(). can you please suggest me a
window function that can fulfill my use case (if one sensor of a factory
raises "alerte" I want the factory status to be "alerte") ... I hope
someone can understand my use case !! Sorry for disturbing you, and thak
you for your time !
Best,
Aissa


Data Stream Enrichement

2020-05-30 Thread Aissa Elaffani
Hello Guys,
I want to enrich a data stream with some mongoDB data, and I am willing to
use the RichFlatMapFunction, and I am lost , i don't know where to
configure the connection with my MongoDB. Can anyone Help me in this ?
Best,
Aissa


multiple sources

2020-05-27 Thread Aissa Elaffani
Hello everyone,
I hope you all doing well.I am reading from a Kafka topic some real-time
messages produced by some sensors, and in order to do some aggregations, I
need to enrich the stream with other data that are stocked in a mongoDB.
So, I want to know if it is possible to work with two sources in one job?
if Yes, How to do so ?
Best,
Aissa


Flink suggestions;

2020-05-14 Thread Aissa Elaffani
Hello Guys,
I am a beginner in this field of real-time streaming and i am working with
apache flink, and i ignore a lot of features of it, and actually I am
building an application, in which i receive some sensors data in this
format {"status": "Alerte", "classe": " ", "value": {"temperature": 15.7},
"mode": "ON", "equipementID": 1, "date": "2019-03-20 22:00", "sensorID":
9157}, each sensor is installed on an equipment in a workshop in a factory
somewhere. My goal is :
If one sensor of a factory get down (status="alerte"), I want that the
status of all the factory to be Alerte. But as  the Stream does not contain
the factory ID, other Static data set source that contain the data of
factories and the sensors that belongs to each one.
So Please guys i want to know the optimized way to do so, and the
aggregation that i need to do!
Sorry for disturbing you, i wish you all the best! And i hope you share
with me the of your experiences!
Best regards,
Aissa


MongoDB sink;

2020-05-06 Thread Aissa Elaffani
Hello ,
 I want to sink my data to MongoDB but as far as I know there is no sink
connector to MongoDB.  How can I implement a MongoDB sink ? If there is any
other solutions, I hope you can share with me.


MongoDB as a Sink;

2020-05-05 Thread Aissa Elaffani
Hello Guys,
I am looking for some help concerning my flink sink, i want te output to be
stocked in MongoDB database. As far as I know, there is no sink
conector for MongoDB, and I need to implement one by my self, and i don't
know how to do that. Can you please help me in this ?


Flink pipeline;

2020-05-05 Thread Aissa Elaffani
Hello Guys,
I am new to the real-time streaming field, and I am trying to build a BIG
DATA architecture for processing real-time streaming. I have some sensors
that generate data in json format, they are sent to Apache kafka cluster
then i want to consume them with Apache flinkin ordre to do some
aggregation. The probleme is that the data coming from kafka contains " the
sensor ID , the equipement ID in wiche it is installed, and the status of
the equipment..", knowing that the each sensor is installed in an
equipement, and the equipement is linked to an workshop that it self linked
to factory. So i need an other data source for the workshop and factories,
because i want to do aggregation on factories, and the data sent by the
sensors contains just the sensorIDand the equipementID...
Guys I am new to the this field, and i am stuck in this. Can someone please
help me to achieve my goal, and explain to me how can i do that. And how
can i do this complexed aggregation??And if there is any optmisation to do?
Sorry for disturbing you !!!
AISSA


Flink Deserialisation JSON to Java;

2020-05-04 Thread Aissa Elaffani
Hello,
Please can you share with me, some demos or examples of deserialization
with flink.
I need to consume some kafka message produced by sensors in JSON format.
here is my JSON message :
{"date": "2018-05-31 15:10", "main": {"ph": 5.0, "whc": 60.0,
"temperature": 9.5, "humidity": 96}, "id": 2582, "coord": {"lat": 57.79,
"lon": -54.58}}