Big data architecture
Hello Guys, I'm sorry for asking you this question, it does not have any link with Apache Flink, but if someone can help I would be so grateful. I want to build a big data architecture for batch processing, we have a lot of data that is generated everyday and we receive it upon a lot of sources especially 2 oracle systems and other external data (scraping, external files, ...), and we want to build a big data architecture for the storage and the processing of this data; usually we process the data of yesterday (J-1). I've already worked on a big data project, but it was for real time streaming. I used kafka as a distributed messaging system and apache flink for consuming data. So in this case I'm kind of lost, especially since it is my first batch processing project. If anyone can help me, oriente me to what technologies i can use for storage and processing i would be so grateful, and i'm sorry again for asking you this question that does not have any link with flink ... But I know that there are some genius Senior big data architects that can help me build this architecture. Thank you so much, I wish you all LUCK & SUCCESS !
multiple kafka topics
Hello Guys, I am working on a Flink application, in which I consume data from Apache Kafka, the data is published in three topics of the cluster, and I need to read from them, I suppose I can create three FlikKafkaConsumer constructors. The data I am consuming is in the same format {Id_sensor:, Id_equipement, Date:, Value{...}, ...}, the problem is the "Value" field changes from topic to topic, in fact in the first topic I have the temperature as a value "Value":{"temperature":26} , the second topic contains oil data as a value "Value":{"oil_data":26}, the third topic the value field is "Value": {"Pitch":, "Roll", "Yaw"}. So I created three FlinkKafkaConsumer, and I defined three DeserializationSchema for each data of a topic, the problem is I want to do some aggregations on those data all together in order to apply a function. So I am wondering if It is a problem to join the three streams together in one stream and then do my aggregation by a field, and then apply the function, and finally sink it. and if so, am I going to have a problem sinking the data, because actually as I explained the value field is different from topic to another. Can anyone give me an explanation Please, I would be so grateful. Thank you for your time !! Aissa
CEP use case ?
Hello Guys, I have some sensors generating some data about their (température, humidity, positioning , ...) and I want to apply some rules (simple conditions, if température>25, ...), in order to define if the sensor is on "Normal" status or "Alerte" status. Do i need to use flink CEP, or just the DataStream Api is suffisant to define the status of each sensor. Sorry for disturbing you ! I hope someone can help me with that. Thank you guys. Best Aissa
ERROR submmiting a flink job
Hello Guys, I am trying to launch a FLINK app on a distance server, but I have this error message. org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 8bf7f299746e051ea7b94afd07e29d3d) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) Caused by: java.util.concurrent.ExecutionException: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 8bf7f299746e051ea7b94afd07e29d3d) at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:83) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620) at sensors.StreamingJob.main(StreamingJob.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321) ... 8 more Caused by: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 8bf7f299746e051ea7b94afd07e29d3d) at org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:112) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.flink.client.program.rest.RestClusterClient.lambda$pollResourceAsync$21(RestClusterClient.java:565) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:291) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:943) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147) at org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:110) ... 19 more Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76) at
deployin a flink app on a server;
Hello Guys, Can someone please, explain to me how can I deploy a flink app on a server, the steps I need to flow in order to achieve that ? Sorry for disturbing you guys. Aissa
CEP use case !
Hello Guys, I am asking if the CEP Api can resolve my use case. Actually, I have a lot of sensors generating some data, and I want to apply a rules engine on those sensor's data,in order to define a "sensor_status" if it is Normal or Alert or warning.for each record I want to apply some conditions (if temperature>15, humidity>...) and then defne the status of the sensor, if it is in Nomarl status, or Alerte status ... And I am wondering if CEP Api can help me achieve that. Thank you guys for your time ! Best, AISSA
Window Function use case;
Hello guys, I have a use case, where I am receiving data from sensors about their status (Normal or Alerte), {SensorID:"1", FactoryID:"1", Status:"Normal" ..}, a factory can contain a lot of sensors, so what I want to do is, if the status of one sensor in a factory, is Alerte I want to raise an alerte for all the factory (the factory status must be alerte) ... I did a stream.keyBy("FactoryID").window(). can you please suggest me a window function that can fulfill my use case (if one sensor of a factory raises "alerte" I want the factory status to be "alerte") ... I hope someone can understand my use case !! Sorry for disturbing you, and thak you for your time ! Best, Aissa
Data Stream Enrichement
Hello Guys, I want to enrich a data stream with some mongoDB data, and I am willing to use the RichFlatMapFunction, and I am lost , i don't know where to configure the connection with my MongoDB. Can anyone Help me in this ? Best, Aissa
multiple sources
Hello everyone, I hope you all doing well.I am reading from a Kafka topic some real-time messages produced by some sensors, and in order to do some aggregations, I need to enrich the stream with other data that are stocked in a mongoDB. So, I want to know if it is possible to work with two sources in one job? if Yes, How to do so ? Best, Aissa
Flink suggestions;
Hello Guys, I am a beginner in this field of real-time streaming and i am working with apache flink, and i ignore a lot of features of it, and actually I am building an application, in which i receive some sensors data in this format {"status": "Alerte", "classe": " ", "value": {"temperature": 15.7}, "mode": "ON", "equipementID": 1, "date": "2019-03-20 22:00", "sensorID": 9157}, each sensor is installed on an equipment in a workshop in a factory somewhere. My goal is : If one sensor of a factory get down (status="alerte"), I want that the status of all the factory to be Alerte. But as the Stream does not contain the factory ID, other Static data set source that contain the data of factories and the sensors that belongs to each one. So Please guys i want to know the optimized way to do so, and the aggregation that i need to do! Sorry for disturbing you, i wish you all the best! And i hope you share with me the of your experiences! Best regards, Aissa
MongoDB sink;
Hello , I want to sink my data to MongoDB but as far as I know there is no sink connector to MongoDB. How can I implement a MongoDB sink ? If there is any other solutions, I hope you can share with me.
MongoDB as a Sink;
Hello Guys, I am looking for some help concerning my flink sink, i want te output to be stocked in MongoDB database. As far as I know, there is no sink conector for MongoDB, and I need to implement one by my self, and i don't know how to do that. Can you please help me in this ?
Flink pipeline;
Hello Guys, I am new to the real-time streaming field, and I am trying to build a BIG DATA architecture for processing real-time streaming. I have some sensors that generate data in json format, they are sent to Apache kafka cluster then i want to consume them with Apache flinkin ordre to do some aggregation. The probleme is that the data coming from kafka contains " the sensor ID , the equipement ID in wiche it is installed, and the status of the equipment..", knowing that the each sensor is installed in an equipement, and the equipement is linked to an workshop that it self linked to factory. So i need an other data source for the workshop and factories, because i want to do aggregation on factories, and the data sent by the sensors contains just the sensorIDand the equipementID... Guys I am new to the this field, and i am stuck in this. Can someone please help me to achieve my goal, and explain to me how can i do that. And how can i do this complexed aggregation??And if there is any optmisation to do? Sorry for disturbing you !!! AISSA
Flink Deserialisation JSON to Java;
Hello, Please can you share with me, some demos or examples of deserialization with flink. I need to consume some kafka message produced by sensors in JSON format. here is my JSON message : {"date": "2018-05-31 15:10", "main": {"ph": 5.0, "whc": 60.0, "temperature": 9.5, "humidity": 96}, "id": 2582, "coord": {"lat": 57.79, "lon": -54.58}}