Hi everyone, I want to ask for guidance for my log analyzer platform idea.
I have an elasticsearch system which collects the logs from different
platforms, and creates alerts. The system writes the alerts to an index on
ES. Also, my alerts are stored in a folder as JSON (multi line format).

The Goals:

   1. Read json folder or ES index as streaming (read in new entry within 5
   min)
   2. Select only alerts that I want to work on ( alert.id = 100 ,
   status=true , ...)
   3. Create a DataFrame + Window for 10 min period
   4. Run a query fro that DataFrame by grupping by IP ( If same IP gets 3
   alerts then show me the result)
   5. All the coding should be in python


The ideas is something like this, my question is how should I proceed to
this task. What are the technologies that I should use?

*Apache Spark + Python + Pyspark + Kaola *can handle this ?

-- 

Best regards,

*Suat Toksoz*

Reply via email to