Spark Stremaing - Dstreams - Removing RDD

2020-07-27 Thread forece85
We are using Spark Streaming (Dstreams) with Kinesis batch interval as 10sec. For every random batch, processing time is taking very long. While checking logs, we found below log lines when ever we are getting spike in processing time:

Re: Lazy Spark Structured Streaming

2020-07-27 Thread Jungtaek Lim
I'm not sure what exactly your problem is, but given you've mentioned window and OutputMode.Append, you may want to remind that append mode doesn't produce the output of aggregation unless the watermark "passes by". It's expected behavior if you're seeing lazy outputs on OutputMode.Append compared

How to map DataSet row to Struct in java?

2020-07-27 Thread anuragDada
When i try to use this below code getting this error :*Exception in thread "main" org.apache.spark.sql.AnalysisException: Generators are not supported when it's nested in expressions, but got: generatorouter(explode(generatorouter(explode(json;*StructType schema = DataTypes.createStructType(

How to map DataSet row to Struct in java?

2020-07-27 Thread anuragDada
When i try to use this below code getting this error : *Exception in thread "main" org.apache.spark.sql.AnalysisException: Generators are not supported when it's nested in expressions, but got: generatorouter(explode(generatorouter(explode(json; * StructType schema =

Spark memory distribution

2020-07-27 Thread dben
Hi, I'm having a computation held on top of a big dynamic model that is constantly having changes / online updates, therefore, thought that working in batch mode (stateless): s.t. requires of heavy model sent to spark will be less appropriate than working in stream mode. Therefore, was able to

Secrets in Spark apps

2020-07-27 Thread Dávid Szakállas
Hi folks, Do you know what’s the best method to passing secrets to Spark operations, for e.g doing encryption, salting with a secret before hashing etc.? I have multiple ideas on top of my head The secret's source: - environment variable - config property - remote service accessed through an

Re: Lazy Spark Structured Streaming

2020-07-27 Thread Phillip Henry
Sorry, should have mentioned that Spark only seems reluctant to take the last windowed, groupBy batch from Kafka when using OutputMode.Append. I've asked on StackOverflow: https://stackoverflow.com/questions/62915922/spark-structured-streaming-wont-pull-the-final-batch-from-kafka but am still

Re: test

2020-07-27 Thread Ashley Hoff
Yes, your emails are getting through. On Mon, Jul 27, 2020 at 6:31 PM Suat Toksöz wrote: > user@spark.apache.org > > -- > > Best regards, > > *Suat Toksoz* > -- Kustoms On Silver

test

2020-07-27 Thread Suat Toksöz
user@spark.apache.org -- Best regards, *Suat Toksoz*

Re: Apache Spark- Help with email library

2020-07-27 Thread Suat Toksöz
Why I am not able to send my question to the spark email list? Thanks On Mon, Jul 27, 2020 at 10:31 AM tianlangstudio wrote: > I use SimpleJavaEmail http://www.simplejavamail.org/#/features for Send > email and parse email file. It is awesome and may help you. > >

Apache Spark + Python + Pyspark + Kaola

2020-07-27 Thread Suat Toksöz
Hi everyone, I want to ask for guidance for my log analyzer platform idea. I have an elasticsearch system which collects the logs from different platforms, and creates alerts. The system writes the alerts to an index on ES. Also, my alerts are stored in a folder as JSON (multi line format). The

回复:Apache Spark- Help with email library

2020-07-27 Thread tianlangstudio
I use SimpleJavaEmail http://www.simplejavamail.org/#/features for Send email and parse email file. It is awesome and may help you. TianlangStudio Some of the biggest lies: I will start tomorrow/Others are better than me/I am not good enough/I don't have time/This is the way I am

Guidance

2020-07-27 Thread Suat Toksöz
Hi everyone, I want to ask for guidance for my log analyzer platform idea. I have an elasticsearch system which collects the logs from different platforms, and creates alerts. The system writes the alerts to an index on ES. Also, my alerts are stored in a folder as JSON (multi line format). The