Iterative Streaming with Spark

2019-10-29 Thread vibhatha
Hi, I am doing a benchmark in Flink, Storm, Spark for an iterative streaming application. The goal is to make a window for a stream and do an iterative computation per window. Both Flink and Storm provides a window function with a list or iterator. But in Spark, I am not quite sure how to do

Deleting columns within nested arrays/structs?

2019-10-29 Thread Jeff Evans
The starting point for the code is the various answer to this StackOverflow question. Fixing some of the issues there, I end up with the following: def dropColumn(df: DataFrame, colName: String):

MultiObjectDeleteException

2019-10-29 Thread Prudhvi Chennuru (CONT)
Hi , I am running spark batch jobs on kubernetes cluster and intermittently i am seeing MultiObjectDeleteException spark version: 2.3.0 kubernetes version: 1.11.5 aws-java-sdk: 1.7.4.jar hadoop-aws: 2.7.3.jar I even added *spark.hadoop.fs.s3a.multiobjectdelete.enable=false* property to

Re: Recover RFormula Column Names

2019-10-29 Thread Andrew Redd
Thanks Alessandro! That did the trick. I all of the indices and interactions are in the metadata. I also wanted to confirm that this solution works in pyspark as the metadata is carried over. Andrew On Tue, Oct 29, 2019 at 5:26 AM Alessandro Solimando < alessandro.solima...@gmail.com> wrote: >

Re: Recover RFormula Column Names

2019-10-29 Thread Alessandro Solimando
Hello Andrew, few years ago I had the same need and I found this SO's answer the way to go. Here an extract of my (Scala) code (which was doing other things on top), I have removed the irrelevant parts but without testing it, so it might not work out

Re: Spark - configuration setting doesn't work

2019-10-29 Thread Chetan Khatri
Ok, thanks. I wanted to confirm that. On Sun, Oct 27, 2019 at 12:55 PM hemant singh wrote: > You should add the configurations while creating the session, I don’t > think you can override it once the session is created. Few are though. > > Thanks, > Hemant > > On Sun, 27 Oct 2019 at 11:02 AM,

Re: Spark Cluster over yarn cluster monitoring

2019-10-29 Thread Chetan Khatri
Thanks Jörn On Sun, Oct 27, 2019 at 8:01 AM Jörn Franke wrote: > Use yarn queues: > > > https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html > > Am 27.10.2019 um 06:41 schrieb Chetan Khatri >: > >  > Could someone please help me to understand better.. > > On