Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Appu K
latency SLAs of 1 second, you can periodically restart the query without restarting the process. Apologies for my misdirections in that earlier thread. Hope this helps. TD On Wed, Feb 14, 2018 at 2:57 AM, Appu K <kut...@gmail.com> wrote: > More specifically, > > Quoting TD from the

Re: Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Appu K
ymous/90dac8efadca3a69571e619943ddb2f6 My streaming dataframe is not using the updated data, even though the view is updated! Thank you On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote: Hi, I had followed the instructions from the thread https://mail-archives.apache.org/mod_mbox/spark

Spark structured streaming: periodically refresh static data frame

2018-02-14 Thread Appu K
Hi, I had followed the instructions from the thread https://mail-archives.apache.org/mod_mbox/spark-user/201704.mbox/%3cd1315d33-41cd-4ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a static data frame periodically that gets joined to a structured streaming query. However, the

Re: Closing resources in the executor

2017-02-02 Thread Appu K
this in data-frames Is shutdown hook the only solution right now ? thanks sajith On 2 February 2017 at 11:58:27 AM, Appu K (kut...@gmail.com) wrote: What would be the recommended way to close resources opened or shared by executors? A few use cases #1) Let's say the enrichment process needs

log4j2 support in Spark

2017-01-15 Thread Appu K
Wondering whether it’ll be possible to do structured logging in Spark. Adding "org.apache.logging.log4j" % "log4j-slf4j-impl" % “2.6.2” makes it to complain about multiple bindings for slf4j cheers Appu

Tuning spark.executor.cores

2017-01-09 Thread Appu K
Are there use-cases for which it is advisable to give a value greater than the actual number of cores to spark.executor.cores ?

Re: Unable to explain the job kicked off for spark.read.csv

2017-01-09 Thread Appu K
of fields if the schema is not specified manually. I believe that another job would happen if the schema is explicitly given I hope this is helpful Thanks. 2017-01-09 0:11 GMT+09:00 Appu K <kut...@gmail.com>: > I was trying to create a base-data-frame in an EMR cluster from a csv f

Re: Storage history in web UI

2017-01-08 Thread Appu K
@jacek - thanks a lot for the book @joe - looks like the rest api also exposes a few things like /applications/[app-id]/storage/rdd /applications/[app-id]/storage/rdd/[rdd-id] that might perhaps be of interest to you ? http://spark.apache.org/docs/latest/monitoring.html On 9 January 2017 at

Spark UI - Puzzling “Input Size / Records” in Stage Details

2017-01-08 Thread Appu K
Was trying something basic to understand tasks stages and shuffles a bit better in Spark. The dataset is 256 MB Tried this in zeppelin val tmpDF = spark.read .option("header", "true") .option("delimiter", ",") .option("inferSchema", "true")

Unable to explain the job kicked off for spark.read.csv

2017-01-08 Thread Appu K
I was trying to create a base-data-frame in an EMR cluster from a csv file using val baseDF = spark.read.csv("s3://l4b-d4t4/wikipedia/pageviews-by-second-tsv”) Omitted the options to infer the schema and specify the header, just to understand what happens behind the screen. The Spark UI shows

groupByKey vs reduceByKey

2016-12-09 Thread Appu K
Hi, Read somewhere that groupByKey() in RDD disables map-side aggregation as the aggregation function (appending to a list) does not save any space. However from my understanding, using something like reduceByKey or (CombineByKey + a combiner function,) we could reduce the data shuffled

Re: Managed memory leak : spark-2.0.2

2016-12-08 Thread Appu K
, 2016 at 8:10 PM, Appu K <kut...@gmail.com> wrote: > Hello, > > I’ve just ran into an issue where the job is giving me "Managed memory > leak" with spark version 2.0.2 > > — > 2016-12-08 16:31:25,231 [Executor task launch worker-0] > (TaskM

Managed memory leak : spark-2.0.2

2016-12-08 Thread Appu K
Hello, I’ve just ran into an issue where the job is giving me "Managed memory leak" with spark version 2.0.2 — 2016-12-08 16:31:25,231 [Executor task launch worker-0] (TaskMemoryManager.java:381) WARN leak 46.2 MB memory from