latency SLAs of 1
second, you can periodically restart the query without restarting the
process.
Apologies for my misdirections in that earlier thread. Hope this helps.
TD
On Wed, Feb 14, 2018 at 2:57 AM, Appu K <kut...@gmail.com> wrote:
> More specifically,
>
> Quoting TD from the
ymous/90dac8efadca3a69571e619943ddb2f6
My streaming dataframe is not using the updated data, even though the view
is updated!
Thank you
On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote:
Hi,
I had followed the instructions from the thread
https://mail-archives.apache.org/mod_mbox/spark
Hi,
I had followed the instructions from the thread
https://mail-archives.apache.org/mod_mbox/spark-user/201704.mbox/%3cd1315d33-41cd-4ba3-8b77-0879f3669...@qvantel.com%3E
while
trying to reload a static data frame periodically that gets joined to a
structured streaming query.
However, the
this in data-frames
Is shutdown hook the only solution right now ?
thanks
sajith
On 2 February 2017 at 11:58:27 AM, Appu K (kut...@gmail.com) wrote:
What would be the recommended way to close resources opened or shared by
executors?
A few use cases
#1) Let's say the enrichment process needs
Wondering whether it’ll be possible to do structured logging in Spark.
Adding "org.apache.logging.log4j" % "log4j-slf4j-impl" % “2.6.2” makes it
to complain about multiple bindings for slf4j
cheers
Appu
Are there use-cases for which it is advisable to give a value greater than
the actual number of cores to spark.executor.cores ?
of fields if the schema is not specified manually.
I believe that another job would happen if the schema is explicitly given
I hope this is helpful
Thanks.
2017-01-09 0:11 GMT+09:00 Appu K <kut...@gmail.com>:
> I was trying to create a base-data-frame in an EMR cluster from a csv f
@jacek - thanks a lot for the book
@joe - looks like the rest api also exposes a few things like
/applications/[app-id]/storage/rdd
/applications/[app-id]/storage/rdd/[rdd-id]
that might perhaps be of interest to you ?
http://spark.apache.org/docs/latest/monitoring.html
On 9 January 2017 at
Was trying something basic to understand tasks stages and shuffles a bit
better in Spark. The dataset is 256 MB
Tried this in zeppelin
val tmpDF = spark.read
.option("header", "true")
.option("delimiter", ",")
.option("inferSchema", "true")
I was trying to create a base-data-frame in an EMR cluster from a csv file
using
val baseDF =
spark.read.csv("s3://l4b-d4t4/wikipedia/pageviews-by-second-tsv”)
Omitted the options to infer the schema and specify the header, just to
understand what happens behind the screen.
The Spark UI shows
Hi,
Read somewhere that
groupByKey() in RDD disables map-side aggregation as the aggregation
function (appending to a list) does not save any space.
However from my understanding, using something like reduceByKey or
(CombineByKey + a combiner function,) we could reduce the data shuffled
, 2016 at 8:10 PM, Appu K <kut...@gmail.com> wrote:
> Hello,
>
> I’ve just ran into an issue where the job is giving me "Managed memory
> leak" with spark version 2.0.2
>
> —
> 2016-12-08 16:31:25,231 [Executor task launch worker-0]
> (TaskM
Hello,
I’ve just ran into an issue where the job is giving me "Managed memory
leak" with spark version 2.0.2
—
2016-12-08 16:31:25,231 [Executor task launch worker-0]
(TaskMemoryManager.java:381) WARN leak 46.2 MB memory from
13 matches
Mail list logo