Re: Custom metrics sink

2018-03-16 Thread Silvio Fiorito
Just set your custom sink in the org.apache.spark.metrics.sink namespace and configure metrics.properties. Use ConsoleSink as an example. Obviously since it’s private the API may change, but in the meantime that should work…

Re: Custom metrics sink

2018-03-16 Thread Felix Cheung
There is a proposal to expose them. See SPARK-14151 From: Christopher Piggott Sent: Friday, March 16, 2018 1:09:38 PM To: user@spark.apache.org Subject: Custom metrics sink Just for fun, i want to make a stupid program that makes different

change spark default for a setting without overriding user

2018-03-16 Thread Koert Kuipers
i would like to change some defaults in spark without overriding the user if she/he wishes to change them. for example currently spark.blacklist.enabled is by default false, which makes sense for backwards compatibility. i would like it to be by default true, but if the user provided --conf

Custom metrics sink

2018-03-16 Thread Christopher Piggott
Just for fun, i want to make a stupid program that makes different frequency chimes as each worker becomes active. That way you can 'hear' what the cluster is doing and how it's distributing work. I thought to do this I would make a custom Sink, but the Sink and everything else in

Re: is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1?

2018-03-16 Thread Cody Koeninger
Should be able to use the 0.8 kafka dstreams with a kafka 0.9 broker On Fri, Mar 16, 2018 at 7:52 AM, kant kodali wrote: > Hi All, > > is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1? > > Thanks, > kant

GOTO Chicago Talk / Discount

2018-03-16 Thread Trevor Grant
Hey all, We (the ASF) are putting on a booth at GOTO Chicago April 24-27. There is a Spark talk by Kelly Robinson[1]. If anyone was already planning on going and can help with the booth, there is a signup on the comdev wiki[2] (you might need to join d...@community.apache.org and ask for write

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread Aakash Basu
Hi all, >From the last mail queries in the bottom, query 1's doubt has been resolved, I was already guessing so, that I resent same columns from Kafka producer multiple times, hence the join gave duplicates. Retested with fresh Kafka feed and problem was solved. But, the other queries still

Spark 2.x Core: .setMaster(local[*]) output is different from spark-submit

2018-03-16 Thread klrmowse
when i run a job with .setMaster(local[*]), the output is as expected... but when i run it using YARN (single node, pseudo-distributed hdfs) via spark-submit, the output is fudged - instead of key-value pairs, it only shows one value preceded by a comma, and the rest are blank what am i

NPE in Subexpression Elimination optimization

2018-03-16 Thread Jacek Laskowski
Hi, I'm working on a minimal test to reproduce the NPE exception that is thrown in the latest 2.3.0 and earlier 2.2.1 in subexpression elimination optimization, and am sending it to the mailing list hoping someone notices something familiar and would shed more light on what might be the root

Time delay in multiple predicate Filter

2018-03-16 Thread Nikodimos Nikolaidis
Hello, Here’s a behavior that I find strange. A filtering with a predicate with zero selectivity is much quicker than a filtering with multiple predicates, but with the same zero-selectivity predicate in first place. For example, in google’s English One Million 1-grams dataset (Spark 2.2,

is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1?

2018-03-16 Thread kant kodali
Hi All, is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1? Thanks, kant

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread Aakash Basu
Hi all, The code was perfectly alright, just the package I was submitting had to be the updated one (marked green below). The join happened but the output has many duplicates (even though the *how *parameter is by default *inner*) - Spark Submit:

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread Aakash Basu
Hi, *Thanks to Chris and TD* for perpetually supporting my endeavor. I ran the code with a little bit of tweak here and there, *it worked well in Spark 2.2.1* giving me the Deserialized values (I used withColumn in the writeStream section to run all SQL functions of split and cast). But, when I

Re: Sparklyr and idle executors

2018-03-16 Thread Florian Dewes
I set this from within R: config <- spark_config() config$spark.shuffle.service.enabled = "true" config$spark.dynamicAllocation.enabled = "true" config$spark.dynamicAllocation.executorIdleTimeout = 120 config$spark.dynamicAllocation.maxExecutors = 80 sc <- spark_connect(master = “yarn_client",

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread sagar grover
With regards, Sagar Grover Phone - 7022175584 On Fri, Mar 16, 2018 at 12:15 AM, Aakash Basu wrote: > Awesome, thanks for detailing! > > Was thinking the same, we've to split by comma for csv while casting > inside. > > Cool! Shall try it and revert back tomm. > >

Re: Sparklyr and idle executors

2018-03-16 Thread Femi Anthony
I assume you're setting these values in spark-defaults.conf. What happens if you specify them directly to spark-submit as in --conf spark.dynamicAllocation.enabled=true ? On Thu, Mar 15, 2018 at 1:47 PM, Florian Dewes wrote: > Hi all, > > I am currently trying to enable