Re: Kryo On Spark 1.6.0

2017-01-10 Thread Richard Startin
Hi Enrico, Only set spark.kryo.registrationRequired if you want to forbid any classes you have not explicitly registered - see http://spark.apache.org/docs/latest/configuration.html. Configuration - Spark 2.0.2 Documentation

Re: ToLocalIterator vs collect

2017-01-05 Thread Richard Startin
Why not do that with spark sql to utilise the executors properly, rather than a sequential filter on the driver. Select * from A left join B on A.fk = B.fk where B.pk is NULL limit k If you were sorting just so you could iterate in order, this might save you a couple of sorts too.

Re: withColumn gives "Can only zip RDDs with same number of elements in each partition" but not with a LIMIT on the dataframe

2016-12-20 Thread Richard Startin
I think limit repartitions your data into a single partition if called as a non terminal operator. Hence zip works after limit because you only have one partition. In practice, I have found joins to be much more applicable than zip because of the strict limitation of identical partitions.

Re: Spark streaming completed batches statistics

2016-12-07 Thread Richard Startin
Ok it looks like I could reconstruct the logic in the Spark UI from the /jobs resource. Thanks. https://richardstartin.com/ From: map reduced <k3t.gi...@gmail.com> Sent: 07 December 2016 19:49 To: Richard Startin Cc: user@spark.apache.org Subject: Re:

Re: Spark streaming completed batches statistics

2016-12-07 Thread Richard Startin
Is there any way to get this information as CSV/JSON? https://docs.databricks.com/_images/CompletedBatches.png [https://docs.databricks.com/_images/CompletedBatches.png] https://richardstartin.com/ From: Richard Startin <richardstar...@outlook.com> Se

Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Richard Startin
I've seen the feature work very well. For tuning, you've got: spark.streaming.backpressure.pid.proportional (defaults to 1, non-negative) - weight for response to "error" (change between last batch and this batch) spark.streaming.backpressure.pid.integral (defaults to 0.2, non-negative) -

Spark streaming completed batches statistics

2016-12-05 Thread Richard Startin
Is there any way to get a more computer friendly version of the completes batches section of the streaming page of the application master? I am very interested in the statistics and am currently screen-scraping... https://richardstartin.com

Re: Livy with Spark

2016-12-05 Thread Richard Startin
There is a great write up on Livy at http://henning.kropponline.de/2016/11/06/ On 5 Dec 2016, at 14:34, Mich Talebzadeh > wrote: Hi, Has there been any experience using Livy with Spark to share multiple Spark contexts? thanks Dr