Dataset - withColumn and withColumnRenamed that accept Column type

2018-07-13 Thread Nirav Patel
Is there a version of withColumn or withColumnRenamed that accept Column instead of String? That way I can specify FQN in case when there is duplicate column names. I can Drop column based on Column type argument then why can't I rename them based on same type argument. Use case is, I have

Spark on Mesos: Spark issuing hundreds of SUBSCRIBE requests / second and crashing Mesos

2018-07-13 Thread Nimi W
I've come across an issue with Mesos 1.4.1 and Spark 2.2.1. We launch Spark tasks using the MesosClusterDispatcher in cluster mode. On a couple of occasions, we have noticed that when the Spark Driver crashes (to various causes - human error, network error), sometimes, when the Driver is

Re: Live Streamed Code Review today at 11am Pacific

2018-07-13 Thread Holden Karau
This afternoon @ 3pm pacific I'll be looking at review tooling for Spark & Beam https://www.youtube.com/watch?v=ff8_jbzC8JI. Next week's regular Friday code (this time July 20th @ 9:30am pacific) review will once again probably have more of an ML focus for folks interested in watching Spark ML

[ML] Linear regression with SGD

2018-07-13 Thread sandy
Hi, I would like to compare different implementations of linear regression (and possibly generalised linear regression) in Spark. I was wondering why the functions for linear regression (and GLM) with stochastic gradient descent have been deprecated? I have found some old posts of people having

spark rename or access columns which has special chars " ?:

2018-07-13 Thread Great Info
I have a columns like below root |-- metadata: struct (nullable = true) ||-- "drop":{"dropPath":" https://dstpath.media27.ec2.st-av.net/drop?source_id: string (nullable = true) ||-- "selection":{"AlllURL":" https://dstpath.media27.ec2.st-av.net/image?source_id: string

Re: spark sql data skew

2018-07-13 Thread Jean Georges Perrin
Just thinking out loud… repartition by key? create a composite key based on company and userid? How big is your dataset? > On Jul 13, 2018, at 06:20, 崔苗 wrote: > > Hi, > when I want to count(distinct userId) by company,I met the data skew and the > task takes too long time,how to count

spark sql data skew

2018-07-13 Thread 崔苗
Hi, when I want to count(distinct userId) by company,I met the data skew and the task takes too long time,how to count distinct by keys on skew data in spark sql ? thanks for any reply