Re: Best practices of maintaining a long running SparkContext

2016-03-08 Thread Zhong Wang
community. Thanks, Zhong On Tue, Mar 8, 2016 at 11:13 AM, Zhong Wang <wangzhong@gmail.com> wrote: > Thanks for your insights, Deenar. I think this is really helpful to users > who want to run Zeppelin as a service. > > The caching issue we experienced seems to be a Spark

Re: SparkSQL/DataFrame - Is `JOIN USING` syntax null-safe?

2016-02-15 Thread Zhong Wang
Just checked the code and wrote some tests. Seems it is not null-safe... Shall we consider providing a null-safe option for `JOIN USING` syntax? Zhong On Mon, Feb 15, 2016 at 7:25 PM, Zhong Wang <wangzhong@gmail.com> wrote: > Is it null-safe when we use this interface? > --

SparkSQL/DataFrame - Is `JOIN USING` syntax null-safe?

2016-02-15 Thread Zhong Wang
Is it null-safe when we use this interface? -- def join(right: DataFrame, usingColumns: Seq[String], joinType: String): DataFrame Thanks, Zhong

First job is extremely slow due to executor heartbeat timeout (yarn-client)

2016-01-22 Thread Zhong Wang
Hi, I am deploying Spark 1.6.0 using yarn-client mode in our yarn cluster. Everything works fine, except the first job is extremely slow due to executor heartbeat RPC timeout: WARN netty.NettyRpcEndpointRef: Error sending message [message = Heartbeat I think this might be related to our

Redundant common columns of nature full outer join

2016-01-19 Thread Zhong Wang
Hi all, I am joining two tables with common columns using full outer join. However, the current Dataframe API doesn't support nature joins, so the output contains redundant common columns from both of the tables. Is there any way to remove these redundant columns for a "nature" full outer join?