Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-02 Thread Mark Hamstra
+1 On Wed, Mar 2, 2016 at 2:45 PM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.1! > > The vote is open until Saturday, March 5, 2016 at 20:00 UTC and passes if > a majority of at least 3+1 PMC votes are cast. >

About the exception "Received LaunchTask command but executor was null"

2016-03-02 Thread Sea
Hi, all: Sometimes task will fail with exception "About the exception "Received LaunchTask command but executor was null", and I find it is a common problem: https://issues.apache.org/jira/browse/SPARK-13112 https://issues.apache.org/jira/browse/SPARK-13060 I have a

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-02 Thread Reynold Xin
SQL is very common and even some business analysts learn them. Scala and Python are great, but the easiest language to use is often the languages a user already knows. And for a lot of users, that is SQL. On Wednesday, March 2, 2016, Jerry Lam wrote: > Hi guys, > > FYI...

Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-02 Thread Davies Liu
I see, we could reduce the memory by moving the copy out of the HashedRelation, then we should do the copy before call HashedRelation for shuffle hash join. Another things is that when we do broadcasting, we will have another serialized copy of hash table. For the table that's larger than 100M,

Re: Upgrading to Kafka 0.9.x

2016-03-02 Thread Cody Koeninger
Jay, thanks for the response. Regarding the new consumer API for 0.9, I've been reading through the code for it and thinking about how it fits in to the existing Spark integration. So far I've seen some interesting challenges, and if you (or anyone else on the dev list) have time to provide some

Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-02 Thread Matt Cheah
I would expect the memory pressure to grow because not only are we storing the backing array to the iterator of the rows on the driver, but we’re also storing a copy of each of those rows in the hash table. Whereas if we didn’t do the copy on the drive side then the hash table would only have to

Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-02 Thread Davies Liu
UnsafeHashedRelation and HashedRelation could also be used in Executor (for non-broadcast hash join), then the UnsafeRow could come from UnsafeProjection, so We should copy the rows for safety. We could have a smarter copy() for UnsafeRow (avoid the copy if it's already copied), but I don't think

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-02 Thread Jerry Lam
Hi guys, FYI... this wiki page (StreamSQL: https://en.wikipedia.org/wiki/StreamSQL) has some histories related Event Stream Processing and SQL. Hi Steve, It is difficult to ask your customers that they should learn a new language when they are not programmers :) I don't know where/why they

Re: Selecting column in dataframe created with incompatible schema causes AnalysisException

2016-03-02 Thread Michael Armbrust
-dev +user StructType(StructField(data,ArrayType(StructType(StructField( > *stuff,ArrayType(*StructType(StructField(onetype,ArrayType(StructType(StructField(id,LongType,true), > StructField(name,StringType,true)),true),true), StructField(othertype, >

Selecting column in dataframe created with incompatible schema causes AnalysisException

2016-03-02 Thread Ewan Leith
When you create a dataframe using the sqlContext.read.schema() API, if you pass in a schema that's compatible with some of the records, but incompatible with others, it seems you can't do a .select on the problematic columns, instead you get an AnalysisException error. I know loading the wrong