date:20150119

Will Spark-SQL support vectorized query engine someday?

2015-01-19 Thread Xuelin Cao

Hi, Correct me if I were wrong. It looks like, the current version of Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical operator produces a tuple by recursively call child-execute . There are papers that illustrate the benefits of vectorized query engine. And

GraphX ShortestPaths backwards?

2015-01-19 Thread Michael Malak

GraphX ShortestPaths seems to be following edges backwards instead of forwards: import org.apache.spark.graphx._ val g = Graph(sc.makeRDD(Array((1L,), (2L,), (3L,))), sc.makeRDD(Array(Edge(1L,2L,), Edge(2L,3L, lib.ShortestPaths.run(g,Array(3)).vertices.collect res1:

Re: Join the developer community of spark

2015-01-19 Thread Alessandro Baretta

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Enjoy! Alex On Mon, Jan 19, 2015 at 6:44 PM, Jeff Wang jingjingwang...@gmail.com wrote: Hi: I would like to contribute to the code of spark. Can I join the community? Thanks, Jeff

Is there any way to support multiple users executing SQL on thrift server?

2015-01-19 Thread Yi Tian

Is there any way to support multiple users executing SQL on one thrift server? I think there are some problems for spark 1.2.0, for example: 1. Start thrift server with user A 2. Connect to thrift server via beeline with user B 3. Execute “insert into table dest select … from table src” then

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies

Added a JIRA to track https://issues.apache.org/jira/browse/SPARK-5309 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Optimize-encoding-decoding-strings-when-using-Parquet-tp10141p10189.html Sent from the Apache Spark Developers List mailing list

Re: Will Spark-SQL support vectorized query engine someday?

2015-01-19 Thread Reynold Xin

It will probably eventually make its way into part of the query engine, one way or another. Note that there are in general a lot of other lower hanging fruits before you have to do vectorization. As far as I know, Hive doesn't really have vectorization because the vectorization in Hive is simply

Re: Memory config issues

2015-01-19 Thread Sean Owen

On Mon, Jan 19, 2015 at 6:29 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Its the executor memory (spark.executor.memory) which you can set while creating the spark context. By default it uses 0.6% of the executor memory (Uses 0.6 or 60%)

Re: Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-19 Thread Romi Kuntsman

in yarn-client mode it only controls the environment of the executor launcher So you either use yarn-client mode, and then your app keeps running and controlling the process Or you use yarn-cluster mode, and then you send a jar to YARN, and that jar should have code to report the result back to

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies

Here are some timings showing effect of caching last Binary-String conversion. Query times are reduced significantly and variation in timings due to reduction in garbage is very significant. Set of sample queries selecting various columns, applying some filtering and then aggregating Spark 1.2.0

Re: RDD order guarantees

2015-01-19 Thread Ewan Higgs

Hi Reynold. I'll take a look. SPARK-5300 is open for this issue. -Ewan On 19/01/15 08:39, Reynold Xin wrote: Hi Ewan, Not sure if there is a JIRA ticket (there are too many that I lose track). I chatted briefly with Aaron on this. The way we can solve it is to create a new FileSystem

Re: Semantics of LGTM

2015-01-19 Thread Prashant Sharma

Patrick's original proposal LGTM :). However until now, I have been in the impression of LGTM with special emphasis on TM part. That said, I will be okay/happy(or Responsible ) for the patch, if it goes in. Prashant Sharma On Sun, Jan 18, 2015 at 2:33 PM, Reynold Xin r...@databricks.com

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies

Looking at Parquet code - it looks like hooks are already in place to support this. In particular PrimitiveConverter has methods hasDictionarySupport and addValueFromDictionary for this purpose. These are not used by CatalystPrimitiveConverter. I think that it would be pretty straightforward to

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Reynold Xin

Definitely go for a pull request! On Mon, Jan 19, 2015 at 10:10 AM, Mick Davies michael.belldav...@gmail.com wrote: Looking at Parquet code - it looks like hooks are already in place to support this. In particular PrimitiveConverter has methods hasDictionarySupport and

Re: GraphX vertex partition/location strategy

2015-01-19 Thread Ankur Dave

No - the vertices are hash-partitioned onto workers independently of the edges. It would be nice for each vertex to be on the worker with the most adjacent edges, but we haven't done this yet since it would add a lot of complexity to avoid load imbalance while reducing the overall communication by

Re: GraphX vertex partition/location strategy

2015-01-19 Thread Michael Malak

But wouldn't the gain be greater under something similar to EdgePartition1D (but perhaps better load-balanced based on number of edges for each vertex) and an algorithm that primarily follows edges in the forward direction? From: Ankur Dave ankurd...@gmail.com To: Michael Malak

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell

The wiki does not seem to be operational ATM, but I will do this when it is back up. On Mon, Jan 19, 2015 at 12:00 PM, Patrick Wendell pwend...@gmail.com wrote: Okay - so given all this I was going to put the following on the wiki tentatively: ## Reviewing Code Community code review is

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell

Okay - so given all this I was going to put the following on the wiki tentatively: ## Reviewing Code Community code review is Spark's fundamental quality assurance process. When reviewing a patch, your goal should be to help streamline the committing process by giving committers confidence this

GraphX vertex partition/location strategy

2015-01-19 Thread Michael Malak

Does GraphX make an effort to co-locate vertices onto the same workers as the majority (or even some) of its edges? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail:

Will Spark-SQL support vectorized query engine someday?

GraphX ShortestPaths backwards?

Re: Join the developer community of spark

Is there any way to support multiple users executing SQL on thrift server?

Re: Optimize encoding/decoding strings when using Parquet

Re: Will Spark-SQL support vectorized query engine someday?

Re: Memory config issues

Re: Spark client reconnect to driver in yarn-cluster deployment mode

Re: Optimize encoding/decoding strings when using Parquet

Re: RDD order guarantees

Re: Semantics of LGTM

Re: Optimize encoding/decoding strings when using Parquet

Re: Optimize encoding/decoding strings when using Parquet

Re: GraphX vertex partition/location strategy

Re: GraphX vertex partition/location strategy

Re: Semantics of LGTM

Re: Semantics of LGTM

GraphX vertex partition/location strategy

18 matches

Site Navigation

Mail list logo

Footer information