Will Spark-SQL support vectorized query engine someday?

2015-01-19 Thread Xuelin Cao
Hi, Correct me if I were wrong. It looks like, the current version of Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical operator produces a tuple by recursively call child-execute . There are papers that illustrate the benefits of vectorized query engine. And

GraphX ShortestPaths backwards?

2015-01-19 Thread Michael Malak
GraphX ShortestPaths seems to be following edges backwards instead of forwards: import org.apache.spark.graphx._ val g = Graph(sc.makeRDD(Array((1L,), (2L,), (3L,))), sc.makeRDD(Array(Edge(1L,2L,), Edge(2L,3L, lib.ShortestPaths.run(g,Array(3)).vertices.collect res1:

Re: Join the developer community of spark

2015-01-19 Thread Alessandro Baretta
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Enjoy! Alex On Mon, Jan 19, 2015 at 6:44 PM, Jeff Wang jingjingwang...@gmail.com wrote: Hi: I would like to contribute to the code of spark. Can I join the community? Thanks, Jeff

Is there any way to support multiple users executing SQL on thrift server?

2015-01-19 Thread Yi Tian
Is there any way to support multiple users executing SQL on one thrift server? I think there are some problems for spark 1.2.0, for example: 1. Start thrift server with user A 2. Connect to thrift server via beeline with user B 3. Execute “insert into table dest select … from table src” then

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies
Added a JIRA to track https://issues.apache.org/jira/browse/SPARK-5309 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Optimize-encoding-decoding-strings-when-using-Parquet-tp10141p10189.html Sent from the Apache Spark Developers List mailing list

Re: Will Spark-SQL support vectorized query engine someday?

2015-01-19 Thread Reynold Xin
It will probably eventually make its way into part of the query engine, one way or another. Note that there are in general a lot of other lower hanging fruits before you have to do vectorization. As far as I know, Hive doesn't really have vectorization because the vectorization in Hive is simply

Re: Memory config issues

2015-01-19 Thread Sean Owen
On Mon, Jan 19, 2015 at 6:29 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Its the executor memory (spark.executor.memory) which you can set while creating the spark context. By default it uses 0.6% of the executor memory (Uses 0.6 or 60%)

Re: Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-19 Thread Romi Kuntsman
in yarn-client mode it only controls the environment of the executor launcher So you either use yarn-client mode, and then your app keeps running and controlling the process Or you use yarn-cluster mode, and then you send a jar to YARN, and that jar should have code to report the result back to

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies
Here are some timings showing effect of caching last Binary-String conversion. Query times are reduced significantly and variation in timings due to reduction in garbage is very significant. Set of sample queries selecting various columns, applying some filtering and then aggregating Spark 1.2.0

Re: RDD order guarantees

2015-01-19 Thread Ewan Higgs
Hi Reynold. I'll take a look. SPARK-5300 is open for this issue. -Ewan On 19/01/15 08:39, Reynold Xin wrote: Hi Ewan, Not sure if there is a JIRA ticket (there are too many that I lose track). I chatted briefly with Aaron on this. The way we can solve it is to create a new FileSystem

Re: Semantics of LGTM

2015-01-19 Thread Prashant Sharma
Patrick's original proposal LGTM :). However until now, I have been in the impression of LGTM with special emphasis on TM part. That said, I will be okay/happy(or Responsible ) for the patch, if it goes in. Prashant Sharma On Sun, Jan 18, 2015 at 2:33 PM, Reynold Xin r...@databricks.com

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Mick Davies
Looking at Parquet code - it looks like hooks are already in place to support this. In particular PrimitiveConverter has methods hasDictionarySupport and addValueFromDictionary for this purpose. These are not used by CatalystPrimitiveConverter. I think that it would be pretty straightforward to

Re: Optimize encoding/decoding strings when using Parquet

2015-01-19 Thread Reynold Xin
Definitely go for a pull request! On Mon, Jan 19, 2015 at 10:10 AM, Mick Davies michael.belldav...@gmail.com wrote: Looking at Parquet code - it looks like hooks are already in place to support this. In particular PrimitiveConverter has methods hasDictionarySupport and

Re: GraphX vertex partition/location strategy

2015-01-19 Thread Ankur Dave
No - the vertices are hash-partitioned onto workers independently of the edges. It would be nice for each vertex to be on the worker with the most adjacent edges, but we haven't done this yet since it would add a lot of complexity to avoid load imbalance while reducing the overall communication by

Re: GraphX vertex partition/location strategy

2015-01-19 Thread Michael Malak
But wouldn't the gain be greater under something similar to EdgePartition1D (but perhaps better load-balanced based on number of edges for each vertex) and an algorithm that primarily follows edges in the forward direction? From: Ankur Dave ankurd...@gmail.com To: Michael Malak

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell
The wiki does not seem to be operational ATM, but I will do this when it is back up. On Mon, Jan 19, 2015 at 12:00 PM, Patrick Wendell pwend...@gmail.com wrote: Okay - so given all this I was going to put the following on the wiki tentatively: ## Reviewing Code Community code review is

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell
Okay - so given all this I was going to put the following on the wiki tentatively: ## Reviewing Code Community code review is Spark's fundamental quality assurance process. When reviewing a patch, your goal should be to help streamline the committing process by giving committers confidence this

GraphX vertex partition/location strategy

2015-01-19 Thread Michael Malak
Does GraphX make an effort to co-locate vertices onto the same workers as the majority (or even some) of its edges? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: