Spark SQL: what does an exclamation mark mean in the plan?

2015-10-19 Thread Xiao Li
Hi, all, After turning on the trace, I saw a strange exclamation mark in the intermediate plans. This happened in catalyst analyzer. Join Inner, Some((col1#0 = col1#6)) Project [col1#0,col2#1,col3#2,col2_alias#24,col3#2 AS col3_alias#13] Project [col1#0,col2#1,col3#2,col2#1 AS col2_alias#24]

Guaranteed processing orders of each batch in Spark Streaming

2015-10-19 Thread Renjie Liu
Hi, all: I've read source code and it seems that there is no guarantee that the order of processing of each RDD is guaranteed since jobs are just submitted to a thread pool. I believe that this is quite important in streaming since updates should be ordered.

Re: Unable to run applications on spark in standalone cluster mode

2015-10-19 Thread Jean-Baptiste Onofré
Hi Rohith, Do you have multiple interfaces on the machine hosting the master ? If so, can you try to force to the public interface using: sbin/start-master.sh --ip xxx.xxx.xxx.xxx Regards JB On 10/19/2015 02:05 PM, Rohith Parameshwara wrote: Hi all, I am doing some

failed mesos task loses executor

2015-10-19 Thread Adrian Bridgett
Just testing spark v1.5.0 (on mesos v0.23) and we saw something unexpected (according to the event timeline) - when a spark task failed (intermittent S3 connection failure), the whole executor was removed and was never recovered so the job proceeded slower than normal. Looking at the code I

Building Spark w/ 1.8 and binary incompatibilities

2015-10-19 Thread Iulian Dragoș
Hey all, tl;dr; I built Spark with Java 1.8 even though my JAVA_HOME pointed to 1.7. Then it failed with binary incompatibilities. I couldn’t find any mention of this in the docs, so It might be a known thing, but it’s definitely too easy to do the wrong thing. The problem is that Maven is

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell
This is what I'm looking at: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ On Mon, Oct 19, 2015 at 12:58 PM, shane knapp wrote: > all we did was reboot -05 and -03... i'm seeing a bunch of green > builds. could you provide me w/some

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell
I think many of them are coming form the Spark 1.4 builds: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-1.4-Maven-pre-YARN/3900/console On Mon, Oct 19, 2015 at 1:44 PM, Patrick Wendell wrote: > This is what I'm looking at: > > >

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
++joshrosen some of those 1.4 builds were incorrectly configured and launching on a reserved executor... josh fixed them and we're looking a lot better (meaning that we're building and not failing at launch). shane On Mon, Oct 19, 2015 at 1:49 PM, Patrick Wendell wrote: >

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
all we did was reboot -05 and -03... i'm seeing a bunch of green builds. could you provide me w/some specific failures so i can look in to them more closely? On Mon, Oct 19, 2015 at 12:27 PM, Patrick Wendell wrote: > Hey Shane, > > It also appears that every Spark build is

Re: ShuffledHashJoin Possible Issue

2015-10-19 Thread Davies Liu
Can you reproduce it on master? I can't reproduce it with the following code: >>> t2 = sqlContext.range(50).selectExpr("concat('A', id) as id") >>> t1 = sqlContext.range(10).selectExpr("concat('A', id) as id") >>> t1.join(t2).where(t1.id == t2.id).explain() ShuffledHashJoin [id#21], [id#19],

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
worker 05 is back up now... looks like the machine OOMed and needed to be kicked. On Mon, Oct 19, 2015 at 9:39 AM, shane knapp wrote: > i'll have to head down to the colo and see what's up with it... it > seems to be wedged (pings ok, can't ssh in) and i'll update the list

RE: Gradient Descent with large model size

2015-10-19 Thread Ulanov, Alexander
Evan, Joseph Thank you for valuable suggestions. It would be great to improve TreeAggregate (if possible). Making less updates would certainly make sense, though that will mean using batch gradient such as LBFGS. It seems as today it is the only viable option in Spark. I will also take a

Re: Spark SQL: what does an exclamation mark mean in the plan?

2015-10-19 Thread Michael Armbrust
It means that there is an invalid attribute reference (i.e. a #n where the attribute is missing from the child operator). On Sun, Oct 18, 2015 at 11:38 PM, Xiao Li wrote: > Hi, all, > > After turning on the trace, I saw a strange exclamation mark in > the intermediate

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell
Hey Shane, It also appears that every Spark build is failing right now. Could it be related to your changes? - Patrick On Mon, Oct 19, 2015 at 11:13 AM, shane knapp wrote: > worker 05 is back up now... looks like the machine OOMed and needed > to be kicked. > > On Mon,

Re: Spark SQL: what does an exclamation mark mean in the plan?

2015-10-19 Thread Xiao Li
Hi, Michael, Thank you again! Just found the functions that generate the ! mark /** * A prefix string used when printing the plan. * * We use "!" to indicate an invalid plan, and "'" to indicate an unresolved plan. */ protected def statePrefix = if (missingInput.nonEmpty &&

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
things are green, nice catch on the job config, josh. On Mon, Oct 19, 2015 at 1:57 PM, shane knapp wrote: > ++joshrosen > > some of those 1.4 builds were incorrectly configured and launching on > a reserved executor... josh fixed them and we're looking a lot better >

Problem using User Defined Predicate pushdown with core RDD and parquet - UDP class not found

2015-10-19 Thread Vladimir Vladimirov
Hi all I feel like this questions is more Spark dev related that Spark user related. Please correct me if I'm wrong. My project's data flow involves sampling records from the data stored as Parquet dataset. I've checked DataFrames API and it doesn't support user defined predicates projection

Re: Problem building Spark

2015-10-19 Thread Ted Yu
See this thread http://search-hadoop.com/m/q3RTtV3VFNdgNri2=Re+Build+spark+1+5+1+branch+fails > On Oct 19, 2015, at 6:59 PM, Annabel Melongo > wrote: > > I tried to build Spark according to the build directions and the it failed > due to the following error:

Re: Gradient Descent with large model size

2015-10-19 Thread Mike Hynes
Hi Alexander, Joseph, Evan, I just wanted to weigh in an empirical result that we've had on a standalone cluster with 16 nodes and 256 cores. Typically we run optimization tasks with 256 partitions for 1 partition per core, and find that performance worsens with more partitions than physical

Problem building Spark

2015-10-19 Thread Annabel Melongo
I tried to build Spark according to the build directions and the it failed due to the following error:  |   | |   |   |   |   |   | | Building Spark - Spark 1.5.1 DocumentationBuilding Spark Building with build/mvn Building a Runnable Distribution Setting up Maven’s Memory Usage Specifying the

Re: Problem building Spark

2015-10-19 Thread Tathagata Das
Seems to be a heap space issue for Maven. Have you configured Maven's memory according the instruction on the web page? export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" On Mon, Oct 19, 2015 at 6:59 PM, Annabel Melongo < melongo_anna...@yahoo.com.invalid> wrote: >