you will need to change sbt version to 13.2. I think spark 0.9.1 was
released with sbt 13 ? Incase not then it may not work with java 8. Just
wait for 1.0 release or give 1.0 release candidate a try !
Thanks for your quick replay.
I tried with fresh installation, it downloads sbt 0.12.4 only (please check
below logs). So it is not working. Can you tell where this 1.0 release
candidate located which i can try?
dhcp-173-39-68-28:spark-0.9.1 neravi$ ./sbt/sbt assembly
Attempting to fetch sbt
I have pasted the link in my previous post.
Prashant Sharma
On Fri, May 2, 2014 at 4:15 PM, N.Venkata Naga Ravi nvn_r...@hotmail.comwrote:
Thanks for your quick replay.
I tried with fresh installation, it downloads sbt 0.12.4 only (please
check below logs). So it is not working. Can you
not sure why applying concat to reference. conf didn't work for you. since
it simply concatenates the files the key akka.version should be preserved.
we had the same situation for a while without issues.
On May 1, 2014 8:46 PM, Shivani Rao raoshiv...@gmail.com wrote:
Hello Koert,
That did not
Spark would be much faster on process_local instead of node_local.
Node_local references data from local harddisk, process_local references
data from in-memory thread.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Tue,
Anyone have any guidance on using a broadcast variable to ship data to
workers vs. an RDD?
Like, say I'm joining web logs in an RDD with user account data. I could
keep the account data in an RDD or if it's small, a broadcast variable
instead. How small is small? Small enough that I know it
I had like to be corrected on this but I am just trying to say small enough
of the order of few 100 MBs. Imagine the size gets shipped to all nodes, it
can be a GB but not GBs and then depends on the network too.
Prashant Sharma
On Fri, May 2, 2014 at 6:42 PM, Diana Carroll
Thanks Prashant . The 1.0 RC version is working fine in my system.
Let me explore further and get back you.
Thanks Again,
Ravi
From: scrapco...@gmail.com
Date: Fri, 2 May 2014 16:22:40 +0530
Subject: Re: Apache Spark is not building in Mac/Java 8
To: user@spark.apache.org
I have pasted the link
Great reference! I just skimmed through the results without reading
much of the methodology - but it looks like Spark outperforms
Stratosphere fairly consistently in the experiments. It's too bad the
data sources only range from 2GB to 8GB. Who knows if the apparent
pattern would extend out
looks like Spark outperforms Stratosphere fairly consistently in the
experiments
There was one exception the paper noted, which was when memory resources were
constrained. In that case, Stratosphere seemed to have degraded more gracefully
than Spark, but the author did not explore it deeper.
Thank you very much. Making the trait serializable worked.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Task-not-serializable-collect-take-tp5193p5236.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi All,
I encountered this problem when the firewall is enabled between the spark-shell
and the Workers.
When I launch spark-shell in yarn-client mode, I notice that Workers on the
YARN containers are trying to talk to the driver (spark-shell), however, the
firewall is not opened and caused
Deenar,
I haven't heard of any activity to do partitioning in that way, but it does
seem more broadly valuable.
On Fri, May 2, 2014 at 10:15 AM, deenar.toraskar deenar.toras...@db.comwrote:
I have equal sized partitions now, but I want the RDD to be partitioned
such
that the partitions are
What is the most efficient way to an RDD of GraphX vertices and their
connected edges? Initially I though I could use mapReduceTriplet, but I
realized that would neglect vertices that aren't connected to anything
Would I have to do a mapReduceTriplet and then do a join with all of the
vertices to
Hello Stephen,
My goal was to run spark on a cluster that already had spark and hadoop
installed. So the right thing to do was to remove these dependencies in my
spark build. I wrote a
bloghttp://myresearchdiaries.blogspot.com/2014/05/building-apache-spark-jars.html
about
it so that it might
I think what you want to do is set spark.driver.port to a fixed port.
On Fri, May 2, 2014 at 1:52 PM, Andrew Lee alee...@hotmail.com wrote:
Hi All,
I encountered this problem when the firewall is enabled between the
spark-shell and the Workers.
When I launch spark-shell in yarn-client
I have mucked around this a little bit. The first step to make this happen
is to build a fat jar. I wrote a quick
bloghttp://myresearchdiaries.blogspot.com/2014/05/building-apache-spark-jars.htmldocumenting
my learning curve w.r.t that.
The next step is to schedule this as a java action. Since
Could be a bug. Can you share a code with data that I can use to reproduce
this?
TD
On May 2, 2014 9:49 AM, Adrian Mocanu amoc...@verticalscope.com wrote:
Has anyone else noticed that *sometimes* the same tuple calls update
state function twice?
I have 2 tuples with the same key in 1 RDD
I have opened a PR for discussion on the apache/spark repository
https://github.com/apache/spark/pull/620
There is certainly a classLoader problem in the way Mesos and Spark operate,
I'm not sure what caused it to suddenly stop working so I'd like to open the
discussion there
--
View this
Hi Yana,
I did. I configured the the port in spark-env.sh, the problem is not the driver
port which is fixed.it's the Workers port that are dynamic every time when they
are launched in the YARN container. :-(
Any idea how to restrict the 'Workers' port range?
Date: Fri, 2 May 2014 14:49:23
If you end up with a really long dependency tree between RDDs (like 100+)
people have reported success with using the .checkpoint() method. This
computes the RDD and then saves it, flattening the dependency tree. It
turns out that having a really long RDD dependency graph causes
serialization
We have a spark server already running. When invoking spark-shell a new
http server is attempted to be started
spark.HttpServer: Starting HTTP Server
But that attempts results in a BindException due to the preexisting
server:
java.net.BindException: Address already in use
What is the
Howdy Andrew,
I think I am running into the same issue [1] as you. It appears that Spark
opens up dynamic / ephemera [2] ports for each job on the shell and the
workers. As you are finding out, this makes securing and managing the
network for Spark very difficult.
Any idea how to restrict
Do you mean you want to obtain a list of adjacent edges for every vertex? A
mapReduceTriplets followed by a join is the right way to do this. The join
will be cheap because the original and derived vertices will share indices.
There's a built-in function to do this for neighboring vertex
Hi I tried to build docker image for spark 0.9.1 but get the following
error.
any one has experience resolving the issue ?
The following packages have unmet dependencies:
tzdata-java : Depends: tzdata (= 2012b-1) but 2013g-0ubuntu0.12.04 is to
be installed
E: Unable to correct problems, you
yes, the docker script is there inside spark source package. It already
specifies the master and worker container to run in different docker
containers. Mainly it is used for easy deployment and development in my
scenario.
On Fri, May 2, 2014 at 2:30 PM, Nicholas Chammas
We’ve had some pretty awesome presentations at the Seattle Spark Meetup - here
are the links to the various slides:
Seattle Spark Meetup KickOff with DataBricks | Introduction to Spark with Matei
Zaharia and Pat McDonough
Learnings from Running Spark at Twitter sessions
Ben Hindman’s Mesos
I am using Spark 0.9.1 in standalone mode. In the
SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my
directory called mycode in which I have placed some standalone scala code.
I was able to compile. I ran the code using:
./bin/run-example org.apache.spark.mycode.MyClass
ok, we figured it out. It is a bit weird, but for some reason, the
YARN_CONF_DIR and HADOOP_CONF_DIR did not propagate out. We do see it in
the build classpath, but the remote machines don't seem to get it. So we
added:
export SPARK_YARN_USER_ENV=CLASSPATH=/hadoop/var/hadoop/conf/
and it seems
You can drop header in csv by
rddData.mapPartitionsWithIndex((partitionIdx: Int, lines: Iterator[String])
= {
if (partitionIdx == 0) {
lines.drop(1)
}
lines
}
On May 2, 2014 6:02 PM, SK skrishna...@gmail.com wrote:
1) I have a csv file where one of the field has integer data but it
Hi,
Lets say I have millions of binary format files... Lets say I have this
java (or python) library which reads and parses these binary formatted
files..
Say
import foo
f = foo.open(filename)
header = f.get_header()
and some other methods..
What I was thinking was to write hadoop input
31 matches
Mail list logo