Assembly settings have an option to exclude jars. You need something
similar to:
assemblyExcludedJars in assembly = (fullClasspath in assembly) map { cp =
val excludes = Set(
minlog-1.2.jar
)
cp filter { jar = excludes(jar.data.getName) }
}
in your build file (may need to be
Appears to be a problem with boto. Make sure you have boto 2.34 on your
system.
On Wed, Apr 1, 2015 at 11:19 AM, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:
Hi all – I’m trying to bring up a spark ec2 cluster with the script below
and see the following error. Can anyone please advise as
You're probably requesting more instances than allowed by your account, so
the error gets generated for the extra instances. Try launching a smaller
cluster.
On Wed, Apr 1, 2015 at 12:41 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Hi all,
I just tried launching a Spark cluster on
I don't think its possible to access. What I've done before is send the
current or next iteration index with the message, where the message is a
case class.
HTH
Dan
On Tue, Feb 3, 2015 at 10:20 AM, Matthew Cornell corn...@cs.umass.edu
wrote:
Hi Folks,
I'm new to GraphX and Scala and my
Can you verify that its reading the entire file on each worker using
network monitoring stats? If it does, that would be a bug in my opinion.
On Mon, Nov 24, 2014 at 2:06 PM, Nitay Joffe ni...@actioniq.co wrote:
Andrei, Ashish,
To be clear, I don't think it's *counting* the entire file. It
Its not recommended to have multiple spark contexts in one JVM, but you
could launch a separate JVM per context. How resources get allocated is
probably outside the scope of Spark, and more of a task for the cluster
manager.
On Fri, Nov 14, 2014 at 12:58 PM, Charles charles...@cenx.com wrote:
I
Hello,
I'm attempting to implement a clustering algorithm on top of Pregel
implementation in GraphX, however I'm hitting a wall. Ideally, I'd like to
be able to get all edges for a specific vertex, since they factor into the
calculation. My understanding was that sendMsg function would receive
You should look at how fold is used in scala in general to help. Here is a
blog post that may also give some guidance:
http://blog.madhukaraphatak.com/spark-rdd-fold
The zero value should be your bean, with the 4th parameter set to the
minimum value. Your fold function should compare the 4th
You can use --spark-version argument to spark-ec2 to specify a GIT hash
corresponding to the version you want to checkout. If you made changes that
are not in the master repository, you can use --spark-git-repo to specify
the git repository to pull down spark from, which contains the specified
How are you launching the cluster, and how are you submitting the job to
it? Can you list any Spark configuration parameters you provide?
On Mon, Oct 20, 2014 at 12:53 PM, Daniel Mahler dmah...@gmail.com wrote:
I am launching EC2 clusters using the spark-ec2 scripts.
My understanding is that
Not directly related, but FWIW, EMR seems to back away from s3n usage:
Previously, Amazon EMR used the S3 Native FileSystem with the URI scheme,
s3n. While this still works, we recommend that you use the s3 URI scheme
for the best performance, security, and reliability.
-hdfs/conf/core-site.xml
On Mon, Oct 13, 2014 at 2:56 PM, Ranga sra...@gmail.com wrote:
The cluster is deployed on EC2 and I am trying to access the S3 files from
within a spark-shell session.
On Mon, Oct 13, 2014 at 2:51 PM, Daniil Osipov daniil.osi...@shazam.com
wrote:
So is your cluster
Try using s3n:// instead of s3 (for the credential configuration as well).
On Tue, Oct 7, 2014 at 9:51 AM, Sunny Khatri sunny.k...@gmail.com wrote:
Not sure if it's supposed to work. Can you try newAPIHadoopFile() passing
in the required configuration object.
On Tue, Oct 7, 2014 at 4:20 AM,
In the spark source folder, execute `sbt/sbt assembly`
On Thu, Sep 11, 2014 at 8:27 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
HI,
Can someone please tell me how to compile the spark source code to effect
the changes in the source code. I was trying to ship the jars to all the
Limited memory could also cause you some problems and limit usability. If
you're looking for a local testing environment, vagrant boxes may serve you
much better.
On Thu, Sep 11, 2014 at 6:18 AM, Chen He airb...@gmail.com wrote:
Pi's bus speed, memory size and access speed, and processing
Try providing full path to the file you want to write, and make sure the
directory exists and is writable by the Spark process.
On Wed, Sep 10, 2014 at 3:46 PM, Arun Luthra arun.lut...@gmail.com wrote:
I have a spark program that worked in local mode, but throws an error in
yarn-client mode on
Depending on what you want to do with the result of the scraping, Spark may
not be the best framework for your use case. Take a look at a general Akka
application.
On Sun, Sep 7, 2014 at 12:15 AM, Sandeep Singh sand...@techaddict.me
wrote:
Hi all,
I am Implementing a Crawler, Scraper. The It
Make sure your key pair is configured to access whatever region you're
deploying to - it defaults to us-east-1, but you can provide a custom one
with parameter --region.
On Sat, Aug 30, 2014 at 12:53 AM, David Matheson david.j.mathe...@gmail.com
wrote:
I'm following the latest documentation
Hello,
I've been seeing the following errors when trying to save to S3:
Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage fail
ure: Task 4058 in stage 2.1 failed 4 times, most recent failure: Lost task
4058.3 in stag
e 2.1 (TID 12572,
You could try to use foreachRDD on the result of countByWindow with a
function that performs the save operation.
On Fri, Aug 22, 2014 at 1:58 AM, Josh J joshjd...@gmail.com wrote:
Hi,
Hopefully a simple question. Though is there an example of where to save
the output of countByWindow ? I
Hello,
My job keeps failing on saveAsTextFile stage (frustrating after a 3 hour
run) with an OOM exception. The log is below. I'm running the job on an
input of ~8Tb gzipped JSON files, executing on 15 m3.xlarge instances.
Executor is given 13Gb memory, and I'm setting two custom preferences in
21 matches
Mail list logo