Hi TD,
You can always run two jobs on the same cached RDD, and they can run in
parallel (assuming you launch the 2 jobs from two different threads)
Is this a correct way to launch jobs from two different threads?
val threadA = new Thread(new Runnable {
def run() {
for(i- 0 until
Hi,
I just wrote an application that intends to submit its actions(jobs) via
independent threads keeping in view of the point: Second, within each
Spark application, multiple “jobs” (Spark actions) may be running
concurrently if they were submitted by different threads, mentioned in:
Hi,
I run into Task not Serializable excption with following code below. When I
remove the threads and run, it works, but with threads I run into Task not
serializable exception.
object SparkKart extends Serializable{
def parseVector(line: String): Vector[Double] = {
DenseVector(line.split('
I could trace where the problem is. If I run without any threads, it works
fine. When I allocate threads, I run into Not serializable problem. But, I
need to have threads in my code.
Any help please!!!
This is my code:
object SparkKart
{
def parseVector(line: String): Vector[Double] = {
Hi,
I have a file containig data in the following way:
0.0 0.0 0.0
0.1 0.1 0.1
0.2 0.2 0.2
9.0 9.0 9.0
9.1 9.1 9.1
9.2 9.2 9.2
Now I do the folloowing:
val kPoints = data.takeSample(withReplacement = false, 4, 42).toArray
val thread1= new Thread(new Runnable {
def run() {
Are you replicating any RDDs?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-Filesystem-closed-tp20150p21749.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
I have HDFS file of size 598MB. I create RDD over this file and cache it in
RAM in a 7 node cluster with 2G RAM each. I find that each partition gets
replicated thrice or even 4 times in the cluster even without me specifying
in code. Total partitions are 5 for the RDD created but cached
Hi,
My spark cluster contains machines like Pentium-4, dual core and quad-core
machines. I am trying to run a character frequency count application. The
application contains several threads, each submitting a job(action) that
counts the frequency of a single character. But, my problem is, I get
Hi,
Can someone please suggest some real life application implemented in spark
( things like gene sequencing) that is of type below code. Basically, the
application should have jobs submitted via as many threads as possible. I
need similar kind of spark application for benchmarking.
val
Hi,
I have this doubt: Assume that an rdd is stored across multiple nodes and
one of the nodes fails. So, a partition is lost. Now, I know that when this
node is back, it uses the lineage from its neighbours and recomputes that
partition alone.
1) How does it get the source data (original data
Hi,
I keep facing this error when I run my application:
java.io.IOException: Connection from s1/- closed +details
java.io.IOException: Connection from s1/:43741 closed
at
When I increase the executor.memory size, I run it smoothly without any
errors.
On Sat, Jan 24, 2015 at 9:29 PM, Rapelly Kartheek kartheek.m...@gmail.com
wrote:
Hi,
While running spark application, I get the following Exception leading to
several failed stages.
Exception in thread
Hi,
While running spark application, I get the following Exception leading to
several failed stages.
Exception in thread Thread-46 org.apache.spark.SparkException: Job
aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 11.0 (TID 262,
-- Forwarded message --
From: Rapelly Kartheek kartheek.m...@gmail.com
Date: Mon, Jan 19, 2015 at 3:03 PM
Subject: UnknownhostException : home
To: user@spark.apache.org user@spark.apache.org
Hi,
I get the following exception when I run my application:
Hi,
This is what I am trying to do:
karthik@s4:~/spark-1.2.0$ SPARK_HADOOP_VERSION=2.3.0 sbt/sbt clean
Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
[info] Loading project definition from
/home/karthik/spark-1.2.0/project/project
Hi,
I get the following error when I build spark-1.2.0 using sbt:
[error] Nonzero exit code (128): git clone
https://github.com/ScrapCodes/sbt-pom-reader.git
/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader
[error] Use 'last' for the full log.
Any help please?
Thanks
--
The problem is that my network is not able to access github.com for cloning
some dependencies as github is blocked in India. What are the other
possible ways for this problem??
Thank you!
On Sun, Jan 4, 2015 at 9:45 PM, Rapelly Kartheek kartheek.m...@gmail.com
wrote:
Hi,
I get the following
Hi Sean,
I tried even with sc as: sc.parallelize(data). But. I get the error: value
sc not found.
On Sun, Oct 12, 2014 at 1:47 PM, sowen [via Apache Spark User List]
ml-node+s1001560n16233...@n3.nabble.com wrote:
It is a method of the class, not a static method of the object. Since a
Does SparkContext exists when this part (AskDriverWithReply()) of the
scheduler code gets executed?
On Sun, Oct 12, 2014 at 1:54 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi Sean,
I tried even with sc as: sc.parallelize(data). But. I get the error: value
sc not found.
On Sun, Oct
Thank you yuanbosoft.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13444.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe,
Thank you Raymond and Tobias.
Yeah, I am very clear about what I was asking. I was talking about
replicated rdd only. Now that I've got my understanding about job and
application validated, I wanted to know if we can replicate an rdd and run
two jobs (that need same rdd) of an application in
Thank you so much for the link, Sujeet.
regards
Karthik
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Scheduling-in-spark-tp9035p9716.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thank you Andrew for the updated link.
regards
Karthik
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Scheduling-in-spark-tp9035p9717.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
23 matches
Mail list logo