Hi all,
sometimes you can see OutOfMemoryException: Java heap space of executor in
Spark. There many ideas about how to work arounds.
My question is: how does executor execute tasks from the point of view of
memory usage and parallelism?
Picture in my mind is:
Executor is JVM instance. Number
Hi guys,
I don't have exact picture about preserving of ordering of elements of RDD
after executing of operations.
Which operations preserve it?
1) Map (Yes?)
2) ZipWithIndex (Yes or sometimes yes?)
Serg.
--
View this message in context:
Is it a way to tunnel Spark UI?
I tried to tunnel client-node:4040 but my browser was redirected from
localhost to some cluster locally visible domain name..
Maybe there is some startup option to encourage Spark UI be fully
accessiable just through single endpoint (address:port)?
Serg.
--
Hi,
I executed a task on Spark in YARN and it failed.
I see just executor lost message from YARNClientScheduler, no further
details..
(I read ths error can be connected to spark.yarn.executor.memoryOverhead
setting and already played with this param)
How to go more deeply in details in log files
Hi,
I try to vectorize on yarn cluster corpus of texts (about 500K texts in 13
files - 100GB totally) located in HDFS .
This process already token about 20 hours on 3 node cluster with 6 cores,
20GB RAM on each node.
In my opinion it's to long :-)
I started the task with the following command:
Does map(...) preserve ordering of original RDD?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-ordering-after-map-tp22129.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
What persistance level is better if RDD to be cached is heavily to be
recalculated?
Am I right it is MEMORY_AND_DISK?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/MEMORY-ONLY-vs-MEMORY-AND-DISK-tp22130.html
Sent from the Apache Spark User List mailing
I have a 30GB gzip file (originally that is text file where each line
represents text document) in HDFS and Spark 1.2.0 under YARN cluster with 3
worker nodes with 64GB RAM and 4 cores on each node.
Replictaion factor for my file is 3.
I tried to implement simple pyspark script to parse this file
Does somebody used SVD from MLlib for very large (like 10^6 x 10^7) sparse
matrix?
What time did it take?
What implementation of SVD is used in MLLib?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SVD-transform-of-large-matrix-with-MLlib-tp22005.html
Hi!
I downloaded and extracted Spark to local folder under windows 7 and have
successfully played with it in pyspark interactive shell.
BUT
When I try to use spark-submit (for example: job-submit pi.py ) I get:
C:\spark-1.2.1-bin-hadoop2.4\binspark-submit.cmd pi.py
Using Spark's default log4j
10 matches
Mail list logo