Could anyone figure out what is going in my spark cluster?
Thanks in advance
Paolo
Inviata dal mio Windows Phone
Da: Paolo Plattermailto:paolo.plat...@agilelab.it
Inviato: 06/02/2015 10:48
A: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: spark
Hi,
You can make a image of ec2 with all the python libraries installed and
create a bash script to export python_path in the /etc/init.d/ directory.
Then you can launch the cluster with this image and ec2.py
Hope this can be helpful
Cheers
Gen
On Sun, Feb 8, 2015 at 9:46 AM, Chengi Liu
Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of
machine do you use?
If I guess well, you can check the file /etc/fstab. There would be a double
mount of /dev/xvdb.
If yes, you should
1. stop hdfs
2. umount /dev/xvdb at /
3. restart hdfs
Hope this could be helpful.
://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz'
I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading
'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to
'/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-
Gen,
Thanks for your information. The content of /etc/fstab at the worker node
(r3.large) is:
#LABEL=/ / ext4defaults,noatime 1 1tmpfs /dev/shm
tmpfs defaults0 0devpts /dev/ptsdevpts gid=5,mode=620 0
0sysfs /syssysfs
Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about
double mount. However, there is no information about /mnt2. You should
check whether /dev/sdc is well mounted or not.
The reply of Micheal is good solution about this type of problem. You can
check his site.
Cheers
Gen
Thanks Gen. How can I check if /dev/sdc is well mounted or not? In general,
the problem shows up when I submit the second or third job. The first job I
submit most likely will succeed.
Ey-Chih Chow
Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From:
Thanks Michael. I didn't edit core-site.xml. We use the default one. I only
saw hdaoop.tmp.dir in core-site.xml, pointing to /mnt/ephemeral-hdfs. How can
I edit the config file?
Best regards,
Ey-Chih
Date: Sun, 8 Feb 2015 16:51:32 +
From: m_albert...@yahoo.com
To: gen.tan...@gmail.com;
] Downloading
'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to
'/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
I0208 12:58
/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource
'/local/vdbogert/var/lib/mesos//slaves/20150206-110658
I think I have this right:
You will run one executor per application per worker. Generally there
is one worker per machine, and it manages all of the machine's
resources. So if you want one app to use this whole machine you need
to ask for 48G and 24 cores. That's better than splitting up the
Hi, I have some questions about how the spark run the job concurrently.
For example, if I setup the Spark on one standalone test box, which has 24 core
and 64G memory. I setup the Worker memory to 48G, and Executor memory to 4G,
and using spark-shell to run some jobs. Here is something confusing
I changed the
curGraph = curGraph.outerJoinVertices(curMessages)(
(vid, vertex, message) =
vertex.process(message.getOrElse(List[Message]()), ti)
).cache()
to
curGraph = curGraph.outerJoinVertices(curMessages)(
(vid, vertex, message) = (vertex,
Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been
mounted in /mnt. Therefore this is no /dev/sdc.
In fact, the problem is that there is no space in the under / directory. So
you should check whether your application write data under this
directory(for instance, save
On Sun, Feb 8, 2015 at 10:26 PM, java8964 java8...@hotmail.com wrote:
standalone one box environment, if I want to use all 48G memory allocated to
worker for my application, I should ask 48G memory for the executor in the
spark shell, right? Because 48G is too big for a JVM heap in normal case,
Hi Gen,
Thanks. I save my logs in a file under /var/log. This is the only place to
save data. Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow
Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To:
Hi Lian,
Will the latest 0.14.0 version of Hive,which is installed by ambari 1.7.0 by
default, be supported by the next release of Spark?
Regards,
-- Original --
From: Cheng Lian;lian.cs@gmail.com;
Send time: Friday, Feb 6, 2015 9:02 AM
To:
Hi,
Problem still exists. Any experts would take a look at this?
Thanks,
Sun.
fightf...@163.com
From: fightf...@163.com
Date: 2015-02-06 17:54
To: user; dev
Subject: Sort Shuffle performance issues about using AppendOnlyMap for large
data sets
Hi, all
Recently we had caught performance
Traceback (most recent call last):
File pi.py, line 29, in module
sc = SparkContext(appName=PythonPi)
File
/home/ashish/Downloads/spark-1.1.0-bin-hadoop2.4/python/pyspark/context.py,
line 104, in __init__
SparkContext._ensure_initialized(self, gateway=gateway)
File
Just to add why tunneling is not a good practice sometime:
There could be some other ports/apps depeneding on other processes running
on different ports. Lets say a web app running on port 8080 pulling info
from other processes through rest api which will fail here since you only
tunnel for 8080
Hi All,
I wonder if anyone else has some experience building a Gradient Boosted Trees
model using spark/mllib? I have noticed when building decent-size models that
the process slows down over time. We observe that the time to build tree n is
approximately a constant time longer than the time
Is there any way we can disable Spark copying the jar file to the corresponding
directory. I have a fat jar and is already copied to worker nodes using the
command copydir. Why Spark needs to save the jar to ./spark/work/appid each
time a job get started?
Ey-Chih Chow
Date: Sun, 8 Feb
Hi I am very new both in spark and aws stuff..
Say, I want to install pandas on ec2.. (pip install pandas)
How do I create the image and the above library which would be used from
pyspark.
Thanks
On Sun, Feb 8, 2015 at 3:03 AM, gen tang gen.tan...@gmail.com wrote:
Hi,
You can make a image of
You can basically add one function call to install the stuffs you want. If
you look at the spark-ec2 script, there's a function which does all the
setup named: setup_cluster(..)
https://github.com/apache/spark/blob/master/ec2/spark_ec2.py#L625. Now,
if you want to install a python library (
Maybe, try with local: under the heading of Advanced Dependency
Management here:
https://spark.apache.org/docs/1.1.0/submitting-applications.html
It seems this is what you want. Hope this help.
Kelvin
On Sun, Feb 8, 2015 at 9:13 PM, ey-chih chow eyc...@hotmail.com wrote:
Is there any way we
I found the problem is, for each application, the Spark worker node saves the
corresponding std output and std err under ./spark/work/appid, where appid is
the id of the application. If I ran several applications in a row, it will out
of space. In my case, the disk usage under ./spark/work/
By this way, the input and output paths of the job are all in s3. I did not
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow
From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800
27 matches
Mail list logo