Hi,
using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - I
can seen multiple python processes spawned on each nodemanager but from some
reason when running cartesian there is only single python process running on
each node. the task is indicating thousands of partitions
?
thanks,Antony.
On Monday, 22 December 2014, 20:09, Ted Yu yuzhih...@gmail.com wrote:
Which HBase version are you using ?
Can you show the full stack trace ?
Cheers
On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
can anyone please give me
Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to
1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just
hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
and pastebin it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to
1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just
hangs - even when testing
)
- at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)
On Wed, Dec 24, 2014 at 3:11 PM, Antony Mayi antonym...@yahoo.com wrote:
this is it (jstack of particular yarn container) - http://pastebin.com/eAdiUYKK
thanks, Antony.
On Wednesday, 24 December 2014, 16:34, Ted Yu yuzhih...@gmail.com
24, 2014 at 4:11 PM, Antony Mayi antonym...@yahoo.com wrote:
I just run it by hand from pyspark shell. here is the steps:
pyspark --jars
/usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
conf = {hbase.zookeeper.quorum: localhost,
... hbase.mapred.outputtable: test
column=f1:asd, timestamp=1419463092904, value=456
testkey column=f1:testqual,
timestamp=1419487275905, value=testval 2 row(s) in
0.0270 seconds
On Thursday, 25 December 2014, 6:58, Antony Mayi antonym
Hi,
I am running spark 1.1.0 on yarn. I have custom set of modules installed under
same location on each executor node and wondering how can I pass the executors
the PYTHONPATH so that they can use the modules.
I've tried this:
spark-env.sh:export PYTHONPATH=/tmp/test/
ok, I see now what's happening - the pkg.mod.test is serialized by reference
and there is nothing actually trying to import pkg.mod on the executors so the
reference is broken.
so how can I get the pkg.mod imported on the executors?
thanks,Antony.
On Friday, 2 January 2015, 13:49, Antony
Hi,
is it possible to mix hosts with (significantly) different specs within a
cluster (without wasting the extra resources)? for example having 10 nodes with
36GB RAM/10CPUs now trying to add 3 hosts with 128GB/10CPUs - is there a way to
utilize the extra memory by spark executors (as my
is uniform across the cluster or for the machine where the
config file resides.)
On Mon Jan 26 2015 at 8:07:51 AM Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
is it possible to mix hosts with (significantly) different specs within a
cluster (without wasting the extra resources)? for example
Hi,
is there a way to use custom python module that is available to all executors
under PYTHONPATH (without a need to upload it using sc.addPyFile()) - bit weird
that this module is on all nodes yet the spark tasks can't use it (references
to its objects are serialized and sent to all executors
Hi,
I believe this is some kind of timeout problem but can't figure out how to
increase it.
I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task
which first loads big RDD from hbase - I can see in the screen output all
executors fire up then no more logging output for
.
is there anything more I can do?
thanks,Antony.
On Monday, 12 January 2015, 8:21, Antony Mayi antonym...@yahoo.com wrote:
this seems to have sorted it, awesome, thanks for great help.Antony.
On Sunday, 11 January 2015, 13:02, Sean Owen so...@cloudera.com wrote:
I would expect
have a look at
http://spark.apache.org/docs/latest/running-on-yarn.html
... --executor-memory 22g --conf
spark.yarn.executor.memoryOverhead=2g ... should do it, off the top
of my head. That should reserve 24g from YARN.
On Sat, Jan 17, 2015 at 5:29 AM, Antony Mayi antonym...@yahoo.com wrote
Hi,
when getting the model out of ALS.train it would be beneficial to store it (to
disk) so the model can be reused later for any following predictions. I am
using pyspark and I had no luck pickling it either using standard pickle module
or even dill.
does anyone have a solution for this (note
Hi,
I have 4 very powerful boxes (256GB RAM, 32 cores each). I am running spark
1.2.0 in yarn-client mode with following layout:
spark.executor.cores=4
spark.executor.memory=28G
spark.yarn.executor.memoryOverhead=4096
I am submitting bigger ALS trainImplicit task (rank=100, iters=15) on a
= 128GB
of RAM, not 1TB.
It still feels like this shouldn't be running out of memory, not by a
long shot though. But just pointing out potential differences between
what you are expecting and what you are configuring.
On Thu, Feb 19, 2015 at 9:56 AM, Antony Mayi
antonym...@yahoo.com.invalid wrote:
Hi
rather
than split up your memory, but at some point it becomes
counter-productive. 32GB is a fine executor size.
So you have ~8GB available per task which seems like plenty. Something
else is at work here. Is this error form your code's stages or ALS?
On Thu, Feb 19, 2015 at 10:07 AM, Antony Mayi
container_1424204221358_0013_01_08 transitioned from RUNNING to
EXITED_WITH_FAILURE2015-02-19 12:08:14,455 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1424204221358_0013_01_08
Antony.
On Thursday, 19 February 2015, 11:54, Antony Mayi
help you get rid of files
in the workers that are not needed.
TD
On Mon, Feb 16, 2015 at 12:27 AM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
spark.cleaner.ttl is not the right way - seems to be really designed for
streaming. although it keeps the disk usage under control it also causes loss
. the matrix model had two RDD vectors representing the decomposed matrix.
You can save these to disk and re use them.
On Thu, Feb 19, 2015 at 2:19 AM Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
when getting the model out of ALS.train it would be beneficial to store it (to
disk) so the model
Hi,
I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors
crashed with this error.
does that mean I have genuinely not enough RAM or is this matter of config
tuning?
other config options used:spark.storage.memoryFraction=0.3
SPARK_EXECUTOR_MEMORY=14G
running spark 1.2.0 as
Cc: Sandy Ryza sandy.r...@cloudera.com, Antony Mayi antonym...@yahoo.com,
user@spark.apache.org user@spark.apache.org
Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Since it's an executor running OOM it doesn't look like a container being
killed by YARN to me
Hi,
When running big mapreduce operation with pyspark (in the particular case using
lot of sets and operations on sets in the map tasks so likely to be allocating
and freeing loads of pages) I eventually get kernel error 'python: page
allocation failure: order:10, mode:0x2000d0' plus very
the memory requirements seem to be rapidly growing hen using higher rank... I
am unable to get over 20 without running out of memory. is this
expected?thanks, Antony.
container.
thanks for any ideas,Antony.
On Saturday, 10 January 2015, 10:11, Antony Mayi antonym...@yahoo.com
wrote:
the memory requirements seem to be rapidly growing hen using higher rank... I
am unable to get over 20 without running out of memory. is this
expected?thanks
input and parameters?
thanks,Antony.
On Saturday, 10 January 2015, 10:47, Antony Mayi antonym...@yahoo.com
wrote:
the actual case looks like this:* spark 1.1.0 on yarn (cdh 5.2.1)* ~8-10
executors, 36GB phys RAM per host* input RDD is roughly 3GB containing
~150-200M items
) You can try
increasing it to a couple GB.
On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi
antonym...@yahoo.com.invalid wrote:
the question really is whether this is expected that the memory requirements
grow rapidly with the rank... as I would expect memory is rather O(1)
problem with dependency
Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as
yarn-client) - pretty much the standard case demonstrated in the
hbase_inputformat.py from examples... the thing is the when trying the very
same code on spark 1.2 I am getting the error bellow which based on
Hi,
running spark 1.1.0 in yarn-client mode (cdh 5.2.1) on XEN based cloud and
randomly getting my executors failing on errors like bellow. I suspect it is
some cloud networking issue (XEN driver bug?) but wondering if there is any
spark/yarn workaround that I could use to mitigate?
Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about
3 billions of ratings and I am doing several trainImplicit() runs in loop
within one spark session. I have four node cluster with 3TB disk space on each.
before starting the job there is less then 8% of the disk
spark.cleaner.ttl ?
On Sunday, 15 February 2015, 18:23, Antony Mayi antonym...@yahoo.com
wrote:
Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about
3 billions of ratings and I am doing several trainImplicit() runs in loop
within one spark session
, Antony Mayi antonym...@yahoo.com
wrote:
spark.cleaner.ttl ?
On Sunday, 15 February 2015, 18:23, Antony Mayi antonym...@yahoo.com
wrote:
Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about
3 billions of ratings and I am doing several
experience might be able to give more useful directions about how
to fix that.
On Wed, Jan 7, 2015 at 1:46 PM, Antony Mayi antonym...@yahoo.com wrote:
this is official cloudera compiled stack cdh 5.3.0 - nothing has been done by
me and I presume they are pretty good in building it so I still
/ Hadoop provided by the cluster. It could
also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x cluster.
On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as
yarn-client
19, 2015 at 5:10 AM Antony Mayi antonym...@yahoo.com.invalid
wrote:
now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC
overhead limit exceeded:
=== spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task
7.0 in stage 18.0 (TID 5329, 192.168.1.93
Hi,
This has already been briefly discussed here in the past but there seems to be
more questions...
I am running bigger ALS task with input data ~40GB (~3 billions of ratings).
The data is partitioned into 512 partitions and I am also using default
parallelism set to 512. The ALS runs with
Hi,
is it expected I can't reference a column inside of IF statement like this:
sctx.sql(SELECT name, IF(ts0, price, 0) FROM table).collect()
I get an error:
org.apache.spark.sql.AnalysisException: unresolved operator 'Project [name#0,if
((CAST(ts#1, DoubleType) CAST(0, DoubleType))) price#2
Hi,
I have two streams coming from two different kafka topics. the two topics
contain time related events but are quite asymmetric in volume. I would
obviously need to process them in sync to get the time related events together
but with same processing rate if the heavier stream starts
Hi,
using spark 1.5.2 on yarn (client mode) and was trying to use the dynamic
resource allocation but it seems once it is enabled by first app then any
following application is managed that way even if explicitly disabling.
example:1) yarn configured with
just for reference in my case this problem is caused by this bug:
https://issues.apache.org/jira/browse/SPARK-12617
On Monday, 21 December 2015, 14:32, Antony Mayi <antonym...@yahoo.com>
wrote:
I noticed it might be related to longer GC pauses (1-2 sec) - the crash
usually
Hi,
can anyone please help me troubleshooting this prob: I have a streaming pyspark
application (spark 1.5.2 on yarn-client) which keeps crashing after few hours.
Doesn't seem to be running out of mem neither on driver or executors.
driver error:
py4j.protocol.Py4JJavaError: An error occurred
I noticed it might be related to longer GC pauses (1-2 sec) - the crash usually
occurs after such pause. could that be causing the python-java gateway timing
out?
On Sunday, 20 December 2015, 23:05, Antony Mayi <antonym...@yahoo.com>
wrote:
Hi,
can anyone please h
I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm
part, not python) OOM (no matter how big heap is assigned, eventually runs out).
When checking the heap it is all taken by "byte" items of
io.netty.buffer.PoolThreadCache. The number of
-of-io-netty-buffer-poolthreadcachememoryregioncacheentry-instances
On Tue, Dec 22, 2015 at 2:59 AM, Antony Mayi <antonym...@yahoo.com.invalid>
wrote:
I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm
part, not python) OOM (no matter how big heap is assigned
fyi after further troubleshooting logging this as
https://issues.apache.org/jira/browse/SPARK-12511
On Tuesday, 22 December 2015, 18:16, Antony Mayi <antonym...@yahoo.com>
wrote:
I narrowed it down to problem described for example here:
https://bugs.openjdk.java.net/brow
47 matches
Mail list logo