Could you try it on AWS using EMR? That'd give you an exact replica of the
environment that causes the error.
Sent from my iPhone
On Jan 7, 2015, at 10:53 AM, Davies Liu dav...@databricks.com wrote:
Hey Sven,
I tried with all of your configurations, 2 node with 2 executors each,
but in
Hey Sven,
I tried with all of your configurations, 2 node with 2 executors each,
but in standalone mode,
it worked fine.
Could you try to narrow down the possible change of configurations?
Davies
On Tue, Jan 6, 2015 at 8:03 PM, Sven Krasser kras...@gmail.com wrote:
Hey Davies,
Here are some
I had ran your scripts in 5 nodes ( 2 CPUs, 8G mem) cluster, can not
reproduce your failure. Should I test it with big memory node?
On Mon, Jan 5, 2015 at 4:00 PM, Sven Krasser kras...@gmail.com wrote:
Thanks for the input! I've managed to come up with a repro of the error with
test data only
I still can not reproduce it with 2 nodes (4 CPUs).
Your repro.py could be faster (10 min) than before (22 min):
inpdata.map(lambda (pc, x): (x, pc=='p' and 2 or
1)).reduceByKey(lambda x, y: x|y).filter(lambda (x, pc):
pc==3).collect()
(also, no cache needed anymore)
Davies
On Tue, Jan 6,
Hey Davies,
Here are some more details on a configuration that causes this error for
me. Launch an AWS Spark EMR cluster as follows:
*aws emr create-cluster --region us-west-1 --no-auto-terminate \
--ec2-attributes KeyName=your-key-here,SubnetId=your-subnet-here \
--bootstrap-actions
The issue has been sensitive to the number of executors and input data
size. I'm using 2 executors with 4 cores each, 25GB of memory, 3800MB of
memory overhead for YARN. This will fit onto Amazon r3 instance types.
-Sven
On Tue, Jan 6, 2015 at 12:46 AM, Davies Liu dav...@databricks.com wrote:
I
Thanks for the input! I've managed to come up with a repro of the error
with test data only (and without any of the custom code in the original
script), please see here:
https://gist.github.com/skrasser/4bd7b41550988c8f6071#file-gistfile1-md
The Gist contains a data generator and the script
It doesn’t seem like there’s a whole lot of clues to go on here without seeing
the job code. The original org.apache.spark.SparkException: PairwiseRDD:
unexpected value: List([B@130dc7ad)” error suggests that maybe there’s an issue
with PySpark’s serialization / tracking of types, but it’s
Hey all,
Since upgrading to 1.2.0 a pyspark job that worked fine in 1.1.1 fails
during shuffle. I've tried reverting from the sort-based shuffle back to
the hash one, and that fails as well. Does anyone see similar problems or
has an idea on where to look next?
For the sort-based shuffle I get a
Hi Sven,
Do you have a small example program that you can share which will allow me
to reproduce this issue? If you have a workload that runs into this, you
should be able to keep iteratively simplifying the job and reducing the
data set size until you hit a fairly minimal reproduction (assuming
Hey Josh,
I am still trying to prune this to a minimal example, but it has been
tricky since scale seems to be a factor. The job runs over ~720GB of data
(the cluster's total RAM is around ~900GB, split across 32 executors). I've
managed to run it over a vastly smaller data set without issues.
11 matches
Mail list logo