--
PRISMALYTICS Sincerely yours,
Team PRISMALYTICS
PRISMALYTICS, LLC. http://www.prismalytics.com/ | www.prismalytics.com
http://www.prismalytics.com/
P: 212.882.1276 tel:212.882.1276 | subscripti...@prismalytics.io
Hello Friends:
I generated a Pair RDD with K/V pairs, like so:
rdd1.take(10) # Show a small sample.
[(u'2013-10-09', 7.60117302052786),
(u'2013-10-10', 9.322709163346612),
(u'2013-10-10', 28.264462809917358),
(u'2013-10-07', 9.664429530201343),
(u'2013-10-07', 12.461538461538463),
and count for each key.
Use mapValues to keep your partitioning by keys intact and minimize a
full shuffle for downstream keyed operations. It just calculates the
avg for each key.
From: Todd Nist
Date: Tuesday, April 28, 2015 at 10:20 AM
To: subscripti...@prismalytics.io mailto:subscripti
Hi Friends:
We noticed the following in 'pyspark' happens when running in
distributed Standalone Mode (MASTER=spark://vps00:7077),
but not in Local Mode (MASTER=local[n]).
See the following, particularly what is highlighted in *Red* (again the
problem only happens in Standalone Mode).
Any
...@prismalytics.io
subscripti...@prismalytics.io wrote:
Hi Friends:
We noticed the following in 'pyspark' happens when running in distributed
Standalone Mode (MASTER=spark://vps00:7077),
but not in Local Mode (MASTER=local[n]).
See the following, particularly what is highlighted in Red (again