Help with datetime comparison in SparkSQL statement ...

2015-05-05 Thread subscripti...@prismalytics.io
-- PRISMALYTICS Sincerely yours, Team PRISMALYTICS PRISMALYTICS, LLC. http://www.prismalytics.com/ | www.prismalytics.com http://www.prismalytics.com/ P: 212.882.1276 tel:212.882.1276 | subscripti...@prismalytics.io

Calculating the averages for each KEY in a Pairwise (K,V) RDD ...

2015-04-28 Thread subscripti...@prismalytics.io
Hello Friends: I generated a Pair RDD with K/V pairs, like so: rdd1.take(10) # Show a small sample. [(u'2013-10-09', 7.60117302052786), (u'2013-10-10', 9.322709163346612), (u'2013-10-10', 28.264462809917358), (u'2013-10-07', 9.664429530201343), (u'2013-10-07', 12.461538461538463),

Re: Calculating the averages for each KEY in a Pairwise (K,V) RDD ...

2015-04-28 Thread subscripti...@prismalytics.io
and count for each key. Use mapValues to keep your partitioning by keys intact and minimize a full shuffle for downstream keyed operations. It just calculates the avg for each key. From: Todd Nist Date: Tuesday, April 28, 2015 at 10:20 AM To: subscripti...@prismalytics.io mailto:subscripti

ImportError: No module named iter ... (on CDH5 v1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch) ...

2015-03-03 Thread subscripti...@prismalytics.io
Hi Friends: We noticed the following in 'pyspark' happens when running in distributed Standalone Mode (MASTER=spark://vps00:7077), but not in Local Mode (MASTER=local[n]). See the following, particularly what is highlighted in *Red* (again the problem only happens in Standalone Mode). Any

Re: ImportError: No module named iter ... (on CDH5 v1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch) ...

2015-03-03 Thread subscripti...@prismalytics.io
...@prismalytics.io subscripti...@prismalytics.io wrote: Hi Friends: We noticed the following in 'pyspark' happens when running in distributed Standalone Mode (MASTER=spark://vps00:7077), but not in Local Mode (MASTER=local[n]). See the following, particularly what is highlighted in Red (again