SizeEstimator in Spark 1.1 and high load/object allocation when reading in data

2014-10-30 Thread Erik Freed
Hi All, We have recently moved to Spark 1.1 from 0.9 for an application handling a fair number of very large datasets partitioned across multiple nodes. About half of each of these large datasets is stored in off heap byte arrays and about half in the standard Java heap. While these datasets are

Hadoop 2.X Spark Client Jar 0.9.0 problem

2014-04-04 Thread Erik Freed
Hi All, I am not sure if this is a 0.9.0 problem to be fixed in 0.9.1 so perhaps already being addressed, but I am having a devil of a time with a spark 0.9.0 client jar for hadoop 2.X. If I go to the site and download: - Download binaries for Hadoop 2 (HDP2, CDH5): find an Apache mirror

Re: Hadoop 2.X Spark Client Jar 0.9.0 problem

2014-04-04 Thread Erik Freed
-Dhadoop.version=2.3.0 -Dyarn.version=2.3.0 -DskipTests clean package And from http://spark.apache.org/docs/latest/running-on-yarn.html, for sbt build, you could try: SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true sbt/sbt assembly Thanks, Rahul Singhal From: Erik Freed erikjfr