When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

2016-06-08 Thread Himanshu Mehra
So what happens underneath when we query on a hive table using hiveContext? 1. Does Spark talks to metastore to get the data location on hdfs and read the data from there to perform those queries? 2. Spark passes those queries to hive and hive executes those queries on the table and returns the

Re: Spark Standalone Cluster - Slave not connecting to Master

2015-07-07 Thread Himanshu Mehra
Hi MorEru, same problem occurred to. i had to change the version of maven dependency from spark_core_2.11 to spark_core_2.10 and it worked. Thanks Himanshu -- View this message in context:

Re: How to use caching in Spark Actions or Output operations?

2015-07-06 Thread Himanshu Mehra
Hi Sudarshan, As far as i understand your problem you should take a look at broadcast variables in spark. here you have the docs https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables . Thanks Himanshu -- View this message in context:

Re: kmeans broadcast

2015-06-29 Thread Himanshu Mehra
Hi Haviv, have you tried sc.broadcast(model), the broadcast method is a member of sparkContext class. Thanks Himanshu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/kmeans-broadcast-tp23511p23526.html Sent from the Apache Spark User List mailing list

Re: Best way to randomly distribute elements

2015-06-18 Thread Himanshu Mehra
Hi A bellet You can try RDD.randomSplit(weights array) where a weights array is the array of weight you wants to want to put in the consecutive partition example RDD.randomSplit(Array(0.7, 0.3)) will create two partitions containing 70% data in one and 30% in other, randomly selecting the

Re: How does one decide no of executors/cores/memory allocation?

2015-06-16 Thread Himanshu Mehra
Hi Shreesh, You can definitely decide the how many partitions your data should break into by passing a, 'minPartition' argument in the method sc.textFile(input/path, minPartition) and 'numSlices' arg in method sc.parallelize(localCollection, numSlices). In fact there is always a option to specify

Re: Limit Spark Shuffle Disk Usage

2015-06-16 Thread Himanshu Mehra
'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2 this should make a significant difference in disk use of shuffle. Thank you - Himanshu Mehra -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23334.html Sent from the Apache Spark

Re: How can I use Tachyon with SPARK?

2015-06-15 Thread Himanshu Mehra
Hi June, As i understand your problem, you are running spark 1.3 and want to use Tachyon with it. what you need to do is simply build the latest Spark and Tachyon and set some configuration is Spark. In fact spark 1.3 has spark/core/pom.xm, you have to find the core folder in your spark home and

Re: cannot access port 4040

2015-06-10 Thread Himanshu Mehra
Hi Maria, Have you tried the 8080 as well ? Thanks Himanshu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cannot-access-port-4040-tp23248p23249.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Determining number of executors within RDD

2015-06-10 Thread Himanshu Mehra
Hi Akshat, I assume what you want is to make sure the number of partitions in your RDD, which is easily achievable by passing numSlices and minSplits argument at the time of RDD creation. example : val someRDD = sc.parallelize(someCollection, numSlices) / val someRDD = sc.textFile(pathToFile,

Re: Monitoring Spark Jobs

2015-06-10 Thread Himanshu Mehra
Hi Sam, You might want to have a look at spark UI which runs by default at localhost://8080. You can also configure Apache Ganglia to monitor over your cluster resources. Thank you Regards Himanshu Mehra -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Re: Spark Streaming: JavaDStream compute method NPE

2015-04-28 Thread Himanshu Mehra
Hi Puneith, Please provide the code if you may. It will be helpful. Thank you, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-JavaDStream-compute-method-NPE-tp22676p22684.html Sent from the Apache Spark User List mailing list archive at