So what happens underneath when we query on a hive table using hiveContext?
1. Does Spark talks to metastore to get the data location on hdfs and read
the data from there to perform those queries?
2. Spark passes those queries to hive and hive executes those queries on the
table and returns the
Hi MorEru,
same problem occurred to. i had to change the version of maven dependency
from spark_core_2.11 to spark_core_2.10 and it worked.
Thanks
Himanshu
--
View this message in context:
Hi Sudarshan,
As far as i understand your problem you should take a look at broadcast
variables in spark. here you have the docs
https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
.
Thanks
Himanshu
--
View this message in context:
Hi Haviv,
have you tried sc.broadcast(model), the broadcast method is a member of
sparkContext class.
Thanks
Himanshu
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/kmeans-broadcast-tp23511p23526.html
Sent from the Apache Spark User List mailing list
Hi A bellet
You can try RDD.randomSplit(weights array) where a weights array is the
array of weight you wants to want to put in the consecutive partition
example RDD.randomSplit(Array(0.7, 0.3)) will create two partitions
containing 70% data in one and 30% in other, randomly selecting the
Hi Shreesh,
You can definitely decide the how many partitions your data should break
into by passing a, 'minPartition' argument in the method
sc.textFile(input/path, minPartition) and 'numSlices' arg in method
sc.parallelize(localCollection, numSlices). In fact there is always a option
to specify
'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2
this should make a significant difference in disk use of shuffle.
Thank you
-
Himanshu Mehra
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23334.html
Sent from the Apache Spark
Hi June,
As i understand your problem, you are running spark 1.3 and want to use
Tachyon with it. what you need to do is simply build the latest Spark and
Tachyon and set some configuration is Spark. In fact spark 1.3 has
spark/core/pom.xm, you have to find the core folder in your spark home
and
Hi Maria,
Have you tried the 8080 as well ?
Thanks
Himanshu
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/cannot-access-port-4040-tp23248p23249.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi Akshat,
I assume what you want is to make sure the number of partitions in your RDD,
which is easily achievable by passing numSlices and minSplits argument at
the time of RDD creation. example :
val someRDD = sc.parallelize(someCollection, numSlices) /
val someRDD = sc.textFile(pathToFile,
Hi Sam,
You might want to have a look at spark UI which runs by default at
localhost://8080. You can also configure Apache Ganglia to monitor over your
cluster resources.
Thank you
Regards
Himanshu Mehra
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
Hi Puneith,
Please provide the code if you may. It will be helpful.
Thank you,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-JavaDStream-compute-method-NPE-tp22676p22684.html
Sent from the Apache Spark User List mailing list archive at
12 matches
Mail list logo