I have tried to use function where and filter in SchemaRDD.
I have build class for tuple/record in the table like this:
case class Region(num:Int, str1:String, str2:String)
I also successfully create a SchemaRDD.
scala val results = sqlContext.sql(select * from region)
results:
...
Thank you! I'm so stupid... This is the only thing I miss in the
tutorial...orz
Thanks,
Tim
2014-12-04 16:49 GMT-06:00 Michael Armbrust mich...@databricks.com:
You need to import sqlContext._
On Thu, Dec 4, 2014 at 2:26 PM, Tim Chou timchou@gmail.com wrote:
I have tried to use
Hi All,
My question is about lazy running mode for SchemaRDD, I guess. I know lazy
mode is good, however, I still have this demand.
For example, here is the first SchemaRDD, named result.(select * from table
where num1 and num 4):
results: org.apache.spark.sql.SchemaRDD =
SchemaRDD[59] at RDD
Hi All,
I'm learning the code of Spark SQL.
I'm confused about how SchemaRDD executes each operator.
I'm tracing the code. I found toRDD() function in QueryExecution is the
start for running a query. toRDD function will run SparkPlan, which is a
tree structure.
However, I didn't find any
Hi All,
I use textFile to create a RDD. However, I don't want to handle the whole
data in this RDD. For example, maybe I only want to solve the data in 3rd
partition of the RDD.
How can I do it? Here are some possible solutions that I'm thinking:
1. Create multiple RDDs when reading the file
2.
Here is the code I run in spark-shell:
val table = sc.textFile(args(1))
val histMap = collection.mutable.Map[Int,Int]()
for (x - table) {
val tuple = x.split('|')
histMap.put(tuple(0).toInt, 1)
}
Why is histMap still null?
Is there something wrong with my code?
Here is the code I run in spark-shell:
val table = sc.textFile(args(1))
val histMap = collection.mutable.Map[Int,Int]()
for (x - table) {
val tuple = x.split('|')
histMap.put(tuple(0).toInt, 1)
}
Why is histMap still null?
Is there something wrong with my code?
Thanks,