How to make symbol for one column in Spark SQL.

2014-12-04 Thread Tim Chou
I have tried to use function where and filter in SchemaRDD. I have build class for tuple/record in the table like this: case class Region(num:Int, str1:String, str2:String) I also successfully create a SchemaRDD. scala val results = sqlContext.sql(select * from region) results:

Re: How to make symbol for one column in Spark SQL.

2014-12-04 Thread Tim Chou
... Thank you! I'm so stupid... This is the only thing I miss in the tutorial...orz Thanks, Tim 2014-12-04 16:49 GMT-06:00 Michael Armbrust mich...@databricks.com: You need to import sqlContext._ On Thu, Dec 4, 2014 at 2:26 PM, Tim Chou timchou@gmail.com wrote: I have tried to use

How to create a new SchemaRDD which is not based on original SparkPlan?

2014-12-03 Thread Tim Chou
Hi All, My question is about lazy running mode for SchemaRDD, I guess. I know lazy mode is good, however, I still have this demand. For example, here is the first SchemaRDD, named result.(select * from table where num1 and num 4): results: org.apache.spark.sql.SchemaRDD = SchemaRDD[59] at RDD

How does Spark SQL traverse the physical tree?

2014-11-24 Thread Tim Chou
Hi All, I'm learning the code of Spark SQL. I'm confused about how SchemaRDD executes each operator. I'm tracing the code. I found toRDD() function in QueryExecution is the start for running a query. toRDD function will run SparkPlan, which is a tree structure. However, I didn't find any

Spark- How can I run MapReduce only on one partition in an RDD?

2014-11-13 Thread Tim Chou
Hi All, I use textFile to create a RDD. However, I don't want to handle the whole data in this RDD. For example, maybe I only want to solve the data in 3rd partition of the RDD. How can I do it? Here are some possible solutions that I'm thinking: 1. Create multiple RDDs when reading the file 2.

How to add elements into map?

2014-11-07 Thread Tim Chou
Here is the code I run in spark-shell: val table = sc.textFile(args(1)) val histMap = collection.mutable.Map[Int,Int]() for (x - table) { val tuple = x.split('|') histMap.put(tuple(0).toInt, 1) } Why is histMap still null? Is there something wrong with my code?

Fwd: How to add elements into map?

2014-11-07 Thread Tim Chou
Here is the code I run in spark-shell: val table = sc.textFile(args(1)) val histMap = collection.mutable.Map[Int,Int]() for (x - table) { val tuple = x.split('|') histMap.put(tuple(0).toInt, 1) } Why is histMap still null? Is there something wrong with my code? Thanks,