Re: SchemaRDD.sample problem

2014-12-23 Thread Hao Ren
update: t1 is good. After collecting on t1, I find that all row is ok (is_new = 0) Just after sampling, there are some rows where is_new = 1 which should have been filtered by Where clause. -- View this message in context:

Re: SchemaRDD.sample problem

2014-12-23 Thread Cheng Lian
Here is a more cleaned up version, can be used in |./sbt/sbt hive/console| to easily reproduce this issue: |sql(SELECT * FROM src WHERE key % 2 = 0). sample(withReplacement =false, fraction =0.05). registerTempTable(sampled) println(table(sampled).queryExecution) val query = sql(SELECT

Re: SchemaRDD.sample problem

2014-12-18 Thread madhu phatak
Hi, Can you clean up the code lil bit better, it's hard to read what's going on. You can use pastebin or gist to put the code. On Wed, Dec 17, 2014 at 3:58 PM, Hao Ren inv...@gmail.com wrote: Hi, I am using SparkSQL on 1.2.1 branch. The problem comes froms the following 4-line code: *val

SchemaRDD.sample problem

2014-12-17 Thread Hao Ren
Hi, I am using SparkSQL on 1.2.1 branch. The problem comes froms the following 4-line code: *val t1: SchemaRDD = hiveContext hql select * from product where is_new = 0 val tb1: SchemaRDD = t1.sample(withReplacement = false, fraction = 0.05) tb1.registerTempTable(t1_tmp) (hiveContext sql select