update:
t1 is good. After collecting on t1, I find that all row is ok (is_new = 0)
Just after sampling, there are some rows where is_new = 1 which should have
been filtered by Where clause.
--
View this message in context:
Here is a more cleaned up version, can be used in |./sbt/sbt
hive/console| to easily reproduce this issue:
|sql(SELECT * FROM src WHERE key % 2 = 0).
sample(withReplacement =false, fraction =0.05).
registerTempTable(sampled)
println(table(sampled).queryExecution)
val query = sql(SELECT
Hi,
Can you clean up the code lil bit better, it's hard to read what's going
on. You can use pastebin or gist to put the code.
On Wed, Dec 17, 2014 at 3:58 PM, Hao Ren inv...@gmail.com wrote:
Hi,
I am using SparkSQL on 1.2.1 branch. The problem comes froms the following
4-line code:
*val
Hi,
I am using SparkSQL on 1.2.1 branch. The problem comes froms the following
4-line code:
*val t1: SchemaRDD = hiveContext hql select * from product where is_new =
0
val tb1: SchemaRDD = t1.sample(withReplacement = false, fraction = 0.05)
tb1.registerTempTable(t1_tmp)
(hiveContext sql select