Tablesample doubling
Hello All, Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows? I have the following script: DROP TABLE IF EXISTS sparse_features_small; CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' as SELECT * FROM sparse_features TABLESAMPLE(5 ROWS) After I execute this by sourcing the file, I can then execute : -- https://github.com/bearrito @deepbearrito
Re: Tablesample doubling
SELECT COUNT(*) FROM sparse_features_small; And I receive back : Total MapReduce CPU Time Spent: 3 seconds 330 msec OK 10 Rather than the expected 5 I am running hive 11.2 On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: Hello All, Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows? I have the following script: DROP TABLE IF EXISTS sparse_features_small; CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' as SELECT * FROM sparse_features TABLESAMPLE(5 ROWS) After I execute this by sourcing the file, I can then execute : -- https://github.com/bearrito @deepbearrito -- https://github.com/bearrito @deepbearrito
Re: Tablesample doubling
Nevermind I see in the docs, it is rows PER SPLIT. -b On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: SELECT COUNT(*) FROM sparse_features_small; And I receive back : Total MapReduce CPU Time Spent: 3 seconds 330 msec OK 10 Rather than the expected 5 I am running hive 11.2 On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: Hello All, Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows? I have the following script: DROP TABLE IF EXISTS sparse_features_small; CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' as SELECT * FROM sparse_features TABLESAMPLE(5 ROWS) After I execute this by sourcing the file, I can then execute : -- https://github.com/bearrito @deepbearrito -- https://github.com/bearrito @deepbearrito -- https://github.com/bearrito @deepbearrito
Re: Tablesample doubling
+1 for documentation. sometimes it surprises you. :) On Mon, Jul 29, 2013 at 7:11 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: Nevermind I see in the docs, it is rows PER SPLIT. -b On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: SELECT COUNT(*) FROM sparse_features_small; And I receive back : Total MapReduce CPU Time Spent: 3 seconds 330 msec OK 10 Rather than the expected 5 I am running hive 11.2 On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: Hello All, Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows? I have the following script: DROP TABLE IF EXISTS sparse_features_small; CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' as SELECT * FROM sparse_features TABLESAMPLE(5 ROWS) After I execute this by sourcing the file, I can then execute : -- https://github.com/bearrito @deepbearrito -- https://github.com/bearrito @deepbearrito -- https://github.com/bearrito @deepbearrito