Re: Tablesample doubling

2013-07-29 Thread j.barrett Strausser
SELECT COUNT(*) FROM sparse_features_small;

And I receive back :

Total MapReduce CPU Time Spent: 3 seconds 330 msec
OK
10

Rather than the expected 5

I am running hive 11.2




On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser 
j.barrett.straus...@gmail.com wrote:

 Hello All,

 Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?


 I have the following script:

 DROP TABLE IF EXISTS sparse_features_small;

 CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED
 BY ',' LINES TERMINATED BY '\n' as

 SELECT
 *
 FROM
 sparse_features
 TABLESAMPLE(5 ROWS)


 After I execute this by sourcing the file, I can then execute :







 --


 https://github.com/bearrito
 @deepbearrito




-- 


https://github.com/bearrito
@deepbearrito


Re: Tablesample doubling

2013-07-29 Thread j.barrett Strausser
Nevermind I see in the docs, it is rows PER SPLIT.

-b


On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser 
j.barrett.straus...@gmail.com wrote:

 SELECT COUNT(*) FROM sparse_features_small;

 And I receive back :

 Total MapReduce CPU Time Spent: 3 seconds 330 msec
 OK
 10

 Rather than the expected 5

 I am running hive 11.2




 On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser 
 j.barrett.straus...@gmail.com wrote:

 Hello All,

 Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?


 I have the following script:

 DROP TABLE IF EXISTS sparse_features_small;

 CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED
 BY ',' LINES TERMINATED BY '\n' as

 SELECT
 *
 FROM
 sparse_features
 TABLESAMPLE(5 ROWS)


 After I execute this by sourcing the file, I can then execute :







 --


 https://github.com/bearrito
 @deepbearrito




 --


 https://github.com/bearrito
 @deepbearrito




-- 


https://github.com/bearrito
@deepbearrito


Re: Tablesample doubling

2013-07-29 Thread Stephen Sprague
+1 for documentation.  sometimes it surprises you. :)


On Mon, Jul 29, 2013 at 7:11 PM, j.barrett Strausser 
j.barrett.straus...@gmail.com wrote:

 Nevermind I see in the docs, it is rows PER SPLIT.

 -b


 On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser 
 j.barrett.straus...@gmail.com wrote:

 SELECT COUNT(*) FROM sparse_features_small;

 And I receive back :

 Total MapReduce CPU Time Spent: 3 seconds 330 msec
 OK
 10

 Rather than the expected 5

 I am running hive 11.2




 On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser 
 j.barrett.straus...@gmail.com wrote:

 Hello All,

 Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?


 I have the following script:

 DROP TABLE IF EXISTS sparse_features_small;

 CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS
 TERMINATED BY ',' LINES TERMINATED BY '\n' as

 SELECT
 *
 FROM
 sparse_features
 TABLESAMPLE(5 ROWS)


 After I execute this by sourcing the file, I can then execute :







 --


 https://github.com/bearrito
 @deepbearrito




 --


 https://github.com/bearrito
 @deepbearrito




 --


 https://github.com/bearrito
 @deepbearrito