Re: Tablesample doubling
+1 for documentation. sometimes it surprises you. :) On Mon, Jul 29, 2013 at 7:11 PM, j.barrett Strausser < j.barrett.straus...@gmail.com> wrote: > Nevermind I see in the docs, it is rows PER SPLIT. > > -b > > > On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser < > j.barrett.straus...@gmail.com> wrote: > >> SELECT COUNT(*) FROM sparse_features_small; >> >> And I receive back : >> >> Total MapReduce CPU Time Spent: 3 seconds 330 msec >> OK >> 10 >> >> Rather than the expected 5 >> >> I am running hive 11.2 >> >> >> >> >> On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser < >> j.barrett.straus...@gmail.com> wrote: >> >>> Hello All, >>> >>> Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows? >>> >>> >>> I have the following script: >>> >>> DROP TABLE IF EXISTS sparse_features_small; >>> >>> CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS >>> TERMINATED BY ',' LINES TERMINATED BY '\n' as >>> >>> SELECT >>> * >>> FROM >>> sparse_features >>> TABLESAMPLE(5 ROWS) >>> >>> >>> After I execute this by sourcing the file, I can then execute : >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> https://github.com/bearrito >>> @deepbearrito >>> >> >> >> >> -- >> >> >> https://github.com/bearrito >> @deepbearrito >> > > > > -- > > > https://github.com/bearrito > @deepbearrito >
Re: Tablesample doubling
Nevermind I see in the docs, it is rows PER SPLIT. -b On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser < j.barrett.straus...@gmail.com> wrote: > SELECT COUNT(*) FROM sparse_features_small; > > And I receive back : > > Total MapReduce CPU Time Spent: 3 seconds 330 msec > OK > 10 > > Rather than the expected 5 > > I am running hive 11.2 > > > > > On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser < > j.barrett.straus...@gmail.com> wrote: > >> Hello All, >> >> Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows? >> >> >> I have the following script: >> >> DROP TABLE IF EXISTS sparse_features_small; >> >> CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED >> BY ',' LINES TERMINATED BY '\n' as >> >> SELECT >> * >> FROM >> sparse_features >> TABLESAMPLE(5 ROWS) >> >> >> After I execute this by sourcing the file, I can then execute : >> >> >> >> >> >> >> >> -- >> >> >> https://github.com/bearrito >> @deepbearrito >> > > > > -- > > > https://github.com/bearrito > @deepbearrito > -- https://github.com/bearrito @deepbearrito
Re: Tablesample doubling
SELECT COUNT(*) FROM sparse_features_small; And I receive back : Total MapReduce CPU Time Spent: 3 seconds 330 msec OK 10 Rather than the expected 5 I am running hive 11.2 On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser < j.barrett.straus...@gmail.com> wrote: > Hello All, > > Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows? > > > I have the following script: > > DROP TABLE IF EXISTS sparse_features_small; > > CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED > BY ',' LINES TERMINATED BY '\n' as > > SELECT > * > FROM > sparse_features > TABLESAMPLE(5 ROWS) > > > After I execute this by sourcing the file, I can then execute : > > > > > > > > -- > > > https://github.com/bearrito > @deepbearrito > -- https://github.com/bearrito @deepbearrito