Hi All, Can custom storage handlers get information for queries like count, max, min etc. from hive directly so that for each of such queries RecordReader need not fetch all the records?
Regards, Amey On Tue, Mar 22, 2016 at 1:46 PM, Amey Barve <ameybarv...@gmail.com> wrote: > Thanks Nitin, Mich, > > if its just plain vanilla text file format, it needs to run a job to get > the count so the longest of all > --> Hive must be translating some operator like fetch (for count) into a > map-reduce job and getting the result? > Can a custom storage handler get information about the operator/s for > count(*) and then use it to retrieve the results. > > I want to know whether custom storage handler can get information about > operators that hive constructs for queries like count, max, min etc. so > that storage handler can map these to internal storage functions? > > Regards, > Amey > > On Tue, Mar 22, 2016 at 1:32 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> ORC file has the following stats levels for storage indexes >> >> >> 1. ORC File itself >> 2. Multiple stripes (chunks) within the ORC file >> 3. Multiple row groups (row batches) within each stripe >> >> Assuming that the underlying table has stats updated, count will be >> stored for each column >> >> So when we do something like below: >> >> select count(1) from orctest >> >> you can see stats collected if you do >> >> show create table orctest; >> >> TBLPROPERTIES ( | >> | 'COLUMN_STATS_ACCURATE'='true', | >> | 'numFiles'='31', | >> | *'numRows'='250000'*, | >> >> >> File statistics, Stripe statistics and row group statistics are kept. So >> ORC table will rely on those if needed >> >> >> HTH >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 22 March 2016 at 07:14, Amey Barve <ameybarv...@gmail.com> wrote: >> >>> select count(*) from table; >>> >>> How does hive evaluate count(*) on a table? >>> >>> Does it return count by actually querying table, or directly return >>> count by consulting some statistics locally. >>> >>> For Hive's Text format it takes few seconds while Hive's Orc format >>> takes fraction of seconds. >>> >>> Regards, >>> Amey >>> >> >> >