[ 
https://issues.apache.org/jira/browse/TRAFODION-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518651#comment-16518651
 ] 

Anoop Sharma commented on TRAFODION-3101:
-----------------------------------------

Count for Traf tables is returned by issuing a count(*) query. This is needed 
as count is not kept in any underlying structure or repository. Count(*) can be 
a long running query depending on the number of rows. In some cases count can 
be pushed down to HBase coproc and in other cases it could be run using ESP 
parallelism. But in all cases the time taken to execute it is non-trivial. 

If count is done as part of 'get tables ' command or with a new option, it may 
cause get tables to take a long time or seem to pause during count execution 
especially for large tables. It may also spawn esps or load coprocessor. 

Also, 'get tables' is a metadata only command and does not look at data. With 
this enhancement, it will need to look at data.

Is this something that users have asked for? What is the need for it? Are they 
looking for exact count or fuzzy/sampled count?

Can this be done by users themselves by running count on all tables? A script 
could be generated by doing something similar to "select 'select count(*) from 
' || a from (get tables in schema) t(a)".  It could be obeyed to get the count.

Or does traf need to add that infrastructure of rowcount computation to the 
engine?

Would this also be done for Hive tables?

If users are interested in data size estimate, they can also use region stats 
command to get that for all tables in a schema.

 

> enhance the 'get table' utility to return number of rows
> --------------------------------------------------------
>
>                 Key: TRAFODION-3101
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3101
>             Project: Apache Trafodion
>          Issue Type: Improvement
>            Reporter: liu ming
>            Assignee: liu ming
>            Priority: Major
>
> when run 'get tables' command, it is desired to show the number of rows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to