Zoltan Haindrich created HIVE-18051:
---------------------------------------

             Summary: qfiles: dataset support
                 Key: HIVE-18051
                 URL: https://issues.apache.org/jira/browse/HIVE-18051
             Project: Hive
          Issue Type: Improvement
          Components: Testing Infrastructure
            Reporter: Zoltan Haindrich


it would be great to have some kind of test dataset support; currently there is 
the {{q_test_init.sql}} which is quite large; and I'm often override it with an 
invalid string; because I write independent qtests most of the time - and the 
load of {{src}} and other tables are just a waste of time for me ; not to 
mention that the loading of those tables may also trigger breakpoints - which 
is a bit annoying.

Most of the tests are "only" using the {{src}} table and possibly 2 others; 
however the main init script contains a bunch of tables - meanwhile there are 
quite few other tests which could possibly also benefit from a more general 
feature; for example the creation of {{bucket_small}} is present in 20 q files.

the proposal would be to enable the qfiles to be annotated with metadata like 
datasets:
{code}
--! qt:dataset:src,bucket_small
{code}

proposal for storing a dataset:

* the loader script would be at: {{data/datasets/__NAME__/load.hive.sql}}
* the table data could be stored under that location


a draft about this; and other qfiles related ideas:
https://docs.google.com/document/d/1KtcIx8ggL9LxDintFuJo8NQuvNWkmtvv_ekbWrTLNGc/edit?usp=sharing




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to