Joe McDonnell created IMPALA-6052: ------------------------------------- Summary: Improve test data directory structure Key: IMPALA-6052 URL: https://issues.apache.org/jira/browse/IMPALA-6052 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 2.10.0 Reporter: Joe McDonnell
Dataload generates the hdfs location using this code: hdfs_location = '{0}.{1}{2}'.format(db_name, table_name, db_suffix) if data_set in ['hive-benchmark', 'functional']: hdfs_location = hdfs_location.split('.')[-1] Where db_suffix is used to describe the compression. Here are some examples: functional.alltypes is stored in /test-warehouse/alltypes/ functional.alltypesagg is stored in /test-warehouse/alltypesagg/ functional_seq.alltypes is stored in /test-warehouse/alltypes_seq/ functional_seq.alltypesagg is stored in /test-warehouse/alltypesagg_seq/ Tables from the same database are not grouped into a directory. Instead, almost everything in functional is a top level directory. In a normal dataload, hdfs dfs -ls /test-warehouse results in 998 directories. This makes it hard to browse our HDFS directory structure. It also makes it hard to import/export a single database and its tables. The tables for a database should be in a single directory for that database. The hdfs location should be of the form "${db_name}${db_suffix}.db/${table_name}". functional.alltypes should be in '/test-warehouse/functional.db/alltypes'. The top level directory should end up with about 50 items with the default dataload. This will require changes in generate_schema_statement.py (in generate_statmments() when generating the hdfs_location). It will also require changes to the schema templates such as testdata/datasets/functional/functional_schema_template.sql. It is also likely to require corresponding test changes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)