[ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
-------------------------------
    Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in documentation Sample Datasets
> -----------------------------------------------------------
>
>                 Key: DRILL-6667
>                 URL: https://issues.apache.org/jira/browse/DRILL-6667
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Bridget Bevens
>            Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set, available on the class path in {{tpch}}
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill sample query (see below) references a FoodMart table. To see the list 
> of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file 
> in the Maven dependencies for {{drill-java-exec}}. The table names here are 
> simplified relative to those in the ER diagram in the above link. Perhaps 
> include a simple table with names, and the mapping to the original names, and 
> a link to (or just embed the link) to the FoodMart ER image. The data is 
> available in JSON format.
> TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
> schema is described in the [TPC-H 
> specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].
> These are very handy, but hard to find: I find I must keep searching the 
> source code to remember file names and directory paths. End uses won't have 
> this luxury.
> Suggestion: Describe the files available in the class path data source.
> Along these same lines, in "Connect a Data Source", there is no mention of 
> the class path data source. Yet, we reference that data source in the Web 
> Console where we suggest a sample query to run:
> {code}
> Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
> {code}
> The above query refers to the FoodMart data set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to