[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set, available on the class path in {{tpch}}
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill sample query (see below) references a FoodMart table. To see the list 
> of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file 
> in the Maven dependencies for {{drill-java-exec}}. The table names here are 
> simplified relative to those in the ER diagram in the above link. Perhaps 
> include a simple table with names, and the mapping to the original names, and 
> a link to (or just embed the link) to the FoodMart ER image. The data is 
> available in JSON format.
> TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
> schema is described in the [TPC-H 
> specification|http://www.tpc.org/tpc_documents_current

[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set, available on the class path in {{tpch}}
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill sample query (see below) references a FoodMart table. To see the list 
> of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file 
> in the Maven dependencies for {{drill-java-exec}}. The table names here are 
> simplified relative to those in the ER diagram in the above link. Perha

[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Summary: Include internal data sets in documentation Sample Datasets  (was: 
Include internal data sets in Documentation Sample Datasets)

> Include internal data sets in documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set, available on the class path in {{tpch}}
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill sample query (see below) references a FoodMart table. To see the list 
> of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file 
> in the Maven dependencies for {{drill-java-exec}}. The table names here are 
> simplified relative to those in the ER diagram in the above link. Perhaps 
> include a simple table with names, and the mapping to the original names, and 
> a link to (or just embed the link) to the FoodMart ER image. The data is 
> available in JSON format.
> TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
> schema is described in the [TPC-H 
> specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].
> Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", 
> we mention the Yelp data set. But, we don't mention that in the "Sample 
> Datasets" section. We should, just to be consistent and to save the reader 
> time when going back and saying, "Hey, didn't Drill provide some kind of Yelp 
> data? Let me look in Sample Datasets. Wait.. no Yelp?"
> These are very handy, but hard to find: I find I must keep searching the 
> source code to remember file names and directory paths. End uses won't have 
> this luxury.
> Suggestion: Describe the files available in the class path data source.
> Along these same lines, in "Connect a Data Source", there is no mention of 
> the class path data source. Yet, we reference that data source in the Web 
> Console where we suggest a sample query to run:
> {code}
> Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
> {code}
> The above query refers to the FoodMart data set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in Documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb]

[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}


> Include internal data sets in Documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set.
> The "FoodMart" data set is available

[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in Documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H

[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification](http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp).

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}


> Include internal data sets in Documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set.
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill samp