[jira] [Updated] (DRILL-6669) List all sample data sets in Documentation "Sample Datasets" section

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6669:
---
Description: 
In the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention 
the Yelp data set. But, we don't mention that in the "Sample Datasets" section. 
We should, just to be consistent and to save the reader time when going back 
and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in 
Sample Datasets. Wait.. no Yelp?"

Also, it turns out that we mention other sample datasets in in the 
Documentation. In The [CTAS|http://drill.apache.org/docs/create-table-as-ctas/] 
page we mention the Google ngram data set.

Would be great if we could list all sample datasets, including the above, and 
those on the class path (See DRILL-6667) in the "Sample Datasets" section for 
easy reference.

  was:
In the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention 
the Yelp data set. But, we don't mention that in the "Sample Datasets" section. 
We should, just to be consistent and to save the reader time when going back 
and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in 
Sample Datasets. Wait.. no Yelp?"

Also, it turns out that we mention other sample datasets in in the 
Documentation. In The [CTAS|http://drill.apache.org/docs/create-table-as-ctas/] 
page we mention the Google ngram data set.

Would be great if we could list all sample datasets, including the above, and 
those on the class path (See ) in the "Sample Datasets" section for easy 
reference.


> List all sample data sets in Documentation "Sample Datasets" section
> 
>
> Key: DRILL-6669
> URL: https://issues.apache.org/jira/browse/DRILL-6669
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> In the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention 
> the Yelp data set. But, we don't mention that in the "Sample Datasets" 
> section. We should, just to be consistent and to save the reader time when 
> going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
> me look in Sample Datasets. Wait.. no Yelp?"
> Also, it turns out that we mention other sample datasets in in the 
> Documentation. In The 
> [CTAS|http://drill.apache.org/docs/create-table-as-ctas/] page we mention the 
> Google ngram data set.
> Would be great if we could list all sample datasets, including the above, and 
> those on the class path (See DRILL-6667) in the "Sample Datasets" section for 
> easy reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set, available on the class path in {{tpch}}
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill sample query (see below) references a FoodMart table. To see the list 
> of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file 
> in the Maven dependencies for {{drill-java-exec}}. The table names here are 
> simplified relative to those in the ER diagram in the above link. Perhaps 
> include a simple table with names, and the mapping to the original names, and 
> a link to (or just embed the link) to the FoodMart ER image. The data is 
> available in JSON format.
> TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
> schema is described in the [TPC-H 
> 

[jira] [Created] (DRILL-6669) List all sample data sets in Documentation "Sample Datasets" section

2018-08-04 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6669:
--

 Summary: List all sample data sets in Documentation "Sample 
Datasets" section
 Key: DRILL-6669
 URL: https://issues.apache.org/jira/browse/DRILL-6669
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.14.0
Reporter: Paul Rogers
Assignee: Bridget Bevens


In the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention 
the Yelp data set. But, we don't mention that in the "Sample Datasets" section. 
We should, just to be consistent and to save the reader time when going back 
and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in 
Sample Datasets. Wait.. no Yelp?"

Also, it turns out that we mention other sample datasets in in the 
Documentation. In The [CTAS|http://drill.apache.org/docs/create-table-as-ctas/] 
page we mention the Google ngram data set.

Would be great if we could list all sample datasets, including the above, and 
those on the class path (See ) in the "Sample Datasets" section for easy 
reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set, available on the class path in {{tpch}}
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill sample query (see below) references a FoodMart table. To see the list 
> of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file 
> in the Maven dependencies for {{drill-java-exec}}. The table names here are 
> simplified relative to those in the ER diagram in the above link. 

[jira] [Updated] (DRILL-6668) In Web Console, highlight options that are different from default values

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6668:
---
Description: 
Suppose you inherit a Drill setup created by someone else (or by you, some time 
in the past). Or, suppose you are a support person. You want to know which 
Drill options have been changed from the defaults.

The Web UI conveniently displays all options. But, there is no indication of 
which might have non-default values.

After the improvements of the last year, the information needed to detect 
non-default values is now available. Would be great to mark these values. 
Perhaps using colors, perhaps with words.

For example:

*planner.width.max_per_node*  200 \[Update]

Or

planner.width.max_per_node (system) 200 \[Update]

(The Web UI does not, I believe, show session settings, since the Web UI has no 
sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. 
Otherwise, we could also have a "(session)" suffix above.)

Then, in addition to the {{[Update]}} button, for non default values, also 
provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.

planner.width.max_per_node (session) 200 \[Update] \[Reset]


  was:
Suppose you inherit a Drill setup created by someone else (or by you, some time 
in the past). Or, suppose you are a support person. You want to know which 
Drill options have been changed from the defaults.

The Web UI conveniently displays all options. But, there is no indication of 
which might have non-default values.

After the improvements of the last year, the information needed to detect 
non-default values is now available. Would be great to mark these values. 
Perhaps using colors, perhaps with words.

For example:

*planner.width.max_per_node*  200 \[Update]

Or

planner.width.max_per_node (session) 200 \[Update]
store.json.all_text_mode (system) true \[Update]


Then, in addition to the {{[Update]}} button, for non default values, also 
provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.

planner.width.max_per_node (session) 200 \[Update] \[Reset]



> In Web Console, highlight options that are different from default values
> 
>
> Key: DRILL-6668
> URL: https://issues.apache.org/jira/browse/DRILL-6668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose you inherit a Drill setup created by someone else (or by you, some 
> time in the past). Or, suppose you are a support person. You want to know 
> which Drill options have been changed from the defaults.
> The Web UI conveniently displays all options. But, there is no indication of 
> which might have non-default values.
> After the improvements of the last year, the information needed to detect 
> non-default values is now available. Would be great to mark these values. 
> Perhaps using colors, perhaps with words.
> For example:
> *planner.width.max_per_node*  200 \[Update]
> Or
> planner.width.max_per_node (system) 200 \[Update]
> (The Web UI does not, I believe, show session settings, since the Web UI has 
> no sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. 
> Otherwise, we could also have a "(session)" suffix above.)
> Then, in addition to the {{[Update]}} button, for non default values, also 
> provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.
> planner.width.max_per_node (session) 200 \[Update] \[Reset]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6668) In Web Console, highlight options that are different from default values

2018-08-04 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6668:
--

 Summary: In Web Console, highlight options that are different from 
default values
 Key: DRILL-6668
 URL: https://issues.apache.org/jira/browse/DRILL-6668
 Project: Apache Drill
  Issue Type: Improvement
  Components: Web Server
Affects Versions: 1.14.0
Reporter: Paul Rogers


Suppose you inherit a Drill setup created by someone else (or by you, some time 
in the past). Or, suppose you are a support person. You want to know which 
Drill options have been changed from the defaults.

The Web UI conveniently displays all options. But, there is no indication of 
which might have non-default values.

After the improvements of the last year, the information needed to detect 
non-default values is now available. Would be great to mark these values. 
Perhaps using colors, perhaps with words.

For example:

*planner.width.max_per_node*  200 \[Update]

Or

planner.width.max_per_node (session) 200 \[Update] 
store.json.all_text_mode (system) true \[Update]


Then, in addition to the {{[Update]}} button, for non default values, also 
provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.

planner.width.max_per_node (session) 200 \[Update] \[Reset]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6668) In Web Console, highlight options that are different from default values

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6668:
---
Description: 
Suppose you inherit a Drill setup created by someone else (or by you, some time 
in the past). Or, suppose you are a support person. You want to know which 
Drill options have been changed from the defaults.

The Web UI conveniently displays all options. But, there is no indication of 
which might have non-default values.

After the improvements of the last year, the information needed to detect 
non-default values is now available. Would be great to mark these values. 
Perhaps using colors, perhaps with words.

For example:

*planner.width.max_per_node*  200 \[Update]

Or

planner.width.max_per_node (session) 200 \[Update]
store.json.all_text_mode (system) true \[Update]


Then, in addition to the {{[Update]}} button, for non default values, also 
provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.

planner.width.max_per_node (session) 200 \[Update] \[Reset]


  was:
Suppose you inherit a Drill setup created by someone else (or by you, some time 
in the past). Or, suppose you are a support person. You want to know which 
Drill options have been changed from the defaults.

The Web UI conveniently displays all options. But, there is no indication of 
which might have non-default values.

After the improvements of the last year, the information needed to detect 
non-default values is now available. Would be great to mark these values. 
Perhaps using colors, perhaps with words.

For example:

*planner.width.max_per_node*  200 \[Update]

Or

planner.width.max_per_node (session) 200 \[Update] 
store.json.all_text_mode (system) true \[Update]


Then, in addition to the {{[Update]}} button, for non default values, also 
provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.

planner.width.max_per_node (session) 200 \[Update] \[Reset]



> In Web Console, highlight options that are different from default values
> 
>
> Key: DRILL-6668
> URL: https://issues.apache.org/jira/browse/DRILL-6668
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose you inherit a Drill setup created by someone else (or by you, some 
> time in the past). Or, suppose you are a support person. You want to know 
> which Drill options have been changed from the defaults.
> The Web UI conveniently displays all options. But, there is no indication of 
> which might have non-default values.
> After the improvements of the last year, the information needed to detect 
> non-default values is now available. Would be great to mark these values. 
> Perhaps using colors, perhaps with words.
> For example:
> *planner.width.max_per_node*  200 \[Update]
> Or
> planner.width.max_per_node (session) 200 \[Update]
> store.json.all_text_mode (system) true \[Update]
> Then, in addition to the {{[Update]}} button, for non default values, also 
> provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}.
> planner.width.max_per_node (session) 200 \[Update] \[Reset]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Summary: Include internal data sets in documentation Sample Datasets  (was: 
Include internal data sets in Documentation Sample Datasets)

> Include internal data sets in documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set, available on the class path in {{tpch}}
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill sample query (see below) references a FoodMart table. To see the list 
> of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file 
> in the Maven dependencies for {{drill-java-exec}}. The table names here are 
> simplified relative to those in the ER diagram in the above link. Perhaps 
> include a simple table with names, and the mapping to the original names, and 
> a link to (or just embed the link) to the FoodMart ER image. The data is 
> available in JSON format.
> TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
> schema is described in the [TPC-H 
> specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].
> Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", 
> we mention the Yelp data set. But, we don't mention that in the "Sample 
> Datasets" section. We should, just to be consistent and to save the reader 
> time when going back and saying, "Hey, didn't Drill provide some kind of Yelp 
> data? Let me look in Sample Datasets. Wait.. no Yelp?"
> These are very handy, but hard to find: I find I must keep searching the 
> source code to remember file names and directory paths. End uses won't have 
> this luxury.
> Suggestion: Describe the files available in the class path data source.
> Along these same lines, in "Connect a Data Source", there is no mention of 
> the class path data source. Yet, we reference that data source in the Web 
> Console where we suggest a sample query to run:
> {code}
> Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
> {code}
> The above query refers to the FoodMart data set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set, available on the class path in {{tpch}}

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The 
schema is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in Documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> 

[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}


> Include internal data sets in Documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set.
> The "FoodMart" data set is 

[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

The above query refers to the FoodMart data set.


> Include internal data sets in Documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * 

[jira] [Created] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6667:
--

 Summary: Include internal data sets in Documentation Sample 
Datasets
 Key: DRILL-6667
 URL: https://issues.apache.org/jira/browse/DRILL-6667
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.13.0
Reporter: Paul Rogers
Assignee: Bridget Bevens


The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification](http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp).

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets

2018-08-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6667:
---
Description: 
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp].

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}

  was:
The Drill documentation provides the "Sample Datasets" section, which is very 
handy. However, this section does not discuss the two datasets provided with 
Drill itself.

* Julian Hyde's [FoodMart data 
set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class 
path.
* TPC-H data set.

The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill 
sample query (see below) references a FoodMart table. To see the list of tables 
(at development time), find the {{foodmark-data-json-0.4.jar}} file in the 
Maven dependencies for {{drill-java-exec}}. The table names here are simplified 
relative to those in the ER diagram in the above link. Perhaps include a simple 
table with names, and the mapping to the original names, and a link to (or just 
embed the link) to the FoodMart ER image. The data is available in JSON format.

TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema 
is described in the [TPC-H 
specification](http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp).

Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we 
mention the Yelp data set. But, we don't mention that in the "Sample Datasets" 
section. We should, just to be consistent and to save the reader time when 
going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let 
me look in Sample Datasets. Wait.. no Yelp?"

These are very handy, but hard to find: I find I must keep searching the source 
code to remember file names and directory paths. End uses won't have this 
luxury.

Suggestion: Describe the files available in the class path data source.

Along these same lines, in "Connect a Data Source", there is no mention of the 
class path data source. Yet, we reference that data source in the Web Console 
where we suggest a sample query to run:

{code}
Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20
{code}


> Include internal data sets in Documentation Sample Datasets
> ---
>
> Key: DRILL-6667
> URL: https://issues.apache.org/jira/browse/DRILL-6667
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The Drill documentation provides the "Sample Datasets" section, which is very 
> handy. However, this section does not discuss the two datasets provided with 
> Drill itself.
> * Julian Hyde's [FoodMart data 
> set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the 
> class path.
> * TPC-H data set.
> The "FoodMart" data set is available directly under {{cp}}. In fact, the 
> Drill 

[jira] [Created] (DRILL-6666) Doc link to AOL data set is broken

2018-08-04 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-:
--

 Summary: Doc link to AOL data set is broken
 Key: DRILL-
 URL: https://issues.apache.org/jira/browse/DRILL-
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.13.0
Reporter: Paul Rogers
Assignee: Bridget Bevens


Drill provides links to sample data sets in the documentation. Look in the side 
bar, under Sample Datasets. Click on the "AOL Search" link. You'll get a 404 
error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6653) Unsupported Schema change exception where there is no schema change in lateral Unnest queries

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569271#comment-16569271
 ] 

ASF GitHub Bot commented on DRILL-6653:
---

sohami closed pull request #1422: DRILL-6653: Unsupported Schema change 
exception where there is no sch…
URL: https://github.com/apache/drill/pull/1422
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/RemovingRecordBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/RemovingRecordBatch.java
index 05a1f1267de..acfdc878aa6 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/RemovingRecordBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/RemovingRecordBatch.java
@@ -51,7 +51,11 @@ public int getRecordCount() {
 
   @Override
   protected boolean setupNewSchema() throws SchemaChangeException {
-container.clear();
+// Don't clear off container just because an OK_NEW_SCHEMA was received 
from upstream. For cases when there is just
+// change in container type but no actual schema change, 
RemovingRecordBatch should consume OK_NEW_SCHEMA and
+// send OK to downstream instead. Since the output of RemovingRecordBatch 
is always going to be a regular container
+// change in incoming container type is not actual schema change.
+container.zeroVectors();
 switch(incoming.getSchema().getSelectionVectorMode()){
 case NONE:
   this.copier = getStraightCopier();
@@ -66,6 +70,8 @@ protected boolean setupNewSchema() throws 
SchemaChangeException {
   throw new UnsupportedOperationException();
 }
 
+// If there is an actual schema change then below condition will be true 
and it will send OK_NEW_SCHEMA
+// downstream too
 if (container.isSchemaChanged()) {
   container.buildSchema(SelectionVectorMode.NONE);
   return true;
diff --git 
a/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/TestSVRemoverIterOutcome.java
 
b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/TestSVRemoverIterOutcome.java
new file mode 100644
index 000..6613a71ca24
--- /dev/null
+++ 
b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/TestSVRemoverIterOutcome.java
@@ -0,0 +1,283 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.svremover;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.config.SelectionVectorRemover;
+import org.apache.drill.exec.physical.impl.BaseTestOpBatchEmitOutcome;
+import org.apache.drill.exec.physical.impl.MockRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.rowSet.DirectRowSet;
+import org.apache.drill.test.rowSet.RowSet;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.After;
+import org.junit.Test;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestSVRemoverIterOutcome extends BaseTestOpBatchEmitOutcome {
+  //private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestSVRemoverIterOutcome.class);
+
+  // Holds reference to actual operator instance created for each tests
+  private static RemovingRecordBatch removingRecordBatch;
+
+  // Lits of expected outcomes populated by each tests. Used to verify actual 
IterOutcome returned with next call on
+  // operator to expected outcome
+  private final List expectedOutcomes = new 
ArrayList<>();
+
+  // List of expected row counts populated by each tests. Used to verify 
actual output row count to expected row count
+  private final List 

[jira] [Updated] (DRILL-6653) Unsupported Schema change exception where there is no schema change in lateral Unnest queries

2018-08-04 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6653:
-
Labels: ready-to-commit  (was: )

> Unsupported Schema change exception where there is no schema change in 
> lateral Unnest queries
> -
>
> Key: DRILL-6653
> URL: https://issues.apache.org/jira/browse/DRILL-6653
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Kedar Sankar Behera
>Assignee: Sorabh Hamirwasia
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: Plan2.pdf
>
>
> Unsupported Schema change exception where there is no schema change
> DataSet - A single json file(sf1)
> Query - 
> {code}
> select customer.c_custkey, customer.c_name, sum(orders.totalprice) totalprice 
> from customer, lateral (select t.o.o_totalprice as totalprice from 
> unnest(customer.c_orders) t(o) order by totalprice limit 10) orders group by 
> customer.c_custkey, customer.c_name order by customer.c_custkey limit 50;
> {code}
> Result - 
> {code}
> Exception:
> java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not 
> support schema change
> Prior schema : 
> BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` 
> (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> New schema : 
> BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` 
> (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> Fragment 0:0
> [Error Id: 21d4d646-4e6a-4e4a-ba75-60ba247ddabd on drill191:31010]
>  at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
>  at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:632)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:153)
>  at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
>  at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema change
> Prior schema : 
> BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` 
> (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> New schema : 
> BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` 
> (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> Fragment 0:0
> [Error Id: 21d4d646-4e6a-4e4a-ba75-60ba247ddabd on drill191:31010]
>  at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>  at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>  at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
>  at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
>  at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> 

[jira] [Commented] (DRILL-6629) BitVector split and transfer does not work correctly for transfer length < 8

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569245#comment-16569245
 ] 

ASF GitHub Bot commented on DRILL-6629:
---

ilooner closed pull request #1395: DRILL-6629 BitVector split and transfer does 
not work correctly for transfer length < 8
URL: https://github.com/apache/drill/pull/1395
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/test/java/org/apache/drill/exec/vector/TestSplitAndTransfer.java
 
b/exec/java-exec/src/test/java/org/apache/drill/exec/vector/TestSplitAndTransfer.java
index 057fa131804..96dbd7caa2b 100644
--- 
a/exec/java-exec/src/test/java/org/apache/drill/exec/vector/TestSplitAndTransfer.java
+++ 
b/exec/java-exec/src/test/java/org/apache/drill/exec/vector/TestSplitAndTransfer.java
@@ -94,6 +94,21 @@ public void test() throws Exception {
   @Test
   public void testBitVectorUnalignedStart() throws Exception {
 
+testBitVectorImpl(16, new int[][] {{2, 4}}, TestBitPattern.RANDOM);
+testBitVectorImpl(16, new int[][] {{2, 4}}, TestBitPattern.ONE);
+testBitVectorImpl(16, new int[][] {{2, 4}}, TestBitPattern.ZERO);
+testBitVectorImpl(16, new int[][] {{2, 4}}, TestBitPattern.ALTERNATING);
+
+testBitVectorImpl(4096, new int[][] {{4092, 4}}, TestBitPattern.ONE);
+testBitVectorImpl(4096, new int[][] {{4092, 4}}, TestBitPattern.ZERO);
+testBitVectorImpl(4096, new int[][] {{4092, 4}}, 
TestBitPattern.ALTERNATING);
+testBitVectorImpl(4096, new int[][] {{4092, 4}}, TestBitPattern.RANDOM);
+
+testBitVectorImpl(4096, new int[][] {{1020, 8}}, TestBitPattern.ONE);
+testBitVectorImpl(4096, new int[][] {{1020, 8}}, TestBitPattern.ZERO);
+testBitVectorImpl(4096, new int[][] {{1020, 8}}, 
TestBitPattern.ALTERNATING);
+testBitVectorImpl(4096, new int[][] {{1020, 8}}, TestBitPattern.RANDOM);
+
 testBitVectorImpl(24, new int[][] {{5, 17}}, TestBitPattern.ONE);
 testBitVectorImpl(24, new int[][] {{5, 17}}, TestBitPattern.ZERO);
 testBitVectorImpl(24, new int[][] {{5, 17}}, TestBitPattern.ALTERNATING);
@@ -113,6 +128,17 @@ public void testBitVectorUnalignedStart() throws Exception 
{
   @Test
   public void testBitVectorAlignedStart() throws Exception {
 
+testBitVectorImpl(32, new int[][] {{0, 4}}, TestBitPattern.RANDOM);
+testBitVectorImpl(32, new int[][] {{0, 4}}, TestBitPattern.ONE);
+testBitVectorImpl(32, new int[][] {{0, 4}}, TestBitPattern.ZERO);
+testBitVectorImpl(32, new int[][] {{0, 4}}, TestBitPattern.ALTERNATING);
+
+
+testBitVectorImpl(32, new int[][] {{0, 8}}, TestBitPattern.ONE);
+testBitVectorImpl(32, new int[][] {{0, 8}}, TestBitPattern.ZERO);
+testBitVectorImpl(32, new int[][] {{0, 8}}, TestBitPattern.ALTERNATING);
+testBitVectorImpl(32, new int[][] {{0, 8}}, TestBitPattern.RANDOM);
+
 testBitVectorImpl(24, new int[][] {{0, 17}}, TestBitPattern.ONE);
 testBitVectorImpl(24, new int[][] {{0, 17}}, TestBitPattern.ZERO);
 testBitVectorImpl(24, new int[][] {{0, 17}}, TestBitPattern.ALTERNATING);
diff --git 
a/exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java 
b/exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java
index 0dd34f51ef0..a6b87378d75 100644
--- a/exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java
+++ b/exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java
@@ -323,8 +323,14 @@ public void splitAndTransferTo(int startIndex, int length, 
BitVector target) {
   if (length % 8 != 0) {
 // start is not byte aligned so we have to copy some bits from the 
last full byte read in the
 // previous loop
-byte lastButOneByte = byteIPlus1;
+// if numBytesHoldingSourceBits == 1, lastButOneByte is the first 
byte, but we have not read it yet, so read it
+byte lastButOneByte = (numBytesHoldingSourceBits == 1) ? 
this.data.getByte(firstByteIndex) : byteIPlus1;
 byte bitsFromLastButOneByte = (byte)((lastButOneByte & 0xFF) >>> 
firstBitOffset);
+// if last bit to be copied is before the end of the first byte, then 
mask of the trailing extra bits
+if (8 > (length + firstBitOffset)) {
+  byte mask = (byte)((0x1 << length) - 1);
+  bitsFromLastButOneByte = (byte)((bitsFromLastButOneByte & mask));
+}
 
 // If we have to read more bits than what we have already read, read 
it into lastByte otherwise set lastByte to 0.
 // (length % 8) is num of remaining bits to be read.


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL 

[jira] [Commented] (DRILL-6653) Unsupported Schema change exception where there is no schema change in lateral Unnest queries

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569098#comment-16569098
 ] 

ASF GitHub Bot commented on DRILL-6653:
---

sohami opened a new pull request #1422: DRILL-6653: Unsupported Schema change 
exception where there is no sch…
URL: https://github.com/apache/drill/pull/1422
 
 
   …ema change


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unsupported Schema change exception where there is no schema change in 
> lateral Unnest queries
> -
>
> Key: DRILL-6653
> URL: https://issues.apache.org/jira/browse/DRILL-6653
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Kedar Sankar Behera
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: Plan2.pdf
>
>
> Unsupported Schema change exception where there is no schema change
> DataSet - A single json file(sf1)
> Query - 
> {code}
> select customer.c_custkey, customer.c_name, sum(orders.totalprice) totalprice 
> from customer, lateral (select t.o.o_totalprice as totalprice from 
> unnest(customer.c_orders) t(o) order by totalprice limit 10) orders group by 
> customer.c_custkey, customer.c_name order by customer.c_custkey limit 50;
> {code}
> Result - 
> {code}
> Exception:
> java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not 
> support schema change
> Prior schema : 
> BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` 
> (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> New schema : 
> BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` 
> (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> Fragment 0:0
> [Error Id: 21d4d646-4e6a-4e4a-ba75-60ba247ddabd on drill191:31010]
>  at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
>  at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:632)
>  at 
> oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
>  at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:153)
>  at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
>  at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema change
> Prior schema : 
> BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` 
> (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> New schema : 
> BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` 
> (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> Fragment 0:0
> [Error Id: 21d4d646-4e6a-4e4a-ba75-60ba247ddabd on drill191:31010]
>  at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>  at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>  at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>  at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
>  at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
>  at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>  at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>  at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>  at 
> 

[jira] [Updated] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows

2018-08-04 Thread Timothy Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6664:
--
Fix Version/s: 1.15.0

> Parquet reader should not allow batches with more than 64k rows
> ---
>
> Key: DRILL-6664
> URL: https://issues.apache.org/jira/browse/DRILL-6664
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.15.0
>
>
> The Drill configuration allows the Parquet reader to handle batches larger 
> than 64. We should limit this setting to 64k as several operators assume a 
> maximum batch size of 64k.
> NOTE - This Jira is precautionary as the default is 32k rows maximum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6664) Parquet reader should not allow batches with more than 64k rows

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569087#comment-16569087
 ] 

ASF GitHub Bot commented on DRILL-6664:
---

ilooner closed pull request #1420: DRILL-6664: Limit the maximum parquet reader 
batch rows to 64k
URL: https://github.com/apache/drill/pull/1420
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
b/exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
index 282ad30bd54..f5556cf9b63 100644
--- a/exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
+++ b/exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
@@ -34,6 +34,7 @@
 import org.apache.drill.exec.server.options.TypeValidators.AdminUsersValidator;
 import 
org.apache.drill.exec.server.options.TypeValidators.AdminUserGroupsValidator;
 import org.apache.drill.exec.testing.ExecutionControls;
+import org.apache.drill.exec.vector.ValueVector;
 
 public final class ExecConstants {
   private ExecConstants() {
@@ -322,7 +323,7 @@ private ExecConstants() {
 
   // Controls the flat parquet reader batching constraints (number of record 
and memory limit)
   public static final String PARQUET_FLAT_BATCH_NUM_RECORDS = 
"store.parquet.flat.batch.num_records";
-  public static final OptionValidator PARQUET_FLAT_BATCH_NUM_RECORDS_VALIDATOR 
= new RangeLongValidator(PARQUET_FLAT_BATCH_NUM_RECORDS, 1, Integer.MAX_VALUE);
+  public static final OptionValidator PARQUET_FLAT_BATCH_NUM_RECORDS_VALIDATOR 
= new RangeLongValidator(PARQUET_FLAT_BATCH_NUM_RECORDS, 1, 
ValueVector.MAX_ROW_COUNT);
   public static final String PARQUET_FLAT_BATCH_MEMORY_SIZE = 
"store.parquet.flat.batch.memory_size";
   // This configuration is used to overwrite the common memory batch sizing 
configuration property
   public static final OptionValidator PARQUET_FLAT_BATCH_MEMORY_SIZE_VALIDATOR 
= new RangeLongValidator(PARQUET_FLAT_BATCH_MEMORY_SIZE, 0, Integer.MAX_VALUE);


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parquet reader should not allow batches with more than 64k rows
> ---
>
> Key: DRILL-6664
> URL: https://issues.apache.org/jira/browse/DRILL-6664
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>
> The Drill configuration allows the Parquet reader to handle batches larger 
> than 64. We should limit this setting to 64k as several operators assume a 
> maximum batch size of 64k.
> NOTE - This Jira is precautionary as the default is 32k rows maximum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)