[jira] [Updated] (DRILL-6669) List all sample data sets in Documentation "Sample Datasets" section
[ https://issues.apache.org/jira/browse/DRILL-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6669: --- Description: In the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" Also, it turns out that we mention other sample datasets in in the Documentation. In The [CTAS|http://drill.apache.org/docs/create-table-as-ctas/] page we mention the Google ngram data set. Would be great if we could list all sample datasets, including the above, and those on the class path (See DRILL-6667) in the "Sample Datasets" section for easy reference. was: In the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" Also, it turns out that we mention other sample datasets in in the Documentation. In The [CTAS|http://drill.apache.org/docs/create-table-as-ctas/] page we mention the Google ngram data set. Would be great if we could list all sample datasets, including the above, and those on the class path (See ) in the "Sample Datasets" section for easy reference. > List all sample data sets in Documentation "Sample Datasets" section > > > Key: DRILL-6669 > URL: https://issues.apache.org/jira/browse/DRILL-6669 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > In the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention > the Yelp data set. But, we don't mention that in the "Sample Datasets" > section. We should, just to be consistent and to save the reader time when > going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let > me look in Sample Datasets. Wait.. no Yelp?" > Also, it turns out that we mention other sample datasets in in the > Documentation. In The > [CTAS|http://drill.apache.org/docs/create-table-as-ctas/] page we mention the > Google ngram data set. > Would be great if we could list all sample datasets, including the above, and > those on the class path (See DRILL-6667) in the "Sample Datasets" section for > easy reference. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets
[ https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6667: --- Description: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set, available on the class path in {{tpch}} The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The schema is described in the [TPC-H specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. was: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set, available on the class path in {{tpch}} The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The schema is described in the [TPC-H specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. > Include internal data sets in documentation Sample Datasets > --- > > Key: DRILL-6667 > URL: https://issues.apache.org/jira/browse/DRILL-6667 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > The Drill documentation provides the "Sample Datasets" section, which is very > handy. However, this section does not discuss the two datasets provided with > Drill itself. > * Julian Hyde's [FoodMart data > set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the > class path. > * TPC-H data set, available on the class path in {{tpch}} > The "FoodMart" data set is available directly under {{cp}}. In fact, the > Drill sample query (see below) references a FoodMart table. To see the list > of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file > in the Maven dependencies for {{drill-java-exec}}. The table names here are > simplified relative to those in the ER diagram in the above link. Perhaps > include a simple table with names, and the mapping to the original names, and > a link to (or just embed the link) to the FoodMart ER image. The data is > available in JSON format. > TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The > schema is described in the [TPC-H > specification|http://www.tpc.org/tpc_documents_current
[jira] [Created] (DRILL-6669) List all sample data sets in Documentation "Sample Datasets" section
Paul Rogers created DRILL-6669: -- Summary: List all sample data sets in Documentation "Sample Datasets" section Key: DRILL-6669 URL: https://issues.apache.org/jira/browse/DRILL-6669 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.14.0 Reporter: Paul Rogers Assignee: Bridget Bevens In the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" Also, it turns out that we mention other sample datasets in in the Documentation. In The [CTAS|http://drill.apache.org/docs/create-table-as-ctas/] page we mention the Google ngram data set. Would be great if we could list all sample datasets, including the above, and those on the class path (See ) in the "Sample Datasets" section for easy reference. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets
[ https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6667: --- Description: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set, available on the class path in {{tpch}} The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The schema is described in the [TPC-H specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. was: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set, available on the class path in {{tpch}} The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The schema is described in the [TPC-H specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. > Include internal data sets in documentation Sample Datasets > --- > > Key: DRILL-6667 > URL: https://issues.apache.org/jira/browse/DRILL-6667 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > The Drill documentation provides the "Sample Datasets" section, which is very > handy. However, this section does not discuss the two datasets provided with > Drill itself. > * Julian Hyde's [FoodMart data > set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the > class path. > * TPC-H data set, available on the class path in {{tpch}} > The "FoodMart" data set is available directly under {{cp}}. In fact, the > Drill sample query (see below) references a FoodMart table. To see the list > of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file > in the Maven dependencies for {{drill-java-exec}}. The table names here are > simplified relative to those in the ER diagram in the above link. Perha
[jira] [Updated] (DRILL-6668) In Web Console, highlight options that are different from default values
[ https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6668: --- Description: Suppose you inherit a Drill setup created by someone else (or by you, some time in the past). Or, suppose you are a support person. You want to know which Drill options have been changed from the defaults. The Web UI conveniently displays all options. But, there is no indication of which might have non-default values. After the improvements of the last year, the information needed to detect non-default values is now available. Would be great to mark these values. Perhaps using colors, perhaps with words. For example: *planner.width.max_per_node* 200 \[Update] Or planner.width.max_per_node (system) 200 \[Update] (The Web UI does not, I believe, show session settings, since the Web UI has no sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. Otherwise, we could also have a "(session)" suffix above.) Then, in addition to the {{[Update]}} button, for non default values, also provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}. planner.width.max_per_node (session) 200 \[Update] \[Reset] was: Suppose you inherit a Drill setup created by someone else (or by you, some time in the past). Or, suppose you are a support person. You want to know which Drill options have been changed from the defaults. The Web UI conveniently displays all options. But, there is no indication of which might have non-default values. After the improvements of the last year, the information needed to detect non-default values is now available. Would be great to mark these values. Perhaps using colors, perhaps with words. For example: *planner.width.max_per_node* 200 \[Update] Or planner.width.max_per_node (session) 200 \[Update] store.json.all_text_mode (system) true \[Update] Then, in addition to the {{[Update]}} button, for non default values, also provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}. planner.width.max_per_node (session) 200 \[Update] \[Reset] > In Web Console, highlight options that are different from default values > > > Key: DRILL-6668 > URL: https://issues.apache.org/jira/browse/DRILL-6668 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Priority: Minor > > Suppose you inherit a Drill setup created by someone else (or by you, some > time in the past). Or, suppose you are a support person. You want to know > which Drill options have been changed from the defaults. > The Web UI conveniently displays all options. But, there is no indication of > which might have non-default values. > After the improvements of the last year, the information needed to detect > non-default values is now available. Would be great to mark these values. > Perhaps using colors, perhaps with words. > For example: > *planner.width.max_per_node* 200 \[Update] > Or > planner.width.max_per_node (system) 200 \[Update] > (The Web UI does not, I believe, show session settings, since the Web UI has > no sessions. I believe the custom values are all set by {{ALTER SYSTEM}}. > Otherwise, we could also have a "(session)" suffix above.) > Then, in addition to the {{[Update]}} button, for non default values, also > provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}. > planner.width.max_per_node (session) 200 \[Update] \[Reset] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6668) In Web Console, highlight options that are different from default values
Paul Rogers created DRILL-6668: -- Summary: In Web Console, highlight options that are different from default values Key: DRILL-6668 URL: https://issues.apache.org/jira/browse/DRILL-6668 Project: Apache Drill Issue Type: Improvement Components: Web Server Affects Versions: 1.14.0 Reporter: Paul Rogers Suppose you inherit a Drill setup created by someone else (or by you, some time in the past). Or, suppose you are a support person. You want to know which Drill options have been changed from the defaults. The Web UI conveniently displays all options. But, there is no indication of which might have non-default values. After the improvements of the last year, the information needed to detect non-default values is now available. Would be great to mark these values. Perhaps using colors, perhaps with words. For example: *planner.width.max_per_node* 200 \[Update] Or planner.width.max_per_node (session) 200 \[Update] store.json.all_text_mode (system) true \[Update] Then, in addition to the {{[Update]}} button, for non default values, also provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}. planner.width.max_per_node (session) 200 \[Update] \[Reset] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6668) In Web Console, highlight options that are different from default values
[ https://issues.apache.org/jira/browse/DRILL-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6668: --- Description: Suppose you inherit a Drill setup created by someone else (or by you, some time in the past). Or, suppose you are a support person. You want to know which Drill options have been changed from the defaults. The Web UI conveniently displays all options. But, there is no indication of which might have non-default values. After the improvements of the last year, the information needed to detect non-default values is now available. Would be great to mark these values. Perhaps using colors, perhaps with words. For example: *planner.width.max_per_node* 200 \[Update] Or planner.width.max_per_node (session) 200 \[Update] store.json.all_text_mode (system) true \[Update] Then, in addition to the {{[Update]}} button, for non default values, also provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}. planner.width.max_per_node (session) 200 \[Update] \[Reset] was: Suppose you inherit a Drill setup created by someone else (or by you, some time in the past). Or, suppose you are a support person. You want to know which Drill options have been changed from the defaults. The Web UI conveniently displays all options. But, there is no indication of which might have non-default values. After the improvements of the last year, the information needed to detect non-default values is now available. Would be great to mark these values. Perhaps using colors, perhaps with words. For example: *planner.width.max_per_node* 200 \[Update] Or planner.width.max_per_node (session) 200 \[Update] store.json.all_text_mode (system) true \[Update] Then, in addition to the {{[Update]}} button, for non default values, also provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}. planner.width.max_per_node (session) 200 \[Update] \[Reset] > In Web Console, highlight options that are different from default values > > > Key: DRILL-6668 > URL: https://issues.apache.org/jira/browse/DRILL-6668 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.14.0 >Reporter: Paul Rogers >Priority: Minor > > Suppose you inherit a Drill setup created by someone else (or by you, some > time in the past). Or, suppose you are a support person. You want to know > which Drill options have been changed from the defaults. > The Web UI conveniently displays all options. But, there is no indication of > which might have non-default values. > After the improvements of the last year, the information needed to detect > non-default values is now available. Would be great to mark these values. > Perhaps using colors, perhaps with words. > For example: > *planner.width.max_per_node* 200 \[Update] > Or > planner.width.max_per_node (session) 200 \[Update] > store.json.all_text_mode (system) true \[Update] > Then, in addition to the {{[Update]}} button, for non default values, also > provide a {{[Reset]}} button that does the same as {{ALTER SESSION RESET}}. > planner.width.max_per_node (session) 200 \[Update] \[Reset] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6667) Include internal data sets in documentation Sample Datasets
[ https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6667: --- Summary: Include internal data sets in documentation Sample Datasets (was: Include internal data sets in Documentation Sample Datasets) > Include internal data sets in documentation Sample Datasets > --- > > Key: DRILL-6667 > URL: https://issues.apache.org/jira/browse/DRILL-6667 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > The Drill documentation provides the "Sample Datasets" section, which is very > handy. However, this section does not discuss the two datasets provided with > Drill itself. > * Julian Hyde's [FoodMart data > set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the > class path. > * TPC-H data set, available on the class path in {{tpch}} > The "FoodMart" data set is available directly under {{cp}}. In fact, the > Drill sample query (see below) references a FoodMart table. To see the list > of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file > in the Maven dependencies for {{drill-java-exec}}. The table names here are > simplified relative to those in the ER diagram in the above link. Perhaps > include a simple table with names, and the mapping to the original names, and > a link to (or just embed the link) to the FoodMart ER image. The data is > available in JSON format. > TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The > schema is described in the [TPC-H > specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. > Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", > we mention the Yelp data set. But, we don't mention that in the "Sample > Datasets" section. We should, just to be consistent and to save the reader > time when going back and saying, "Hey, didn't Drill provide some kind of Yelp > data? Let me look in Sample Datasets. Wait.. no Yelp?" > These are very handy, but hard to find: I find I must keep searching the > source code to remember file names and directory paths. End uses won't have > this luxury. > Suggestion: Describe the files available in the class path data source. > Along these same lines, in "Connect a Data Source", there is no mention of > the class path data source. Yet, we reference that data source in the Web > Console where we suggest a sample query to run: > {code} > Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 > {code} > The above query refers to the FoodMart data set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets
[ https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6667: --- Description: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set, available on the class path in {{tpch}} The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in {{`cp`.`tpch/*.parquet`}}, in Parquet format. The schema is described in the [TPC-H specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. was: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set. The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the [TPC-H specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. > Include internal data sets in Documentation Sample Datasets > --- > > Key: DRILL-6667 > URL: https://issues.apache.org/jira/browse/DRILL-6667 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > The Drill documentation provides the "Sample Datasets" section, which is very > handy. However, this section does not discuss the two datasets provided with > Drill itself. > * Julian Hyde's [FoodMart data > set|https://github.com/julianhyde/foodmart-data-hsqldb]
[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets
[ https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6667: --- Description: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set. The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the [TPC-H specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. was: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set. The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the [TPC-H specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} > Include internal data sets in Documentation Sample Datasets > --- > > Key: DRILL-6667 > URL: https://issues.apache.org/jira/browse/DRILL-6667 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > The Drill documentation provides the "Sample Datasets" section, which is very > handy. However, this section does not discuss the two datasets provided with > Drill itself. > * Julian Hyde's [FoodMart data > set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the > class path. > * TPC-H data set. > The "FoodMart" data set is available
[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets
[ https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6667: --- Description: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set. The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the [TPC-H specification|http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. was: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set. The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the [TPC-H specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} The above query refers to the FoodMart data set. > Include internal data sets in Documentation Sample Datasets > --- > > Key: DRILL-6667 > URL: https://issues.apache.org/jira/browse/DRILL-6667 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > The Drill documentation provides the "Sample Datasets" section, which is very > handy. However, this section does not discuss the two datasets provided with > Drill itself. > * Julian Hyde's [FoodMart data > set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the > class path. > * TPC-H
[jira] [Created] (DRILL-6667) Include internal data sets in Documentation Sample Datasets
Paul Rogers created DRILL-6667: -- Summary: Include internal data sets in Documentation Sample Datasets Key: DRILL-6667 URL: https://issues.apache.org/jira/browse/DRILL-6667 Project: Apache Drill Issue Type: Improvement Components: Documentation Affects Versions: 1.13.0 Reporter: Paul Rogers Assignee: Bridget Bevens The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set. The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the [TPC-H specification](http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp). Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6667) Include internal data sets in Documentation Sample Datasets
[ https://issues.apache.org/jira/browse/DRILL-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-6667: --- Description: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set. The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the [TPC-H specification|(http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]. Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} was: The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself. * Julian Hyde's [FoodMart data set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the class path. * TPC-H data set. The "FoodMart" data set is available directly under {{cp}}. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the {{foodmark-data-json-0.4.jar}} file in the Maven dependencies for {{drill-java-exec}}. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format. TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the [TPC-H specification](http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp). Further, in the "Tutorials" section, "Analyzing the Yelp Academic Dataset", we mention the Yelp data set. But, we don't mention that in the "Sample Datasets" section. We should, just to be consistent and to save the reader time when going back and saying, "Hey, didn't Drill provide some kind of Yelp data? Let me look in Sample Datasets. Wait.. no Yelp?" These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury. Suggestion: Describe the files available in the class path data source. Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run: {code} Sample SQL query: SELECT * FROM cp.`employee.json` LIMIT 20 {code} > Include internal data sets in Documentation Sample Datasets > --- > > Key: DRILL-6667 > URL: https://issues.apache.org/jira/browse/DRILL-6667 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > The Drill documentation provides the "Sample Datasets" section, which is very > handy. However, this section does not discuss the two datasets provided with > Drill itself. > * Julian Hyde's [FoodMart data > set|https://github.com/julianhyde/foodmart-data-hsqldb], available on the > class path. > * TPC-H data set. > The "FoodMart" data set is available directly under {{cp}}. In fact, the > Drill samp
[jira] [Created] (DRILL-6666) Doc link to AOL data set is broken
Paul Rogers created DRILL-: -- Summary: Doc link to AOL data set is broken Key: DRILL- URL: https://issues.apache.org/jira/browse/DRILL- Project: Apache Drill Issue Type: Bug Components: Documentation Affects Versions: 1.13.0 Reporter: Paul Rogers Assignee: Bridget Bevens Drill provides links to sample data sets in the documentation. Look in the side bar, under Sample Datasets. Click on the "AOL Search" link. You'll get a 404 error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6653) Unsupported Schema change exception where there is no schema change in lateral Unnest queries
[ https://issues.apache.org/jira/browse/DRILL-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569271#comment-16569271 ] ASF GitHub Bot commented on DRILL-6653: --- sohami closed pull request #1422: DRILL-6653: Unsupported Schema change exception where there is no sch… URL: https://github.com/apache/drill/pull/1422 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/RemovingRecordBatch.java b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/RemovingRecordBatch.java index 05a1f1267de..acfdc878aa6 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/RemovingRecordBatch.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/RemovingRecordBatch.java @@ -51,7 +51,11 @@ public int getRecordCount() { @Override protected boolean setupNewSchema() throws SchemaChangeException { -container.clear(); +// Don't clear off container just because an OK_NEW_SCHEMA was received from upstream. For cases when there is just +// change in container type but no actual schema change, RemovingRecordBatch should consume OK_NEW_SCHEMA and +// send OK to downstream instead. Since the output of RemovingRecordBatch is always going to be a regular container +// change in incoming container type is not actual schema change. +container.zeroVectors(); switch(incoming.getSchema().getSelectionVectorMode()){ case NONE: this.copier = getStraightCopier(); @@ -66,6 +70,8 @@ protected boolean setupNewSchema() throws SchemaChangeException { throw new UnsupportedOperationException(); } +// If there is an actual schema change then below condition will be true and it will send OK_NEW_SCHEMA +// downstream too if (container.isSchemaChanged()) { container.buildSchema(SelectionVectorMode.NONE); return true; diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/TestSVRemoverIterOutcome.java b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/TestSVRemoverIterOutcome.java new file mode 100644 index 000..6613a71ca24 --- /dev/null +++ b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/svremover/TestSVRemoverIterOutcome.java @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.svremover; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.physical.config.SelectionVectorRemover; +import org.apache.drill.exec.physical.impl.BaseTestOpBatchEmitOutcome; +import org.apache.drill.exec.physical.impl.MockRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.test.rowSet.DirectRowSet; +import org.apache.drill.test.rowSet.RowSet; +import org.apache.drill.test.rowSet.RowSetComparison; +import org.apache.drill.test.rowSet.schema.SchemaBuilder; +import org.junit.After; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class TestSVRemoverIterOutcome extends BaseTestOpBatchEmitOutcome { + //private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestSVRemoverIterOutcome.class); + + // Holds reference to actual operator instance created for each tests + private static RemovingRecordBatch removingRecordBatch; + + // Lits of expected outcomes populated by each tests. Used to verify actual IterOutcome returned with next call on + // operator to expected outcome + private final List expectedOutcomes = new ArrayList<>(); + + // List of expected row counts populated by each tests. Used to verify actual output row count to expected row count + private fin
[jira] [Updated] (DRILL-6653) Unsupported Schema change exception where there is no schema change in lateral Unnest queries
[ https://issues.apache.org/jira/browse/DRILL-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-6653: - Labels: ready-to-commit (was: ) > Unsupported Schema change exception where there is no schema change in > lateral Unnest queries > - > > Key: DRILL-6653 > URL: https://issues.apache.org/jira/browse/DRILL-6653 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Kedar Sankar Behera >Assignee: Sorabh Hamirwasia >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > Attachments: Plan2.pdf > > > Unsupported Schema change exception where there is no schema change > DataSet - A single json file(sf1) > Query - > {code} > select customer.c_custkey, customer.c_name, sum(orders.totalprice) totalprice > from customer, lateral (select t.o.o_totalprice as totalprice from > unnest(customer.c_orders) t(o) order by totalprice limit 10) orders group by > customer.c_custkey, customer.c_name order by customer.c_custkey limit 50; > {code} > Result - > {code} > Exception: > java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not > support schema change > Prior schema : > BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` > (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE] > New schema : > BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` > (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE] > Fragment 0:0 > [Error Id: 21d4d646-4e6a-4e4a-ba75-60ba247ddabd on drill191:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528) > at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:632) > at > oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:153) > at > org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253) > at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: > UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema change > Prior schema : > BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` > (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE] > New schema : > BatchSchema [fields=[[`c_custkey` (VARCHAR:OPTIONAL)], [`c_name` > (VARCHAR:OPTIONAL)], [`totalprice` (FLOAT8:OPTIONAL)]], selectionVector=NONE] > Fragment 0:0 > [Error Id: 21d4d646-4e6a-4e4a-ba75-60ba247ddabd on drill191:31010] > at > oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) > at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) > at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > o
[jira] [Commented] (DRILL-6629) BitVector split and transfer does not work correctly for transfer length < 8
[ https://issues.apache.org/jira/browse/DRILL-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569245#comment-16569245 ] ASF GitHub Bot commented on DRILL-6629: --- ilooner closed pull request #1395: DRILL-6629 BitVector split and transfer does not work correctly for transfer length < 8 URL: https://github.com/apache/drill/pull/1395 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/vector/TestSplitAndTransfer.java b/exec/java-exec/src/test/java/org/apache/drill/exec/vector/TestSplitAndTransfer.java index 057fa131804..96dbd7caa2b 100644 --- a/exec/java-exec/src/test/java/org/apache/drill/exec/vector/TestSplitAndTransfer.java +++ b/exec/java-exec/src/test/java/org/apache/drill/exec/vector/TestSplitAndTransfer.java @@ -94,6 +94,21 @@ public void test() throws Exception { @Test public void testBitVectorUnalignedStart() throws Exception { +testBitVectorImpl(16, new int[][] {{2, 4}}, TestBitPattern.RANDOM); +testBitVectorImpl(16, new int[][] {{2, 4}}, TestBitPattern.ONE); +testBitVectorImpl(16, new int[][] {{2, 4}}, TestBitPattern.ZERO); +testBitVectorImpl(16, new int[][] {{2, 4}}, TestBitPattern.ALTERNATING); + +testBitVectorImpl(4096, new int[][] {{4092, 4}}, TestBitPattern.ONE); +testBitVectorImpl(4096, new int[][] {{4092, 4}}, TestBitPattern.ZERO); +testBitVectorImpl(4096, new int[][] {{4092, 4}}, TestBitPattern.ALTERNATING); +testBitVectorImpl(4096, new int[][] {{4092, 4}}, TestBitPattern.RANDOM); + +testBitVectorImpl(4096, new int[][] {{1020, 8}}, TestBitPattern.ONE); +testBitVectorImpl(4096, new int[][] {{1020, 8}}, TestBitPattern.ZERO); +testBitVectorImpl(4096, new int[][] {{1020, 8}}, TestBitPattern.ALTERNATING); +testBitVectorImpl(4096, new int[][] {{1020, 8}}, TestBitPattern.RANDOM); + testBitVectorImpl(24, new int[][] {{5, 17}}, TestBitPattern.ONE); testBitVectorImpl(24, new int[][] {{5, 17}}, TestBitPattern.ZERO); testBitVectorImpl(24, new int[][] {{5, 17}}, TestBitPattern.ALTERNATING); @@ -113,6 +128,17 @@ public void testBitVectorUnalignedStart() throws Exception { @Test public void testBitVectorAlignedStart() throws Exception { +testBitVectorImpl(32, new int[][] {{0, 4}}, TestBitPattern.RANDOM); +testBitVectorImpl(32, new int[][] {{0, 4}}, TestBitPattern.ONE); +testBitVectorImpl(32, new int[][] {{0, 4}}, TestBitPattern.ZERO); +testBitVectorImpl(32, new int[][] {{0, 4}}, TestBitPattern.ALTERNATING); + + +testBitVectorImpl(32, new int[][] {{0, 8}}, TestBitPattern.ONE); +testBitVectorImpl(32, new int[][] {{0, 8}}, TestBitPattern.ZERO); +testBitVectorImpl(32, new int[][] {{0, 8}}, TestBitPattern.ALTERNATING); +testBitVectorImpl(32, new int[][] {{0, 8}}, TestBitPattern.RANDOM); + testBitVectorImpl(24, new int[][] {{0, 17}}, TestBitPattern.ONE); testBitVectorImpl(24, new int[][] {{0, 17}}, TestBitPattern.ZERO); testBitVectorImpl(24, new int[][] {{0, 17}}, TestBitPattern.ALTERNATING); diff --git a/exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java b/exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java index 0dd34f51ef0..a6b87378d75 100644 --- a/exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java +++ b/exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java @@ -323,8 +323,14 @@ public void splitAndTransferTo(int startIndex, int length, BitVector target) { if (length % 8 != 0) { // start is not byte aligned so we have to copy some bits from the last full byte read in the // previous loop -byte lastButOneByte = byteIPlus1; +// if numBytesHoldingSourceBits == 1, lastButOneByte is the first byte, but we have not read it yet, so read it +byte lastButOneByte = (numBytesHoldingSourceBits == 1) ? this.data.getByte(firstByteIndex) : byteIPlus1; byte bitsFromLastButOneByte = (byte)((lastButOneByte & 0xFF) >>> firstBitOffset); +// if last bit to be copied is before the end of the first byte, then mask of the trailing extra bits +if (8 > (length + firstBitOffset)) { + byte mask = (byte)((0x1 << length) - 1); + bitsFromLastButOneByte = (byte)((bitsFromLastButOneByte & mask)); +} // If we have to read more bits than what we have already read, read it into lastByte otherwise set lastByte to 0. // (length % 8) is num of remaining bits to be read. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub