date:20190503

[jira] [Created] (DRILL-7235) Download page should link to verification instructions

2019-05-03 Thread Sebb (JIRA)

Sebb created DRILL-7235:
---

 Summary: Download page should link to verification instructions
 Key: DRILL-7235
 URL: https://issues.apache.org/jira/browse/DRILL-7235
 Project: Apache Drill
  Issue Type: Bug
Reporter: Sebb


The download page include links to KEYS, sigs and hashes, but does not provide 
any information on why or how they should be used.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7098) File Metadata Metastore Plugin

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832395#comment-16832395
 ] 

ASF GitHub Bot commented on DRILL-7098:
---

vdiravka commented on issue #1754: DRILL-7098: File Metadata Metastore Plugin
URL: https://github.com/apache/drill/pull/1754#issuecomment-489035078
 
 
   The commits are squashed. The branch is rebased onto Drill master branch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> File Metadata Metastore Plugin
> --
>
> Key: DRILL-7098
> URL: https://issues.apache.org/jira/browse/DRILL-7098
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Metadata
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: Metastore, ready-to-commit
> Fix For: 2.0.0
>
>
> DRILL-6852 introduces Drill Metastore API. 
> The second step is to create internal Drill Metastore mechanism (and File 
> Metastore Plugin), which will involve Metastore API and can be extended for 
> using by other Storage Plugins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7233) Format Plugin for HDF5

2019-05-03 Thread Charles Givre (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-7233:
-
Labels: doc-impacting  (was: )

> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query will return the result below:
> {{apache drill> SELECT * FROM table(dfs.test.`dset.h5` (type => 'hdf5', 
> defaultPath => '/dset'));
> +---+---+---+---+---+---+
> | int_col_0 | int_col_1 | int_col_2 | int_col_3 | int_col_4 | int_col_5 |
> +---+---+---+---+---+---+
> | 1 | 2 | 3 | 4 | 5 | 6 |
> | 7 | 8 | 9 | 10| 11| 12|
> | 13| 14| 15| 16| 17| 18|
> | 19| 20| 21| 22| 23| 24|
> +---+---+

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832482#comment-16832482
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on issue #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489087756
 
 
   @kkhatua why it's false by default? Also can we just show estimated row 
count without having separate option for this? If estimated row count is 
available, we show the info, if not, we don't. Or maybe make decision based on 
statistics availability, if if available show, otherwise don't.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832484#comment-16832484
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on issue #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489087756
 
 
   @kkhatua why it's false by default? Also can we just show estimated row 
count without having separate option for this? If estimated row count is 
available, we show the info, if not, we don't. Or maybe make decision based on 
statistics availability, if it is available show, otherwise don't.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832485#comment-16832485
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on issue #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489087756
 
 
   @kkhatua why it's false by default? Also can we just show estimated row 
count without having separate option for this? If estimated row count is 
available, we show the info, if not, we don't. Or maybe make decision based on 
statistics availability, if it is available show, otherwise don't.
   Also maybe it will be more obvious to have separate column for estimated row 
count, rather than explaining what number in parenthesis means?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6965) Adjust table function usage for all storage plugins and implement schema parameter

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832548#comment-16832548
 ] 

ASF GitHub Bot commented on DRILL-6965:
---

arina-ielchiieva commented on issue #1777: DRILL-6965: Implement schema table 
function parameter
URL: https://github.com/apache/drill/pull/1777#issuecomment-489120732
 
 
   @vvysotskyi thanks for the code review. Besides requested changes, converted 
`SchemaParsingException` into `IOException` to ensure it's always handled in 
the code. Also for schema parameter added schema validation before executing 
the query.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adjust table function usage for all storage plugins and implement schema 
> parameter
> --
>
> Key: DRILL-6965
> URL: https://issues.apache.org/jira/browse/DRILL-6965
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Schema can be used while reading the table into two ways:
>  a. schema is created in the table root folder using CREATE SCHEMA command 
> and schema usage command is enabled;
>  b. schema indicated in table function.
>  This Jira implements point b.
> Schema indication using table function is useful when user does not want to 
> persist schema in table root location or when reading from file, not folder.
> Schema parameter can be used as individual unit or in together with for 
> format plugin table properties.
> Usage examples:
> Pre-requisites: 
>  V3 reader must be enabled: {{set `exec.storage.enable_v3_text_reader` = 
> true;}}
> Query examples:
> 1. There is folder with files or just one file (ex: dfs.tmp.text_table) and 
> user wants to apply schema to them:
>  a. indicate schema inline:
> {noformat}
> select * from table(dfs.tmp.`text_table`(
> schema => 'inline=(col1 date properties {`drill.format` = `-MM-dd`}) 
> properties {`drill.strict` = `false`}'))
> {noformat}
> To indicate only table properties use the following syntax:
> {noformat}
> select * from table(dfs.tmp.`text_table`(
> schema => 'inline=() 
> properties {`drill.strict` = `false`}'))
> {noformat}
> b. indicate schema using path:
>  First schema was created in some location using CREATE SCHEMA command. For 
> example:
> {noformat}
> create schema 
> (col int)
> path '/tmp/my_schema'
> {noformat}
> Now user wants to apply this schema in table function:
> {noformat}
> select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))
> {noformat}
> 2. User wants to apply schema along side with format plugin table function 
> parameters.
>  Assuming that user has CSV file with headers with extension that does not 
> comply to default text file with headers extension (ex: cars.csvh-test):
> {noformat}
> select * from table(dfs.tmp.`cars.csvh-test`(type => 'text', 
> fieldDelimiter => ',', extractHeader => true,
> schema => 'inline=(col1 date)'))
> {noformat}
> More details about syntax can be found in design document:
>  
> [https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-7235) Download page should link to verification instructions

2019-05-03 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-7235:


Assignee: Bridget Bevens

> Download page should link to verification instructions
> --
>
> Key: DRILL-7235
> URL: https://issues.apache.org/jira/browse/DRILL-7235
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Bridget Bevens
>Priority: Major
>
> The download page include links to KEYS, sigs and hashes, but does not 
> provide any information on why or how they should be used.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7222:

Labels: doc-impacting user-experience  (was: user-experience)

> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832600#comment-16832600
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

agozhiy commented on pull request #1780: DRILL-7030: Make format plugins fully 
pluggable
URL: https://github.com/apache/drill/pull/1780
 
 
   - Bootstrap files for format plugins were introduced and added to the 
existing plugins in contrib.
   - Formats from these files are being added dynamically to the corresponding 
storage plugins.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832619#comment-16832619
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280834454
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -334,6 +331,48 @@ private StoragePlugins 
loadBootstrapPlugins(LogicalPlanPersistence lpPersistence
 }
   }
 
+  private void loadStoragePlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
+StoragePlugins plugins = getPluginsFromResource(url, lpPersistence);
+plugins.forEach(plugin -> {
+  StoragePluginConfig oldPluginConfig = 
bootstrapPlugins.putIfAbsent(plugin.getKey(), plugin.getValue());
+  if (oldPluginConfig != null) {
+logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], 
ignoring the later one.",
+plugin.getKey(), pluginURLMap.get(plugin.getKey()), url);
+  } else {
+pluginURLMap.put(plugin.getKey(), url);
+  }
+});
+  }
+
+  private void loadFormatPlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
+StoragePlugins plugins = getPluginsFromResource(url, lpPersistence);
+plugins.forEach(formatPlugin -> {
+  String targetStoragePluginName = formatPlugin.getKey();
+  StoragePluginConfig storagePlugin = 
bootstrapPlugins.getConfig(targetStoragePluginName);
+  StoragePluginConfig formatPluginValue = formatPlugin.getValue();
+  if (storagePlugin == null) {
+logger.warn("No storage plugins with the given name are registered: 
'{}'", targetStoragePluginName);
+  } else if (storagePlugin instanceof FileSystemConfig && 
formatPluginValue instanceof FileSystemConfig) {
+FileSystemConfig targetPlugin = (FileSystemConfig) storagePlugin;
+((FileSystemConfig) 
formatPluginValue).getFormats().forEach((formatName, formatValue) -> {
+  if (targetPlugin.getFormats().containsKey(formatName)) {
 
 Review comment:
   Use putIfAbsent as in `loadStoragePlugins`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832616#comment-16832616
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280833295
 
 

 ##
 File path: exec/java-exec/src/test/resources/bootstrap-format-plugins.json
 ##
 @@ -0,0 +1,31 @@
+{
 
 Review comment:
   Will the below formats be loaded for all tests since they are present in the 
classpath or only for the specific unit tests?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832615#comment-16832615
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280832129
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemConfig.java
 ##
 @@ -48,7 +49,7 @@ public FileSystemConfig(@JsonProperty("connection") String 
connection,
 Map caseInsensitiveWorkspaces = 
CaseInsensitiveMap.newHashMap();
 
Optional.ofNullable(workspaces).ifPresent(caseInsensitiveWorkspaces::putAll);
 this.workspaces = caseInsensitiveWorkspaces;
-this.formats = formats;
+this.formats = formats != null? formats : new LinkedHashMap<>();
 
 Review comment:
   ```suggestion
   this.formats = formats != null ? formats : new LinkedHashMap<>();
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832621#comment-16832621
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280834206
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -334,6 +331,48 @@ private StoragePlugins 
loadBootstrapPlugins(LogicalPlanPersistence lpPersistence
 }
   }
 
+  private void loadStoragePlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
 
 Review comment:
   Please add Javadoc.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832620#comment-16832620
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280834627
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -334,6 +331,48 @@ private StoragePlugins 
loadBootstrapPlugins(LogicalPlanPersistence lpPersistence
 }
   }
 
+  private void loadStoragePlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
+StoragePlugins plugins = getPluginsFromResource(url, lpPersistence);
+plugins.forEach(plugin -> {
+  StoragePluginConfig oldPluginConfig = 
bootstrapPlugins.putIfAbsent(plugin.getKey(), plugin.getValue());
+  if (oldPluginConfig != null) {
+logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], 
ignoring the later one.",
+plugin.getKey(), pluginURLMap.get(plugin.getKey()), url);
+  } else {
+pluginURLMap.put(plugin.getKey(), url);
+  }
+});
+  }
+
+  private void loadFormatPlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
+StoragePlugins plugins = getPluginsFromResource(url, lpPersistence);
+plugins.forEach(formatPlugin -> {
+  String targetStoragePluginName = formatPlugin.getKey();
+  StoragePluginConfig storagePlugin = 
bootstrapPlugins.getConfig(targetStoragePluginName);
+  StoragePluginConfig formatPluginValue = formatPlugin.getValue();
+  if (storagePlugin == null) {
+logger.warn("No storage plugins with the given name are registered: 
'{}'", targetStoragePluginName);
+  } else if (storagePlugin instanceof FileSystemConfig && 
formatPluginValue instanceof FileSystemConfig) {
+FileSystemConfig targetPlugin = (FileSystemConfig) storagePlugin;
+((FileSystemConfig) 
formatPluginValue).getFormats().forEach((formatName, formatValue) -> {
+  if (targetPlugin.getFormats().containsKey(formatName)) {
+logger.warn("Duplicate format instance '{}' defined in [{}, {}], 
ignoring the later one.",
+formatName, pluginURLMap.get(targetStoragePluginName), url);
+  } else {
+targetPlugin.getFormats().put(formatName, formatValue);
+  }
+});
+  } else {
+logger.warn("Formats are only supported by File System plugin type: 
'{}'", targetStoragePluginName);
 
 Review comment:
   ```suggestion
   logger.warn("Formats are only supported by File System plugin type: 
[{}]", targetStoragePluginName);
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
>

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832618#comment-16832618
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280834695
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -334,6 +331,48 @@ private StoragePlugins 
loadBootstrapPlugins(LogicalPlanPersistence lpPersistence
 }
   }
 
+  private void loadStoragePlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
+StoragePlugins plugins = getPluginsFromResource(url, lpPersistence);
+plugins.forEach(plugin -> {
+  StoragePluginConfig oldPluginConfig = 
bootstrapPlugins.putIfAbsent(plugin.getKey(), plugin.getValue());
+  if (oldPluginConfig != null) {
+logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], 
ignoring the later one.",
+plugin.getKey(), pluginURLMap.get(plugin.getKey()), url);
+  } else {
+pluginURLMap.put(plugin.getKey(), url);
+  }
+});
+  }
+
+  private void loadFormatPlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
+StoragePlugins plugins = getPluginsFromResource(url, lpPersistence);
+plugins.forEach(formatPlugin -> {
+  String targetStoragePluginName = formatPlugin.getKey();
+  StoragePluginConfig storagePlugin = 
bootstrapPlugins.getConfig(targetStoragePluginName);
+  StoragePluginConfig formatPluginValue = formatPlugin.getValue();
+  if (storagePlugin == null) {
+logger.warn("No storage plugins with the given name are registered: 
'{}'", targetStoragePluginName);
 
 Review comment:
   ```suggestion
   logger.warn("No storage plugins with the given name are registered: 
[{}]", targetStoragePluginName);
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832622#comment-16832622
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280834770
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -334,6 +331,48 @@ private StoragePlugins 
loadBootstrapPlugins(LogicalPlanPersistence lpPersistence
 }
   }
 
+  private void loadStoragePlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
+StoragePlugins plugins = getPluginsFromResource(url, lpPersistence);
+plugins.forEach(plugin -> {
+  StoragePluginConfig oldPluginConfig = 
bootstrapPlugins.putIfAbsent(plugin.getKey(), plugin.getValue());
+  if (oldPluginConfig != null) {
+logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], 
ignoring the later one.",
 
 Review comment:
   ```suggestion
   logger.warn("Duplicate plugin instance [{}] defined in [{}, {}], 
ignoring the later one.",
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832617#comment-16832617
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280834121
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -310,22 +311,18 @@ public void addPluginToPersistentStoreIfAbsent(String 
name, StoragePluginConfig
   private StoragePlugins loadBootstrapPlugins(LogicalPlanPersistence 
lpPersistence) throws IOException {
 // bootstrap load the config since no plugins are stored.
 logger.info("No storage plugin instances configured in persistent store, 
loading bootstrap configuration.");
-Set urls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, 
false);
-if (urls != null && !urls.isEmpty()) {
-  logger.info("Loading the storage plugin configs from URLs {}.", urls);
+Set storageUrls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, 
false);
+Set formatUrls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_FORMAT_PLUGINS_FILE, 
false);
+if (storageUrls != null && !storageUrls.isEmpty()) {
+  logger.info("Loading the storage plugin configs from URLs {}.", 
storageUrls);
   StoragePlugins bootstrapPlugins = new StoragePlugins(new HashMap<>());
   Map pluginURLMap = new HashMap<>();
-  for (URL url : urls) {
-String pluginsData = Resources.toString(url, Charsets.UTF_8);
-StoragePlugins plugins = 
lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class);
-for (Entry plugin : plugins) {
-  StoragePluginConfig oldPluginConfig = 
bootstrapPlugins.putIfAbsent(plugin.getKey(), plugin.getValue());
-  if (oldPluginConfig != null) {
-logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], 
ignoring the later one.",
-plugin.getKey(), pluginURLMap.get(plugin.getKey()), url);
-  } else {
-pluginURLMap.put(plugin.getKey(), url);
-  }
+  for (URL url : storageUrls) {
 
 Review comment:
   add log info, similar as for storage plugins
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832623#comment-16832623
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

arina-ielchiieva commented on pull request #1780: DRILL-7030: Make format 
plugins fully pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280835496
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -334,6 +331,48 @@ private StoragePlugins 
loadBootstrapPlugins(LogicalPlanPersistence lpPersistence
 }
   }
 
+  private void loadStoragePlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
+StoragePlugins plugins = getPluginsFromResource(url, lpPersistence);
+plugins.forEach(plugin -> {
+  StoragePluginConfig oldPluginConfig = 
bootstrapPlugins.putIfAbsent(plugin.getKey(), plugin.getValue());
+  if (oldPluginConfig != null) {
+logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], 
ignoring the later one.",
+plugin.getKey(), pluginURLMap.get(plugin.getKey()), url);
+  } else {
+pluginURLMap.put(plugin.getKey(), url);
+  }
+});
+  }
+
+  private void loadFormatPlugins(URL url, StoragePlugins bootstrapPlugins, 
Map pluginURLMap, LogicalPlanPersistence lpPersistence) throws 
IOException {
 
 Review comment:
   Please add Javadoc.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7030:

Reviewer: Arina Ielchiieva

> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-7198) Issuing a control-C in Sqlline exits the session (it does cancel the query)

2019-05-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7198:
---

Assignee: Volodymyr Vysotskyi

> Issuing a control-C in Sqlline exits the session (it does cancel the query)
> ---
>
> Key: DRILL-7198
> URL: https://issues.apache.org/jira/browse/DRILL-7198
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Aman Sinha
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>
> This behavior is observed both in Drill 1.15.0 and the RC1 of 1.16.0.   Run a 
> long-running query in sqlline and cancel it using control-c.  It exits the 
> sqlline session although it does cancel the query.  Behavior is seen in both 
> embedded mode and distributed mode.  If the query is submitted through 
> sqlline  and cancelled from the Web UI, it does behave correctly..the session 
> does not get killed and subsequent queries can be submitted in the same 
> sqlline session. 
> Same query in Drill 1.14.0 works correctly and returns the column headers 
> while canceling the query. 
> Since the query can be cancelled just fine through the Web UI,  I am not 
> considering this a blocker for 1.16.   Very likely the sqlline upgrade in 
> 1.15.0 changed the behavior.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7198) Issuing a control-C in Sqlline exits the session (it does cancel the query)

2019-05-03 Thread Arina Ielchiieva (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832624#comment-16832624
 ] 

Arina Ielchiieva commented on DRILL-7198:
-

SqlLine issue - https://github.com/julianhyde/sqlline/issues/292

> Issuing a control-C in Sqlline exits the session (it does cancel the query)
> ---
>
> Key: DRILL-7198
> URL: https://issues.apache.org/jira/browse/DRILL-7198
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Aman Sinha
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>
> This behavior is observed both in Drill 1.15.0 and the RC1 of 1.16.0.   Run a 
> long-running query in sqlline and cancel it using control-c.  It exits the 
> sqlline session although it does cancel the query.  Behavior is seen in both 
> embedded mode and distributed mode.  If the query is submitted through 
> sqlline  and cancelled from the Web UI, it does behave correctly..the session 
> does not get killed and subsequent queries can be submitted in the same 
> sqlline session. 
> Same query in Drill 1.14.0 works correctly and returns the column headers 
> while canceling the query. 
> Since the query can be cancelled just fine through the Web UI,  I am not 
> considering this a blocker for 1.16.   Very likely the sqlline upgrade in 
> 1.15.0 changed the behavior.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7236) SqlLine 1.8 upgrade

2019-05-03 Thread Arina Ielchiieva (JIRA)

Arina Ielchiieva created DRILL-7236:
---

 Summary: SqlLine 1.8 upgrade
 Key: DRILL-7236
 URL: https://issues.apache.org/jira/browse/DRILL-7236
 Project: Apache Drill
  Issue Type: Task
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva


SqlLine 1.8 upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7158) null values for varchar, interval, boolean are displayed as empty string in SqlLine

2019-05-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7158:

Fix Version/s: (was: 1.17.0)

> null values for varchar, interval, boolean are displayed as empty string in 
> SqlLine
> ---
>
> Key: DRILL-7158
> URL: https://issues.apache.org/jira/browse/DRILL-7158
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Priority: Major
>
> null values for varchar, interval, boolean are displayed as empty string in 
> SqlLine.
> Caused by SqlLine bug: [https://github.com/julianhyde/sqlline/issues/288]
> Possible workaround to set nullValue other case than lower: {{!set nullValue 
> Null}}.
> Should be fixed in the next SqlLine upgrade (to 1.8.0) when prior fixed in 
> SqlLine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread Anton Gozhiy (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy reassigned DRILL-7030:
---

Assignee: Anton Gozhiy  (was: Anton Gozhiy)

> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-7204) Add proper validation when creating plugin

2019-05-03 Thread Anton Gozhiy (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy reassigned DRILL-7204:
---

Assignee: Anton Gozhiy  (was: Anton Gozhiy)

> Add proper validation when creating plugin
> --
>
> Key: DRILL-7204
> URL: https://issues.apache.org/jira/browse/DRILL-7204
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: alert.JPG, new_plugin.png
>
>
> 1. Currently there is no failure when user attempts to create plugin without 
> name. Screenshot attached. I think we need proper plugin name validation when 
> creating plugin.
> 2. When disabling and deleting plugin, alerts are used. Its better to use 
> more user friendly message window. Screenshot attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7168) Implement ALTER SCHEMA ADD / REMOVE COLUMN / PROPERTY commands

2019-05-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7168:

Summary: Implement ALTER SCHEMA ADD / REMOVE COLUMN / PROPERTY commands  
(was: Implement ALTER TABLE SCHEMA ADD / REMOVE COLUMN / PROPERTY commands)

> Implement ALTER SCHEMA ADD / REMOVE COLUMN / PROPERTY commands
> --
>
> Key: DRILL-7168
> URL: https://issues.apache.org/jira/browse/DRILL-7168
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> By [~Paul.Rogers]:
> {quote}
> Sooner or later users are going to ask for a command to update just the 
> properties, or just add or remove a column, without having to spell out the 
> entire new schema. ALTER TABLE SCHEMA ADD/REMOVE COLUMN/PROPERTY ...
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832657#comment-16832657
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

agozhiy commented on pull request #1780: DRILL-7030: Make format plugins fully 
pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280851671
 
 

 ##
 File path: exec/java-exec/src/test/resources/bootstrap-format-plugins.json
 ##
 @@ -0,0 +1,31 @@
+{
 
 Review comment:
   They should be available for all tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7030) Make format plugins fully pluggable

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832658#comment-16832658
 ] 

ASF GitHub Bot commented on DRILL-7030:
---

agozhiy commented on pull request #1780: DRILL-7030: Make format plugins fully 
pluggable
URL: https://github.com/apache/drill/pull/1780#discussion_r280851671
 
 

 ##
 File path: exec/java-exec/src/test/resources/bootstrap-format-plugins.json
 ##
 @@ -0,0 +1,31 @@
+{
 
 Review comment:
   They will be available for all tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make format plugins fully pluggable
> ---
>
> Key: DRILL-7030
> URL: https://issues.apache.org/jira/browse/DRILL-7030
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Arina Ielchiieva
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Discussion on the mailing list - 
> [https://lists.apache.org/thread.html/0c7de9c23ee9a8e18f8548ae0a323284cf1311b9570bd77ba544f63d@%3Cdev.drill.apache.org%3E]
> {noformat}
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> {
>   "storage":{
> dfs: {
>   formats: {
> "psv" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> }
>   }
> }
>   }
> }
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6965) Adjust table function usage for all storage plugins and implement schema parameter

2019-05-03 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6965:
---
Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Adjust table function usage for all storage plugins and implement schema 
> parameter
> --
>
> Key: DRILL-6965
> URL: https://issues.apache.org/jira/browse/DRILL-6965
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Schema can be used while reading the table into two ways:
>  a. schema is created in the table root folder using CREATE SCHEMA command 
> and schema usage command is enabled;
>  b. schema indicated in table function.
>  This Jira implements point b.
> Schema indication using table function is useful when user does not want to 
> persist schema in table root location or when reading from file, not folder.
> Schema parameter can be used as individual unit or in together with for 
> format plugin table properties.
> Usage examples:
> Pre-requisites: 
>  V3 reader must be enabled: {{set `exec.storage.enable_v3_text_reader` = 
> true;}}
> Query examples:
> 1. There is folder with files or just one file (ex: dfs.tmp.text_table) and 
> user wants to apply schema to them:
>  a. indicate schema inline:
> {noformat}
> select * from table(dfs.tmp.`text_table`(
> schema => 'inline=(col1 date properties {`drill.format` = `-MM-dd`}) 
> properties {`drill.strict` = `false`}'))
> {noformat}
> To indicate only table properties use the following syntax:
> {noformat}
> select * from table(dfs.tmp.`text_table`(
> schema => 'inline=() 
> properties {`drill.strict` = `false`}'))
> {noformat}
> b. indicate schema using path:
>  First schema was created in some location using CREATE SCHEMA command. For 
> example:
> {noformat}
> create schema 
> (col int)
> path '/tmp/my_schema'
> {noformat}
> Now user wants to apply this schema in table function:
> {noformat}
> select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))
> {noformat}
> 2. User wants to apply schema along side with format plugin table function 
> parameters.
>  Assuming that user has CSV file with headers with extension that does not 
> comply to default text file with headers extension (ex: cars.csvh-test):
> {noformat}
> select * from table(dfs.tmp.`cars.csvh-test`(type => 'text', 
> fieldDelimiter => ',', extractHeader => true,
> schema => 'inline=(col1 date)'))
> {noformat}
> More details about syntax can be found in design document:
>  
> [https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7235) Download page should link to verification instructions

2019-05-03 Thread Bridget Bevens (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832689#comment-16832689
 ] 

Bridget Bevens commented on DRILL-7235:
---

I've updated the [Download page|https://drill.apache.org/download/] with a link 
to:
[How to Verify Downloaded Files|https://www.apache.org/info/verification.html]
You may need to refresh the page to see the update.

Thanks,
Bridget


> Download page should link to verification instructions
> --
>
> Key: DRILL-7235
> URL: https://issues.apache.org/jira/browse/DRILL-7235
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Bridget Bevens
>Priority: Major
>
> The download page include links to KEYS, sigs and hashes, but does not 
> provide any information on why or how they should be used.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DRILL-7235) Download page should link to verification instructions

2019-05-03 Thread Bridget Bevens (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-7235.
---
Resolution: Fixed

Download page updated with a link to instructions to check downloaded files.

> Download page should link to verification instructions
> --
>
> Key: DRILL-7235
> URL: https://issues.apache.org/jira/browse/DRILL-7235
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Bridget Bevens
>Priority: Major
>
> The download page include links to KEYS, sigs and hashes, but does not 
> provide any information on why or how they should be used.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832702#comment-16832702
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual 
row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489185185
 
 
   @arina-ielchiieva 
   
   The motivation for this PR comes from the need for engineers to analyze 
queries as plans change due to introduction of statistics. An initial thought 
was to add an additional column, but, I think, we already have a lot of 
columns. I've tried to figure which columns to trim, but almost all seem 
relevant. I know we might come back to doing similar things with Resource 
Management as well, where we'll again need to work on estimates vs actual. So 
adding additional columns is not practical.
   
   Showing the estimates based on whether a planning decision was made using 
statistics is not possible unless the profile JSON itself carries some hint 
that statistics were used.
   
   Also, I added the toggle button to provide a mechanism to hide the estimates 
by default (another reason why not an additional column). I'm worried that 
users will get the impression that there are issues with Drill because of 
estimates being wildly off. Even if they are sufficiently accurate (like 
NDV-based estimates vs actual), most users don't have the insight into how the 
stats are being used.
   
   Users who have insight into such things can make use of the estimates to 
tune parameters (e.g. broadcast or selectivity thresholds) to force changes in 
plans that are sub-optimal. Based on this, I thought we should go with the 
parenthesis option for showing the estimated row counts.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832729#comment-16832729
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on issue #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489196746
 
 
   Well, I am ok if we leave estimates in parenthesis but I am still unclear 
why we need config param, if estimates are hidden by default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832732#comment-16832732
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual 
row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489198464
 
 
   Because engineers are, by nature, lazy and would like to get as much insight 
with minimal clicks. 
😇
   
   Changing the default on a development environment helps.  ( @amansinha100 , 
agree? )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832734#comment-16832734
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on issue #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489200426
 
 
   Oh, now I see where it comes from :). Just not sure if this is worth adding 
new config param. Won't be devs lazy enough to update drill config and restart 
the drillbits? Is adding it as session / system option more convenient for devs?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832737#comment-16832737
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on issue #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489200426
 
 
   Oh, now I see where it comes from :). Just not sure if this is worth adding 
new config param. Won't be devs lazy enough to update drill config and restart 
the drillbits? Is adding it as session / system option be more convenient for 
devs? Anyway, I am fine with any of the choice.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832736#comment-16832736
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on issue #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489200426
 
 
   Oh, now I see where it comes from :). Just not sure if this is worth adding 
new config param. Won't be devs lazy enough to update drill config and restart 
the drillbits? Is adding it as session / system option be more convenient for 
devs?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832740#comment-16832740
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on pull request #1779: DRILL-7222: Visualize 
estimated and actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#discussion_r280889338
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl
 ##
 @@ -587,6 +622,49 @@
   if (e.target.form) 
 <#if 
model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit();
 });
+
+// Convert scientific to Decimal [Ref: 
https://gist.github.com/jiggzson/b5f489af9ad931e3d186]
+function scientificToDecimal(num) {
 
 Review comment:
   Won't this be are convenient to do in Java and just pass expected string 
representation to UI?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832741#comment-16832741
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on pull request #1779: DRILL-7222: Visualize 
estimated and actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#discussion_r280889338
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl
 ##
 @@ -587,6 +622,49 @@
   if (e.target.form) 
 <#if 
model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit();
 });
+
+// Convert scientific to Decimal [Ref: 
https://gist.github.com/jiggzson/b5f489af9ad931e3d186]
+function scientificToDecimal(num) {
 
 Review comment:
   Won't this be more convenient to do in Java and just pass expected string 
representation to UI?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832770#comment-16832770
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on pull request #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#discussion_r280902167
 
 

 ##
 File path: exec/java-exec/src/main/resources/rest/profile/profile.ftl
 ##
 @@ -587,6 +622,49 @@
   if (e.target.form) 
 <#if 
model.isOnlyImpersonationEnabled()>doSubmitQueryWithUserName()<#else>doSubmitQueryWithAutoLimit();
 });
+
+// Convert scientific to Decimal [Ref: 
https://gist.github.com/jiggzson/b5f489af9ad931e3d186]
+function scientificToDecimal(num) {
 
 Review comment:
   Ideally, yes.
   
   But the way we construct the `Operators Overview` table is by only injecting 
values (which then get converted to human readable formats based on data 
types). That makes it difficult for me to inject these `` elements during 
construction.
   
   Also, with Javascript, I reduce the processing and construction overhead on 
the WebServer and have the client browsers do this formatting.
   
   In theory, the entire profile can be constructed by Javascript from the JSON 
profile, but basic summaries are easier to implement (and faster to calculate) 
in Java.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7237) IllegalStateException in aggregation function 'single_value' when there is a varchar datatype in the subquery results

2019-05-03 Thread Denys Ordynskiy (JIRA)

Denys Ordynskiy created DRILL-7237:
--

 Summary: IllegalStateException in aggregation function 
'single_value' when there is a varchar datatype in the subquery results
 Key: DRILL-7237
 URL: https://issues.apache.org/jira/browse/DRILL-7237
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.15.0, 1.14.0
Reporter: Denys Ordynskiy
Assignee: Volodymyr Vysotskyi
 Attachments: drillbit.log

*Description:*
The following issue can be reproduced on the EBF for the 
[DRILL-7050|https://issues.apache.org/jira/browse/DRILL-7050].

_For the query with > 1 row in subquery results where the data type of these 
results *is not varchar*:_
{noformat}
SELECT
  e.full_name,
  (
SELECT
  ine.employee_id
FROM
  cp.`employee.json` ine
WHERE
  ine.position_id = e.position_id
  ) as empl_id
FROM
  cp.`employee.json` e
LIMIT 20
{noformat}

_We have the following correct and informative error:_
{noformat}
Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: Input 
for single_value function has more than one row Fragment 0:0 [Error Id: 
b770098f-b1c7-4647-9f41-9e986a0e47b7 on maprhost:31010]
{noformat}

_But when in the result set of the subquery we have *a varchar data type*:_
{noformat}
SELECT
  e.full_name,
  (
SELECT
  ine.first_name
FROM
  cp.`employee.json` ine
WHERE
  ine.position_id = e.position_id
  ) as empl_id
FROM
  cp.`employee.json` e
LIMIT 20
{noformat}

*Actual result:*
_Drill throws the following error:_
{noformat}
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
IllegalStateException: Workspace variable 'value' in aggregation function 
'single_value' is not allowed to have variable length type. Fragment 0:0 
Please, refer to logs for more information. [Error Id: 
32325ba9-d2b3-4216-acf6-8e80dfe4a56a on maprhost:31010]
{noformat}
Log file is in the attachment "drillbit.log"

*Expected result:*
Drill should return the same informative error to any of a data types in the 
subquery result set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7237) IllegalStateException in aggregation function 'single_value' when there is a varchar datatype in the subquery results

2019-05-03 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7237:

Fix Version/s: 1.17.0

> IllegalStateException in aggregation function 'single_value' when there is a 
> varchar datatype in the subquery results
> -
>
> Key: DRILL-7237
> URL: https://issues.apache.org/jira/browse/DRILL-7237
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Denys Ordynskiy
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: drillbit.log
>
>
> *Description:*
> The following issue can be reproduced on the EBF for the 
> [DRILL-7050|https://issues.apache.org/jira/browse/DRILL-7050].
> _For the query with > 1 row in subquery results where the data type of these 
> results *is not varchar*:_
> {noformat}
> SELECT
>   e.full_name,
>   (
> SELECT
>   ine.employee_id
> FROM
>   cp.`employee.json` ine
> WHERE
>   ine.position_id = e.position_id
>   ) as empl_id
> FROM
>   cp.`employee.json` e
> LIMIT 20
> {noformat}
> _We have the following correct and informative error:_
> {noformat}
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: FUNCTION ERROR: Input 
> for single_value function has more than one row Fragment 0:0 [Error Id: 
> b770098f-b1c7-4647-9f41-9e986a0e47b7 on maprhost:31010]
> {noformat}
> _But when in the result set of the subquery we have *a varchar data type*:_
> {noformat}
> SELECT
>   e.full_name,
>   (
> SELECT
>   ine.first_name
> FROM
>   cp.`employee.json` ine
> WHERE
>   ine.position_id = e.position_id
>   ) as empl_id
> FROM
>   cp.`employee.json` e
> LIMIT 20
> {noformat}
> *Actual result:*
> _Drill throws the following error:_
> {noformat}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalStateException: Workspace variable 'value' in aggregation function 
> 'single_value' is not allowed to have variable length type. Fragment 0:0 
> Please, refer to logs for more information. [Error Id: 
> 32325ba9-d2b3-4216-acf6-8e80dfe4a56a on maprhost:31010]
> {noformat}
> Log file is in the attachment "drillbit.log"
> *Expected result:*
> Drill should return the same informative error to any of a data types in the 
> subquery result set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed

2019-05-03 Thread Sorabh Hamirwasia (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7148:
-
Labels:   (was: ready-to-commit)

> TPCH query 17 increases execution time with Statistics enabled because join 
> order is changed
> 
>
> Key: DRILL-7148
> URL: https://issues.apache.org/jira/browse/DRILL-7148
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
> Fix For: 1.17.0
>
>
> TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order 
> has flipped the build side and the probe side in Major Fragment 01.
> Here is the query:
> select
>  sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>  lineitem l,
>  part p
> where
>  p.p_partkey = l.l_partkey
>  and p.p_brand = 'Brand#13'
>  and p.p_container = 'JUMBO CAN'
>  and l.l_quantity < (
>  select
>  0.2 * avg(l2.l_quantity)
>  from
>  lineitem l2
>  where
>  l2.l_partkey = p.p_partkey
>  );
> Here is original plan:
> {noformat}
> 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, 
> cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489493
> 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY 
> avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, 
> 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489492
> 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, 
> 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489491
> 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, 
> cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489490
> 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, 
> 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489489
> 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY 
> l_extendedprice): rowcount = 2.948545E9, cumulative cost = 
> \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488
> 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 
> 2.948545E9, cumulative cost = \{7.253787630218E10 rows, 
> 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489487
> 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): 
> rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, 
> 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489486
> 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) 
> : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, 
> ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = 
> \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485
> 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : 
> rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): 
> rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, 
> 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 
> memory}, id = 489476
> 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = 
> \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, 
> 9.8471562592256E13 network, 7.92E7 memory}, id = 489475
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
> 5.89709E9, cumulative cost = \{3.0417948545E10 rows, 1.57618732434E11 
> cpu, 1.8599969127E10 io, 1.677312E11 network, 7.92E7 memory}, id = 489474
> 04-01 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2, 1301011)]) : rowType

[jira] [Commented] (DRILL-7148) TPCH query 17 increases execution time with Statistics enabled because join order is changed

2019-05-03 Thread Sorabh Hamirwasia (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832850#comment-16832850
 ] 

Sorabh Hamirwasia commented on DRILL-7148:
--

[~gparai] - With this commit I am seeing plan verification failures for 2 
queries in [Functional 
Tests|http://10.10.104.91:8080/job/Apache_Drill_Functional_Tests/1823/console]. 
After removing this commit from my merge branch the tests are fine. So for now 
I am removing the ready-to-commit tag from this JIRA. Can you please look into 
the failures meanwhile ?

> TPCH query 17 increases execution time with Statistics enabled because join 
> order is changed
> 
>
> Key: DRILL-7148
> URL: https://issues.apache.org/jira/browse/DRILL-7148
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Gautam Parai
>Assignee: Gautam Parai
>Priority: Major
> Fix For: 1.17.0
>
>
> TPCH query 17 with sf 1000 runs 45% slower. One issue is that the join order 
> has flipped the build side and the probe side in Major Fragment 01.
> Here is the query:
> select
>  sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
>  lineitem l,
>  part p
> where
>  p.p_partkey = l.l_partkey
>  and p.p_brand = 'Brand#13'
>  and p.p_container = 'JUMBO CAN'
>  and l.l_quantity < (
>  select
>  0.2 * avg(l2.l_quantity)
>  from
>  lineitem l2
>  where
>  l2.l_partkey = p.p_partkey
>  );
> Here is original plan:
> {noformat}
> 00-00 Screen : rowType = RecordType(ANY avg_yearly): rowcount = 1.0, 
> cumulative cost = \{7.853786601428E10 rows, 6.6179786770537E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489493
> 00-01 Project(avg_yearly=[/($0, 7.0)]) : rowType = RecordType(ANY 
> avg_yearly): rowcount = 1.0, cumulative cost = \{7.853786601418E10 rows, 
> 6.6179786770527E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489492
> 00-02 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601318E10 rows, 
> 6.6179786770127E11 cpu, 3.0599948545E10 io, 1.083019457355776E14 network, 
> 1.17294998955024E11 memory}, id = 489491
> 00-03 UnionExchange : rowType = RecordType(ANY $f0): rowcount = 1.0, 
> cumulative cost = \{7.853786601218E10 rows, 6.6179786768927E11 cpu, 
> 3.0599948545E10 io, 1.083019457355776E14 network, 1.17294998955024E11 
> memory}, id = 489490
> 01-01 StreamAgg(group=[{}], agg#0=[SUM($0)]) : rowType = RecordType(ANY $f0): 
> rowcount = 1.0, cumulative cost = \{7.853786601118E10 rows, 
> 6.6179786768127E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489489
> 01-02 Project(l_extendedprice=[$1]) : rowType = RecordType(ANY 
> l_extendedprice): rowcount = 2.948545E9, cumulative cost = 
> \{7.553787115668E10 rows, 6.2579792942727E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489488
> 01-03 SelectionVectorRemover : rowType = RecordType(ANY l_quantity, ANY 
> l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): rowcount = 
> 2.948545E9, cumulative cost = \{7.253787630218E10 rows, 
> 6.2279793457277E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489487
> 01-04 Filter(condition=[<($0, *(0.2, $4))]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY l_partkey, ANY $f1): 
> rowcount = 2.948545E9, cumulative cost = \{6.953788144768E10 rows, 
> 6.1979793971827E11 cpu, 3.0599948545E10 io, 1.083019457314816E14 network, 
> 1.17294998955024E11 memory}, id = 489486
> 01-05 HashJoin(condition=[=($2, $3)], joinType=[inner], semi-join: =[false]) 
> : rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey, 
> ANY l_partkey, ANY $f1): rowcount = 5.89709E9, cumulative cost = 
> \{6.353789173867999E10 rows, 5.8379800146427E11 cpu, 3.0599948545E10 io, 
> 1.083019457314816E14 network, 1.17294998955024E11 memory}, id = 489485
> 01-07 Project(l_quantity=[$0], l_extendedprice=[$1], p_partkey=[$2]) : 
> rowType = RecordType(ANY l_quantity, ANY l_extendedprice, ANY p_partkey): 
> rowcount = 5.89709E9, cumulative cost = \{4.2417927963E10 rows, 
> 2.71618536905E11 cpu, 1.8599969127E10 io, 9.8471562592256E13 network, 7.92E7 
> memory}, id = 489476
> 01-09 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY 
> l_quantity, ANY l_extendedprice, ANY p_partkey, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.89709E9, cumulative cost = 
> \{3.6417938254E10 rows, 2.53618567778E11 cpu, 1.8599969127E10 io, 
> 9.8471562592256E13 network, 7.92E7 memory}, id = 489475
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY l_q

[jira] [Created] (DRILL-7238) Drill does not use DirectScan for non-existent columns

2019-05-03 Thread Venkata Jyothsna Donapati (JIRA)

Venkata Jyothsna Donapati created DRILL-7238:


 Summary: Drill does not use DirectScan for non-existent columns
 Key: DRILL-7238
 URL: https://issues.apache.org/jira/browse/DRILL-7238
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venkata Jyothsna Donapati
Assignee: Venkata Jyothsna Donapati


This query does not use the summary metadata cache file:
select count(int_nulls_id), count(int_id), count(ss_ticket_number), 
count(extra) from store_sales_null_blocks_int;

In this query, extra is a column that does not exist (non-existent column).

Here is the explain plan:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)], EXPR$1=[$SUM0($1)], 
EXPR$2=[$SUM0($2)], EXPR$3=[$SUM0($3)])
00-03  UnionExchange
01-01StreamAgg(group=[{}], EXPR$0=[COUNT($0)], EXPR$1=[COUNT($1)], 
EXPR$2=[COUNT($2)], EXPR$3=[COUNT($3)])
01-02  Scan(table=[[dfs, parquet_metadata_cache, 
store_sales_null_blocks_int]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=/drill/testdata/metadata_cache/store_sales_null_blocks_int]], 
selectionRoot=/drill/testdata/metadata_cache/store_sales_null_blocks_int, 
numFiles=1, numRowGroups=11, usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/metadata_cache/store_sales_null_blocks_int, 
columns=[`int_nulls_id`, `int_id`, `ss_ticket_number`, `extra`]]])
{noformat}
This is a regression from Drill 1.15.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7228) Histogram end points show high deviation for a sample data set

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832880#comment-16832880
 ] 

ASF GitHub Bot commented on DRILL-7228:
---

sohami commented on issue #1774: DRILL-7228: Upgrade to a newer version of 
t-digest to address inaccur…
URL: https://github.com/apache/drill/pull/1774#issuecomment-489258099
 
 
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Histogram end points show high deviation for a sample data set
> --
>
> Key: DRILL-7228
> URL: https://issues.apache.org/jira/browse/DRILL-7228
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> There are couple of scenarios where the histogram bucket end points show high 
> deviation for the attached sample data set. 
> +Scenario 1: +
> There are total 100 rows in the sample. Here are the first 10 values of the 
> c_float column ordered by the column.  
> {noformat}
> select c_float from `table_stats/alltypes_with_nulls` order by c_float;
> +--+
> |   c_float|
> +--+
> | -4.6873795E9 |
> | 8.1855632E7  |
> | 2.65311632E8 |
> | 4.50677952E8 |
> | 4.6864464E8  |
> | 5.7848493E8  |
> | 6.6793114E8  |
> | 7.1175571E8  |
> | 9.0065581E8  |
> | 9.2245773E8  |
> ...
> ...
> <100 rows>
> {noformat}
> Here the minimum value is a small negative number.  Here's the output of the 
> histogram after running ANALYZE command: 
> {noformat}
>  "buckets" : [ 8.1855488E7, 9.13736816E8, 1.720863011198E9, 
> 3.2401755232E9, 4.6546719328E9, 5.130497904E9, 5.9901393504E9, 6.779930992E9, 
> 7.998626672E9, 8.69159614398E9, 9.983783792E9 ]
> {noformat}
> Note that the starting end point of bucket 0 is actually the 2nd value in the 
> ordered list and the small negative number is not represented in the 
> histogram at all. 
> +Scenario 2:+
> Histogram for the c_bigint column is as below: 
> {noformat}
>  {
>   "column" : "`c_bigint`",
>   "majortype" : {
> "type" : "BIGINT",
> "mode" : "OPTIONAL"
>   },
>   "schema" : 1.0,
>   "rowcount" : 100.0,
>   "nonnullrowcount" : 87.0,
>   "ndv" : 46,
>   "avgwidth" : 8.0,
>   "histogram" : {
> "category" : "numeric-equi-depth",
> "numRowsPerBucket" : 8,
> "buckets" : [ -8.6390506354062131E18, -7.679478802017577E18, 
> -5.8389791200382024E18, -2.9165328693138038E18, -1.77746633649836621E18, 
> 2.83467841536E11, 2.83467841536E11, 2.83467841536E11, 2.83467841536E11, 
> 8.848383132345303E17, 4.6441480083157811E18 ]
>   }
> }
> {noformat}
> This indicates that there are duplicate rows with the value close to 2.83 
> which is not true when we analyze the source data.
> This is the output of the ntile function:
> {noformat}
> SELECT bucket_num,
>  min(c_bigint) as min_amount,
> max(c_bigint) as max_amount,
>  count(*) as total_count
>FROM (
>  SELECT c_bigint,
>  NTILE(10) OVER (ORDER BY c_bigint) as bucket_num
>  FROM `table_stats/alltypes_with_nulls`
>   )
> GROUP BY bucket_num
>  ORDER BY bucket_num;
> ++--+--+-+
> | bucket_num |  min_amount  |  max_amount  | total_count |
> ++--+--+-+
> | 1  | -8804872880253829120 | -6983033704176156672 | 10  |
> | 2  | -6772904422084182016 | -5326061597989273600 | 10  |
> | 3  | -5111449881868763136 | -2561061038367703040 | 10  |
> | 4  | -2424523650070740992 | -449093763428515840  | 10  |
> | 5  | 0| 0| 10  |
> | 6  | 0| 0| 10  |
> | 7  | 0| 0| 10  |
> | 8  | 0| 884838034226544640   | 10  |
> | 9  | 884838034226544640   | 4644147690488201216  | 10  |
> | 10 | null | null | 10  |
> ++--+--+-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832904#comment-16832904
 ] 

ASF GitHub Bot commented on DRILL-7225:
---

sohami commented on pull request #1773: DRILL-7225: Fixed merging 
ColumnTypeInfo for files with different schemas
URL: https://github.com/apache/drill/pull/1773
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6974) SET option command

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832900#comment-16832900
 ] 

ASF GitHub Bot commented on DRILL-6974:
---

sohami commented on pull request #1763: DRILL-6974: Add possibility to view 
option value via SET command
URL: https://github.com/apache/drill/pull/1763
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SET option command
> --
>
> Key: DRILL-6974
> URL: https://issues.apache.org/jira/browse/DRILL-6974
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Affects Versions: 1.15.0
>Reporter: benj
>Assignee: Dmytriy Grinchenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> It's currently possible to define options with the SQL command SET
> {code:java}
> ALTER SESSION SET `drill.exec.hashagg.fallback.enabled` = true;
> {code}
> But it's not possible to simply visualize the current value of one option 
> with SHOW, we have to query like
> {code:java}
> SELECT * FROM sys.options WHERE `name` = 
> 'drill.exec.hashagg.fallback.enabled';
> {code}
> Why not allow a simple
> {code:java}
> SET `drill.exec.hashagg.fallback.enabled`;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7098) File Metadata Metastore Plugin

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832902#comment-16832902
 ] 

ASF GitHub Bot commented on DRILL-7098:
---

sohami commented on pull request #1754: DRILL-7098: File Metadata Metastore 
Plugin
URL: https://github.com/apache/drill/pull/1754
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> File Metadata Metastore Plugin
> --
>
> Key: DRILL-7098
> URL: https://issues.apache.org/jira/browse/DRILL-7098
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Metadata
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: Metastore, ready-to-commit
> Fix For: 2.0.0
>
>
> DRILL-6852 introduces Drill Metastore API. 
> The second step is to create internal Drill Metastore mechanism (and File 
> Metastore Plugin), which will involve Metastore API and can be extended for 
> using by other Storage Plugins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7171) Count(*) query on leaf level directory is not reading summary cache file.

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832899#comment-16832899
 ] 

ASF GitHub Bot commented on DRILL-7171:
---

sohami commented on pull request #1748: DRILL-7171: Create metadata directories 
cache file in the leaf level directories to support ConvertCountToDirectScan 
optimization.
URL: https://github.com/apache/drill/pull/1748
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Count(*) query on leaf level directory is not reading summary cache file.
> -
>
> Key: DRILL-7171
> URL: https://issues.apache.org/jira/browse/DRILL-7171
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Since the leaf level directory doesn't store the metadata directories file, 
> while reading summary if the directories cache file is not present, it is 
> assumed that the cache is possibly corrupt and reading of the summary cache 
> file is skipped. Metadata directories cache file should be created at the 
> leaf level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7167) DESCRIBE TABLE statement is not implemented

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832898#comment-16832898
 ] 

ASF GitHub Bot commented on DRILL-7167:
---

sohami commented on pull request #1747: DRILL-7167: Implemented DESCRIBE TABLE 
statement
URL: https://github.com/apache/drill/pull/1747
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DESCRIBE TABLE statement is not implemented 
> 
>
> Key: DRILL-7167
> URL: https://issues.apache.org/jira/browse/DRILL-7167
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> DESCRIBE dfs.tmp.`table` - works fine 
> DESCRIBE TABLE dfs.tmp.`table` - fails with error:
> {code:java}
> Error: PARSE ERROR: org.apache.calcite.sql.SqlBasicCall cannot be cast to 
> org.apache.calcite.sql.SqlIdentifier
> {code}
> DESCRIBE TABLE should work the same as DESCRIBE;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7062) Run-time row group pruning

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832897#comment-16832897
 ] 

ASF GitHub Bot commented on DRILL-7062:
---

sohami commented on pull request #1738: DRILL-7062: Initial implementation of 
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run-time row group pruning
> --
>
> Key: DRILL-7062
> URL: https://issues.apache.org/jira/browse/DRILL-7062
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Boaz Ben-Zvi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7228) Histogram end points show high deviation for a sample data set

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832903#comment-16832903
 ] 

ASF GitHub Bot commented on DRILL-7228:
---

sohami commented on pull request #1774: DRILL-7228: Upgrade to a newer version 
of t-digest to address inaccur…
URL: https://github.com/apache/drill/pull/1774
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Histogram end points show high deviation for a sample data set
> --
>
> Key: DRILL-7228
> URL: https://issues.apache.org/jira/browse/DRILL-7228
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> There are couple of scenarios where the histogram bucket end points show high 
> deviation for the attached sample data set. 
> +Scenario 1: +
> There are total 100 rows in the sample. Here are the first 10 values of the 
> c_float column ordered by the column.  
> {noformat}
> select c_float from `table_stats/alltypes_with_nulls` order by c_float;
> +--+
> |   c_float|
> +--+
> | -4.6873795E9 |
> | 8.1855632E7  |
> | 2.65311632E8 |
> | 4.50677952E8 |
> | 4.6864464E8  |
> | 5.7848493E8  |
> | 6.6793114E8  |
> | 7.1175571E8  |
> | 9.0065581E8  |
> | 9.2245773E8  |
> ...
> ...
> <100 rows>
> {noformat}
> Here the minimum value is a small negative number.  Here's the output of the 
> histogram after running ANALYZE command: 
> {noformat}
>  "buckets" : [ 8.1855488E7, 9.13736816E8, 1.720863011198E9, 
> 3.2401755232E9, 4.6546719328E9, 5.130497904E9, 5.9901393504E9, 6.779930992E9, 
> 7.998626672E9, 8.69159614398E9, 9.983783792E9 ]
> {noformat}
> Note that the starting end point of bucket 0 is actually the 2nd value in the 
> ordered list and the small negative number is not represented in the 
> histogram at all. 
> +Scenario 2:+
> Histogram for the c_bigint column is as below: 
> {noformat}
>  {
>   "column" : "`c_bigint`",
>   "majortype" : {
> "type" : "BIGINT",
> "mode" : "OPTIONAL"
>   },
>   "schema" : 1.0,
>   "rowcount" : 100.0,
>   "nonnullrowcount" : 87.0,
>   "ndv" : 46,
>   "avgwidth" : 8.0,
>   "histogram" : {
> "category" : "numeric-equi-depth",
> "numRowsPerBucket" : 8,
> "buckets" : [ -8.6390506354062131E18, -7.679478802017577E18, 
> -5.8389791200382024E18, -2.9165328693138038E18, -1.77746633649836621E18, 
> 2.83467841536E11, 2.83467841536E11, 2.83467841536E11, 2.83467841536E11, 
> 8.848383132345303E17, 4.6441480083157811E18 ]
>   }
> }
> {noformat}
> This indicates that there are duplicate rows with the value close to 2.83 
> which is not true when we analyze the source data.
> This is the output of the ntile function:
> {noformat}
> SELECT bucket_num,
>  min(c_bigint) as min_amount,
> max(c_bigint) as max_amount,
>  count(*) as total_count
>FROM (
>  SELECT c_bigint,
>  NTILE(10) OVER (ORDER BY c_bigint) as bucket_num
>  FROM `table_stats/alltypes_with_nulls`
>   )
> GROUP BY bucket_num
>  ORDER BY bucket_num;
> ++--+--+-+
> | bucket_num |  min_amount  |  max_amount  | total_count |
> ++--+--+-+
> | 1  | -8804872880253829120 | -6983033704176156672 | 10  |
> | 2  | -6772904422084182016 | -5326061597989273600 | 10  |
> | 3  | -5111449881868763136 | -2561061038367703040 | 10  |
> | 4  | -2424523650070740992 | -449093763428515840  | 10  |
> | 5  | 0| 0| 10  |
> | 6  | 0| 0| 10  |
> | 7  | 0| 0| 10  |
> | 8  | 0| 884838034226544640   | 10  |
> | 9  | 884838034226544640   | 4644147690488201216  | 10  |
> | 10 | null | null | 10  |
> ++--+--+-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6988) Utility of the too long error message when syntax error

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832901#comment-16832901
 ] 

ASF GitHub Bot commented on DRILL-6988:
---

sohami commented on pull request #1753: DRILL-6988: Utility of the too long 
error message when syntax error
URL: https://github.com/apache/drill/pull/1753
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Utility of the too long error message when syntax error
> ---
>
> Key: DRILL-6988
> URL: https://issues.apache.org/jira/browse/DRILL-6988
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Affects Versions: 1.15.0
>Reporter: benj
>Assignee: Dmytriy Grinchenko
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: error_picture_sample.png
>
>
> When executing query with syntax error, the too long and useless error 
> message pollute log/screen and doesn't give any useful information.
> Maybe a more concise error message (1-3 line max) should be enough.
> {code:java}
> SELECT FROM (VALUES(1));
> Error: PARSE ERROR: Encountered "FROM" at line 1, column 8.
> Was expecting one of:
>     "UNION" ...
>     "INTERSECT" ...
>     "EXCEPT" ...
>     "MINUS" ...
>     "ORDER" ...
>     "LIMIT" ...
>     "OFFSET" ...
>     "FETCH" ...
>     "STREAM" ...
>     "DISTINCT" ...
>     "ALL" ...
>     "*" ...
>     "+" ...
>     "-" ...
>     "NOT" ...
>     "EXISTS" ...
>      ...
>      ...
>      ...
>      ...
>      ...
>      ...
>      ...
>     "TRUE" ...
>     "FALSE" ...
>     "UNKNOWN" ...
>     "NULL" ...
>      ...
>      ...
>      ...
>     "DATE" ...
>     "TIME" ...
>     "TIMESTAMP" ...
>     "INTERVAL" ...
>     "?" ...
>     "CAST" ...
>     "EXTRACT" ...
>     "POSITION" ...
>     "CONVERT" ...
>     "TRANSLATE" ...
>     "OVERLAY" ...
>     "FLOOR" ...
>     "CEIL" ...
>     "CEILING" ...
>     "SUBSTRING" ...
>     "TRIM" ...
>     "CLASSIFIER" ...
>     "MATCH_NUMBER" ...
>     "RUNNING" ...
>     "PREV" ...
>     "NEXT" ...
>      ...
>     "MULTISET" ...
>     "ARRAY" ...
>     "PERIOD" ...
>     "SPECIFIC" ...
>      ...
>      ...
>      ...
>      ...
>      ...
>     "ABS" ...
>     "AVG" ...
>     "CARDINALITY" ...
>     "CHAR_LENGTH" ...
>     "CHARACTER_LENGTH" ...
>     "COALESCE" ...
>     "COLLECT" ...
>     "COVAR_POP" ...
>     "COVAR_SAMP" ...
>     "CUME_DIST" ...
>     "COUNT" ...
>     "CURRENT_DATE" ...
>     "CURRENT_TIME" ...
>     "CURRENT_TIMESTAMP" ...
>     "DENSE_RANK" ...
>     "ELEMENT" ...
>     "EXP" ...
>     "FIRST_VALUE" ...
>     "FUSION" ...
>     "GROUPING" ...
>     "HOUR" ...
>     "LAG" ...
>     "LEAD" ...
>     "LAST_VALUE" ...
>     "LN" ...
>     "LOCALTIME" ...
>     "LOCALTIMESTAMP" ...
>     "LOWER" ...
>     "MAX" ...
>     "MIN" ...
>     "MINUTE" ...
>     "MOD" ...
>     "MONTH" ...
>     "NTH_VALUE" ...
>     "NTILE" ...
>     "NULLIF" ...
>     "OCTET_LENGTH" ...
>     "PERCENT_RANK" ...
>     "POWER" ...
>     "RANK" ...
>     "REGR_SXX" ...
>     "REGR_SYY" ...
>     "ROW_NUMBER" ...
>     "SECOND" ...
>     "SQRT" ...
>     "STDDEV_POP" ...
>     "STDDEV_SAMP" ...
>     "SUM" ...
>     "UPPER" ...
>     "TRUNCATE" ...
>     "USER" ...
>     "VAR_POP" ...
>     "VAR_SAMP" ...
>     "YEAR" ...
>     "CURRENT_CATALOG" ...
>     "CURRENT_DEFAULT_TRANSFORM_GROUP" ...
>     "CURRENT_PATH" ...
>     "CURRENT_ROLE" ...
>     "CURRENT_SCHEMA" ...
>     "CURRENT_USER" ...
>     "SESSION_USER" ...
>     "SYSTEM_USER" ...
>     "NEW" ...
>     "CASE" ...
>     "CURRENT" ...
>     "CURSOR" ...
>     "ROW" ...
>     "(" ...
>     
> SQL Query SELECT FROM VALUES(1)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-05-03 Thread Sorabh Hamirwasia (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7225:
-
Component/s: Metadata

> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7225) Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata

2019-05-03 Thread Sorabh Hamirwasia (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7225:
-
Affects Version/s: 1.16.0

> Merging of columnTypeInfo for file with different schema throws 
> NullPointerException during refresh metadata
> 
>
> Key: DRILL-7225
> URL: https://issues.apache.org/jira/browse/DRILL-7225
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Merging of columnTypeInfo from two files with different schemas throws 
> nullpointerexception. For example if a directory Orders has two files:
>  * orders.parquet (with columns order_id, order_name, order_date)
>  * orders_with_address.parquet (with columns order_id, order_name, address)
> When refresh table metadata is triggered, metadata such as total_null_count 
> for columns in both the files is aggregated and updated in the 
> ColumnTypeInfo. Initially ColumnTypeInfo is initialized with the first file's 
> ColumnTypeInfo (i.e., order_id, order_name, order_date). While aggregating, 
> the existing ColumnTypeInfo is looked up for columns in the second file and 
> since some of them don't exist in the ColumnTypeInfo, a npe is thrown. This 
> can be fixed by initializing ColumnTypeInfo for columns that are not yet 
> present.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7171) Count(*) query on leaf level directory is not reading summary cache file.

2019-05-03 Thread Sorabh Hamirwasia (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7171:
-
Component/s: Metadata

> Count(*) query on leaf level directory is not reading summary cache file.
> -
>
> Key: DRILL-7171
> URL: https://issues.apache.org/jira/browse/DRILL-7171
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Since the leaf level directory doesn't store the metadata directories file, 
> while reading summary if the directories cache file is not present, it is 
> assumed that the cache is possibly corrupt and reading of the summary cache 
> file is skipped. Metadata directories cache file should be created at the 
> leaf level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7171) Count(*) query on leaf level directory is not reading summary cache file.

2019-05-03 Thread Sorabh Hamirwasia (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7171:
-
Affects Version/s: 1.16.0

> Count(*) query on leaf level directory is not reading summary cache file.
> -
>
> Key: DRILL-7171
> URL: https://issues.apache.org/jira/browse/DRILL-7171
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Since the leaf level directory doesn't store the metadata directories file, 
> while reading summary if the directories cache file is not present, it is 
> assumed that the cache is possibly corrupt and reading of the summary cache 
> file is skipped. Metadata directories cache file should be created at the 
> leaf level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7239) Download page wrong date

2019-05-03 Thread Sebb (JIRA)

Sebb created DRILL-7239:
---

 Summary: Download page wrong date
 Key: DRILL-7239
 URL: https://issues.apache.org/jira/browse/DRILL-7239
 Project: Apache Drill
  Issue Type: Bug
Reporter: Sebb


[https://drill.apache.org/download/] says:

 

"Drill 1.16 was released on May 02, 2018."

I think that is wrong.

 

The page also says:

"Copyright © 2012-2014"

If the year is mentioned, it should be updated for the last substantive change



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-7239) Download page wrong date

2019-05-03 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-7239:


Assignee: Bridget Bevens

> Download page wrong date
> 
>
> Key: DRILL-7239
> URL: https://issues.apache.org/jira/browse/DRILL-7239
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Bridget Bevens
>Priority: Major
>
> [https://drill.apache.org/download/] says:
>  
> "Drill 1.16 was released on May 02, 2018."
> I think that is wrong.
>  
> The page also says:
> "Copyright © 2012-2014"
> If the year is mentioned, it should be updated for the last substantive change



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7199) Optimize the time taken to populate column statistics for non-interesting columns

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832924#comment-16832924
 ] 

ASF GitHub Bot commented on DRILL-7199:
---

dvjyothsna commented on issue #1771: DRILL-7199: Optimize population of 
metadata for non-interesting columns
URL: https://github.com/apache/drill/pull/1771#issuecomment-489269917
 
 
   Rebased on master
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimize the time taken to populate column statistics for non-interesting 
> columns
> -
>
> Key: DRILL-7199
> URL: https://issues.apache.org/jira/browse/DRILL-7199
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.17.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently populating column statistics for non-interesting columns very long 
> since it is populated for every row group. Since non-interesting column 
> statistics are common for the table, it can be populated once and can be 
> reused.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7199) Optimize the time taken to populate column statistics for non-interesting columns

2019-05-03 Thread Venkata Jyothsna Donapati (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Jyothsna Donapati updated DRILL-7199:
-
Labels: ready-to-commit  (was: )

> Optimize the time taken to populate column statistics for non-interesting 
> columns
> -
>
> Key: DRILL-7199
> URL: https://issues.apache.org/jira/browse/DRILL-7199
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently populating column statistics for non-interesting columns very long 
> since it is populated for every row group. Since non-interesting column 
> statistics are common for the table, it can be populated once and can be 
> reused.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DRILL-7239) Download page wrong date

2019-05-03 Thread Bridget Bevens (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-7239.
---
Resolution: Fixed

Fixed Date on Download page.
Will check on copyright date.


> Download page wrong date
> 
>
> Key: DRILL-7239
> URL: https://issues.apache.org/jira/browse/DRILL-7239
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Bridget Bevens
>Priority: Major
>
> [https://drill.apache.org/download/] says:
>  
> "Drill 1.16 was released on May 02, 2018."
> I think that is wrong.
>  
> The page also says:
> "Copyright © 2012-2014"
> If the year is mentioned, it should be updated for the last substantive change



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7238) Drill does not use DirectScan for non-existent columns

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832951#comment-16832951
 ] 

ASF GitHub Bot commented on DRILL-7238:
---

dvjyothsna commented on issue #1781: DRILL-7238: Fixed ConvertCountToDirectScan 
to handle non-existent columns
URL: https://github.com/apache/drill/pull/1781#issuecomment-489275905
 
 
   @amansinha100 Please review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill does not use DirectScan for non-existent columns
> --
>
> Key: DRILL-7238
> URL: https://issues.apache.org/jira/browse/DRILL-7238
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> This query does not use the summary metadata cache file:
> select count(int_nulls_id), count(int_id), count(ss_ticket_number), 
> count(extra) from store_sales_null_blocks_int;
> In this query, extra is a column that does not exist (non-existent column).
> Here is the explain plan:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)], EXPR$1=[$SUM0($1)], 
> EXPR$2=[$SUM0($2)], EXPR$3=[$SUM0($3)])
> 00-03  UnionExchange
> 01-01StreamAgg(group=[{}], EXPR$0=[COUNT($0)], 
> EXPR$1=[COUNT($1)], EXPR$2=[COUNT($2)], EXPR$3=[COUNT($3)])
> 01-02  Scan(table=[[dfs, parquet_metadata_cache, 
> store_sales_null_blocks_int]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/drill/testdata/metadata_cache/store_sales_null_blocks_int]], 
> selectionRoot=/drill/testdata/metadata_cache/store_sales_null_blocks_int, 
> numFiles=1, numRowGroups=11, usedMetadataFile=true, 
> cacheFileRoot=/drill/testdata/metadata_cache/store_sales_null_blocks_int, 
> columns=[`int_nulls_id`, `int_id`, `ss_ticket_number`, `extra`]]])
> {noformat}
> This is a regression from Drill 1.15.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7238) Drill does not use DirectScan for non-existent columns

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832950#comment-16832950
 ] 

ASF GitHub Bot commented on DRILL-7238:
---

dvjyothsna commented on pull request #1781: DRILL-7238: Fixed 
ConvertCountToDirectScan to handle non-existent columns
URL: https://github.com/apache/drill/pull/1781
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill does not use DirectScan for non-existent columns
> --
>
> Key: DRILL-7238
> URL: https://issues.apache.org/jira/browse/DRILL-7238
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> This query does not use the summary metadata cache file:
> select count(int_nulls_id), count(int_id), count(ss_ticket_number), 
> count(extra) from store_sales_null_blocks_int;
> In this query, extra is a column that does not exist (non-existent column).
> Here is the explain plan:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)], EXPR$1=[$SUM0($1)], 
> EXPR$2=[$SUM0($2)], EXPR$3=[$SUM0($3)])
> 00-03  UnionExchange
> 01-01StreamAgg(group=[{}], EXPR$0=[COUNT($0)], 
> EXPR$1=[COUNT($1)], EXPR$2=[COUNT($2)], EXPR$3=[COUNT($3)])
> 01-02  Scan(table=[[dfs, parquet_metadata_cache, 
> store_sales_null_blocks_int]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/drill/testdata/metadata_cache/store_sales_null_blocks_int]], 
> selectionRoot=/drill/testdata/metadata_cache/store_sales_null_blocks_int, 
> numFiles=1, numRowGroups=11, usedMetadataFile=true, 
> cacheFileRoot=/drill/testdata/metadata_cache/store_sales_null_blocks_int, 
> columns=[`int_nulls_id`, `int_id`, `ss_ticket_number`, `extra`]]])
> {noformat}
> This is a regression from Drill 1.15.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7240) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-05-03 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7240:
---

 Summary: Run-time rowgroup pruning match() fails on casting a Long 
to an Integer
 Key: DRILL-7240
 URL: https://issues.apache.org/jira/browse/DRILL-7240
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.17.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.17.0


After a Parquet table is refreshed with select "interesting" columns, a query 
whose WHERE clause contains a condition on a "non interesting" INT64 column 
fails during run-time pruning (calling match()) with:
{noformat}
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
{noformat}
 Near-term fix suggestion: Catch the match() exception error, and instead do 
not prune (i.e. run-time pruning would be disabled in such cases).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7240) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-05-03 Thread Boaz Ben-Zvi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-7240:

Issue Type: Sub-task  (was: Bug)
Parent: DRILL-7028

> Run-time rowgroup pruning match() fails on casting a Long to an Integer
> ---
>
> Key: DRILL-7240
> URL: https://issues.apache.org/jira/browse/DRILL-7240
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
>
> After a Parquet table is refreshed with select "interesting" columns, a query 
> whose WHERE clause contains a condition on a "non interesting" INT64 column 
> fails during run-time pruning (calling match()) with:
> {noformat}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
> {noformat}
>  Near-term fix suggestion: Catch the match() exception error, and instead do 
> not prune (i.e. run-time pruning would be disabled in such cases).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7240) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-05-03 Thread Boaz Ben-Zvi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-7240:

Description: 
After a Parquet table is refreshed with selected "interesting" columns, a query 
whose WHERE clause contains a condition on a "non interesting" INT64 column 
fails during run-time pruning (calling match()) with:
{noformat}
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
{noformat}
 Near-term fix suggestion: Catch the match() exception error, and instead do 
not prune (i.e. run-time pruning would be disabled in such cases).

  was:
After a Parquet table is refreshed with select "interesting" columns, a query 
whose WHERE clause contains a condition on a "non interesting" INT64 column 
fails during run-time pruning (calling match()) with:
{noformat}
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
{noformat}
 Near-term fix suggestion: Catch the match() exception error, and instead do 
not prune (i.e. run-time pruning would be disabled in such cases).


> Run-time rowgroup pruning match() fails on casting a Long to an Integer
> ---
>
> Key: DRILL-7240
> URL: https://issues.apache.org/jira/browse/DRILL-7240
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
>
> After a Parquet table is refreshed with selected "interesting" columns, a 
> query whose WHERE clause contains a condition on a "non interesting" INT64 
> column fails during run-time pruning (calling match()) with:
> {noformat}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
> {noformat}
>  Near-term fix suggestion: Catch the match() exception error, and instead do 
> not prune (i.e. run-time pruning would be disabled in such cases).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7240) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-05-03 Thread Boaz Ben-Zvi (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832964#comment-16832964
 ] 

Boaz Ben-Zvi commented on DRILL-7240:
-

Example recreating this bug - take this json file
{noformat}
{"key": "aa", "sales": 11}
{"key": "bb", "sales": 22}
{noformat}
And create two parquet tables/files by selecting from the json, first casting 
the "sales" to an INT, and the second to a BIGINT:
{noformat}
create table test_int as select key, cast(sales as int) sales from 
dfs.`/tmp/myfile.json`;
create table test_bigint as select key, cast(sales as bigint) sales from 
dfs.`/tmp/myfile.json`;
{noformat}
Then move the two files into a sub-directory, renaming the second:
{noformat}
$ > mv /tmp/test_int/0_0_0.parquet /tmp/test/sub
$ > mv /tmp/test_bigint/0_0_0.parquet /tmp/test/sub/0_0_1.parquet 
{noformat}
Last refresh on only the first "key" columns, then run a query with a predicate 
on the 'sales" column:
{noformat}
refresh table METADATA columns(key) dfs.`/tmp/test`;
select sales from dfs.`/tmp/test/` where sales > 10;
{noformat}


> Run-time rowgroup pruning match() fails on casting a Long to an Integer
> ---
>
> Key: DRILL-7240
> URL: https://issues.apache.org/jira/browse/DRILL-7240
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
>
> After a Parquet table is refreshed with selected "interesting" columns, a 
> query whose WHERE clause contains a condition on a "non interesting" INT64 
> column fails during run-time pruning (calling match()) with:
> {noformat}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
> {noformat}
>  Near-term fix suggestion: Catch the match() exception error, and instead do 
> not prune (i.e. run-time pruning would be disabled in such cases).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6965) Adjust table function usage for all storage plugins and implement schema parameter

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832970#comment-16832970
 ] 

ASF GitHub Bot commented on DRILL-6965:
---

sohami commented on pull request #1777: DRILL-6965: Implement schema table 
function parameter
URL: https://github.com/apache/drill/pull/1777
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adjust table function usage for all storage plugins and implement schema 
> parameter
> --
>
> Key: DRILL-6965
> URL: https://issues.apache.org/jira/browse/DRILL-6965
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Schema can be used while reading the table into two ways:
>  a. schema is created in the table root folder using CREATE SCHEMA command 
> and schema usage command is enabled;
>  b. schema indicated in table function.
>  This Jira implements point b.
> Schema indication using table function is useful when user does not want to 
> persist schema in table root location or when reading from file, not folder.
> Schema parameter can be used as individual unit or in together with for 
> format plugin table properties.
> Usage examples:
> Pre-requisites: 
>  V3 reader must be enabled: {{set `exec.storage.enable_v3_text_reader` = 
> true;}}
> Query examples:
> 1. There is folder with files or just one file (ex: dfs.tmp.text_table) and 
> user wants to apply schema to them:
>  a. indicate schema inline:
> {noformat}
> select * from table(dfs.tmp.`text_table`(
> schema => 'inline=(col1 date properties {`drill.format` = `-MM-dd`}) 
> properties {`drill.strict` = `false`}'))
> {noformat}
> To indicate only table properties use the following syntax:
> {noformat}
> select * from table(dfs.tmp.`text_table`(
> schema => 'inline=() 
> properties {`drill.strict` = `false`}'))
> {noformat}
> b. indicate schema using path:
>  First schema was created in some location using CREATE SCHEMA command. For 
> example:
> {noformat}
> create schema 
> (col int)
> path '/tmp/my_schema'
> {noformat}
> Now user wants to apply this schema in table function:
> {noformat}
> select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))
> {noformat}
> 2. User wants to apply schema along side with format plugin table function 
> parameters.
>  Assuming that user has CSV file with headers with extension that does not 
> comply to default text file with headers extension (ex: cars.csvh-test):
> {noformat}
> select * from table(dfs.tmp.`cars.csvh-test`(type => 'text', 
> fieldDelimiter => ',', extractHeader => true,
> schema => 'inline=(col1 date)'))
> {noformat}
> More details about syntax can be found in design document:
>  
> [https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7199) Optimize the time taken to populate column statistics for non-interesting columns

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832972#comment-16832972
 ] 

ASF GitHub Bot commented on DRILL-7199:
---

sohami commented on pull request #1771: DRILL-7199: Optimize population of 
metadata for non-interesting columns
URL: https://github.com/apache/drill/pull/1771
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimize the time taken to populate column statistics for non-interesting 
> columns
> -
>
> Key: DRILL-7199
> URL: https://issues.apache.org/jira/browse/DRILL-7199
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently populating column statistics for non-interesting columns very long 
> since it is populated for every row group. Since non-interesting column 
> statistics are common for the table, it can be populated once and can be 
> reused.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7050) RexNode convert exception in subquery

2019-05-03 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832971#comment-16832971
 ] 

ASF GitHub Bot commented on DRILL-7050:
---

sohami commented on pull request #1770: DRILL-7050: RexNode convert exception 
in sub-query
URL: https://github.com/apache/drill/pull/1770
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RexNode convert exception in subquery
> -
>
> Key: DRILL-7050
> URL: https://issues.apache.org/jira/browse/DRILL-7050
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Oleg Zinoviev
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> If the query contains a subquery whose filters are associated with the main 
> query, an error occurs: *PLAN ERROR: Cannot convert RexNode to equivalent 
> Drill expression. RexNode Class: org.apache.calcite.rex.RexCorrelVariable*
> Steps to reproduce:
> 1) Create source table (or view, doesn't matter)
> {code:sql}
> create table dfs.root.source as  (
> select 1 as id union all select 2 as id
> )
> {code}
> 2) Execute query
> {code:sql}
> select t1.id,
>   (select count(t2.id) 
>   from dfs.root.source t2 where t2.id = t1.id)
> from  dfs.root.source t1
> {code}
> Reason: 
> Method 
> {code:java}org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard.lookupExp{code}
>   call {code:java}RexBuilder.makeCorrel{code} in some cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-7199) Optimize the time taken to populate column statistics for non-interesting columns

2019-05-03 Thread Sorabh Hamirwasia (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7199:
-
Component/s: Metadata

> Optimize the time taken to populate column statistics for non-interesting 
> columns
> -
>
> Key: DRILL-7199
> URL: https://issues.apache.org/jira/browse/DRILL-7199
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently populating column statistics for non-interesting columns very long 
> since it is populated for every row group. Since non-interesting column 
> statistics are common for the table, it can be populated once and can be 
> reused.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

74 matches

Mail list logo