[jira] [Created] (DRILL-8501) Json Conversion UDF Not Respecting System JSON Options
Charles Givre created DRILL-8501: Summary: Json Conversion UDF Not Respecting System JSON Options Key: DRILL-8501 URL: https://issues.apache.org/jira/browse/DRILL-8501 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.21.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.22.0 The convert_fromJSON() UDF does not respect the system JSON options of allTextMode and readAllNumbersAsDouble. This PR fixes that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8494) HTTP Caching Not Saving Pages
Charles Givre created DRILL-8494: Summary: HTTP Caching Not Saving Pages Key: DRILL-8494 URL: https://issues.apache.org/jira/browse/DRILL-8494 Project: Apache Drill Issue Type: Bug Components: Storage - HTTP Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.2 A minor bugfix, but the HTTP storage plugin was not actually caching results even when caching was set to true. This bug was introduced in DRILL-8329. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8493) Drill Unable to Read XML Files with Namespaces
Charles Givre created DRILL-8493: Summary: Drill Unable to Read XML Files with Namespaces Key: DRILL-8493 URL: https://issues.apache.org/jira/browse/DRILL-8493 Project: Apache Drill Issue Type: Bug Components: Format - XML Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.2 This is a bug fix whereby Drill ignores all data when an XML file has a namespace. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8481) Ability to query XML root attributes
[ https://issues.apache.org/jira/browse/DRILL-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8481: - Fix Version/s: 1.21.2 > Ability to query XML root attributes > > > Key: DRILL-8481 > URL: https://issues.apache.org/jira/browse/DRILL-8481 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - XML >Affects Versions: 1.21.1 >Reporter: benj >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.2 > > > Hi, > It is possible to retrieve the field attributes except those of the root > It would be interesting to be able to retrieve the attributes found in the > root node of XML files. > In my common use cases, I have many XML files each containing a single XML > frame with often one or more attributes in the root tag. > To recover this value, I am currently forced to preprocess the files to > "copy" this attribute into the fields of the XML record. > Even with multiple xml records under the root, it would be useful to consider > that the root attributes are accessible for each record > Example (fichier aaa.xml): > {noformat} > > > blue > > {noformat} > With request : > {code:sql} > SELECT * FROM(SELECT filename, * FROM TABLE(dfs.test.`/aaa.xml`(type=>'xml', > dataLevel=>1)) as xml) AS x; > {code} > I can access to : > * P1_SubVersion > * P1_MID > * P1_PN > * P1_SL > * P2_SubVersion > * P2.Color > But I can' access to : > * PPP_Version > * PPP_TimeStamp > and changing the DataLevel does not solve the problem > Regards, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8481) Ability to query XML root attributes
[ https://issues.apache.org/jira/browse/DRILL-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821723#comment-17821723 ] Charles Givre commented on DRILL-8481: -- [~benj641] I just submitted a bug fix. [https://github.com/apache/drill/pull/2884] If you can review and test it, I'd appreciate it. > Ability to query XML root attributes > > > Key: DRILL-8481 > URL: https://issues.apache.org/jira/browse/DRILL-8481 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - XML >Affects Versions: 1.21.1 >Reporter: benj >Assignee: Charles Givre >Priority: Major > > Hi, > It is possible to retrieve the field attributes except those of the root > It would be interesting to be able to retrieve the attributes found in the > root node of XML files. > In my common use cases, I have many XML files each containing a single XML > frame with often one or more attributes in the root tag. > To recover this value, I am currently forced to preprocess the files to > "copy" this attribute into the fields of the XML record. > Even with multiple xml records under the root, it would be useful to consider > that the root attributes are accessible for each record > Example (fichier aaa.xml): > {noformat} > > > blue > > {noformat} > With request : > {code:sql} > SELECT * FROM(SELECT filename, * FROM TABLE(dfs.test.`/aaa.xml`(type=>'xml', > dataLevel=>1)) as xml) AS x; > {code} > I can access to : > * P1_SubVersion > * P1_MID > * P1_PN > * P1_SL > * P2_SubVersion > * P2.Color > But I can' access to : > * PPP_Version > * PPP_TimeStamp > and changing the DataLevel does not solve the problem > Regards, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (DRILL-8481) Ability to query XML root attributes
[ https://issues.apache.org/jira/browse/DRILL-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre reassigned DRILL-8481: Assignee: Charles Givre > Ability to query XML root attributes > > > Key: DRILL-8481 > URL: https://issues.apache.org/jira/browse/DRILL-8481 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - XML >Affects Versions: 1.21.1 >Reporter: benj >Assignee: Charles Givre >Priority: Major > > Hi, > It is possible to retrieve the field attributes except those of the root > It would be interesting to be able to retrieve the attributes found in the > root node of XML files. > In my common use cases, I have many XML files each containing a single XML > frame with often one or more attributes in the root tag. > To recover this value, I am currently forced to preprocess the files to > "copy" this attribute into the fields of the XML record. > Even with multiple xml records under the root, it would be useful to consider > that the root attributes are accessible for each record > Example (fichier aaa.xml): > {noformat} > > > blue > > {noformat} > With request : > {code:sql} > SELECT * FROM(SELECT filename, * FROM TABLE(dfs.test.`/aaa.xml`(type=>'xml', > dataLevel=>1)) as xml) AS x; > {code} > I can access to : > * P1_SubVersion > * P1_MID > * P1_PN > * P1_SL > * P2_SubVersion > * P2.Color > But I can' access to : > * PPP_Version > * PPP_TimeStamp > and changing the DataLevel does not solve the problem > Regards, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8481) Ability to query XML root attributes
[ https://issues.apache.org/jira/browse/DRILL-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821285#comment-17821285 ] Charles Givre commented on DRILL-8481: -- [~benj641] Thanks for submitting. Are you actively working on this or is this just a bug report? > Ability to query XML root attributes > > > Key: DRILL-8481 > URL: https://issues.apache.org/jira/browse/DRILL-8481 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - XML >Affects Versions: 1.21.1 >Reporter: benj >Priority: Major > > Hi, > It is possible to retrieve the field attributes except those of the root > It would be interesting to be able to retrieve the attributes found in the > root node of XML files. > In my common use cases, I have many XML files each containing a single XML > frame with often one or more attributes in the root tag. > To recover this value, I am currently forced to preprocess the files to > "copy" this attribute into the fields of the XML record. > Even with multiple xml records under the root, it would be useful to consider > that the root attributes are accessible for each record > Example (fichier aaa.xml): > {noformat} > > > blue > > {noformat} > With request : > {code:sql} > SELECT * FROM(SELECT filename, * FROM TABLE(dfs.test.`/aaa.xml`(type=>'xml', > dataLevel=>1)) as xml) AS x; > {code} > I can access to : > * P1_SubVersion > * P1_MID > * P1_PN > * P1_SL > * P2_SubVersion > * P2.Color > But I can' access to : > * PPP_Version > * PPP_TimeStamp > and changing the DataLevel does not solve the problem > Regards, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802191#comment-17802191 ] Charles Givre commented on DRILL-8474: -- [https://github.com/apache/drill/pull/2836] > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8474) Add Daffodil Format Plugin
Charles Givre created DRILL-8474: Summary: Add Daffodil Format Plugin Key: DRILL-8474 URL: https://issues.apache.org/jira/browse/DRILL-8474 Project: Apache Drill Issue Type: New Feature Affects Versions: 1.21.1 Reporter: Charles Givre Fix For: 1.22.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8472) Bump Image Metadata Library to Latest Version
Charles Givre created DRILL-8472: Summary: Bump Image Metadata Library to Latest Version Key: DRILL-8472 URL: https://issues.apache.org/jira/browse/DRILL-8472 Project: Apache Drill Issue Type: Task Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.2 Bump Metadata Extractor dependency to latest version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8471) Bump DeltaLake Driver to Version 3.0.0
Charles Givre created DRILL-8471: Summary: Bump DeltaLake Driver to Version 3.0.0 Key: DRILL-8471 URL: https://issues.apache.org/jira/browse/DRILL-8471 Project: Apache Drill Issue Type: Task Components: Format - DeltaLake Reporter: Charles Givre Bump DeltaLake Driver to Version 3.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8470) Bump MongoDB Driver to Latest Version
Charles Givre created DRILL-8470: Summary: Bump MongoDB Driver to Latest Version Key: DRILL-8470 URL: https://issues.apache.org/jira/browse/DRILL-8470 Project: Apache Drill Issue Type: Task Components: Storage - MongoDB Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.2 Bump mongoDB driver to latest version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8461) Prevent XXE Attacks in XML Format Plugin
Charles Givre created DRILL-8461: Summary: Prevent XXE Attacks in XML Format Plugin Key: DRILL-8461 URL: https://issues.apache.org/jira/browse/DRILL-8461 Project: Apache Drill Issue Type: Bug Components: Format - XML Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.22.0 Drill's XML reader would allow a maliciously crafted XML file to perform an _XML eXternal Entity injection_ (XXE) attack. This fix disables DTD parsing in the XML format plugin and prevents XXE attacks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8453) Add XSD Support to XML Reader (Part 1)
Charles Givre created DRILL-8453: Summary: Add XSD Support to XML Reader (Part 1) Key: DRILL-8453 URL: https://issues.apache.org/jira/browse/DRILL-8453 Project: Apache Drill Issue Type: Improvement Components: Format - XML Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.2 This PR is a part of a series to add better support for reading XML data to Drill. One of the main challenges is that XML data does not have a way of inferring data types, nor does it have a way of detecting arrays. The only way to do this really well is to have a schema. Some XML files link a schema definition file to the data. This PR adds the capability for Drill to map XSD schema files into Drill schemas. The current plan is as follows: Part 1 of this PR simply adds the reader but adds no new user detectable functionality. Part 2 will include the actual integration with the XML reader. Part 3 will include the ability to read arrays. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8450) Add Data Type Inference to XML Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8450: - Description: This PR adds data type inference to the XML format plugin. In similar fashion to other plugins, it adds a new configuration parameter: allTextMode, which when set to true, reads all data as strings. The default is true. Note that the inference is limited to doubles, date, timestamps, boolean and strings. > Add Data Type Inference to XML Format Plugin > > > Key: DRILL-8450 > URL: https://issues.apache.org/jira/browse/DRILL-8450 > Project: Apache Drill > Issue Type: Improvement > Components: Format - XML >Affects Versions: 1.21.1 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.22.0 > > > This PR adds data type inference to the XML format plugin. In similar > fashion to other plugins, it adds a new configuration parameter: allTextMode, > which when set to true, reads all data as strings. The default is true. > Note that the inference is limited to doubles, date, timestamps, boolean and > strings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8450) Add Data Type Inference to XML Format Plugin
Charles Givre created DRILL-8450: Summary: Add Data Type Inference to XML Format Plugin Key: DRILL-8450 URL: https://issues.apache.org/jira/browse/DRILL-8450 Project: Apache Drill Issue Type: Improvement Components: Format - XML Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.22.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8439) Getting col__ prefix for columns that are not special when extractHeader is enabled
[ https://issues.apache.org/jira/browse/DRILL-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728079#comment-17728079 ] Charles Givre commented on DRILL-8439: -- Can you please verify in the CSV file that the affected column doesn't have any other leading characters? Please check for carriage returns, and other invisible unicode characters. The fact that Drill is inserting an extra underscore leads me to believe there could be some extra garbage in that field. In any event, can't you just query this by giving it an alias? IE: {{SELECT `col__PRODUCTID_` AS product_id ...}} > Getting col__ prefix for columns that are not special when extractHeader is > enabled > --- > > Key: DRILL-8439 > URL: https://issues.apache.org/jira/browse/DRILL-8439 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, SQL Parser >Affects Versions: 1.21.0 > Environment: Enabled {{extractHeader}} in the csv config of dfs > plugin. > No. of drillbits: Single > OS: Windows >Reporter: Diksha Chaturvedi >Priority: Major > Labels: drill, extractHeader > > As per documentation, Drill appends col_ to the columns that start with a > number or special characters. > {code:java} > /** > * Prefix used to replace non-alphabetic characters at the start of > * a column name. For example, $foo becomes col_foo. Used > * because SQL does not allow _foo. > */ > public static final String COLUMN_PREFIX = "col_"; > {code} > But in my case I'm getting it even for all alphabetical column name. > > I have the following data in the CSV file, > ||PRODUCTID||PRODUCTNAME||SUPPLIERID||CATEGORYID||UNIT||PRICE|| > |1|Chais|1|1|10 boxes x 20 bags|18| > |2|Chang|1|1|24 - 12 oz bottles|19| > |3|Aniseed Syrup|1|2|12 - 550 ml bottles|10| > |4|Chef Anton's Cajun Seasoning|2|2|48 - 6 oz jars|22| > |5|Chef Anton's Gumbo Mix|2|2|36 boxes|21.35| > > While querying on the csv file using following query: > {code:sql} > SELECT * FROM dfs.`/var/lib/PRODUCT.csv`{code} > The output is > [!https://i.stack.imgur.com/FBNmn.png|width=611,height=130!|https://i.stack.imgur.com/FBNmn.png] > > I know about other criterias like > {{#UNITS}} is changed to {{col_UNITS}} > {{FINANCIAL$RECORD}} is changed to {{FINANCIAL_RECORD}} > But what's with {{{}PRODUCTID{}}}; Why is it changed to > {{col___PRODUCTID__}}? In this case it has appended extra underscores also. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8438) Bump YAUAA to 7.19.2
Charles Givre created DRILL-8438: Summary: Bump YAUAA to 7.19.2 Key: DRILL-8438 URL: https://issues.apache.org/jira/browse/DRILL-8438 Project: Apache Drill Issue Type: Task Components: Functions - Drill Reporter: Charles Givre Assignee: Niels Basjes Bump YAUAA to latest version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8437) Add Header Index Pagination
Charles Givre created DRILL-8437: Summary: Add Header Index Pagination Key: DRILL-8437 URL: https://issues.apache.org/jira/browse/DRILL-8437 Project: Apache Drill Issue Type: Improvement Components: Storage - HTTP Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.22.0 Some APIs include pagination fields in the HTTP response headers. This PR adds a new pagination method called Header Index which supports that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8434) Add Median Function
Charles Givre created DRILL-8434: Summary: Add Median Function Key: DRILL-8434 URL: https://issues.apache.org/jira/browse/DRILL-8434 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.22.0 Adds a median function to Drill. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8433) Add Percent Change UDF to Drill
Charles Givre created DRILL-8433: Summary: Add Percent Change UDF to Drill Key: DRILL-8433 URL: https://issues.apache.org/jira/browse/DRILL-8433 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.22.0 Adds a function to calculate the percent change between two columns. Doing this without a custom function is cumbersome because you have to include a check for division by zero. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8428) ElasticSearch Config Missing Getters
Charles Givre created DRILL-8428: Summary: ElasticSearch Config Missing Getters Key: DRILL-8428 URL: https://issues.apache.org/jira/browse/DRILL-8428 Project: Apache Drill Issue Type: Bug Reporter: Charles Givre Assignee: Charles Givre The ElasticSearch config was missing some getters and as a result, prevented users from setting certain config variables. This PR fixes this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (DRILL-8385) Add support for disabling SSL certificate verification in the Elasticsearch plugin
[ https://issues.apache.org/jira/browse/DRILL-8385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre reassigned DRILL-8385: Assignee: Charles Givre > Add support for disabling SSL certificate verification in the Elasticsearch > plugin > -- > > Key: DRILL-8385 > URL: https://issues.apache.org/jira/browse/DRILL-8385 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - ElasticSearch >Affects Versions: 1.20.3 >Reporter: James Turton >Assignee: Charles Givre >Priority: Minor > Fix For: Future > > > In Calcite, provide a custom TrustManager that trusts every certificate to > the ES RestClient builder in ElasticsearchSchemaFactory if a corresponding > config option has been set by application code. > In Drill, add a config option to the ES plugin allowing certificate > verification to be toggled and pass it through to the Calcite option > mentioned above. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (DRILL-4223) PIVOT and UNPIVOT to rotate table valued expressions
[ https://issues.apache.org/jira/browse/DRILL-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre closed DRILL-4223. > PIVOT and UNPIVOT to rotate table valued expressions > > > Key: DRILL-4223 > URL: https://issues.apache.org/jira/browse/DRILL-4223 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - Codegen, SQL Parser >Reporter: Ashwin Aravind >Priority: Major > Fix For: 1.21.0 > > > Capability to PIVOT and UNPIVOT table values expressions which are results of > a SELECT query -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-4223) PIVOT and UNPIVOT to rotate table valued expressions
[ https://issues.apache.org/jira/browse/DRILL-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-4223: - Fix Version/s: 1.21.0 > PIVOT and UNPIVOT to rotate table valued expressions > > > Key: DRILL-4223 > URL: https://issues.apache.org/jira/browse/DRILL-4223 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - Codegen, SQL Parser >Reporter: Ashwin Aravind >Priority: Major > Fix For: 1.21.0 > > > Capability to PIVOT and UNPIVOT table values expressions which are results of > a SELECT query -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (DRILL-4223) PIVOT and UNPIVOT to rotate table valued expressions
[ https://issues.apache.org/jira/browse/DRILL-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre resolved DRILL-4223. -- Resolution: Fixed Added in Drill 1.21. > PIVOT and UNPIVOT to rotate table valued expressions > > > Key: DRILL-4223 > URL: https://issues.apache.org/jira/browse/DRILL-4223 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - Codegen, SQL Parser >Reporter: Ashwin Aravind >Priority: Major > Fix For: 1.21.0 > > > Capability to PIVOT and UNPIVOT table values expressions which are results of > a SELECT query -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors
[ https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8417: - Description: If Drill encounters an Excel formula which is invalid somehow, such as a DIV/0, Drill is unable to proceed and throws a number format exception. This PR adds a config parameter called ignoreErrors which allows Drill to skip such records and returns null for that cell. Drill will also output a log warning. When set to false, original behavior is retained. > Allow Excel Reader to Ignore Formula Errors > --- > > Key: DRILL-8417 > URL: https://issues.apache.org/jira/browse/DRILL-8417 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Excel >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > If Drill encounters an Excel formula which is invalid somehow, such as a > DIV/0, Drill is unable to proceed and throws a number format exception. > This PR adds a config parameter called ignoreErrors which allows Drill to > skip such records and returns null for that cell. Drill will also output a > log warning. When set to false, original behavior is retained. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors
Charles Givre created DRILL-8417: Summary: Allow Excel Reader to Ignore Formula Errors Key: DRILL-8417 URL: https://issues.apache.org/jira/browse/DRILL-8417 Project: Apache Drill Issue Type: Improvement Components: Storage - Excel Affects Versions: 1.21.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8414) Index Paginator Not Working When Provided URL
Charles Givre created DRILL-8414: Summary: Index Paginator Not Working When Provided URL Key: DRILL-8414 URL: https://issues.apache.org/jira/browse/DRILL-8414 Project: Apache Drill Issue Type: Bug Components: Storage - HTTP Affects Versions: 1.21.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.1 The index paginator offers two options: One where the API returns an index or offset and the other is when it returns a URL. The second was not fully implemented. This PR also adds functionality in the case where the API returns a path rather than a URL. In that case, the path will replace the pre-existing path segments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8413) Add DNS Lookup Functions
Charles Givre created DRILL-8413: Summary: Add DNS Lookup Functions Key: DRILL-8413 URL: https://issues.apache.org/jira/browse/DRILL-8413 Project: Apache Drill Issue Type: New Feature Components: Functions - Drill Affects Versions: 1.21.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.22 This PR adds additional DNS lookup functions to Drill: -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8411) GoogleSheets Reader Will Not Read More than 1K Rows
Charles Givre created DRILL-8411: Summary: GoogleSheets Reader Will Not Read More than 1K Rows Key: DRILL-8411 URL: https://issues.apache.org/jira/browse/DRILL-8411 Project: Apache Drill Issue Type: Bug Components: Storage - GoogleSheets Affects Versions: 1.21.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.1 The GoogleSheets reader hits the batch limit from the GoogleSheets SDK of 1000 rows and stops. This PR fixes that. It also fixes a minor but annoying issue whereby the GoogleSheets reader determines a column is a date/time, but is then unable to parse it because it is in a non-standard format. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8408) Allow Implicit Casts on Join
Charles Givre created DRILL-8408: Summary: Allow Implicit Casts on Join Key: DRILL-8408 URL: https://issues.apache.org/jira/browse/DRILL-8408 Project: Apache Drill Issue Type: Improvement Components: Execution - Data Types Affects Versions: 1.21.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.1 Currently, Drill does not allow implicit casts on joins. With DRILL-8136, this has been significantly improved, and it might make sense to do so. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8407) Add Support for SFTP File Systems
Charles Givre created DRILL-8407: Summary: Add Support for SFTP File Systems Key: DRILL-8407 URL: https://issues.apache.org/jira/browse/DRILL-8407 Project: Apache Drill Issue Type: Improvement Components: Storage - File Affects Versions: 1.20.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: Future Add support for SFTP File Systems. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8402) Add REGEXP_EXTRACT Function
Charles Givre created DRILL-8402: Summary: Add REGEXP_EXTRACT Function Key: DRILL-8402 URL: https://issues.apache.org/jira/browse/DRILL-8402 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.21.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.0 This PR adds two UDFs to Drill: regexp_extract(, ) which returns an array of strings which were captured by capturing groups in the regex. regexp_extract(, , ) returns the text captured by a specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8399) MS Access Reader Misinterprets Data Types
Charles Givre created DRILL-8399: Summary: MS Access Reader Misinterprets Data Types Key: DRILL-8399 URL: https://issues.apache.org/jira/browse/DRILL-8399 Project: Apache Drill Issue Type: Bug Components: Format - MS Access Affects Versions: 1.21.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.0 The MS Access reader was assigning certain data types incorrectly, resulting in various errors. This minor PR fixes that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8395) Add Support for INSERT and Drop Table to GoogleSheets Plugin
Charles Givre created DRILL-8395: Summary: Add Support for INSERT and Drop Table to GoogleSheets Plugin Key: DRILL-8395 URL: https://issues.apache.org/jira/browse/DRILL-8395 Project: Apache Drill Issue Type: Improvement Components: Storage - GoogleSheets Affects Versions: 1.20.3 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.0 This PR adds support for INSERT queries which allow a user to append data to an existing GoogleSheets tab. It also: * Adds support for DROP TABLE queries which were not implemented * Modifies CTAS queries so that if a user executes a CTAS query with a file token, Drill will add a new tab to an existing document, but if the user executes a CTAS with a file name, it will create an entirely new document. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8392) Empty Tables Causes Index Out of Bounds Exception on PDF Reader
Charles Givre created DRILL-8392: Summary: Empty Tables Causes Index Out of Bounds Exception on PDF Reader Key: DRILL-8392 URL: https://issues.apache.org/jira/browse/DRILL-8392 Project: Apache Drill Issue Type: Bug Components: Format - PDF Affects Versions: 1.20.3 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8390) Minor Improvements to PDF Reader
Charles Givre created DRILL-8390: Summary: Minor Improvements to PDF Reader Key: DRILL-8390 URL: https://issues.apache.org/jira/browse/DRILL-8390 Project: Apache Drill Issue Type: Improvement Components: Format - PDF Reporter: Charles Givre Assignee: Charles Givre This PR makes some minor improvements to the PDF reader including: * Fixes a minor bug where certain configurations the first row of data was skipped * Fixes a minor bug where empty tables were causing crashes with the spreadsheet extraction algorithm was used * Adds a table_count metadata field * Adds a table_index metadata field to reflect the current table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8387) Add Support for User Translation to ElasticSearch Plugin
Charles Givre created DRILL-8387: Summary: Add Support for User Translation to ElasticSearch Plugin Key: DRILL-8387 URL: https://issues.apache.org/jira/browse/DRILL-8387 Project: Apache Drill Issue Type: Improvement Components: Storage - ElasticSearch Affects Versions: 1.20.3 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.0 Add support for user translation to ElasticSearch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8386) Add Support for User Translation for Cassandra
Charles Givre created DRILL-8386: Summary: Add Support for User Translation for Cassandra Key: DRILL-8386 URL: https://issues.apache.org/jira/browse/DRILL-8386 Project: Apache Drill Issue Type: Improvement Components: Storage - Cassandra Affects Versions: 1.20.3 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.0 Adds support for user translation to the Cassandra plugin. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8384) Add Format Plugin for Microsoft Access
Charles Givre created DRILL-8384: Summary: Add Format Plugin for Microsoft Access Key: DRILL-8384 URL: https://issues.apache.org/jira/browse/DRILL-8384 Project: Apache Drill Issue Type: Improvement Components: Format - MS Access Affects Versions: 1.21.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.0 Shockingly, MS Access is still in widespread use. This plugin enables Drill to read MS Access files. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-5033) Query on JSON that has null as value for each key
[ https://issues.apache.org/jira/browse/DRILL-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652916#comment-17652916 ] Charles Givre commented on DRILL-5033: -- [https://github.com/apache/drill/pull/2731] > Query on JSON that has null as value for each key > - > > Key: DRILL-5033 > URL: https://issues.apache.org/jira/browse/DRILL-5033 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Priority: Major > > Drill 1.9.0 git commit ID : 83513daf > Drill returns same result with or without `store.json.all_text_mode`=true > Note that each key in the JSON has null as its value. > [root@cent01 null_eq_joins]# cat right_all_nulls.json > { > "intKey" : null, > "bgintKey": null, > "strKey": null, > "boolKey": null, > "fltKey": null, > "dblKey": null, > "timKey": null, > "dtKey": null, > "tmstmpKey": null, > "intrvldyKey": null, > "intrvlyrKey": null > } > [root@cent01 null_eq_joins]# > Querying the above JSON file results in null as query result. > - We should see each of the keys in the JSON as a column in query result. > - And in each column the value should be a null value. > Current behavior does not look right. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`; > +---+ > | * | > +---+ > | null | > +---+ > 1 row selected (0.313 seconds) > {noformat} > Adding comment from [~julianhyde] > IMHO it is similar but not the same as DRILL-1256. Worth logging an issue and > let [~jnadeau] (or someone) put on the record what should be the behavior of > an empty record (empty JSON map) when it is top-level (as in this case) or in > a collection. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8376) Add Distribution UDFs
Charles Givre created DRILL-8376: Summary: Add Distribution UDFs Key: DRILL-8376 URL: https://issues.apache.org/jira/browse/DRILL-8376 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.21 Reporter: Charles Givre Assignee: Charles Givre Add `width_bucket`, `pearson_correlation` and `kendall_correlation` to Drill -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (DRILL-7554) Convert LTSV Format Plugin to EVF2
[ https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre closed DRILL-7554. Resolution: Duplicate > Convert LTSV Format Plugin to EVF2 > -- > > Key: DRILL-7554 > URL: https://issues.apache.org/jira/browse/DRILL-7554 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text CSV >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (DRILL-8179) Convert LTSV Format Plugin to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre reassigned DRILL-8179: Assignee: Charles Givre > Convert LTSV Format Plugin to EVF2 > -- > > Key: DRILL-8179 > URL: https://issues.apache.org/jira/browse/DRILL-8179 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: Charles Givre >Priority: Major > Fix For: 2.0.0 > > > Get authorized by Charles, continue the conversion from LTSV to EVF2 directly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (DRILL-8198) XML EVF2 reader provideSchema usage
[ https://issues.apache.org/jira/browse/DRILL-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre resolved DRILL-8198. -- Resolution: Fixed > XML EVF2 reader provideSchema usage > --- > > Key: DRILL-8198 > URL: https://issues.apache.org/jira/browse/DRILL-8198 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - XML >Affects Versions: 1.20.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 2.0.0 > > > XMLBatchReader is converted to EVF2 reader, but not used provideSchema for > Schema Provision feature -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-7554) Convert LTSV Format Plugin to EVF2
[ https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-7554: - Summary: Convert LTSV Format Plugin to EVF2 (was: Convert LTSV Format Plugin to EVF) > Convert LTSV Format Plugin to EVF2 > -- > > Key: DRILL-7554 > URL: https://issues.apache.org/jira/browse/DRILL-7554 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text CSV >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8371) Add Write/Append Capability to Splunk Plugin
Charles Givre created DRILL-8371: Summary: Add Write/Append Capability to Splunk Plugin Key: DRILL-8371 URL: https://issues.apache.org/jira/browse/DRILL-8371 Project: Apache Drill Issue Type: Improvement Components: Storage - Splunk Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 While Drill can currently read from Splunk indexes, it cannot write to them or create them. This proposed PR adds support for CTAS queries for Splunk as well as INSERT and DROP TABLE. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8365) HTTP Plugin Places Parameters in Wrong Place
Charles Givre created DRILL-8365: Summary: HTTP Plugin Places Parameters in Wrong Place Key: DRILL-8365 URL: https://issues.apache.org/jira/browse/DRILL-8365 Project: Apache Drill Issue Type: Bug Components: Storage - HTTP Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.20.3 When the requireTail option is set to true, and pagination is enabled, the HTTP plugin puts the required parameters in the wrong place in the URL. This PR fixes that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8364) Add Support for OAuth Enabled File Systems
Charles Givre created DRILL-8364: Summary: Add Support for OAuth Enabled File Systems Key: DRILL-8364 URL: https://issues.apache.org/jira/browse/DRILL-8364 Project: Apache Drill Issue Type: Improvement Components: Storage - File Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 Currently Drill supports reading from file systems such as HDFS, S3 and others that use token based authentication. This PR extends Drill's plugin architecture so that Drill can connect with other file systems which use OAuth 2.0 for authentication. This PR also adds support for Drill to query Box. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8360) Add Provided Schema for XML Reader
Charles Givre created DRILL-8360: Summary: Add Provided Schema for XML Reader Key: DRILL-8360 URL: https://issues.apache.org/jira/browse/DRILL-8360 Project: Apache Drill Issue Type: Improvement Components: Format - XML Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 The XML reader does not support provisioned schema. This PR adds that support. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8356) Add File Name to GoogleSheets Plugin
Charles Givre created DRILL-8356: Summary: Add File Name to GoogleSheets Plugin Key: DRILL-8356 URL: https://issues.apache.org/jira/browse/DRILL-8356 Project: Apache Drill Issue Type: Improvement Components: Storage - GoogleSheets Affects Versions: 2.0.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 GoogleSheets uses tokens to identify the individual files. These tokens are not human readable and will make it difficult for a user to know which file they are accessing. This PR adds a metadata field called `_title` which identifies the document they are working with. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8354) Add IS_EMPTY Function.
Charles Givre created DRILL-8354: Summary: Add IS_EMPTY Function. Key: DRILL-8354 URL: https://issues.apache.org/jira/browse/DRILL-8354 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 When analyzing data, there is currently no single function to evaluate whether a given field is empty. With scalar fields, this can be accomplished with the `IS NOT NULL` operator, but with complex fields, this is more challenging as complex fields are never null. This PR adds a UDF called IS_EMPTY() which accepts any type of field and returns true if the field does not contain data. In the case of scalar fields, if the field is `null` this returns true. In the case of complex fields, which can never be `null`, in the case of lists, the function returns true if the list is empty. In the case of maps, it returns true if all of the map's fields are unpopulated. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8350) Convert PCAP Format Plugin to EVF2
Charles Givre created DRILL-8350: Summary: Convert PCAP Format Plugin to EVF2 Key: DRILL-8350 URL: https://issues.apache.org/jira/browse/DRILL-8350 Project: Apache Drill Issue Type: Task Components: Format - PCAP Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 Convert the PCAP format plugin to EVF2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8349) GoogleSheets Not Registering Schemas with Non Default Name
Charles Givre created DRILL-8349: Summary: GoogleSheets Not Registering Schemas with Non Default Name Key: DRILL-8349 URL: https://issues.apache.org/jira/browse/DRILL-8349 Project: Apache Drill Issue Type: Bug Components: Storage - GoogleSheets Affects Versions: 2.0.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 GoogleSheets plugin fails to register plugin instances with names other than `GoogleSheets`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8342) Add Automatic Retry for Rate Limited APIs
Charles Givre created DRILL-8342: Summary: Add Automatic Retry for Rate Limited APIs Key: DRILL-8342 URL: https://issues.apache.org/jira/browse/DRILL-8342 Project: Apache Drill Issue Type: Improvement Components: Storage - HTTP Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 Many APIs have a burst limit for number of requests. This PR adds a retry capability to the HTTP Storage Plugin, whereby if a 429 response code is received, Drill will wait a configurable amount of time, and retry the request once. To prevent runaway pagination, this retry will only happen once per request. This PR adds a new configuration option called retryDelay which is the number of milliseconds that Drill should wait between retrys. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8341) Add Scanned Plugin List to Sys Profiles Table
Charles Givre created DRILL-8341: Summary: Add Scanned Plugin List to Sys Profiles Table Key: DRILL-8341 URL: https://issues.apache.org/jira/browse/DRILL-8341 Project: Apache Drill Issue Type: Improvement Components: Execution - Monitoring Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 In DRILL-8322, [~dzamo] added the list of scanned plugins to the query profiles. This information is extremely useful in query analysis. This minor PR adds this same information to the sys.profiles table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8340) Add Additional Date Manipulation Functions (Part 1)
Charles Givre created DRILL-8340: Summary: Add Additional Date Manipulation Functions (Part 1) Key: DRILL-8340 URL: https://issues.apache.org/jira/browse/DRILL-8340 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 This PR adds several utility functions to facilitate working with dates and times. These are modeled after the date/time functionality in MySQL. Specifically this adds: * YEARWEEK(): Returns an int of year week. IE (202002) * TIME_STAMP(): Converts most anything that looks like a date string into a timestamp. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (DRILL-8328) HTTP UDF Not Resolving Storage Aliases
[ https://issues.apache.org/jira/browse/DRILL-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre closed DRILL-8328. > HTTP UDF Not Resolving Storage Aliases > -- > > Key: DRILL-8328 > URL: https://issues.apache.org/jira/browse/DRILL-8328 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HTTP >Affects Versions: 1.20.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Blocker > Fix For: 1.20.3 > > > The http_request function currently does not resolve plugin aliases > correctly. This PR fixes that issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8335) Add Ability to Query GoogleSheets Tabs by Index
Charles Givre created DRILL-8335: Summary: Add Ability to Query GoogleSheets Tabs by Index Key: DRILL-8335 URL: https://issues.apache.org/jira/browse/DRILL-8335 Project: Apache Drill Issue Type: Improvement Components: Storage - GoogleSheets Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 The GoogleSheets plugin does not provide a way for a user to query data if they do not know the available tab names. This adds the ability to query by index of the tabs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8333) Fix Resource Leaks in HTTP Plugin
Charles Givre created DRILL-8333: Summary: Fix Resource Leaks in HTTP Plugin Key: DRILL-8333 URL: https://issues.apache.org/jira/browse/DRILL-8333 Project: Apache Drill Issue Type: Bug Components: Storage - HTTP Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.20.3 The HTTP plugin has several methods which collect a `ResponseBody` object but do not close these objects. This is causing a resource leak and will cause Drill to fail in the event that queries fire off many API calls. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8330) Convert ESRI Shape File Reader to EVF2
Charles Givre created DRILL-8330: Summary: Convert ESRI Shape File Reader to EVF2 Key: DRILL-8330 URL: https://issues.apache.org/jira/browse/DRILL-8330 Project: Apache Drill Issue Type: Task Components: Format - ESRI Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 Converts the ESRI Shape File reader to EVF V2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8329) Close HTTP Caching Resources
Charles Givre created DRILL-8329: Summary: Close HTTP Caching Resources Key: DRILL-8329 URL: https://issues.apache.org/jira/browse/DRILL-8329 Project: Apache Drill Issue Type: Bug Components: Storage - HTTP Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.20.3 The HTTP plugin has the ability to cache API responses. However, the storage plugin was not closing the connection to the file cache. This minor PR fixes that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8328) HTTP UDF Not Resolving Storage Aliases
Charles Givre created DRILL-8328: Summary: HTTP UDF Not Resolving Storage Aliases Key: DRILL-8328 URL: https://issues.apache.org/jira/browse/DRILL-8328 Project: Apache Drill Issue Type: Bug Components: Storage - HTTP Affects Versions: 1.20.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.20.3 The http_request function currently does not resolve plugin aliases correctly. This PR fixes that issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8327) GoogleSheets not Reporting Schemata to Info_Schema
Charles Givre created DRILL-8327: Summary: GoogleSheets not Reporting Schemata to Info_Schema Key: DRILL-8327 URL: https://issues.apache.org/jira/browse/DRILL-8327 Project: Apache Drill Issue Type: Bug Components: Storage - GoogleSheets Affects Versions: 2.0.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 The GoogleSheets (GS) plugin was not reporting the available documents to the info schema. This PR makes some modifications so that users can determine which documents are available via the information schema. The GS plugin does not report the tabs as tables to the information schema because that can cause Drill to exceed Google's rate quota. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8325) Convert PDF Format Plugin to EVF V2
Charles Givre created DRILL-8325: Summary: Convert PDF Format Plugin to EVF V2 Key: DRILL-8325 URL: https://issues.apache.org/jira/browse/DRILL-8325 Project: Apache Drill Issue Type: Task Components: Format - PDF Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 Converts the PDF Format Reader to EVF V2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8320) Prevent Infinite Pagination for Index Paginator
Charles Givre created DRILL-8320: Summary: Prevent Infinite Pagination for Index Paginator Key: DRILL-8320 URL: https://issues.apache.org/jira/browse/DRILL-8320 Project: Apache Drill Issue Type: Bug Components: Storage - HTTP Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 In some cases that use keyset/index pagination, if the API does not have a boolean column that indicates when to stop, Drill will send requests until the API stops returning data. This PR fixes this by making the boolean parameter optional. If that parameter is not present, if the index result is blank or the same as the previous request, pagination will end. Note, if the pagination parameters are buried in nested objects, this cannot be configured with a dataPath. If the user uses a dataPath, pagination will stop at the first page. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (DRILL-8317) Convert LogRegex Format Plugin to EVF V2
[ https://issues.apache.org/jira/browse/DRILL-8317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre resolved DRILL-8317. -- Resolution: Done > Convert LogRegex Format Plugin to EVF V2 > > > Key: DRILL-8317 > URL: https://issues.apache.org/jira/browse/DRILL-8317 > Project: Apache Drill > Issue Type: Task > Components: Format - Log Reader >Affects Versions: 1.20.2 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 2.0.0 > > > Converts the existing logRegex reader to EVF V2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8318) httpd format parser throws exception on log item with malformed query string
[ https://issues.apache.org/jira/browse/DRILL-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609078#comment-17609078 ] Charles Givre commented on DRILL-8318: -- [~nielsbasjes], could you take a look? > httpd format parser throws exception on log item with malformed query string > > > Key: DRILL-8318 > URL: https://issues.apache.org/jira/browse/DRILL-8318 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.19.0 > Environment: drill-embedded > openjdk version "1.8.0_342" > OpenJDK Runtime Environment Corretto-8.342.07.1 (build 1.8.0_342-b07) > OpenJDK 64-Bit Server VM Corretto-8.342.07.1 (build 25.342-b07, mixed mode) > Ubuntu 20.04.4 LTS (Focal Fossa) > Running under WSL on Windows 11 >Reporter: Richard Downer >Priority: Major > Attachments: testcase > > > I am running Apache Drill over my httpd-style access logs. These are > collecting data from requests on the open Internet, which sometimes means > questionable requests made by remote Internet users (sometimes with hostile > intent). > One such style of request looks like this: > {{151.236.216.243 - - [15/Sep/2022:20:18:07 +] "GET > /?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 301 178 "-" > "curl/7.54.0"}} > I have put this request into a new log file containing only this line, as a > test case. I initiate a query: > {{select request_receive_time, request_status_last, request_firstline_method, > request_firstline_uri from > table(dfs.`/home/richard/drill/access-logs/nginx/testcase`(type=>'httpd', > logFormat=>'combined')) where request_status_last = 404;}} > This produces this error: > {{Error: DATA_READ ERROR: Error reading HTTPD file at line number 0}} > {{Error occurred during setter call: null caused by > "java.lang.StringIndexOutOfBoundsException: String index out of range: -1" > when calling "public void > org.apache.drill.exec.store.httpd.HttpdLogRecord.setWildcard(java.lang.String,java.lang.String)" > for key = "STRING:request.firstline.uri.query.*" name = > "STRING:request.firstline.uri.query" value = "Value\{filled=STRING, > s='PHPE9568F36-D428-11d2-A769-00AA001ACF42', l=null, d=null}" castsTo = > "[STRING]"}} > {{Format plugin: httpd}} > {{Format plugin: HttpdLogFormatPlugin}} > {{Plugin config name: null}} > {{Fragment: 0:0}} > While I appreciate that the query string part of the request is probably > malformed according to a strict interpretation, this is a real request seen > "in the wild" and I would prefer that Drill is robust enough to deal with the > type of garbage requests frequently seen on real web server. > Thank you for your assistance - if I can provide any more information that > would help please let me know! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8317) Convert LogRegex Format Plugin to EVF V2
Charles Givre created DRILL-8317: Summary: Convert LogRegex Format Plugin to EVF V2 Key: DRILL-8317 URL: https://issues.apache.org/jira/browse/DRILL-8317 Project: Apache Drill Issue Type: Task Components: Format - Log Reader Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 Converts the existing logRegex reader to EVF V2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8241) Remove Deprecated JSON Reader
[ https://issues.apache.org/jira/browse/DRILL-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8241: - Description: This is a master ticket to remove the deprecated v1 JSON reader from Drill. This JSON reader is used in several places and removing it will ensure consistent behavior across all data sources. The V2, EVF based JSON reader has several advantages, including the possibility of schema provisioning, limit pushdowns and others. Here are the tasks which need to be completed to fully remove the v1 JSON reader. * Complete DRILL-5955 which adds support for the UNION vector to the EVF Json reader. * Convert the convert_fromJSON functions to V2 (DRILL-8239) * Convert the Druid Storage Plugin to V2 (DRILL-8316) * Convert MongoDB Storage Plugin to V2. (Note the MongoDB plugin uses an EVF-based BSON reader as well as the V1 JSON reader) * Remove all V1-based unit tests * Migrate the JsonOptions from the HTTP Storage Plugin to global location to allow other plugins and users of JSON to set JSON configuration at a more granular level. (DRILL-8243) * Remove extraneous configuration options. * Bug fix HTTP UDFs (DRILL-8242) was: This is a master ticket to remove the deprecated v1 JSON reader from Drill. This JSON reader is used in several places and removing it will ensure consistent behavior across all data sources. The V2, EVF based JSON reader has several advantages, including the possibility of schema provisioning, limit pushdowns and others. Here are the tasks which need to be completed to fully remove the v1 JSON reader. * Complete DRILL-5955 which adds support for the UNION vector to the EVF Json reader. * Convert the convert_fromJSON functions to V2 (DRILL-8239) * Convert the Druid Storage Plugin to V2 * Convert MongoDB Storage Plugin to V2. (Note the MongoDB plugin uses an EVF-based BSON reader as well as the V1 JSON reader) * Remove all V1-based unit tests * Migrate the JsonOptions from the HTTP Storage Plugin to global location to allow other plugins and users of JSON to set JSON configuration at a more granular level. (DRILL-8243) * Remove extraneous configuration options. * Bug fix HTTP UDFs (DRILL-8242) > Remove Deprecated JSON Reader > - > > Key: DRILL-8241 > URL: https://issues.apache.org/jira/browse/DRILL-8241 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.20.1 >Reporter: Charles Givre >Priority: Major > Fix For: 2.0.0 > > > This is a master ticket to remove the deprecated v1 JSON reader from Drill. > This JSON reader is used in several places and removing it will ensure > consistent behavior across all data sources. > The V2, EVF based JSON reader has several advantages, including the > possibility of schema provisioning, limit pushdowns and others. > Here are the tasks which need to be completed to fully remove the v1 JSON > reader. > * Complete DRILL-5955 which adds support for the UNION vector to the EVF > Json reader. > * Convert the convert_fromJSON functions to V2 (DRILL-8239) > * Convert the Druid Storage Plugin to V2 (DRILL-8316) > * Convert MongoDB Storage Plugin to V2. (Note the MongoDB plugin uses an > EVF-based BSON reader as well as the V1 JSON reader) > * Remove all V1-based unit tests > * Migrate the JsonOptions from the HTTP Storage Plugin to global location to > allow other plugins and users of JSON to set JSON configuration at a more > granular level. (DRILL-8243) > * Remove extraneous configuration options. > * Bug fix HTTP UDFs (DRILL-8242) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8316) Convert Druid Storage Plugin to EVF & V2 JSON Reader
Charles Givre created DRILL-8316: Summary: Convert Druid Storage Plugin to EVF & V2 JSON Reader Key: DRILL-8316 URL: https://issues.apache.org/jira/browse/DRILL-8316 Project: Apache Drill Issue Type: Improvement Components: Storage - Druid Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8315) Convert SAS Format Plugin to EVF V2
Charles Givre created DRILL-8315: Summary: Convert SAS Format Plugin to EVF V2 Key: DRILL-8315 URL: https://issues.apache.org/jira/browse/DRILL-8315 Project: Apache Drill Issue Type: Improvement Components: Format - SAS Affects Versions: 1.20.2, 1.20.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 Convert the SAS Format Plugin to EVF V2. No user facing changes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8312) Convert Format Plugins to EVF V2
Charles Givre created DRILL-8312: Summary: Convert Format Plugins to EVF V2 Key: DRILL-8312 URL: https://issues.apache.org/jira/browse/DRILL-8312 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.20.2 Reporter: Charles Givre Fix For: 2.0.0 This is a blanket ticket to convert all format plugins to EVF V2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (DRILL-8159) Upgrade HTTPD reader to use EVF V2
[ https://issues.apache.org/jira/browse/DRILL-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre resolved DRILL-8159. -- Resolution: Done > Upgrade HTTPD reader to use EVF V2 > -- > > Key: DRILL-8159 > URL: https://issues.apache.org/jira/browse/DRILL-8159 > Project: Apache Drill > Issue Type: New Feature >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > Continuation of work originally in the DRILL-8085 PR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8311) Convert SPSS Format Plugin to EVF V2
Charles Givre created DRILL-8311: Summary: Convert SPSS Format Plugin to EVF V2 Key: DRILL-8311 URL: https://issues.apache.org/jira/browse/DRILL-8311 Project: Apache Drill Issue Type: Improvement Components: Storage - SPSS Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 This PR converts the SPSS format plugin to use EVF V2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8310) Convert Syslog Format to EVF V2
Charles Givre created DRILL-8310: Summary: Convert Syslog Format to EVF V2 Key: DRILL-8310 URL: https://issues.apache.org/jira/browse/DRILL-8310 Project: Apache Drill Issue Type: Improvement Components: Storage - Syslog Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 This PR proposes to convert the syslog to use EVF V2. No user facing changes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8307) Druid storage plugin's use of Apache HttpClient is not thread safe
[ https://issues.apache.org/jira/browse/DRILL-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605352#comment-17605352 ] Charles Givre commented on DRILL-8307: -- [~dzamo] I don't know if you're planning on taking this one, but I had two thoughts here: # Would it be worth looking to see where else in Drill we use the Apache httpclient and switch them all over to OkHttp? # I started a branch ([https://github.com/cgivre/drill/tree/druid_evf)] which converts the Druid plugin to EVF. It was almost done. The remaining parts to finish were all in the ScanBatchCreator. Would you want to incorporate this work as well? > Druid storage plugin's use of Apache HttpClient is not thread safe > -- > > Key: DRILL-8307 > URL: https://issues.apache.org/jira/browse/DRILL-8307 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.20.2 >Reporter: James Turton >Priority: Major > Fix For: 1.20.3 > > > When multiple concurrent queries are run against a single Druid storage > plugin then an error such as is shown below is reported by the Apache > HttpClient used in that plugin. The Druid storage plugin uses a single static > HttpClient instance which should be replaced with something like > PoolingHttpClientConnectionManager for multithreaded access. > [1cdd2b75-1310---5a638567ed07:foreman] INFO > o.a.d.e.s.d.s.DruidSchemaFactory > - User Error Occurred: Failure while loading druid datasources for database > 'druid-egsmd300'. (Invalid use of BasicClientConnManager: connection still > allocated. > Make sure to release the connection before allocating another one.) > org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Failure > while loading druid datasources for database ''. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (DRILL-8289) Add Threat Hunting Functions
[ https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre resolved DRILL-8289. -- Resolution: Done > Add Threat Hunting Functions > > > Key: DRILL-8289 > URL: https://issues.apache.org/jira/browse/DRILL-8289 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Affects Versions: 2.0.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 2.0.0 > > > # Threat Hunting Functions > These functions are useful for doing threat hunting with Apache Drill. These > were inspired by huntlib.[1] > The functions are: > * `punctuation_pattern()`: Extracts the pattern of punctuation in > text. > * `entropy()`: This function calculates the Shannon Entropy of a > given string of text. > * `entropyPerByte()`: This function calculates the Shannon Entropy of > a given string of text, normed for the string length. > [1]: https://github.com/target/huntlib -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8305) Add Implicit Fields to Google Sheets Reader
Charles Givre created DRILL-8305: Summary: Add Implicit Fields to Google Sheets Reader Key: DRILL-8305 URL: https://issues.apache.org/jira/browse/DRILL-8305 Project: Apache Drill Issue Type: Improvement Components: Storage - GoogleSheets Affects Versions: 2.0.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 GoogleSheets needs additional metadata fields to access the available data. This PR adds framework for implicit metadata fields. This PR also adds the _sheets field which lists the available tabs within a Google Sheets document. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8291) Allow case sensitive Filters in HTTP Plugin
Charles Givre created DRILL-8291: Summary: Allow case sensitive Filters in HTTP Plugin Key: DRILL-8291 URL: https://issues.apache.org/jira/browse/DRILL-8291 Project: Apache Drill Issue Type: Bug Components: Storage - HTTP Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.20.3 Some APIs will reject filter pushdowns if they are not in the correct case. This PR adds a config option `caseSensitiveFilters` to the API config and when set to true, preserves the case of the filters pushed down. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8289) Add Threat Hunting Functions
Charles Givre created DRILL-8289: Summary: Add Threat Hunting Functions Key: DRILL-8289 URL: https://issues.apache.org/jira/browse/DRILL-8289 Project: Apache Drill Issue Type: New Feature Components: Functions - Drill Affects Versions: 2.0.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 # Threat Hunting Functions These functions are useful for doing threat hunting with Apache Drill. These were inspired by huntlib.[1] The functions are: * `punctuation_pattern()`: Extracts the pattern of punctuation in text. * `entropy()`: This function calculates the Shannon Entropy of a given string of text. * `entropyPerByte()`: This function calculates the Shannon Entropy of a given string of text, normed for the string length. [1]: https://github.com/target/huntlib -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8288) Null Columns not being Written to GoogleSheets
Charles Givre created DRILL-8288: Summary: Null Columns not being Written to GoogleSheets Key: DRILL-8288 URL: https://issues.apache.org/jira/browse/DRILL-8288 Project: Apache Drill Issue Type: Bug Components: Storage - GoogleSheets Affects Versions: 2.0.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 When writing to GoogleSheets, null columns are not written which causes wrong data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8287) Add Support for Keyset Based Pagination
Charles Givre created DRILL-8287: Summary: Add Support for Keyset Based Pagination Key: DRILL-8287 URL: https://issues.apache.org/jira/browse/DRILL-8287 Project: Apache Drill Issue Type: New Feature Components: Storage - HTTP Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 Some APIs such as HubSpot use values in the result set to indicate whether there are additional pages. This PR adds support for this kind of pagination. Note that current implementation only works for JSON based APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8286) GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config
[ https://issues.apache.org/jira/browse/DRILL-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8286: - Component/s: Storage - GoogleSheets > GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config > - > > Key: DRILL-8286 > URL: https://issues.apache.org/jira/browse/DRILL-8286 > Project: Apache Drill > Issue Type: Bug > Components: Storage - GoogleSheets >Affects Versions: 2.0.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 2.0.0 > > > The GoogleSheets storage plugin is rendering the `clientID` and > `clientSecret` in the config body instead of in the credential provider. > This minor PR fixes that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8286) GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config
[ https://issues.apache.org/jira/browse/DRILL-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8286: - Affects Version/s: 2.0.0 > GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config > - > > Key: DRILL-8286 > URL: https://issues.apache.org/jira/browse/DRILL-8286 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 2.0.0 > > > The GoogleSheets storage plugin is rendering the `clientID` and > `clientSecret` in the config body instead of in the credential provider. > This minor PR fixes that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8286) GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config
[ https://issues.apache.org/jira/browse/DRILL-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8286: - Fix Version/s: 2.0.0 > GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config > - > > Key: DRILL-8286 > URL: https://issues.apache.org/jira/browse/DRILL-8286 > Project: Apache Drill > Issue Type: Bug >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 2.0.0 > > > The GoogleSheets storage plugin is rendering the `clientID` and > `clientSecret` in the config body instead of in the credential provider. > This minor PR fixes that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8286) GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config
Charles Givre created DRILL-8286: Summary: GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config Key: DRILL-8286 URL: https://issues.apache.org/jira/browse/DRILL-8286 Project: Apache Drill Issue Type: Bug Reporter: Charles Givre Assignee: Charles Givre The GoogleSheets storage plugin is rendering the `clientID` and `clientSecret` in the config body instead of in the credential provider. This minor PR fixes that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (DRILL-8284) Apache SQL Query failing while accessing the Json with complex data model
[ https://issues.apache.org/jira/browse/DRILL-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre closed DRILL-8284. Resolution: Not A Bug > Apache SQL Query failing while accessing the Json with complex data model > - > > Key: DRILL-8284 > URL: https://issues.apache.org/jira/browse/DRILL-8284 > Project: Apache Drill > Issue Type: Bug >Reporter: SHUBHAM KUMAR >Priority: Major > > Apache SQL Query failing while accessing the Json with complex data model. > Complex Json: > Map object inside another map object then Array Object. > Case1: When we have nested objects within array map, and map within map. > {"attributes": [ > { > "name": "webBrandName", > "value": { > "en-US": "Smashbox" > } > }, > { > "name": "startDate", > "value": "2011-07-25T15:30:00.000Z" > } > ] > } > Case2: Having array with multiple map items with diff data types. eg. String > and Boolean both type. > {"attributes": [ > { > "name": "startDate", > "value": "2011-07-25T15:30:00.000Z" > }, > { > "name": "hasCBD", > "value": false > } > ] > } > Query: > select flatten(attributes) as Var from dfs.`/filepath/filename.json` > > Error: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IndexOutOfBoundsException: readerIndex: 0, writerIndex: 1764642048 (expected: > 0 <= readerIndex <= writerIndex <= capacity(0)) Fragment: 0:0 Please, refer > to logs for more information. [Error Id: c5a3b8fa-cad1-4c9a-8673-de5745e9170b > on GGNUWT461535L.ad.infosys.com:31010] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8284) Apache SQL Query failing while accessing the Json with complex data model
[ https://issues.apache.org/jira/browse/DRILL-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583946#comment-17583946 ] Charles Givre commented on DRILL-8284: -- [~shubhamsmvdu] This is normal behavior for Drill. The issue you are encountering is a schema change exception on the `value` field. In both cases, what is happening is that Drill first encounters one data type and creates a vector for that, then in the next row, encounters the same field but in a different data type and throws an exception. The are a few options: # If you use the v1 JSON reader, you can enable the UNION data type which allows heterogeneous data types. We are working on enabling this for the V2 JSON reader, but for the moment, it is not. This is a variable which must be set at the system level. # Provide a schema: You can provide a schema for the field `value` and set `mode` to JSON. I'd have to dig up the documentation for this but what this does is force the field to a string. If JSON objects are encountered, those will be rendered as a string. I'm going to close this as this is expected behavior. Please use github issues or slack to continue the conversation. > Apache SQL Query failing while accessing the Json with complex data model > - > > Key: DRILL-8284 > URL: https://issues.apache.org/jira/browse/DRILL-8284 > Project: Apache Drill > Issue Type: Bug >Reporter: SHUBHAM KUMAR >Priority: Major > > Apache SQL Query failing while accessing the Json with complex data model. > Complex Json: > Map object inside another map object then Array Object. > Case1: When we have nested objects within array map, and map within map. > {"attributes": [ > { > "name": "webBrandName", > "value": { > "en-US": "Smashbox" > } > }, > { > "name": "startDate", > "value": "2011-07-25T15:30:00.000Z" > } > ] > } > Case2: Having array with multiple map items with diff data types. eg. String > and Boolean both type. > {"attributes": [ > { > "name": "startDate", > "value": "2011-07-25T15:30:00.000Z" > }, > { > "name": "hasCBD", > "value": false > } > ] > } > Query: > select flatten(attributes) as Var from dfs.`/filepath/filename.json` > > Error: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IndexOutOfBoundsException: readerIndex: 0, writerIndex: 1764642048 (expected: > 0 <= readerIndex <= writerIndex <= capacity(0)) Fragment: 0:0 Please, refer > to logs for more information. [Error Id: c5a3b8fa-cad1-4c9a-8673-de5745e9170b > on GGNUWT461535L.ad.infosys.com:31010] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8276) Add Support for User Translation for Splunk
Charles Givre created DRILL-8276: Summary: Add Support for User Translation for Splunk Key: DRILL-8276 URL: https://issues.apache.org/jira/browse/DRILL-8276 Project: Apache Drill Issue Type: Task Components: Storage - Other Affects Versions: 1.20.2 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 This PR adds support for user translation to Splunk. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8271) Make Storage and Format Config Case Insensitive
Charles Givre created DRILL-8271: Summary: Make Storage and Format Config Case Insensitive Key: DRILL-8271 URL: https://issues.apache.org/jira/browse/DRILL-8271 Project: Apache Drill Issue Type: Task Reporter: Charles Givre -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (DRILL-8270) Delete absolete zookeeper patch (tech debt)
[ https://issues.apache.org/jira/browse/DRILL-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre reassigned DRILL-8270: Assignee: Charles Givre > Delete absolete zookeeper patch (tech debt) > --- > > Key: DRILL-8270 > URL: https://issues.apache.org/jira/browse/DRILL-8270 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.20.1 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Minor > Fix For: 2.0.0 > > > Patch files are in the `.gitignore` and yet a .patch file > ([contrib/native/client/patches/zookeeper-3.4.6-x64.patch|https://github.com/apache/drill/pull/2585/files/06625708f0419442d823d0025afa6e043fffcc4e#diff-0b6d0330fc567658b83263c83e902ec72dc0e95bb0ad0830736dc5cae8449168]) > somehow has been included in the Drill build. This PR removes it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8270) Delete absolete zookeeper patch (tech debt)
[ https://issues.apache.org/jira/browse/DRILL-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8270: - Affects Version/s: 1.20.1 > Delete absolete zookeeper patch (tech debt) > --- > > Key: DRILL-8270 > URL: https://issues.apache.org/jira/browse/DRILL-8270 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.20.1 >Reporter: Charles Givre >Priority: Minor > > Patch files are in the `.gitignore` and yet a .patch file > ([contrib/native/client/patches/zookeeper-3.4.6-x64.patch|https://github.com/apache/drill/pull/2585/files/06625708f0419442d823d0025afa6e043fffcc4e#diff-0b6d0330fc567658b83263c83e902ec72dc0e95bb0ad0830736dc5cae8449168]) > somehow has been included in the Drill build. This PR removes it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8270) Delete absolete zookeeper patch (tech debt)
[ https://issues.apache.org/jira/browse/DRILL-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8270: - Description: Patch files are in the `.gitignore` and yet a .patch file ([contrib/native/client/patches/zookeeper-3.4.6-x64.patch|https://github.com/apache/drill/pull/2585/files/06625708f0419442d823d0025afa6e043fffcc4e#diff-0b6d0330fc567658b83263c83e902ec72dc0e95bb0ad0830736dc5cae8449168]) somehow has been included in the Drill build. This PR removes it. > Delete absolete zookeeper patch (tech debt) > --- > > Key: DRILL-8270 > URL: https://issues.apache.org/jira/browse/DRILL-8270 > Project: Apache Drill > Issue Type: Task >Reporter: Charles Givre >Priority: Minor > > Patch files are in the `.gitignore` and yet a .patch file > ([contrib/native/client/patches/zookeeper-3.4.6-x64.patch|https://github.com/apache/drill/pull/2585/files/06625708f0419442d823d0025afa6e043fffcc4e#diff-0b6d0330fc567658b83263c83e902ec72dc0e95bb0ad0830736dc5cae8449168]) > somehow has been included in the Drill build. This PR removes it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8270) Delete absolete zookeeper patch (tech debt)
[ https://issues.apache.org/jira/browse/DRILL-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8270: - Fix Version/s: 2.0.0 > Delete absolete zookeeper patch (tech debt) > --- > > Key: DRILL-8270 > URL: https://issues.apache.org/jira/browse/DRILL-8270 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.20.1 >Reporter: Charles Givre >Priority: Minor > Fix For: 2.0.0 > > > Patch files are in the `.gitignore` and yet a .patch file > ([contrib/native/client/patches/zookeeper-3.4.6-x64.patch|https://github.com/apache/drill/pull/2585/files/06625708f0419442d823d0025afa6e043fffcc4e#diff-0b6d0330fc567658b83263c83e902ec72dc0e95bb0ad0830736dc5cae8449168]) > somehow has been included in the Drill build. This PR removes it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8270) Delete absolete zookeeper patch (tech debt)
Charles Givre created DRILL-8270: Summary: Delete absolete zookeeper patch (tech debt) Key: DRILL-8270 URL: https://issues.apache.org/jira/browse/DRILL-8270 Project: Apache Drill Issue Type: Task Reporter: Charles Givre -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8185) EVF 2 doen't handle map arrays or nested maps
[ https://issues.apache.org/jira/browse/DRILL-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568259#comment-17568259 ] Charles Givre commented on DRILL-8185: -- Hey [~Paul.Rogers] , Just as an FYSA, we've been doing some work to consolidate and improve the overall handling of JSON in Drill. (DRILL-8241) The overarching goal being to remove all the deprecated JSON readers and use the EVF2 JSON reader throughout Drill. [~vitalii] has been working on DRILL-5955 with the goal being to "re-enable" union vectors in the EVF2. He has a draft PR but I'm not sure how close we are to completion. We've done some work to also make the JSON configuration more granular and introduced a JSONOptions class which standardizes the JSON configuration for all plugins that use JSON. (DRILL-8243) > EVF 2 doen't handle map arrays or nested maps > - > > Key: DRILL-8185 > URL: https://issues.apache.org/jira/browse/DRILL-8185 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.20.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 2.0.0 > > > When converting Avro, Luoc found two bugs in how EVF 2 (the projection > mechanism) handles map array and nested maps -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8244) HTTP_Request Not Passing Down Config Variable
Charles Givre created DRILL-8244: Summary: HTTP_Request Not Passing Down Config Variable Key: DRILL-8244 URL: https://issues.apache.org/jira/browse/DRILL-8244 Project: Apache Drill Issue Type: Bug Components: Storage - Other Affects Versions: 1.20.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 2.0.0 The http_request UDF was not passing down the provided schema and other config parameters down to the jsonLoader. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (DRILL-8241) Remove Deprecated JSON Reader
[ https://issues.apache.org/jira/browse/DRILL-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-8241: - Description: This is a master ticket to remove the deprecated v1 JSON reader from Drill. This JSON reader is used in several places and removing it will ensure consistent behavior across all data sources. The V2, EVF based JSON reader has several advantages, including the possibility of schema provisioning, limit pushdowns and others. Here are the tasks which need to be completed to fully remove the v1 JSON reader. * Complete DRILL-5955 which adds support for the UNION vector to the EVF Json reader. * Convert the convert_fromJSON functions to V2 (DRILL-8239) * Convert the Druid Storage Plugin to V2 * Convert MongoDB Storage Plugin to V2. (Note the MongoDB plugin uses an EVF-based BSON reader as well as the V1 JSON reader) * Remove all V1-based unit tests * Migrate the JsonOptions from the HTTP Storage Plugin to global location to allow other plugins and users of JSON to set JSON configuration at a more granular level. (DRILL-8243) * Remove extraneous configuration options. * Bug fix HTTP UDFs (DRILL-8242) was: This is a master ticket to remove the deprecated v1 JSON reader from Drill. This JSON reader is used in several places and removing it will ensure consistent behavior across all data sources. The V2, EVF based JSON reader has several advantages, including the possibility of schema provisioning, limit pushdowns and others. Here are the tasks which need to be completed to fully remove the v1 JSON reader. * Complete DRILL-5955 which adds support for the UNION vector to the EVF Json reader. * Convert the convert_fromJSON functions to V2 (DRILL-8239) * Convert the Druid Storage Plugin to V2 * Convert MongoDB Storage Plugin to V2. (Note the MongoDB plugin uses an EVF-based BSON reader as well as the V1 JSON reader) * Remove all V1-based unit tests * Migrate the JsonOptions from the HTTP Storage Plugin to global location to allow other plugins and users of JSON to set JSON configuration at a more granular level. * Remove extraneous configuration options. * Bug fix HTTP UDFs (DRILL-8242) > Remove Deprecated JSON Reader > - > > Key: DRILL-8241 > URL: https://issues.apache.org/jira/browse/DRILL-8241 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.20.1 >Reporter: Charles Givre >Priority: Major > Fix For: 2.0.0 > > > This is a master ticket to remove the deprecated v1 JSON reader from Drill. > This JSON reader is used in several places and removing it will ensure > consistent behavior across all data sources. > The V2, EVF based JSON reader has several advantages, including the > possibility of schema provisioning, limit pushdowns and others. > Here are the tasks which need to be completed to fully remove the v1 JSON > reader. > * Complete DRILL-5955 which adds support for the UNION vector to the EVF > Json reader. > * Convert the convert_fromJSON functions to V2 (DRILL-8239) > * Convert the Druid Storage Plugin to V2 > * Convert MongoDB Storage Plugin to V2. (Note the MongoDB plugin uses an > EVF-based BSON reader as well as the V1 JSON reader) > * Remove all V1-based unit tests > * Migrate the JsonOptions from the HTTP Storage Plugin to global location to > allow other plugins and users of JSON to set JSON configuration at a more > granular level. (DRILL-8243) > * Remove extraneous configuration options. > * Bug fix HTTP UDFs (DRILL-8242) -- This message was sent by Atlassian Jira (v8.20.7#820007)