[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR
[ https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277518#comment-15277518 ] ASF GitHub Bot commented on DRILL-4573: --- Github user jcmcote commented on the pull request: https://github.com/apache/drill/pull/458#issuecomment-218046301 @hsuanyi just following up to see if the patch was applied. > Zero copy LIKE, REGEXP_MATCHES, SUBSTR > -- > > Key: DRILL-4573 > URL: https://issues.apache.org/jira/browse/DRILL-4573 > Project: Apache Drill > Issue Type: Improvement >Reporter: jean-claude >Priority: Minor > Fix For: 1.7.0 > > Attachments: DRILL-4573-3.patch.txt, DRILL-4573.patch.txt > > > All the functions using the java.util.regex.Matcher are currently creating > Java string objects to pass into the matcher.reset(). > However this creates unnecessary copy of the bytes and a Java string object. > The matcher uses a CharSequence, so instead of making a copy we can create an > adapter from the DrillBuffer to the CharSequence interface. > Gains of 25% in execution speed are possible when going over VARCHAR of 36 > chars. The gain will be proportional to the size of the VARCHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4644) Allow setting format options in CTAS
[ https://issues.apache.org/jira/browse/DRILL-4644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276633#comment-15276633 ] Paul Rogers commented on DRILL-4644: Two suggestions: CREATE TABLE (type => 'csv', fieldDelimiter => ',') AS ; That is, if SQL does not currently have syntax that allows parens after table name, then putting the options directly after the table would be the most intuitive. The parens act as delimiters for the list. Else, if we use syntax as a delimiter, then the parens become unnecessary: CREATE TABLE OPTION type => 'csv', fieldDelimiter => ',' AS ; The material between OPTION and the next keyword must be key/value pairs separated by commas. I personally prefer the first solution. > Allow setting format options in CTAS > > > Key: DRILL-4644 > URL: https://issues.apache.org/jira/browse/DRILL-4644 > Project: Apache Drill > Issue Type: Improvement >Reporter: amit hadke >Assignee: amit hadke >Priority: Minor > > User has to set session options in order to specify how CTAS should store a > table.(store.format='json') > Add a new option in CTAS 'STORE AS' that takes optional parameters similar to > format attributes used in table functions. > For example > CREATE TABLE STORE AS(type => 'csv', > fieldDelimiter => ',') AS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.
[ https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276627#comment-15276627 ] Jason Altekruse commented on DRILL-4659: This feature was added last fall, I think we may want to duplicate this information in the section about "Querying Data" to make it easier to find, but the feature is documented here. https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters If you would like to see more examples of usage or information about the features development this was the JIRA for the feature: https://issues.apache.org/jira/browse/DRILL-4047 > Specify, as part of the query, table information: data format (CSV, parquet, > JSON. etc.), field delimiter, etc. > --- > > Key: DRILL-4659 > URL: https://issues.apache.org/jira/browse/DRILL-4659 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, SQL Parser >Reporter: Roger Dielrton >Priority: Minor > > I have a file, that I would like to use in a query, and it can have one or > more of the following properties: > * Has not extension ==> Drill is unable to handle it. > * I know it contains data in CSV format, but the field separator is a non > standard character ==> Drill is unable to parse it (without modify the > storage plugin configuration). > * Is located in an Amazon S3 bucket ==> I can't rename it. > * Has a big size ==> It would be expensive to make a copy of it. > It would be nice if you can specify, as part of the "select" query, as > metadata, relevant table information as: > * Data format (CSV, parquet, JSON. etc.) > * Field delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276572#comment-15276572 ] ASF GitHub Bot commented on DRILL-3149: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/500 DRILL-3149: TextReader should support multibyte line delimiters You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-3149 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/500.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #500 commit d697a377614e76ca2fc10f6f6913be23403f92e3 Author: Arina IelchiievaDate: 2016-04-25T16:15:02Z DRILL-3149: TextReader should support multibyte line delimiters > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: Future > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.
[ https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Dielrton updated DRILL-4659: -- Description: I have a file, that I would like to use in a query, and it can have one or more of the following properties: * Has not extension ==> Drill is unable to handle it. * I know it contains data in CSV format, but the field separator is a non standard character ==> Drill is unable to parse it (without modify the storage plugin configuration). * Is located in an Amazon S3 bucket ==> I can't rename it. * Has a big size ==> It would be expensive to make a copy of it. It would be nice if you can specify, as part of the "select" query, as metadata, relevant table information as: * Data format (CSV, parquet, JSON. etc.) * Field delimiter. was: I have a file, that I would like to use in a query, and it can have one or more of the following properties: * Has not extension ==> Drill is unable to handle it. * I know it contains data in CSV format, but with an non standard character as field separator ==> Drill is unable to parse it (without modify the storage plugin configuration). * Is located in an Amazon S3 bucket ==> I can rename it. * Has a big size ==> It would be expensive to make a copy of it. It would be nice if you can specify, as part of the "select" query, as metadata, relevant table information as: * Data format (CSV, parquet, JSON. etc.) * Field delimiter. > Specify, as part of the query, table information: data format (CSV, parquet, > JSON. etc.), field delimiter, etc. > --- > > Key: DRILL-4659 > URL: https://issues.apache.org/jira/browse/DRILL-4659 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, SQL Parser >Reporter: Roger Dielrton >Priority: Minor > > I have a file, that I would like to use in a query, and it can have one or > more of the following properties: > * Has not extension ==> Drill is unable to handle it. > * I know it contains data in CSV format, but the field separator is a non > standard character ==> Drill is unable to parse it (without modify the > storage plugin configuration). > * Is located in an Amazon S3 bucket ==> I can't rename it. > * Has a big size ==> It would be expensive to make a copy of it. > It would be nice if you can specify, as part of the "select" query, as > metadata, relevant table information as: > * Data format (CSV, parquet, JSON. etc.) > * Field delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.
Roger Dielrton created DRILL-4659: - Summary: Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc. Key: DRILL-4659 URL: https://issues.apache.org/jira/browse/DRILL-4659 Project: Apache Drill Issue Type: Improvement Components: Query Planning & Optimization, SQL Parser Reporter: Roger Dielrton Priority: Minor I have a file, that I would like to use in a query, and it can have one or more of the following properties: * Has not extension ==> Drill is unable to handle it. * I know it contains data in CSV format, but with an non standard character as field separator ==> Drill is unable to parse it (without modify the storage plugin configuration). * Is located in an Amazon S3 bucket ==> I can rename it. * Has a big size ==> It would be expensive to make a copy of it. It would be nice if you can specify, as part of the "select" query, as metadata, relevant table information as: * Data format (CSV, parquet, JSON. etc.) * Field delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function
[ https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276428#comment-15276428 ] Vince Gonzalez commented on DRILL-4658: --- Tried, did not work: {code} 0: jdbc:drill:zk=local> select * from table(dfs.`/Users/vince/data/nyc/mta/bustime/MTA-Bus-Time_.2014-10-31.tsvh`(type => 'text', fieldDelimiter => U&'\0009', extractHeader => true)) limit 10; May 09, 2016 10:27:36 AM org.apache.calcite.sql.validate.SqlValidatorException SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found for function signature /Users/vince/data/nyc/mta/bustime/MTA-Bus-Time_.2014-10-31.tsvh(type => , fieldDelimiter => , extractHeader => ) May 09, 2016 10:27:36 AM org.apache.calcite.runtime.CalciteException SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 25 to line 1, column 157: No match found for function signature /Users/vince/data/nyc/mta/bustime/MTA-Bus-Time_.2014-10-31.tsvh(type => , fieldDelimiter => , extractHeader => ) Error: VALIDATION ERROR: From line 1, column 25 to line 1, column 157: No match found for function signature /Users/vince/data/nyc/mta/bustime/MTA-Bus-Time_.2014-10-31.tsvh(type => , fieldDelimiter => , extractHeader => ) SQL Query null [Error Id: 4a26b35e-2235-43f4-859a-91711690d539 on ip-10-50-108-143.ec2.internal:31010] (state=,code=0) {code} In case anyone happens upon this before it's fixed, I am currently working around this by creating a workspace and applying the options to the workspace. Since table functions only work with individual files, this is necessary anyway when dealing with more than one file. This issue really only slows down initial exploration of a single file, as far as I can tell. > cannot specify tab as a fieldDelimiter in table function > > > Key: DRILL-4658 > URL: https://issues.apache.org/jira/browse/DRILL-4658 > Project: Apache Drill > Issue Type: Bug > Components: SQL Parser >Affects Versions: 1.6.0 > Environment: Mac OS X, Java 8 >Reporter: Vince Gonzalez > > I can't specify a tab delimiter in the table function because it maybe counts > the characters rather than trying to interpret as a character escape code? > {code} > 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as > b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => > '\t', skipFirstLine => true)); > Error: PARSE ERROR: Expected single character but was String: \t > table sample_cast.tsv > parameter fieldDelimiter > SQL Query null > [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)