[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-05-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277518#comment-15277518
 ] 

ASF GitHub Bot commented on DRILL-4573:
---

Github user jcmcote commented on the pull request:

https://github.com/apache/drill/pull/458#issuecomment-218046301
  
@hsuanyi just following up to see if the patch was applied.


> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Fix For: 1.7.0
>
> Attachments: DRILL-4573-3.patch.txt, DRILL-4573.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4644) Allow setting format options in CTAS

2016-05-09 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276633#comment-15276633
 ] 

Paul Rogers commented on DRILL-4644:


Two suggestions:

CREATE TABLE  (type => 'csv', fieldDelimiter => ',') AS ;

That is, if SQL does not currently have syntax that allows parens after table 
name, then putting the options directly after the table would be the most 
intuitive. The parens act as delimiters for the list.

Else, if we use syntax as a delimiter, then the parens become unnecessary:

CREATE TABLE  OPTION type => 'csv',
fieldDelimiter => ',' AS ;

The material between OPTION and the next keyword must be key/value pairs 
separated by commas.

I personally prefer the first solution.

> Allow setting format options in CTAS
> 
>
> Key: DRILL-4644
> URL: https://issues.apache.org/jira/browse/DRILL-4644
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: amit hadke
>Assignee: amit hadke
>Priority: Minor
>
> User has to set session options in order to specify how CTAS should store a 
> table.(store.format='json')
> Add a new option in CTAS 'STORE AS' that takes optional parameters similar to 
> format attributes used in table functions.
> For example
> CREATE TABLE  STORE AS(type => 'csv',
> fieldDelimiter => ',') AS 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-09 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276627#comment-15276627
 ] 

Jason Altekruse commented on DRILL-4659:


This feature was added last fall, I think we may want to duplicate this 
information in the section about "Querying Data" to make it easier to find, but 
the feature is documented here.

https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters

If you would like to see more examples of usage or information about the 
features development this was the JIRA for the feature: 
https://issues.apache.org/jira/browse/DRILL-4047

> Specify, as part of the query, table information: data format (CSV, parquet, 
> JSON. etc.), field delimiter, etc.
> ---
>
> Key: DRILL-4659
> URL: https://issues.apache.org/jira/browse/DRILL-4659
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Roger Dielrton
>Priority: Minor
>
> I have a file, that I would like to use in a query, and it can have one or 
> more of the following properties:
> * Has not extension ==> Drill is unable to handle it.
> * I know it contains data in CSV format, but the field separator is a non 
> standard character ==> Drill is unable to parse it (without modify the 
> storage plugin configuration).
> * Is located in an Amazon S3 bucket ==> I can't rename it.
> * Has a big size ==> It would be expensive to make a copy of it. 
> It would be nice if you can specify, as part of the "select" query, as 
> metadata, relevant table information as:
> * Data format (CSV, parquet, JSON. etc.)
> * Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters

2016-05-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276572#comment-15276572
 ] 

ASF GitHub Bot commented on DRILL-3149:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/500

DRILL-3149: TextReader should support multibyte line delimiters



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-3149

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/500.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #500


commit d697a377614e76ca2fc10f6f6913be23403f92e3
Author: Arina Ielchiieva 
Date:   2016-04-25T16:15:02Z

DRILL-3149: TextReader should support multibyte line delimiters




> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: Future
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-09 Thread Roger Dielrton (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Dielrton updated DRILL-4659:
--
Description: 
I have a file, that I would like to use in a query, and it can have one or more 
of the following properties:
* Has not extension ==> Drill is unable to handle it.
* I know it contains data in CSV format, but the field separator is a non 
standard character ==> Drill is unable to parse it (without modify the storage 
plugin configuration).
* Is located in an Amazon S3 bucket ==> I can't rename it.
* Has a big size ==> It would be expensive to make a copy of it. 

It would be nice if you can specify, as part of the "select" query, as 
metadata, relevant table information as:
* Data format (CSV, parquet, JSON. etc.)
* Field delimiter.

  was:
I have a file, that I would like to use in a query, and it can have one or more 
of the
following properties:
* Has not extension ==> Drill is unable to handle it.
* I know it contains data in CSV format, but with an non standard character as 
field separator ==>
Drill is unable to parse it (without modify the storage plugin configuration).
* Is located in an Amazon S3 bucket ==> I can rename it.
* Has a big size ==> It would be expensive to make a copy of it. 

It would be nice if you can specify, as part of the "select" query, as 
metadata, relevant table
information as:
* Data format (CSV, parquet, JSON. etc.)
* Field delimiter.


> Specify, as part of the query, table information: data format (CSV, parquet, 
> JSON. etc.), field delimiter, etc.
> ---
>
> Key: DRILL-4659
> URL: https://issues.apache.org/jira/browse/DRILL-4659
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Roger Dielrton
>Priority: Minor
>
> I have a file, that I would like to use in a query, and it can have one or 
> more of the following properties:
> * Has not extension ==> Drill is unable to handle it.
> * I know it contains data in CSV format, but the field separator is a non 
> standard character ==> Drill is unable to parse it (without modify the 
> storage plugin configuration).
> * Is located in an Amazon S3 bucket ==> I can't rename it.
> * Has a big size ==> It would be expensive to make a copy of it. 
> It would be nice if you can specify, as part of the "select" query, as 
> metadata, relevant table information as:
> * Data format (CSV, parquet, JSON. etc.)
> * Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-09 Thread Roger Dielrton (JIRA)
Roger Dielrton created DRILL-4659:
-

 Summary: Specify, as part of the query, table information: data 
format (CSV, parquet, JSON. etc.), field delimiter, etc.
 Key: DRILL-4659
 URL: https://issues.apache.org/jira/browse/DRILL-4659
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization, SQL Parser
Reporter: Roger Dielrton
Priority: Minor


I have a file, that I would like to use in a query, and it can have one or more 
of the
following properties:
* Has not extension ==> Drill is unable to handle it.
* I know it contains data in CSV format, but with an non standard character as 
field separator ==>
Drill is unable to parse it (without modify the storage plugin configuration).
* Is located in an Amazon S3 bucket ==> I can rename it.
* Has a big size ==> It would be expensive to make a copy of it. 

It would be nice if you can specify, as part of the "select" query, as 
metadata, relevant table
information as:
* Data format (CSV, parquet, JSON. etc.)
* Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function

2016-05-09 Thread Vince Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276428#comment-15276428
 ] 

Vince Gonzalez commented on DRILL-4658:
---

Tried, did not work:

{code}
0: jdbc:drill:zk=local> select * from 
table(dfs.`/Users/vince/data/nyc/mta/bustime/MTA-Bus-Time_.2014-10-31.tsvh`(type
 => 'text', fieldDelimiter => U&'\0009', extractHeader => true)) limit 10;
May 09, 2016 10:27:36 AM org.apache.calcite.sql.validate.SqlValidatorException 

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found 
for function signature 
/Users/vince/data/nyc/mta/bustime/MTA-Bus-Time_.2014-10-31.tsvh(type => 
, fieldDelimiter => , extractHeader => )
May 09, 2016 10:27:36 AM org.apache.calcite.runtime.CalciteException 
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 
25 to line 1, column 157: No match found for function signature 
/Users/vince/data/nyc/mta/bustime/MTA-Bus-Time_.2014-10-31.tsvh(type => 
, fieldDelimiter => , extractHeader => )
Error: VALIDATION ERROR: From line 1, column 25 to line 1, column 157: No match 
found for function signature 
/Users/vince/data/nyc/mta/bustime/MTA-Bus-Time_.2014-10-31.tsvh(type => 
, fieldDelimiter => , extractHeader => )

SQL Query null

[Error Id: 4a26b35e-2235-43f4-859a-91711690d539 on 
ip-10-50-108-143.ec2.internal:31010] (state=,code=0)
{code}

In case anyone happens upon this before it's fixed, I am currently working 
around this by creating a workspace and applying the options to the workspace. 
Since table functions only work with individual files, this is necessary anyway 
when dealing with more than one file. This issue really only slows down initial 
exploration of a single file, as far as I can tell.



> cannot specify tab as a fieldDelimiter in table function
> 
>
> Key: DRILL-4658
> URL: https://issues.apache.org/jira/browse/DRILL-4658
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
> Environment: Mac OS X, Java 8
>Reporter: Vince Gonzalez
>
> I can't specify a tab delimiter in the table function because it maybe counts 
> the characters rather than trying to interpret as a character escape code?
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as 
> b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => 
> '\t', skipFirstLine => true));
> Error: PARSE ERROR: Expected single character but was String: \t
> table sample_cast.tsv
> parameter fieldDelimiter
> SQL Query null
> [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)