[jira] [Resolved] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in

2016-05-02 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-4577.
--
Resolution: Fixed

Resolved in commit-id: b8f6ebc651445ccecd3e393250f6cd2781fc07e3

> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268148#comment-15268148
 ] 

ASF GitHub Bot commented on DRILL-4577:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/461


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267771#comment-15267771
 ] 

ASF GitHub Bot commented on DRILL-4577:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/461#issuecomment-216399417
  
+1.

The patch looks good to me.

Internally performance measurement shows orders of magnitude improvement 
for hive schema with up to 32k tables.  

 


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in

2016-05-02 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267726#comment-15267726
 ] 

Sean Hsuan-Yi Chu edited comment on DRILL-4577 at 5/2/16 11:34 PM:
---

[~vkorukanti],
Data points regarding Performance:
1. 1k tables in the DB:
Without this patch, it took around 550 seconds; 
With this patch, it took about 3.7 - 4.3 seconds;
2. 32k tables in the DB:
Without this patch, the result does not come back.
With this patch, it took about 83.1 seconds;

It seems to me that the current performance is not acceptable when #of tables 
is beyond 1k.

With the points here, I think this patch, along with the option, is really 
needed. 


was (Author: seanhychu):
[~vkorukanti],
Data points regarding Performance:
1. 1k tables in the DB:
Without this patch, it took around 550 seconds; 
With this patch, it took about 3.7 - 4.3 seconds;
2. 32k tables in the DB:
Without this patch, the result does not come back.
With this patch, it took about 83.1 seconds;

With the points here, I think this patch, along with the option, is really 
needed. 

> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in

2016-05-02 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267726#comment-15267726
 ] 

Sean Hsuan-Yi Chu commented on DRILL-4577:
--

[~vkorukanti],
Data points regarding Performance:
1. 1k tables in the DB:
Without this patch, it took around 550 seconds; 
With this patch, it took about 3.7 - 4.3 seconds;
2. 32k tables in the DB:
Without this patch, the result does not come back.
With this patch, it took about 83.1 seconds;

With the points here, I think this patch, along with the option, is really 
needed. 

> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4641) Support for lzo compression

2016-05-02 Thread subbu srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267595#comment-15267595
 ] 

subbu srinivasan commented on DRILL-4641:
-

Jason,
Need to make following config changes to make it working.

- Modify core-site.xml to include the following. The property specifies the 
list of codecs that will be exposed by the compression interface 
(org.apache.hadoop.io.compress.CompressionCodecFactory and 
org.apache.hadoop.io.compress.CompressionCodec)


  io.compression.codecs
  
org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,

org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzopCodec



- Download the lzo java compression files  - lzo-hadoop-1.0.5.jar and 
lzo-core-1.0.5.jar

- Define the extension appropriately in the storage plugin 
 "json": {
  "type": "json",
  "extensions": [
"lzo"
  ]
},

This got me going.





> Support for lzo compression
> ---
>
> Key: DRILL-4641
> URL: https://issues.apache.org/jira/browse/DRILL-4641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: Future
> Environment: Not specific to platform
>Reporter: subbu srinivasan
>
> Would love support for quering lzo compressed files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4472) Pushing Filter past Union All fails: DRILL-3257 regressed DRILL-2746 but unit test update break test goal

2016-05-02 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-4472.
--

Per Hsuan, this jira is to rewrite a unit test so there is nothing to verify.

> Pushing Filter past Union All fails: DRILL-3257 regressed DRILL-2746 but unit 
> test update break test goal
> -
>
> Key: DRILL-4472
> URL: https://issues.apache.org/jira/browse/DRILL-4472
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> While reviewing DRILL-4467, I discovered this test. 
> https://github.com/apache/drill/blame/master/exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java#L560
> As you can see, the test is checking that test name confirms that filter is 
> pushed below union all. However, as you can see, the expected result in 
> DRILL-3257 was updated to a plan which doesn't push the in clause below the 
> filter. I'm disabling the test since 4467 happens to remove what becomes a 
> trivial project. However, we really should fix the core problem (a regression 
> of DRILL-2746.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-3745) Hive CHAR not supported

2016-05-02 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-3745.
--

git.commit.id.abbrev=5705d45

Able to run queries against hive tables with char data type.

> Hive CHAR not supported
> ---
>
> Key: DRILL-3745
> URL: https://issues.apache.org/jira/browse/DRILL-3745
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nathaniel Auvil
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> It doesn’t look like Drill 1.1.0 supports the Hive CHAR type?
> In Hive:
> create table development.foo
> (
>   bad CHAR(10)
> );
> And then in sqlline:
> > use `hive.development`;
> > select * from foo;
> Error: PARSE ERROR: Unsupported Hive data type CHAR.
> Following Hive data types are supported in Drill INFORMATION_SCHEMA:
> BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP,
> BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION
> [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] 
> (state=,code=0)
> This was originally found when getting failures trying to connect via JDBS 
> using Squirrel.  We have the Hive plugin enabled with tables using CHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4529) SUM() with windows function result in mismatch nullability

2016-05-02 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-4529.
--

git.commit.id.abbrev=5705d45

Verified that bug is fixed.

> SUM() with windows function result in mismatch nullability
> --
>
> Key: DRILL-4529
> URL: https://issues.apache.org/jira/browse/DRILL-4529
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
>  Labels: limit0
> Fix For: 1.7.0
>
>
> git.commit.id.abbrev=cee5317
> select 
>   sum(1)  over w sum1, 
>   sum(5)  over w sum5,
>   sum(10) over w sum10
> from 
>   j1_v
> where 
>   c_date is not null
> window w as (partition by c_date);
> Output from test:
> limit 0: [columnNoNulls, columnNoNulls, columnNoNulls]
> regular: [columnNullable, columnNullable, columnNullable]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3894) Directory functions (MaxDir, MinDir ..) should have optional filename parameter

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267069#comment-15267069
 ] 

ASF GitHub Bot commented on DRILL-3894:
---

Github user parthchandra commented on the pull request:

https://github.com/apache/drill/pull/467#issuecomment-216307896
  
Looks good. +1


> Directory functions (MaxDir, MinDir ..) should have optional filename 
> parameter
> ---
>
> Key: DRILL-3894
> URL: https://issues.apache.org/jira/browse/DRILL-3894
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.2.0
>Reporter: Neeraja
>Assignee: Vitalii Diravka
>
> https://drill.apache.org/docs/query-directory-functions/
> The directory functions documented above should provide ability to have 
> second parameter(file name) as optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-05-02 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266840#comment-15266840
 ] 

Jacques Nadeau commented on DRILL-4573:
---

When we started working on functions, immutability wasn't a constraint so we 
may have a few functions around that don't follow this rule. The goal is go 
back and correct those but make sure we don't add any new mutations.

> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Fix For: 1.7.0
>
> Attachments: DRILL-4573-3.patch.txt, DRILL-4573.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3149) TextReader should support multibyte line delimiters

2016-05-02 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-3149:
---

Assignee: Arina Ielchiieva

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: Future
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3474) Add implicit file columns support

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266717#comment-15266717
 ] 

ASF GitHub Bot commented on DRILL-3474:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/491

DRILL-3474: Add implicit file columns support



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-3474

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #491


commit aff44f3fafa35d946115899f9e9941e2b2af22d6
Author: Arina Ielchiieva 
Date:   2016-04-18T16:36:52Z

DRILL-3474: Add implicit file columns support




> Add implicit file columns support
> -
>
> Key: DRILL-3474
> URL: https://issues.apache.org/jira/browse/DRILL-3474
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
> Fix For: Future
>
>
> I could not find another ticket which talks about this ...
> The file name should be a column which can be selected or filtered when 
> querying a directory just like dir0, dir1 are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-05-02 Thread jean-claude (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266546#comment-15266546
 ] 

jean-claude commented on DRILL-4573:


ok, I've made the change.

> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Fix For: 1.7.0
>
> Attachments: DRILL-4573-3.patch.txt, DRILL-4573.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-05-02 Thread jean-claude (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jean-claude updated DRILL-4573:
---
Attachment: DRILL-4573-3.patch.txt

> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Fix For: 1.7.0
>
> Attachments: DRILL-4573-3.patch.txt, DRILL-4573.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-05-02 Thread jean-claude (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jean-claude updated DRILL-4573:
---
Attachment: (was: DRILL-4573-3.patch.txt)

> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Fix For: 1.7.0
>
> Attachments: DRILL-4573-3.patch.txt, DRILL-4573.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-05-02 Thread jean-claude (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266437#comment-15266437
 ] 

jean-claude commented on DRILL-4573:


ok. I had it as a copy then saw the REPLACE function doing out.buffer = 
text.buffer, but now I see that text is a constant buffer. I'll change it back.

> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Fix For: 1.7.0
>
> Attachments: DRILL-4573-3.patch.txt, DRILL-4573.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3878) Support XML Querying (selects/projections, no writing)

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266184#comment-15266184
 ] 

ASF GitHub Bot commented on DRILL-3878:
---

Github user magpierre closed the pull request at:

https://github.com/apache/drill/pull/451


> Support XML Querying (selects/projections, no writing)
> --
>
> Key: DRILL-3878
> URL: https://issues.apache.org/jira/browse/DRILL-3878
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Edmon Begoli
>  Labels: features
> Fix For: 1.7.0
>
>   Original Estimate: 3,360h
>  Remaining Estimate: 3,360h
>
> Support querying of the XML documents (as read-only selects, 
> Writing should be implemented as a different feature that brings its own set 
> of challenges.)
> To consider is reading of the trivial, schema-less, XML documents, 
> DTD-oriented ones and also of schema-defined ones.
> Also, we should consider direct querying vs. using converter tools to change 
> the representation from XML to JSON, CSV, etc.
> Design and Implementation discussion, notes, ideas and implementation 
> suggestions should be captured here:
> https://docs.google.com/document/d/1oS-cObSaTlAmuW_XghDLmHbBEorLl0z-axaHnjy7vg0/edit?usp=sharing
>  
> (no vandalism, please)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3806) add metadata for untyped null and simple type promotion

2016-05-02 Thread Joris Gillis (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266171#comment-15266171
 ] 

Joris Gillis commented on DRILL-3806:
-

I am experiencing the same issue. Any progress in the meantime?

> add metadata for untyped null and simple type promotion
> ---
>
> Key: DRILL-3806
> URL: https://issues.apache.org/jira/browse/DRILL-3806
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Julien Le Dem
> Fix For: Future
>
>
> Currently when a field has literal null values in JSON the type will be 
> assigned as BIGINT by default for lack of better type.
> ```
> {
>   "a": null
> }
> ```
> if later on a is assigned with a string value the query will fail with a 
> schema change error,
> The idea is to capture the notion of "untyped null" and implement simple type 
> promotion from untyped null to the actual type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)