[GitHub] drill pull request #507: DRILL-4690: CORS in REST API

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/507


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #545: DRILL-4746: Verification Failures (Decimal values) ...

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/545


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #537: DRILL-4695: Log error thrown out of drillbit.run be...

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/537


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #426: DRILL-4499: Remove 17 unused classes

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/426


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #541: DRILL-4673: Implement "DROP TABLE IF EXISTS" for dr...

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/541


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[DISCUSS] Parquet file read performance

2016-07-22 Thread Parth Chandra
Hi Everyone,

  I posted a JIRA [1] about the Parquet file reader not reading from the
file system as fast as it potentially could.

  I also included a proposal [2] to address the issue. Feedback would be
highly appreciated.

Thanks

Parth

[1] https://issues.apache.org/jira/browse/DRILL-4800
[2]
https://docs.google.com/document/d/1FK2LWlazgSLWa_5_WDyt52lYATu8m6UWaWhr591R3ZI/edit?usp=sharing


Changes in Launch Scripts

2016-07-22 Thread Sudheesh Katkam
Hi all,

I just committed DRILL-4581 [1] that changes launch scripts.

The patch should be backward compatible. This email is just an FYI to start
using the new style of drill-env.sh file. The major usability change is
that Drill defaults have been moved from conf/drill-env.sh to
bin/drill-config.sh; changes to variables in drill-env.sh will override the
defaults.

See the ticket for the full list of changes.

Thank you,
Sudheesh

[1] https://issues.apache.org/jira/browse/DRILL-4581


[GitHub] drill pull request #547: Drill-4581: Extensive revisions to the Drill launch...

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/547


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-22 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved DRILL-4743.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

Merged [~gparai]'s fix in commit #: 4dac103

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request #534: [DRILL-4743] HashJoin's not fully parallelized in q...

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/534


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-4783) Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty

2016-07-22 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4783.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

merged in commit: 04964bbf816746ffaefbe58bf9ddedbff54e0f69

> Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty
> ---
>
> Key: DRILL-4783
> URL: https://issues.apache.org/jira/browse/DRILL-4783
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.8.0
>
>
> Flatten failed to work on top of convert_from when the resultset is empty. 
> For a HBase table like this:
> 0: jdbc:drill:zk=localhost:5181> select convert_from(t.address.cities,'json') 
> from hbase.`/tmp/flattentest` t;
> +--+
> |  EXPR$0 
>  |
> +--+
> | {"list":[{"city":"SunnyVale"},{"city":"Palo Alto"},{"city":"Mountain 
> View"}]}|
> | {"list":[{"city":"Seattle"},{"city":"Bellevue"},{"city":"Renton"}]} 
>  |
> | {"list":[{"city":"Minneapolis"},{"city":"Falcon Heights"},{"city":"San 
> Paul"}]}  |
> +--+
> Flatten works when row_key is in (1,2,3)
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=1) t1;
> +---+
> |  EXPR$0   |
> +---+
> | {"city":"SunnyVale"}  |
> | {"city":"Palo Alto"}  |
> | {"city":"Mountain View"}  |
> +---+
> But Flatten throws exception if the resultset is empty
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=4) t1;
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> Fragment 0:0
> [Error Id: 07fd0cab-d1e6-4259-bfec-ad80f02d93a2 on atsqa4-127.qa.lab:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4801) Setting extractHeader attribute for CSV format does not propagate to all drillbits

2016-07-22 Thread Krystal (JIRA)
Krystal created DRILL-4801:
--

 Summary: Setting extractHeader attribute for CSV format does not 
propagate to all drillbits 
 Key: DRILL-4801
 URL: https://issues.apache.org/jira/browse/DRILL-4801
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI, Client - HTTP
Reporter: Krystal


I have multiple drillbits running.  From web UI of one drillbit, I added 
"extractHeader": true to the csv format.  I logged to the Web UI of a different 
drillbit and did not see the added attributed.

I tried the same for the TSV format and that worked as expect as the change got 
propagated to all drillbits. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill issue #514: DRILL-4694: CTAS in JSON format produces extraneous NULL f...

2016-07-22 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/514
  
Merged in 6286c0a


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #514: DRILL-4694: CTAS in JSON format produces extraneous...

2016-07-22 Thread parthchandra
Github user parthchandra closed the pull request at:

https://github.com/apache/drill/pull/514


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #534: [DRILL-4743] HashJoin's not fully parallelized in query pl...

2016-07-22 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/534
  
Overall LGTM +1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #534: [DRILL-4743] HashJoin's not fully parallelized in q...

2016-07-22 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71942925
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/QueryClassLoader.java
 ---
@@ -44,8 +44,8 @@
   public static final String JAVA_COMPILER_OPTION = "exec.java_compiler";
   public static final StringValidator JAVA_COMPILER_VALIDATOR = new 
StringValidator(JAVA_COMPILER_OPTION, CompilerPolicy.DEFAULT.toString()) {
 @Override
-public void validate(OptionValue v) {
-  super.validate(v);
+public void validate(OptionValue v, OptionManager manager) {
--- End diff --

final ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #534: [DRILL-4743] HashJoin's not fully parallelized in q...

2016-07-22 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71942865
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/InboundImpersonationManager.java
 ---
@@ -90,8 +91,8 @@ public InboundImpersonationPolicyValidator(String name, 
String def) {
 }
 
 @Override
-public void validate(OptionValue v) {
-  super.validate(v);
+public void validate(OptionValue v, final OptionManager manager) {
--- End diff --

final OptionValue ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #552: [Drill-3710] New option for the IN LIST size to con...

2016-07-22 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/552#discussion_r71934130
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
@@ -376,4 +376,14 @@ public void testPartitionFilterWithLike() throws 
Exception {
 testIncludeFilter(query4, 4, "Filter", 16);
   }
 
+  @Test //DRILL-3710 Partition pruning should occur with varying IN-LIST 
size
+  public void testPartitionFilterWithInSubquery() throws Exception {
+String query = String.format("select * from 
dfs_test.`%s/multilevel/parquet` where cast (dir0 as int) IN (1994, 1994, 1994, 
1994, 1994, 1994)", TEST_RES_PATH);
+/* In list size exceeds threshold - no partition pruning since 
predicate converted to join */
+test("alter session set `planner.in_subquery_threshold` = 2");
--- End diff --

A bug could cause us to not obey the option at all i.e. we always do 
partition pruning regardless of the option setting. This unit test checks we do 
obey the option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[DISCUSS] New Feature: Kerberos Authentication

2016-07-22 Thread Sudheesh Katkam
Hi all,

I plan to work on DRILL-4280: Kerberos Authentication for Clients [1]. The
design document [2] is attached to the ticket. Please read and comment!

Thank you,
Sudheesh

[1] https://issues.apache.org/jira/browse/DRILL-4280
[2]
https://docs.google.com/document/d/1qSBV2Hi3KwaDFADZJm9me2Nq4RqnKBsyejxunFgmdDo


[jira] [Created] (DRILL-4800) Improve parquet reader performance

2016-07-22 Thread Parth Chandra (JIRA)
Parth Chandra created DRILL-4800:


 Summary: Improve parquet reader performance
 Key: DRILL-4800
 URL: https://issues.apache.org/jira/browse/DRILL-4800
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Parth Chandra


Reported by a user in the field - 

We're generally getting read speeds of about 100-150 MB/s/node on PARQUET scan 
operator. This seems a little low given the number of drives on the node - 24. 
We're looking for options we can improve the performance of this operator as 
most of our queries are I/O bound. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill issue #552: [Drill-3710] New option for the IN LIST size to convert in...

2016-07-22 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/552
  
couple of minor comments.  Overall +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #552: [Drill-3710] New option for the IN LIST size to con...

2016-07-22 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/552#discussion_r71925119
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -216,6 +218,10 @@ public boolean isTypeInferenceEnabled() {
 return options.getOption(TYPE_INFERENCE);
   }
 
+  public long getInSubqueryThreshold() {
+return 
options.getOption(IN_SUBQUERY_THRESHOLD.getOptionName()).num_val;
--- End diff --

Change the declaration to: `public static final LongValidator 
IN_SUBQUERY_THRESHOLD ...`

and here, `return options.getOption(IN_SUBQUERY_THRESHOLD;`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #549: DRILL-4682: Allow full schema identifier in SELECT ...

2016-07-22 Thread julianhyde
Github user julianhyde commented on a diff in the pull request:

https://github.com/apache/drill/pull/549#discussion_r71924583
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/DrillCompoundIdentifier.java
 ---
@@ -69,31 +70,38 @@ public void addIndex(int index, SqlParserPos pos){
 }
   }
 
-  public SqlNode getAsSqlNode(){
-if(ids.size() == 1){
+  public SqlNode getAsSqlNode(Set fullSchemasSet) 
{
--- End diff --

Calcite has a concept of a "namespace" that abstracts what columns are 
available in a table or sub-query. I think you should be using that rather than 
looking at the structure of the parse tree.

There's a lot of code here, and it seems to duplicate (in a less general 
way) what is being done in Calcite. It's technical debt, and let me explain how 
it will bite Drill. I am working right now on 
https://issues.apache.org/jira/browse/CALCITE-1208 and making some significant 
changes to how name-resolution works. When I check in CALCITE-1208 there's a 
strong chance that the code in this PR will break, and as a result, it will 
take Drill even longer to get back onto Calcite master branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #552: [Drill-3710] New option for the IN LIST size to con...

2016-07-22 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/552#discussion_r71924384
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -83,6 +83,8 @@
 
   public static final String TYPE_INFERENCE_KEY = 
"planner.enable_type_inference";
   public static final BooleanValidator TYPE_INFERENCE = new 
BooleanValidator(TYPE_INFERENCE_KEY, true);
+  public static final OptionValidator IN_SUBQUERY_THRESHOLD =
+  new PositiveLongValidator("planner.in_subquery_threshold", 
Integer.MAX_VALUE, 20);
--- End diff --

you should add a comment here for the default value of 20 is chosen to 
match the default Calcite value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #552: [Drill-3710] New option for the IN LIST size to con...

2016-07-22 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/552#discussion_r71923747
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
@@ -376,4 +376,14 @@ public void testPartitionFilterWithLike() throws 
Exception {
 testIncludeFilter(query4, 4, "Filter", 16);
   }
 
+  @Test //DRILL-3710 Partition pruning should occur with varying IN-LIST 
size
+  public void testPartitionFilterWithInSubquery() throws Exception {
+String query = String.format("select * from 
dfs_test.`%s/multilevel/parquet` where cast (dir0 as int) IN (1994, 1994, 1994, 
1994, 1994, 1994)", TEST_RES_PATH);
+/* In list size exceeds threshold - no partition pruning since 
predicate converted to join */
+test("alter session set `planner.in_subquery_threshold` = 2");
--- End diff --

Not sure if it is necessary to check the no-partition-pruning case.  
Basically, the goal of the test is to see if partition pruning works with large 
IN lists. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #549: DRILL-4682: Allow full schema identifier in SELECT ...

2016-07-22 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/549#discussion_r71921608
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/DrillCompoundIdentifier.java
 ---
@@ -69,31 +70,38 @@ public void addIndex(int index, SqlParserPos pos){
 }
   }
 
-  public SqlNode getAsSqlNode(){
-if(ids.size() == 1){
+  public SqlNode getAsSqlNode(Set fullSchemasSet) 
{
--- End diff --

Do you ask about clarification/github comment or code comment?
Calcite parses the query with schema name in column name correctly. Then 
Drill reconverts SqlNode for supporting complex types 
[DRILL-932](https://issues.apache.org/jira/browse/DRILL-932). 
But it was hardcoded: handled everything post two index as item operator 
(For column name in select clause `cp.employee.json.employee_id` SqlNode was 
`cp.employee.json['department_id']` instead of 
`cp.employee.json.department_id`).
I added the case when full schema identifier is used in select clause to 
handle everything post two index (except schema identifier) as item operator. I 
made it by comparing column names identifiers with full schema identifiers from 
`from` and `join` clauses


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #549: DRILL-4682: Allow full schema identifier in SELECT ...

2016-07-22 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/549#discussion_r71918817
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java
 ---
@@ -115,6 +119,18 @@ public SqlNode visitChild(
   enableComplex = true;
 }
   }
+  if (expr.getKind() == SqlKind.SELECT) {
+if (((SqlSelect) expr).getFrom() instanceof 
DrillCompoundIdentifier) {
+  fullSchemasSet.add((DrillCompoundIdentifier) ((SqlSelect) 
expr).getFrom());
+} else if (((SqlSelect) expr).getFrom() instanceof SqlJoin) {
--- End diff --

Seems you are adding the schema-qualified table identifier. Will this logic 
work for nested subqueries? Normally, the name resolution will happen under 
certain name scope. I'm not clear that adding all the table identifier to this 
"fullSchemasSet" will work for name scope based resolution.

Can you try some queries with nested subqueries with different 
schema-qualified tables?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Dynamic UDFs support

2016-07-22 Thread Keys Botzum
No disagreement on deferral but I raised my initial concern precisely because 
I'm concerned about the practicality of the "restart the cluster" option. I  
sighted my concerns about laptops and development clusters.  I was wondering if 
there might be some small things Drill could do to help. If there is nothing 
that can be done to make this easier, so be it, but I think that's going to be 
a big impedance.

Keys
___
Keys Botzum 
Senior Principal Technologist
kbot...@maprtech.com 
443-718-0098
MapR Technologies 
http://www.mapr.com 
> On Jul 22, 2016, at 1:37 AM, Neeraja Rentachintala 
>  wrote:
> 
> It seems like we are reaching a conclusion here in terms of starting with a
> simpler implementation i.e being able to deploy UDFs dynamically without
> Drillbit restarts based off a jars in DFS location.  Dropping functions
> dynamically is out of scope for version 1 of this feature (we assume
> development of UDFs is happening on user laptop or a dev cluster where its
> ok to have restart).
> 
> -Neeraja
> 
> On Thu, Jul 21, 2016 at 11:56 AM, Keys Botzum  wrote:
> 
>> Recognize the difficulty. Not suggesting this be addressed in first
>> version. Just suggesting some thought about how a real user will
>> workaround. Maybe some doc and/or small changes can make this easier.
>> 
>> Keys
>> ___
>> Keys Botzum
>> Senior Principal Technologist
>> kbot...@maprtech.com
>> 443-718-0098
>> MapR Technologies
>> http://www.mapr.com
>> On Jul 21, 2016 1:45 PM, "Paul Rogers"  wrote:
>> 
>>> Hi All,
>>> 
>>> Adding a dynamic DROP would, of course, be a great addition! The reason
>>> for suggesting we skip that was to control project scope.
>>> 
>>> Dynamic DROP requires a synchronization step. Here’s the scenario:
>>> 
>>> * Foreman A starts a query using UDF U.
>>> * Foreman B receives a request to drop UDF U, followed by a request to
>> add
>>> a new version of U, U’.
>>> 
>>> How do we drop a function that may be in use? There are some tricky bits
>>> to work out, which seemed too overwhelming to consider all in one go.
>>> 
>>> Clearly just dropping U and adding a new version of U with the same name
>>> leads to issues if not synchronized. If a Drillbit D is running a query
>>> with U when it receives notice to drop U, should D complete the query or
>>> fail it? If the query completes, then how does D deal with the request to
>>> register U’, which has the same name?
>>> 
>>> Do we globally synchronize function deletion? (The foreman B that
>> receives
>>> the drop request waits for all queries using U to finish.) But, how do we
>>> know which queries use U?
>>> 
>>> An eventually consistent approach is to track the age of the oldest
>>> running query. Suppose B drops U at time T. Any query received after T
>> that
>>> uses U will fail in planning. A new U’ can’t be registered until all
>>> queries that started before T complete.
>>> 
>>> The primary challenge we face in both the CREATE and DROP cases is that
>>> Drill is distributed with little central coordination. That’s great for
>>> scale, but makes it hard to design features that require coordination.
>> Some
>>> other tools solve this problem with a data dictionary (or “metastore").
>>> Alas, Drill does not have such a concept. So a seemingly simple feature
>>> like dynamic UDF becomes a major design challenge to get right.
>>> 
>>> Thanks,
>>> 
>>> - Paul
>>> 
 On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala <
>>> nrentachint...@maprtech.com> wrote:
 
 The whole point of this feature is to avoid Drill cluster restarts as
>> the
 name indicates 'Dynamic' UDFs.
 So any design that requires restarts I would think would beat the
>>> purpose.
 
 I also think this is an example of a feature we start with a simple
>>> design
 to serve the purpose, take feedback on how it is being deployed/used in
 real user situations and improve it in subsequent releases.
 
 -thanks
 Neeraja
 
 On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum 
>>> wrote:
 
> I think there are a lot of great ideas here. My one concern is the
>> lack
>>> of
> unload and thus presumably replace functionality. I'm just thinking
>>> about
> typical actual usage.
> 
> In a typical development cycle someone writes something, tries it,
>>> learns,
> changes it, and tries again. Assuming I understand the design that
>>> change
> step requires a full Drill cluster restart. That is going to be very
> disruptive and will make UDF work nearly impossible without a
>> dedicated
> "private" cluster for Drill. I realize that people should have access
>> to
> the data they need and Drill in a development cluster but even then
> restarts can be hard since development clusters are often shared - and
> that's assuming such a cluster exists. I realize of course Drill can
>> be
>>> run
> as a standalone Dr

[GitHub] drill pull request #549: DRILL-4682: Allow full schema identifier in SELECT ...

2016-07-22 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/549#discussion_r71903562
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/DrillCompoundIdentifier.java
 ---
@@ -69,31 +70,38 @@ public void addIndex(int index, SqlParserPos pos){
 }
   }
 
-  public SqlNode getAsSqlNode(){
-if(ids.size() == 1){
+  public SqlNode getAsSqlNode(Set fullSchemasSet) 
{
--- End diff --

Can you add comment why you have to pass in the set of fullSchema? Without 
it, why will the previous code treat "cp" in "cp.`employee.json`" as a table 
name rather than schema name?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #541: DRILL-4673: Implement "DROP TABLE IF EXISTS" for drill to ...

2016-07-22 Thread vdiravka
Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/541
  
The ticket about `schema#getTable` is opened: 
[DRILL-4799](https://issues.apache.org/jira/browse/DRILL-4799).
Two commits were squashed into one, the branch was rebased to the master 
version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4799) Schema#getTable should return null when the table does not exist.

2016-07-22 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-4799:
--

 Summary: Schema#getTable should return null when the table does 
not exist.
 Key: DRILL-4799
 URL: https://issues.apache.org/jira/browse/DRILL-4799
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow, Storage - HBase
Affects Versions: 1.7.0
Reporter: Vitalii Diravka
 Fix For: Future


There is an unwritten rule: _schema#getTable_ should return null if the table 
does not exist (continuation of the conversation in the 
[DRILL-4673|https://issues.apache.org/jira/browse/DRILL-4673]).

1. That should be documented to ensure that all plugins follow this rule.

2. Accordingly in HBase plugin _HBaseSchemaFactory#getTable_ should return null 
when table is not found instead of _TableNotFoundException_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request #551: DRILL-4792: Include session options used for a quer...

2016-07-22 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/551#discussion_r71869950
  
--- Diff: exec/java-exec/src/main/resources/rest/profile/profile.ftl ---
@@ -107,6 +107,42 @@
   FOREMAN: ${model.getProfile().getForeman().getAddress()}
   TOTAL FRAGMENTS: ${model.getProfile().getTotalFragments()}
 
+  <#if (model.getProfile().getOptionsList()?size > 0)>
--- End diff --

In this case getOptionList returns empty list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #551: DRILL-4792: Include session options used for a quer...

2016-07-22 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/551#discussion_r71869875
  
--- Diff: protocol/src/main/protobuf/UserBitShared.proto ---
@@ -176,12 +176,18 @@ message QueryData {
   optional RecordBatchDef def = 3;
 }
 
+message Option {
+  required string name = 1;
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #502: DRILL-4573 Fixed issue with regexp_replace function

2016-07-22 Thread jcmcote
Github user jcmcote closed the pull request at:

https://github.com/apache/drill/pull/502


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #552: [Drill-3710] New option for the IN LIST size to con...

2016-07-22 Thread gparai
GitHub user gparai opened a pull request:

https://github.com/apache/drill/pull/552

[Drill-3710] New option for the IN LIST size to convert into join

Add option planner.in_subquery_threshold to control the size of the IN list 
for converting to join

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gparai/drill MD-1030-ADM

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/552.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #552


commit 92f244c478f2c5d93b0153f87f15234137bfe0db
Author: Gautam Parai 
Date:   2016-07-19T22:21:59Z

[Drill-3710] New option for the IN LIST size to convert into join

Cosmetic changes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---