Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

2016-06-20 Thread qiang li
Another issue is some time when I restart the node, the node can not be
startup.

Here is the exception.
ache-drill-1.7.0/jars/drill-gis-1.7.0-SNAPSHOT.jar!/,
jar:file:/usr/lib/apache-drill-1.7.0/jars/drill-memory-base-1.7.0-SNAPSHOT.jar!/]
took 2800ms
2016-06-20 19:10:18,313 [main] INFO  o.a.d.e.s.s.PersistentStoreRegistry -
Using the configured PStoreProvider class:
'org.apache.drill.exec.store.sys.store.provider.ZookeeperPersistentStoreProvider'.
2016-06-20 19:10:19,221 [main] INFO  o.apache.drill.exec.server.Drillbit -
Construction completed (1529 ms).
2016-06-20 19:10:31,136 [main] WARN  o.apache.drill.exec.server.Drillbit -
Failure on close()
java.lang.NullPointerException: null
at
org.apache.drill.exec.work.WorkManager.close(WorkManager.java:153)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76)
~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64)
~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:159)
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:293)
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271)
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267)
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
2016-06-20 19:10:31,137 [main] INFO  o.apache.drill.exec.server.Drillbit -
Shutdown completed (1914 ms).

I did nothing and start it at next day, then it can startup.

2016-06-21 9:48 GMT+08:00 qiang li :

> Hi Aman,
>
> I did not fully test with the old version.
>
> Cloud you please help me create the JIRA issue,  I think my account have
> not the privilege, my account is griffinli and can not find the place to
> create new issue. Below is the explain detail for the same SQL in different
> nodes of cluster.
>
>
> This is the correct plan which only have two nodes:
> 0: jdbc:drill:zk=xxx:> explain plan for select
> CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
> convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
> `nation` join hbase.offers_ref0 as `ref0` on
>  BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key
>  > '0br' and `nation`.row_key  < '0bs' limit 10;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(uid=[$0], v=[$1])
> 00-02SelectionVectorRemover
> 00-03  Limit(fetch=[10])
> 00-04UnionExchange
> 01-01  SelectionVectorRemover
> 01-02Limit(fetch=[10])
> 01-03  Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3,
> -8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))])
> 01-04Project(row_key=[$3], v=[$4], ITEM=[$5],
> row_key0=[$0], v0=[$1], $f2=[$2])
> 01-05  HashJoin(condition=[=($2, $5)],
> joinType=[inner])
> 01-07Project(row_key=[$0], v=[$1],
> $f2=[BYTE_SUBSTR($0, -8, 8)])
> 01-09  Scan(groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null,
> stopRow=null, filter=null], columns=[`*`]]])
> 01-06Project(row_key0=[$0], v0=[$1], ITEM=[$2])
> 01-08  *BroadcastExchange*
> 02-01Project(row_key=[$0], v=[$1],
> ITEM=[ITEM($1, 'v')])
> 02-02  Scan(groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx,
> startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter
> (GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`,
> `v`.`v`]]])
>
>
> This is the plan that fails which have more than 5 nodes:
> 0: jdbc:drill:zk=xxx:> explain plan for select
> CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
> convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
> `nation` join hbase.offers_ref0 as `ref0` on
>  BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key
>  > '0br' and `nation`.row_key  < '0bs' limit 10;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(uid=[$0], v=[$1])
> 00-02SelectionVectorRemover
> 00-03  Limit(fetch=[10])
> 00-04UnionExchange
> 01-01  SelectionVectorRemover
> 01-02Limit(fetch=[10])
> 01-03  Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3,
> -8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))])
> 01-04Project(row_key=[$3], v=[$4], ITEM=[$5],
> row_key0=[$0], v0=[$1], $f2=[$2])
> 01-05  HashJoin(condition=[=($2, $5)],
> joinType=[inner])
> 01-07Pr

Re: Dynamic UDFs support

2016-06-20 Thread Paul Rogers
Hi Neeraja,

The proposal calls for the user to copy the jar file to each Drillbit node. The 
jar would go into a new $DRILL_HOME/jars/3rdparty/udf directory.

In Drill-on-YARN (DoY), YARN is responsible for copying Drill code to each node 
(which is good.) YARN puts that code in a location known only to YARN. Since 
the location is private to YARN, the user can’t easily hunt down the location 
in order to add the udf jar. Even if the user did find the location, the next 
Drillbit to start would create a new copy of the Drill software, without the 
udf jar.

Second, in DoY we have separated user files from Drill software. This makes it 
much easier to distribute the software to each node: we give the Drill 
distribution tar archive to YARN, and YARN copies it to each node and untars 
the Drill files. We make a separate copy of the (far smaller) set of user 
config files.

If the udf jar goes into a Drill folder ($DRILL_HOME/jars/3rdparty/udf), then 
the user would have to rebuild the Drill tar file each time they add a udf jar. 
When I tried this myself when building DoY, I found it to be slow and 
error-prone.

So, the solution is to place the udf code in the new “site” directory: 
$DRILL_SITE/jars. That’s what that is for. Then, let DoY automatically 
distribute the code to every node. Perfect! Except that it does not work to 
dynamically distribute code after Drill starts.

For DoY, the solution requirements are:

1. Distribute code using Drill itself, rather than manually copying jars to 
(unknown) Drill directories.
2. Ensure the solution works even if another Drillbit is spun up later, and 
uses the original Drill tar file.

I’m thinking we want to leverage DFS: place udf files into a well-known DFS 
directory. Register the udf into, say, ZK. When a new Drillbit starts, it looks 
for new udf jars in ZK, copies the file to a temporary location, and launches. 
An existing Drill is notified of the change and does the same download process. 
Clean-up is needed at some point to remove ZK entries if the udf jar becomes 
statically available on the next launch. That needs more thought.

We’d still need the phases mentioned earlier to ensure consistency.

Suggestions anyone as to how to do this super simply & still get it to work 
with DoY?

Thanks,

- Paul
 
> On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala 
>  wrote:
> 
> This will need to work with YARN (Once Drill is YARN enabled, I would
> expect a lot of users using it in conjunction with YARN).
> Paul, I am not clear why this wouldn't work with YARN. Can you elaborate.
> 
> -Neeraja
> 
> On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers  wrote:
> 
>> Good enough, as long as we document the limitation that this feature can’t
>> work with YARN deployment as users generally do not have access to the
>> temporary “localization” directories where the Drill code is placed by YARN.
>> 
>> Note that the jar distribution race condition issue occurs with the
>> proposed design: I believe I sketched out a scenario in one of the earlier
>> comments. Drillbit A receives the CREATE FUNCTION command. It tells
>> Drillbit B. While informing the other Drillbits, Drillbit B plans and
>> launches a query that uses the function. Drillbit Z starts execution of the
>> query before it learns from A about the new function. This will be rare —
>> just rare enough to create very hard to reproduce bugs.
>> 
>> The only reliable solution is to do the work in multiple passes:
>> 
>> Pass 1: Ask each node to load the function, but not make it available to
>> the planner. (it would be available to the execution engine.)
>> Pass 2: Await confirmation from each node that this is done.
>> Pass 3: Alert every node that it is now free to plan queries with the
>> function.
>> 
>> Finally, I wonder if we should design the SQL syntax based on a long-term
>> design, even if the feature itself is a short-term work-around. Changing
>> the syntax later might break scripts that users might write.
>> 
>> So, the question for the group is this: is the value of semi-complete
>> feature sufficient to justify the potential problems?
>> 
>> - Paul
>> 
>>> On Jun 20, 2016, at 6:15 PM, Parth Chandra 
>> wrote:
>>> 
>>> Moving discussion to dev.
>>> 
>>> I believe the aim is to do a simple implementation without the complexity
>>> of distributing the UDF. I think the document should make this limitation
>>> clear.
>>> 
>>> Per Paul's point on there being a simpler solution of just having each
>>> drillbit detect the if a UDF is present, I think the problem is if a UDF
>>> get's deployed to some but not all drillbits. A query can then start
>>> executing but not run successfully. The intent of the create commands
>> would
>>> be to ensure that all drillbits have the UDF or none would.
>>> 
>>> I think Jacques' point about ownership conflicts is not addressed
>> clearly.
>>> Also, the unloading is not clear. The delete command should probably
>> remove
>>> the UDF and unload it.
>>> 
>>> 
>>> On Fri, Jun 17

Re: Dynamic UDFs support

2016-06-20 Thread Neeraja Rentachintala
This will need to work with YARN (Once Drill is YARN enabled, I would
expect a lot of users using it in conjunction with YARN).
Paul, I am not clear why this wouldn't work with YARN. Can you elaborate.

-Neeraja

On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers  wrote:

> Good enough, as long as we document the limitation that this feature can’t
> work with YARN deployment as users generally do not have access to the
> temporary “localization” directories where the Drill code is placed by YARN.
>
> Note that the jar distribution race condition issue occurs with the
> proposed design: I believe I sketched out a scenario in one of the earlier
> comments. Drillbit A receives the CREATE FUNCTION command. It tells
> Drillbit B. While informing the other Drillbits, Drillbit B plans and
> launches a query that uses the function. Drillbit Z starts execution of the
> query before it learns from A about the new function. This will be rare —
> just rare enough to create very hard to reproduce bugs.
>
> The only reliable solution is to do the work in multiple passes:
>
> Pass 1: Ask each node to load the function, but not make it available to
> the planner. (it would be available to the execution engine.)
> Pass 2: Await confirmation from each node that this is done.
> Pass 3: Alert every node that it is now free to plan queries with the
> function.
>
> Finally, I wonder if we should design the SQL syntax based on a long-term
> design, even if the feature itself is a short-term work-around. Changing
> the syntax later might break scripts that users might write.
>
> So, the question for the group is this: is the value of semi-complete
> feature sufficient to justify the potential problems?
>
> - Paul
>
> > On Jun 20, 2016, at 6:15 PM, Parth Chandra 
> wrote:
> >
> > Moving discussion to dev.
> >
> > I believe the aim is to do a simple implementation without the complexity
> > of distributing the UDF. I think the document should make this limitation
> > clear.
> >
> > Per Paul's point on there being a simpler solution of just having each
> > drillbit detect the if a UDF is present, I think the problem is if a UDF
> > get's deployed to some but not all drillbits. A query can then start
> > executing but not run successfully. The intent of the create commands
> would
> > be to ensure that all drillbits have the UDF or none would.
> >
> > I think Jacques' point about ownership conflicts is not addressed
> clearly.
> > Also, the unloading is not clear. The delete command should probably
> remove
> > the UDF and unload it.
> >
> >
> > On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers 
> wrote:
> >
> >> Reviewed the spec; many comments posted. Three primary comments for the
> >> community to consider.
> >>
> >> 1. The design conflicts with the Drill-on-YARN project. Is this a
> specific
> >> fix for one unique problem, or is it worth expanding the solution to
> work
> >> with Drill-on-YARN deployments? Might be hard to make the two work
> together
> >> later. See comments in docs for details.
> >>
> >> 2. Have we, by chance, looked at how other projects handle code
> >> distribution? Spark, Storm and others automatically deploy code across
> the
> >> cluster; no manual distribution to each node. The key difference between
> >> Drill and others is that, for Storm, say, code is associated with a job
> >> (“topology” in Storm terms.) But, in Drill, functions are global and
> have
> >> no obvious life cycle that suggests when the code can be unloaded.
> >>
> >> 3. Have considered the class loader, dependency and name space isolation
> >> issues addressed by such products as Tomcat (web apps) or Eclipse
> >> (plugins)? Putting user code in the same namespace as Drill code  is
> quick
> >> & dirty. It turns out, however, that doing so leads to problems that
> >> require long, frustrating debugging sessions to resolve.
> >>
> >> Addressing item 1 might expand scope a bit. Addressing items 2 and 3
> are a
> >> big increase in scope, so I won’t be surprised if we leave those issues
> for
> >> later. (Though, addressing item 2 might be the best way to address item
> 1.)
> >>
> >> If we want a very simple solution that requires minimal change, perhaps
> we
> >> can use an even simpler solution. In the proposed design, the user still
> >> must distribute code to all the nodes. The primary change is to tell
> Drill
> >> to load (or unload) that code. Can accomplish the same result easier
> simply
> >> by having Drill periodically scan certain directories looking for new
> (or
> >> removed) jars? Still won’t work with YARN, or solve the name space
> issues,
> >> but will work for existing non-YARN Drill users without new SQL syntax.
> >>
> >> Thanks,
> >>
> >> - Paul
> >>
> >>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau 
> wrote:
> >>>
> >>> Two quick thoughts:
> >>>
> >>> - (user) In the design document I didn't see any discussion of
> >>> ownership/conflicts or unloading. Would be helpful to see the thinking
> >> there
> >>> - (dev) There is a 

Re: Dynamic UDFs support

2016-06-20 Thread Paul Rogers
Good enough, as long as we document the limitation that this feature can’t work 
with YARN deployment as users generally do not have access to the temporary 
“localization” directories where the Drill code is placed by YARN.

Note that the jar distribution race condition issue occurs with the proposed 
design: I believe I sketched out a scenario in one of the earlier comments. 
Drillbit A receives the CREATE FUNCTION command. It tells Drillbit B. While 
informing the other Drillbits, Drillbit B plans and launches a query that uses 
the function. Drillbit Z starts execution of the query before it learns from A 
about the new function. This will be rare — just rare enough to create very 
hard to reproduce bugs.

The only reliable solution is to do the work in multiple passes:

Pass 1: Ask each node to load the function, but not make it available to the 
planner. (it would be available to the execution engine.)
Pass 2: Await confirmation from each node that this is done.
Pass 3: Alert every node that it is now free to plan queries with the function.

Finally, I wonder if we should design the SQL syntax based on a long-term 
design, even if the feature itself is a short-term work-around. Changing the 
syntax later might break scripts that users might write.

So, the question for the group is this: is the value of semi-complete feature 
sufficient to justify the potential problems?

- Paul

> On Jun 20, 2016, at 6:15 PM, Parth Chandra  wrote:
> 
> Moving discussion to dev.
> 
> I believe the aim is to do a simple implementation without the complexity
> of distributing the UDF. I think the document should make this limitation
> clear.
> 
> Per Paul's point on there being a simpler solution of just having each
> drillbit detect the if a UDF is present, I think the problem is if a UDF
> get's deployed to some but not all drillbits. A query can then start
> executing but not run successfully. The intent of the create commands would
> be to ensure that all drillbits have the UDF or none would.
> 
> I think Jacques' point about ownership conflicts is not addressed clearly.
> Also, the unloading is not clear. The delete command should probably remove
> the UDF and unload it.
> 
> 
> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers  wrote:
> 
>> Reviewed the spec; many comments posted. Three primary comments for the
>> community to consider.
>> 
>> 1. The design conflicts with the Drill-on-YARN project. Is this a specific
>> fix for one unique problem, or is it worth expanding the solution to work
>> with Drill-on-YARN deployments? Might be hard to make the two work together
>> later. See comments in docs for details.
>> 
>> 2. Have we, by chance, looked at how other projects handle code
>> distribution? Spark, Storm and others automatically deploy code across the
>> cluster; no manual distribution to each node. The key difference between
>> Drill and others is that, for Storm, say, code is associated with a job
>> (“topology” in Storm terms.) But, in Drill, functions are global and have
>> no obvious life cycle that suggests when the code can be unloaded.
>> 
>> 3. Have considered the class loader, dependency and name space isolation
>> issues addressed by such products as Tomcat (web apps) or Eclipse
>> (plugins)? Putting user code in the same namespace as Drill code  is quick
>> & dirty. It turns out, however, that doing so leads to problems that
>> require long, frustrating debugging sessions to resolve.
>> 
>> Addressing item 1 might expand scope a bit. Addressing items 2 and 3 are a
>> big increase in scope, so I won’t be surprised if we leave those issues for
>> later. (Though, addressing item 2 might be the best way to address item 1.)
>> 
>> If we want a very simple solution that requires minimal change, perhaps we
>> can use an even simpler solution. In the proposed design, the user still
>> must distribute code to all the nodes. The primary change is to tell Drill
>> to load (or unload) that code. Can accomplish the same result easier simply
>> by having Drill periodically scan certain directories looking for new (or
>> removed) jars? Still won’t work with YARN, or solve the name space issues,
>> but will work for existing non-YARN Drill users without new SQL syntax.
>> 
>> Thanks,
>> 
>> - Paul
>> 
>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau  wrote:
>>> 
>>> Two quick thoughts:
>>> 
>>> - (user) In the design document I didn't see any discussion of
>>> ownership/conflicts or unloading. Would be helpful to see the thinking
>> there
>>> - (dev) There is a row oriented facade via the
>>> FieldReader/FieldWriter/ComplexWriter classes. That would be a good place
>>> to start when trying to implement an alternative interface.
>>> 
>>> 
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>> 
>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik  wrote:
>>> 
 Honestly, I don't see it as a priority issue. I think some of the ideas
 around community java UDFs could be a better approach. I'd hate to take
>

Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

2016-06-20 Thread qiang li
Hi Aman,

I did not fully test with the old version.

Cloud you please help me create the JIRA issue,  I think my account have
not the privilege, my account is griffinli and can not find the place to
create new issue. Below is the explain detail for the same SQL in different
nodes of cluster.


This is the correct plan which only have two nodes:
0: jdbc:drill:zk=xxx:> explain plan for select
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
`nation` join hbase.offers_ref0 as `ref0` on
 BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key
 > '0br' and `nation`.row_key  < '0bs' limit 10;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(uid=[$0], v=[$1])
00-02SelectionVectorRemover
00-03  Limit(fetch=[10])
00-04UnionExchange
01-01  SelectionVectorRemover
01-02Limit(fetch=[10])
01-03  Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3,
-8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))])
01-04Project(row_key=[$3], v=[$4], ITEM=[$5],
row_key0=[$0], v0=[$1], $f2=[$2])
01-05  HashJoin(condition=[=($2, $5)], joinType=[inner])
01-07Project(row_key=[$0], v=[$1],
$f2=[BYTE_SUBSTR($0, -8, 8)])
01-09  Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null,
stopRow=null, filter=null], columns=[`*`]]])
01-06Project(row_key0=[$0], v0=[$1], ITEM=[$2])
01-08  *BroadcastExchange*
02-01Project(row_key=[$0], v=[$1],
ITEM=[ITEM($1, 'v')])
02-02  Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx,
startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter
(GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`,
`v`.`v`]]])


This is the plan that fails which have more than 5 nodes:
0: jdbc:drill:zk=xxx:> explain plan for select
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
`nation` join hbase.offers_ref0 as `ref0` on
 BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key
 > '0br' and `nation`.row_key  < '0bs' limit 10;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(uid=[$0], v=[$1])
00-02SelectionVectorRemover
00-03  Limit(fetch=[10])
00-04UnionExchange
01-01  SelectionVectorRemover
01-02Limit(fetch=[10])
01-03  Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3,
-8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))])
01-04Project(row_key=[$3], v=[$4], ITEM=[$5],
row_key0=[$0], v0=[$1], $f2=[$2])
01-05  HashJoin(condition=[=($2, $5)], joinType=[inner])
01-07Project(row_key=[$0], v=[$1], $f2=[$2])
01-09  *HashToRandomExchange*(dist0=[[$2]])
02-01UnorderedMuxExchange
04-01  Project(row_key=[$0], v=[$1], $f2=[$2],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)])
04-02Project(row_key=[$0], v=[$1],
$f2=[BYTE_SUBSTR($0, -8, 8)])
04-03  Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null,
stopRow=null, filter=null], columns=[`*`]]])
01-06Project(row_key0=[$0], v0=[$1], ITEM=[$2])
01-08  Project(row_key=[$0], v=[$1], ITEM=[$2])
01-10*HashToRandomExchange*(dist0=[[$2]])
03-01  UnorderedMuxExchange
05-01Project(row_key=[$0], v=[$1],
ITEM=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)])
05-02  Project(row_key=[$0], v=[$1],
ITEM=[ITEM($1, 'v')])
05-03Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx,
startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter
(GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`,
`v`.`v`]]])

The difference is use *BroadcastExchange *and  *HashToRandomExchange.*

You can create the JIRA and send me the link .

Thanks.


2016-06-20 23:44 GMT+08:00 Aman Sinha :

> Hi Qiang,
> were you seeing this same issue with the prior HBase version also ?  (I
> would think this is not a regression).  It would be best to create a new
> JIRA and attach the EXPLAIN plans for the successful and failed runs.  With
> more nodes some minor fragments of the hash join may be getting empty input
> batches and I am guessing that has something to do with the
> SchemaChangeException.   Someone would need to debu

Re: Dynamic UDFs support

2016-06-20 Thread Parth Chandra
Moving discussion to dev.

I believe the aim is to do a simple implementation without the complexity
of distributing the UDF. I think the document should make this limitation
clear.

Per Paul's point on there being a simpler solution of just having each
drillbit detect the if a UDF is present, I think the problem is if a UDF
get's deployed to some but not all drillbits. A query can then start
executing but not run successfully. The intent of the create commands would
be to ensure that all drillbits have the UDF or none would.

I think Jacques' point about ownership conflicts is not addressed clearly.
Also, the unloading is not clear. The delete command should probably remove
the UDF and unload it.


On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers  wrote:

> Reviewed the spec; many comments posted. Three primary comments for the
> community to consider.
>
> 1. The design conflicts with the Drill-on-YARN project. Is this a specific
> fix for one unique problem, or is it worth expanding the solution to work
> with Drill-on-YARN deployments? Might be hard to make the two work together
> later. See comments in docs for details.
>
> 2. Have we, by chance, looked at how other projects handle code
> distribution? Spark, Storm and others automatically deploy code across the
> cluster; no manual distribution to each node. The key difference between
> Drill and others is that, for Storm, say, code is associated with a job
> (“topology” in Storm terms.) But, in Drill, functions are global and have
> no obvious life cycle that suggests when the code can be unloaded.
>
> 3. Have considered the class loader, dependency and name space isolation
> issues addressed by such products as Tomcat (web apps) or Eclipse
> (plugins)? Putting user code in the same namespace as Drill code  is quick
> & dirty. It turns out, however, that doing so leads to problems that
> require long, frustrating debugging sessions to resolve.
>
> Addressing item 1 might expand scope a bit. Addressing items 2 and 3 are a
> big increase in scope, so I won’t be surprised if we leave those issues for
> later. (Though, addressing item 2 might be the best way to address item 1.)
>
> If we want a very simple solution that requires minimal change, perhaps we
> can use an even simpler solution. In the proposed design, the user still
> must distribute code to all the nodes. The primary change is to tell Drill
> to load (or unload) that code. Can accomplish the same result easier simply
> by having Drill periodically scan certain directories looking for new (or
> removed) jars? Still won’t work with YARN, or solve the name space issues,
> but will work for existing non-YARN Drill users without new SQL syntax.
>
> Thanks,
>
> - Paul
>
> > On Jun 16, 2016, at 2:07 PM, Jacques Nadeau  wrote:
> >
> > Two quick thoughts:
> >
> > - (user) In the design document I didn't see any discussion of
> > ownership/conflicts or unloading. Would be helpful to see the thinking
> there
> > - (dev) There is a row oriented facade via the
> > FieldReader/FieldWriter/ComplexWriter classes. That would be a good place
> > to start when trying to implement an alternative interface.
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Thu, Jun 16, 2016 at 11:32 AM, John Omernik  wrote:
> >
> >> Honestly, I don't see it as a priority issue. I think some of the ideas
> >> around community java UDFs could be a better approach. I'd hate to take
> >> away from other work to hack in something like this.
> >>
> >>
> >>
> >> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers 
> wrote:
> >>
> >>> Ted refers to source code transformation. Drill gains its speed from
> >> value
> >>> vectors. However, VVs are a far cry from the row-based interface that
> >> most
> >>> mere mortals are accustomed to using. Since VVs are very type specific,
> >>> code is typically generated to handle the specifics of each type.
> >> Accessing
> >>> VVs in Jython may be a bit of a challenge because of the "impedence
> >>> mismatch" between how VVs work and the row-and-column view expected by
> >> most
> >>> (non-Drill) developers.
> >>>
> >>> I wonder if we've considered providing a row-oriented "facade" that can
> >> be
> >>> used by roll-your own data sources and user-defined row transforms?
> Might
> >>> be a hiccup in the fast VV pipeline, but might be handy for users
> willing
> >>> to trade a bit of speed for convenience. With such a facade, the Jython
> >> row
> >>> transforms that John mentions could be quite simple.
> >>>
> >>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning 
> >>> wrote:
> >>>
>  Since UDF's use source code transformation, using Jython would be
>  difficult.
> 
> 
> 
>  On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva <
>  arina.yelchiy...@gmail.com> wrote:
> 
> > Hi Charles,
> >
> > not that I am aware of. Proposed solution doesn't invent anything
> >> new,
>  just
> > adds possibility to add UDFs without drillbit restart. But

Re: Time for a 1.7 release

2016-06-20 Thread Aman Sinha
Quick update:  DRILL-4733 (https://issues.apache.org/jira/browse/DRILL-4733)
is a regression that Drill QA team found today, so I will have to wait to
have it resolved for 1.7.0 before creating a release candidate.


On Mon, Jun 20, 2016 at 1:52 PM, Johannes Schulte <
johannes.schu...@gmail.com> wrote:

> Speaking for DRILL-4574:
>
> I can get a simple mvn test to run, even on the master. I always get
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-dependency-plugin:2.8:unpack
> (unpack-vector-types) on project drill-java-exec: Artifact has not been
> packaged yet. When used on reactor artifact, unpack should be executed
> after packaging: see MDEP-98. -> [Help 1]
>
> I tried some things but nothing so far worked. It's not really unit tests
> failing, it's the building. If somebody could check it out and run the
> tests i'd be really happy
>
> On Mon, Jun 20, 2016 at 5:53 AM, Aman Sinha  wrote:
>
> > For the fixes that were committed in the last few days, could the
> > committers close the pull requests and update the JIRAs with 'Fixed'
> status
> > for 1.7.
> >
> > For the remaining JIRAs mentioned in this thread, here are the status:
> >
> > 1. DRILL-4525 (BETWEEN clause on Date and Timestamp):  the right place to
> > fix this would be an enhancement in Calcite.  In the meantime, a
> workaround
> > is to do explicit CASTing as suggested in the JIRA.
> > 2. DRILL-4653 (Skip malformed JSON):  mostly reviewed but needs some more
> > review/testing.
> > 3. DRILL-4704 (Decimal type):  unit test being added.  needs review.
> > 4. DRILL-4574 (Avro):  Rebased but unit tests failing for some other
> > reason.
> >
> > I would like to finalize the content by EOD tomorrow.  Clearly, 1 has
> been
> > pushed out of 1.7 and I think it  would be quite a stretch to get the
> rest
> > in, so I would be in favor of pushing 2, 3, 4 into the next release.
> > However, since all 3 have good momentum going right now, let's try to get
> > the pending issues resolved soon.
> >
> > Thanks !
> > Aman
> >
> >
> > On Thu, Jun 16, 2016 at 1:57 PM, Aman Sinha 
> wrote:
> >
> > > It does look like DRILL-4574 was previously reviewed and ready to be
> > > merged.  Right now it will need to be rebased on master branch.  Since
> > this
> > > is in the Avro plugin,  I am unsure about the types of tests that need
> to
> > > be run.. I would prefer if Jason Altekruse could take a quick look and
> > > merge into master if everything looks ok.
> > >
> > > On Thu, Jun 16, 2016 at 1:22 PM, Johannes Schulte <
> > > johannes.schu...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> https://github.com/apache/drill/pull/459 (
> > >> https://issues.apache.org/jira/browse/DRILL-4574) is still not merged
> > >> but i
> > >> think it is ready for a merge. Are there any other actions necessary?
> > >>
> > >> Johannes
> > >>
> > >> On Thu, Jun 16, 2016 at 8:07 PM, Jinfeng Ni 
> > >> wrote:
> > >>
> > >> > I will review the Sean's PR for DRILL-4525, since it's a regression
> > from
> > >> > 1.6.
> > >> >
> > >> >
> > >> > On Thu, Jun 16, 2016 at 9:39 AM, rahul challapalli
> > >> >  wrote:
> > >> > > I would like to have DRILL-4525 as this is a regression (most
> likely
> > >> from
> > >> > > 1.6). Any takers for this?
> > >> > >
> > >> > > - Rahul
> > >> > >
> > >> > > On Wed, Jun 15, 2016 at 4:03 PM, Aman Sinha  >
> > >> > wrote:
> > >> > >
> > >> > >> I can take a look at DRILL-4653.
> > >> > >>
> > >> > >> Could someone familiar with the Decimal type take a look at
> > >> DRILL-4704 ?
> > >> > >> Agree with Dave that it is a simple case that should be fixed
> > (note,
> > >> > >> however, that decimal is disabled by default currently).
> > >> > >>
> > >> > >>
> > >> > >> On Wed, Jun 15, 2016 at 3:12 PM, Subbu Srinivasan <
> > >> > ssriniva...@zscaler.com
> > >> > >> >
> > >> > >> wrote:
> > >> > >>
> > >> > >> > Who can  review
> https://issues.apache.org/jira/browse/DRILL-4653
> > ?
> > >> > >> >
> > >> > >> > On Wed, Jun 15, 2016 at 1:37 PM, Parth Chandra <
> > >> pchan...@maprtech.com
> > >> > >
> > >> > >> > wrote:
> > >> > >> >
> > >> > >> > > +1 on the 1.7 release
> > >> > >> > >
> > >> > >> > > I'm reviewing the following and hope to get them in the
> release
> > >> > before
> > >> > >> > > cutoff:
> > >> > >> > > https://issues.apache.org/jira/browse/DRILL-2593
> > >> > >> > > https://issues.apache.org/jira/browse/DRILL-4309
> > >> > >> > >
> > >> > >> > >
> > >> > >> > >
> > >> > >> > > On Wed, Jun 15, 2016 at 1:20 PM, Jinfeng Ni <
> > >> jinfengn...@gmail.com>
> > >> > >> > wrote:
> > >> > >> > >
> > >> > >> > > > I'm reviewing a follow-up PR [1] for DRILL-4573. I think we
> > >> need
> > >> > get
> > >> > >> > > > it merged in, since it's a regression in terms of query
> > >> > correctness
> > >> > >> > > > from release 1.6.
> > >> > >> > > >
> > >> > >> > > > [1] https://github.com/apache/drill/pull/512
> > >> > >> > > >
> > >> > >> > > > On Wed, Jun 15, 2016 at 12:21 PM, Dave Oshinsky <
> > >> > >> > doshin...@commvault.co

[jira] [Created] (DRILL-4733) max(dir0) reading more columns than necessary

2016-06-20 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4733:


 Summary: max(dir0) reading more columns than necessary
 Key: DRILL-4733
 URL: https://issues.apache.org/jira/browse/DRILL-4733
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization, Storage - Parquet
Affects Versions: 1.7.0
Reporter: Rahul Challapalli
Priority: Critical
 Attachments: bug.tgz

The below query started to fail from this commit : 
3209886a8548eea4a2f74c059542672f8665b8d2

{code}
select max(dir0) from dfs.`/drill/testdata/bug/2016`;
Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support schema 
changes

Fragment 0:0

[Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

The sub-folders contains files which do have schema change for one column 
"contributions" (int32 vs double). However prior to this commit we did not fail 
in the scenario. Log files and test data are attached





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Time for a 1.7 release

2016-06-20 Thread Johannes Schulte
Speaking for DRILL-4574:

I can get a simple mvn test to run, even on the master. I always get

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-dependency-plugin:2.8:unpack
(unpack-vector-types) on project drill-java-exec: Artifact has not been
packaged yet. When used on reactor artifact, unpack should be executed
after packaging: see MDEP-98. -> [Help 1]

I tried some things but nothing so far worked. It's not really unit tests
failing, it's the building. If somebody could check it out and run the
tests i'd be really happy

On Mon, Jun 20, 2016 at 5:53 AM, Aman Sinha  wrote:

> For the fixes that were committed in the last few days, could the
> committers close the pull requests and update the JIRAs with 'Fixed' status
> for 1.7.
>
> For the remaining JIRAs mentioned in this thread, here are the status:
>
> 1. DRILL-4525 (BETWEEN clause on Date and Timestamp):  the right place to
> fix this would be an enhancement in Calcite.  In the meantime, a workaround
> is to do explicit CASTing as suggested in the JIRA.
> 2. DRILL-4653 (Skip malformed JSON):  mostly reviewed but needs some more
> review/testing.
> 3. DRILL-4704 (Decimal type):  unit test being added.  needs review.
> 4. DRILL-4574 (Avro):  Rebased but unit tests failing for some other
> reason.
>
> I would like to finalize the content by EOD tomorrow.  Clearly, 1 has been
> pushed out of 1.7 and I think it  would be quite a stretch to get the rest
> in, so I would be in favor of pushing 2, 3, 4 into the next release.
> However, since all 3 have good momentum going right now, let's try to get
> the pending issues resolved soon.
>
> Thanks !
> Aman
>
>
> On Thu, Jun 16, 2016 at 1:57 PM, Aman Sinha  wrote:
>
> > It does look like DRILL-4574 was previously reviewed and ready to be
> > merged.  Right now it will need to be rebased on master branch.  Since
> this
> > is in the Avro plugin,  I am unsure about the types of tests that need to
> > be run.. I would prefer if Jason Altekruse could take a quick look and
> > merge into master if everything looks ok.
> >
> > On Thu, Jun 16, 2016 at 1:22 PM, Johannes Schulte <
> > johannes.schu...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> https://github.com/apache/drill/pull/459 (
> >> https://issues.apache.org/jira/browse/DRILL-4574) is still not merged
> >> but i
> >> think it is ready for a merge. Are there any other actions necessary?
> >>
> >> Johannes
> >>
> >> On Thu, Jun 16, 2016 at 8:07 PM, Jinfeng Ni 
> >> wrote:
> >>
> >> > I will review the Sean's PR for DRILL-4525, since it's a regression
> from
> >> > 1.6.
> >> >
> >> >
> >> > On Thu, Jun 16, 2016 at 9:39 AM, rahul challapalli
> >> >  wrote:
> >> > > I would like to have DRILL-4525 as this is a regression (most likely
> >> from
> >> > > 1.6). Any takers for this?
> >> > >
> >> > > - Rahul
> >> > >
> >> > > On Wed, Jun 15, 2016 at 4:03 PM, Aman Sinha 
> >> > wrote:
> >> > >
> >> > >> I can take a look at DRILL-4653.
> >> > >>
> >> > >> Could someone familiar with the Decimal type take a look at
> >> DRILL-4704 ?
> >> > >> Agree with Dave that it is a simple case that should be fixed
> (note,
> >> > >> however, that decimal is disabled by default currently).
> >> > >>
> >> > >>
> >> > >> On Wed, Jun 15, 2016 at 3:12 PM, Subbu Srinivasan <
> >> > ssriniva...@zscaler.com
> >> > >> >
> >> > >> wrote:
> >> > >>
> >> > >> > Who can  review https://issues.apache.org/jira/browse/DRILL-4653
> ?
> >> > >> >
> >> > >> > On Wed, Jun 15, 2016 at 1:37 PM, Parth Chandra <
> >> pchan...@maprtech.com
> >> > >
> >> > >> > wrote:
> >> > >> >
> >> > >> > > +1 on the 1.7 release
> >> > >> > >
> >> > >> > > I'm reviewing the following and hope to get them in the release
> >> > before
> >> > >> > > cutoff:
> >> > >> > > https://issues.apache.org/jira/browse/DRILL-2593
> >> > >> > > https://issues.apache.org/jira/browse/DRILL-4309
> >> > >> > >
> >> > >> > >
> >> > >> > >
> >> > >> > > On Wed, Jun 15, 2016 at 1:20 PM, Jinfeng Ni <
> >> jinfengn...@gmail.com>
> >> > >> > wrote:
> >> > >> > >
> >> > >> > > > I'm reviewing a follow-up PR [1] for DRILL-4573. I think we
> >> need
> >> > get
> >> > >> > > > it merged in, since it's a regression in terms of query
> >> > correctness
> >> > >> > > > from release 1.6.
> >> > >> > > >
> >> > >> > > > [1] https://github.com/apache/drill/pull/512
> >> > >> > > >
> >> > >> > > > On Wed, Jun 15, 2016 at 12:21 PM, Dave Oshinsky <
> >> > >> > doshin...@commvault.com
> >> > >> > > >
> >> > >> > > > wrote:
> >> > >> > > > > This is a pretty basic bug affecting decimal values, with a
> >> > simple
> >> > >> > fix:
> >> > >> > > > > https://issues.apache.org/jira/browse/DRILL-4704
> >> > >> > > > >
> >> > >> > > > > It would be great if it could be reviewed.
> >> > >> > > > >
> >> > >> > > > > -Original Message-
> >> > >> > > > > From: Aman Sinha [mailto:amansi...@apache.org]
> >> > >> > > > > Sent: Wednesday, June 15, 2016 3:15 PM
> >> > >> > > > > To: dev
> >> > >> > > > > Subject: Time for a 1.7 release
> >> > >> > > > >
>

Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

2016-06-20 Thread qiang li
Thanks Aditya.

By the way, I found another issue.

Let say I have two tables.

offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
v(string)
offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
8 byte)

there is the SQL:

select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
`nation` join hbase.offers_ref0 as `ref0` on
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') =
CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key  > '0br'
and `nation`.row_key  < '0bs' limit 10

When I execute the query with single node or less than 5 nodes, its working
good. But when I execute it in cluster which have about 14 nodes, its throw
a exception:

First time will throw this exception:
*Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
Hash join does not support schema changes*

Then if I query again, it will always throw below exception:
*Query Failed: An Error Occurred*
*org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IllegalStateException: Failure while reading vector. Expected vector class
of org.apache.drill.exec.vector.NullableIntVector but was holding vector
class org.apache.drill.exec.vector.complex.MapVector, field=
v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id:
06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*

Its very strange, and I do not know how to solve it.
I tried add node to the cluster one by one, it will reproduce when I added
5 nodes. Can anyone help me solve this issue?




2016-06-17 4:39 GMT+08:00 Aditya :

> https://issues.apache.org/jira/browse/DRILL-4727
>
> On Thu, Jun 16, 2016 at 11:39 AM, Aman Sinha  wrote:
>
>> Qiang/Aditya can you create a JIRA for this and mark it for 1.7.  thanks.
>>
>> On Thu, Jun 16, 2016 at 11:25 AM, Aditya  wrote:
>>
>> > Thanks for reporting, I'm looking into it and will post a patch soon.
>> >
>> > On Wed, Jun 15, 2016 at 7:27 PM, qiang li  wrote:
>> >
>> > > Hi Aditya,
>> > >
>> > > I tested the latest version and got this exception and the drillbit
>> fail
>> > > to startup .
>> > >
>> > > Exception in thread "main" java.lang.NoSuchMethodError:
>> > > io.netty.util.UniqueName.(Ljava/lang/String;)V
>> > > at
>> io.netty.channel.ChannelOption.(ChannelOption.java:136)
>> > > at
>> io.netty.channel.ChannelOption.valueOf(ChannelOption.java:99)
>> > > at
>> io.netty.channel.ChannelOption.(ChannelOption.java:42)
>> > > at
>> > > org.apache.drill.exec.rpc.BasicServer.(BasicServer.java:63)
>> > > at
>> > > org.apache.drill.exec.rpc.user.UserServer.(UserServer.java:74)
>> > > at
>> > >
>> org.apache.drill.exec.service.ServiceEngine.(ServiceEngine.java:78)
>> > > at
>> > org.apache.drill.exec.server.Drillbit.(Drillbit.java:108)
>> > > at
>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
>> > > at
>> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271)
>> > > at
>> org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267)
>> > >
>> > > It will working if I remove jars/3rdparty/netty-all-4.0.23.Final.jar,
>> the
>> > > drill can startup. I think there have some package dependency version
>> > > issue, do you think so ?
>> > >
>> > >
>> > >
>> > > 2016-06-15 8:14 GMT+08:00 Aditya :
>> > >
>> > >> HBase 1.x support has been merged and is available in latest
>> > >> 1.7.0-SNAPSHOT
>> > >> builds.
>> > >>
>> > >> On Wed, Jun 1, 2016 at 1:23 PM, Aditya 
>> wrote:
>> > >>
>> > >> > Thanks Jacques for promptly reviewing my long series of patches!
>> > >> >
>> > >> > I'm planning to merge the HBase 1.x support some time in next 48
>> > hours.
>> > >> >
>> > >> > If anyone else is interested and willing, please review the latest
>> > patch
>> > >> > here[1].
>> > >> >
>> > >> > aditya...
>> > >> >
>> > >> > [1] https://github.com/apache/drill/pull/443/files
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>


[jira] [Created] (DRILL-4732) Update JDBC driver to use the new prepared statement APIs on DrillClient

2016-06-20 Thread Venki Korukanti (JIRA)
Venki Korukanti created DRILL-4732:
--

 Summary: Update JDBC driver to use the new prepared statement APIs 
on DrillClient
 Key: DRILL-4732
 URL: https://issues.apache.org/jira/browse/DRILL-4732
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Venki Korukanti


DRILL-4729 is adding new prepared statement implementation on server side and 
it provides APIs on DrillClient to create new prepared statement which returns 
metadata along with a opaque handle and submit prepared statement for execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...

2016-06-20 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/519#discussion_r67716686
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -47,16 +47,25 @@
   private List statuses;
 
   public List files;
+  /**
+   * root path for the selections
+   */
   public final String selectionRoot;
+  /**
+   * root path for the metadata cache file (if any)
+   */
+  public final String cacheFileRoot;
--- End diff --

When singlePartitionOpt is applied,  is it possible to update selectionRoot 
to be cacheFileRoot? That is, we do not maintain cacheFileRoot separately. In 
stead, a FileSelection with updated selectionRoot is used when 
singlePartitionOpt is applied. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #530: DRILL-4729: Add support for prepared statement impl...

2016-06-20 Thread vkorukanti
GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/530

DRILL-4729: Add support for prepared statement implementation on server side

+ Add following APIs for Drill Java client
  - DrillRpcFuture 
createPreparedStatement(final String query)
  - void executePreparedStatement(final PreparedStatement 
preparedStatement, UserResultsListener resultsListener)
  - List executePreparedStatement(final PreparedStatement 
preparedStatement) (for testing purpose)

+ Separated out the interface from UserClientConnection. It makes it easy 
to have wrappers which need to
  tap the messages and data going to the actual client.

+ Implement CREATE_PREPARED_STATEMENT and handle RunQuery with 
PreparedStatement

+ Test changes to support prepared statement as query type

+ Add tests in TestPreparedStatementProvider

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4729

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/530.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #530


commit 32ba03c7abd9a3784c9a5376dd2835325fe8d5f9
Author: vkorukanti 
Date:   2016-06-09T23:03:06Z

DRILL-4728: Add support for new metadata fetch APIs

+ Protobuf messages
   - GetCatalogsReq -> GetCatalogsResp
   - GetSchemasReq -> GetSchemasResp
   - GetTablesReq -> GetTablesResp
   - GetColumnsReq -> GetColumnsResp

+ Java Drill client changes

+ Server side changes to handle the metadata API calls
  - Provide a self contained `Runnable` implementation for each metadata API
that process the requests and sends the response to client
  - In `UserWorker` override the `handle` method that takes the 
`ResponseSender` and
send the response from the `handle` method instead of returning it.
  - Add a method for each new API to UserWorker to submit the metadata work.
  - Add a method `addNewWork(Runnable runnable)` to `WorkerBee` to submit a 
generic
`Runnable` to `ExecutorService`.
  - Move out couple of methods from `QueryContext` into a separate interface
`SchemaConfigInfoProvider` to enable instantiating Schema trees without 
the
full `QueryContext`

+ New protobuf messages increased the `jdbc-all.jar` size. Up the limit to 
21MB.

Change-Id: I5a5e4b453caf912d832ff8547c5789c884195cc4

commit c520eda8a2169e173763e5f84d919c87de46e895
Author: vkorukanti 
Date:   2016-06-13T18:20:25Z

DRILL-4729: Add support for prepared statement implementation on server side

+ Add following APIs for Drill Java client
  - DrillRpcFuture 
createPreparedStatement(final String query)
  - void executePreparedStatement(final PreparedStatement 
preparedStatement, UserResultsListener resultsListener)
  - List executePreparedStatement(final PreparedStatement 
preparedStatement) (for testing purpose)

+ Separated out the interface from UserClientConnection. It makes it easy 
to have wrappers which need to
  tap the messages and data going to the actual client.

+ Implement CREATE_PREPARED_STATEMENT and handle RunQuery with 
PreparedStatement

+ Test changes to support prepared statement as query type

+ Add tests in TestPreparedStatementProvider

Change-Id: Id26cbb9ed809f0ab3c7530e6a5d8314d2e868b86




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

2016-06-20 Thread Aman Sinha
Hi Qiang,
were you seeing this same issue with the prior HBase version also ?  (I
would think this is not a regression).  It would be best to create a new
JIRA and attach the EXPLAIN plans for the successful and failed runs.  With
more nodes some minor fragments of the hash join may be getting empty input
batches and I am guessing that has something to do with the
SchemaChangeException.   Someone would need to debug once you create the
JIRA with relevant details.

-Aman

On Mon, Jun 20, 2016 at 5:13 AM, qiang li  wrote:

> Thanks Aditya.
>
> By the way, I found another issue.
>
> Let say I have two tables.
>
> offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,  qualifier:
> v(string)
> offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long
> 8 byte)
>
> there is the SQL:
>
> select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
> convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
> `nation` join hbase.offers_ref0 as `ref0` on
> CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') =
> CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key  > '0br'
> and `nation`.row_key  < '0bs' limit 10
>
> When I execute the query with single node or less than 5 nodes, its working
> good. But when I execute it in cluster which have about 14 nodes, its throw
> a exception:
>
> First time will throw this exception:
> *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
> Hash join does not support schema changes*
>
> Then if I query again, it will always throw below exception:
> *Query Failed: An Error Occurred*
> *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IllegalStateException: Failure while reading vector. Expected vector class
> of org.apache.drill.exec.vector.NullableIntVector but was holding vector
> class org.apache.drill.exec.vector.complex.MapVector, field=
> v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
> v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id:
> 06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
>
> Its very strange, and I do not know how to solve it.
> I tried add node to the cluster one by one, it will reproduce when I added
> 5 nodes. Can anyone help me solve this issue?
>
>
>
>
> 2016-06-17 4:39 GMT+08:00 Aditya :
>
> > https://issues.apache.org/jira/browse/DRILL-4727
> >
> > On Thu, Jun 16, 2016 at 11:39 AM, Aman Sinha 
> wrote:
> >
> >> Qiang/Aditya can you create a JIRA for this and mark it for 1.7.
> thanks.
> >>
> >> On Thu, Jun 16, 2016 at 11:25 AM, Aditya 
> wrote:
> >>
> >> > Thanks for reporting, I'm looking into it and will post a patch soon.
> >> >
> >> > On Wed, Jun 15, 2016 at 7:27 PM, qiang li 
> wrote:
> >> >
> >> > > Hi Aditya,
> >> > >
> >> > > I tested the latest version and got this exception and the drillbit
> >> fail
> >> > > to startup .
> >> > >
> >> > > Exception in thread "main" java.lang.NoSuchMethodError:
> >> > > io.netty.util.UniqueName.(Ljava/lang/String;)V
> >> > > at
> >> io.netty.channel.ChannelOption.(ChannelOption.java:136)
> >> > > at
> >> io.netty.channel.ChannelOption.valueOf(ChannelOption.java:99)
> >> > > at
> >> io.netty.channel.ChannelOption.(ChannelOption.java:42)
> >> > > at
> >> > > org.apache.drill.exec.rpc.BasicServer.(BasicServer.java:63)
> >> > > at
> >> > > org.apache.drill.exec.rpc.user.UserServer.(UserServer.java:74)
> >> > > at
> >> > >
> >>
> org.apache.drill.exec.service.ServiceEngine.(ServiceEngine.java:78)
> >> > > at
> >> > org.apache.drill.exec.server.Drillbit.(Drillbit.java:108)
> >> > > at
> >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
> >> > > at
> >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271)
> >> > > at
> >> org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267)
> >> > >
> >> > > It will working if I remove
> jars/3rdparty/netty-all-4.0.23.Final.jar,
> >> the
> >> > > drill can startup. I think there have some package dependency
> version
> >> > > issue, do you think so ?
> >> > >
> >> > >
> >> > >
> >> > > 2016-06-15 8:14 GMT+08:00 Aditya :
> >> > >
> >> > >> HBase 1.x support has been merged and is available in latest
> >> > >> 1.7.0-SNAPSHOT
> >> > >> builds.
> >> > >>
> >> > >> On Wed, Jun 1, 2016 at 1:23 PM, Aditya 
> >> wrote:
> >> > >>
> >> > >> > Thanks Jacques for promptly reviewing my long series of patches!
> >> > >> >
> >> > >> > I'm planning to merge the HBase 1.x support some time in next 48
> >> > hours.
> >> > >> >
> >> > >> > If anyone else is interested and willing, please review the
> latest
> >> > patch
> >> > >> > here[1].
> >> > >> >
> >> > >> > aditya...
> >> > >> >
> >> > >> > [1] https://github.com/apache/drill/pull/443/files
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>


[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...

2016-06-20 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/519#discussion_r67711244
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 ---
@@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter 
filterRel, Project projectR
 int recordCount = 0;
 int qualifiedCount = 0;
 
-// Inner loop: within each batch iterate over the 
PartitionLocations
-for(PartitionLocation part: partitions){
-  if(!output.getAccessor().isNull(recordCount) && 
output.getAccessor().get(recordCount) == 1){
-newPartitions.add(part);
-qualifiedCount++;
+if (checkForSingle &&
+partitions.get(0).isCompositePartition() /* apply single 
partition check only for composite partitions */) {
+  // Inner loop: within each batch iterate over the 
PartitionLocations
+  for (PartitionLocation part : partitions) {
+assert part.isCompositePartition();
+if(!output.getAccessor().isNull(recordCount) && 
output.getAccessor().get(recordCount) == 1) {
+  newPartitions.add(part);
+  if (isSinglePartition) { // only need to do this if we are 
already single partition
+// compose the array of partition values for the 
directories that are referenced by filter:
+// e.g suppose the dir hierarchy is year/quarter/month and 
the query is:
+// SELECT * FROM T WHERE dir0=2015 AND dir1 = 'Q1',
+// then for 2015/Q1/Feb, this will have ['2015', 'Q1', 
null]
--- End diff --

For WHERE condition   dir0=2015 and dir2 = 'Jan', if the dataset happens to 
have only one 'Jan' under '2015' directory, will this qualify for 
singlePartitionOpt?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...

2016-06-20 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/519#discussion_r67709299
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 ---
@@ -320,7 +377,17 @@ protected void doOnMatch(RelOptRuleCall call, Filter 
filterRel, Project projectR
   condition = condition.accept(reverseVisitor);
   pruneCondition = pruneCondition.accept(reverseVisitor);
 
-  RelNode inputRel = descriptor.createTableScan(newPartitions);
+  String cacheFileRoot = null;
+  if (checkForSingle && isSinglePartition) {
+// if metadata cache file could potentially be used, then assign a 
proper cacheFileRoot
+String path = "";
+for (int j = 0; j <= maxIndex; j++) {
+  path += "/" + spInfo[j];
--- End diff --

Related to Line 313, here we do not check spInfo[j] == null ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


RE: [GitHub] drill issue #517: DRILL-4704 fix

2016-06-20 Thread Dave Oshinsky
I ran the Windows TestDecimal unit tests with ExecConstants.java modified as 
suggested.  The failure looks like follows:

Operating system: Windows 7
Windows hack for parquet: setting hadoop.home.dir to c:\winutil\
Running org.apache.drill.exec.physical.impl.TestDecimal#testCastFromFloat
Query: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "org.apache.drill.exec.planner.logical.DrillImplementor",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"resultMode" : "EXEC"
  },
  graph:[
  {
  @id:1,
  pop:"fs-scan",
  format: {type: "json"},
  storage:{type: "file", connection: "classpath:///"},
  files:["/input_simple_decimal.json"]
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "F4",
  "expr" : " (cast(DEC9 as float4)) "
},
{ "ref" : "F8", "expr": "(cast(DEC18 as float8))" }
],

"child" : 1
  },
{
"pop" : "project",
"@id" : 4,
"exprs" : [ {
  "ref" : "DECIMAL_9",
  "expr" : " cast(F4 as decimal9(9, 4))  "
},
{"ref": "DECIMAL38", "expr" : "cast(F8 as decimal38sparse(38, 4))"}
],

"child" : 2
  },
{
"pop" : "screen",
"@id" : 5,
"child" : 4
  } ]
}
mapException1: java.util.concurrent.ExecutionException: 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
PatternSyntaxException: Unexpected internal error near index 1
\
 ^


[Error Id: f99ad9ee-bc5b-4001-9568-cb806a1a2875 on 
DaveOshinsky-PC.gp.cv.commvault.com:31010]

  (org.apache.drill.exec.work.foreman.ForemanSetupException) Failure while 
parsing physical plan.
org.apache.drill.exec.work.foreman.Foreman.parseAndRunPhysicalPlan():391
org.apache.drill.exec.work.foreman.Foreman.run():248
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (com.fasterxml.jackson.databind.JsonMappingException) Instantiation 
of [simple type, class org.apache.drill.exec.store.dfs.easy.EasyGroupScan] 
value failed (java.util.regex.PatternSyntaxException): Unexpected internal 
error near index 1
\
 ^
 at [Source: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "org.apache.drill.exec.planner.logical.DrillImplementor",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"resultMode" : "EXEC"
  },
  graph:[
  {
  @id:1,
  pop:"fs-scan",
  format: {type: "json"},
  storage:{type: "file", connection: "classpath:///"},
  files:["/input_simple_decimal.json"]
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "F4",
  "expr" : " (cast(DEC9 as float4)) "
},
{ "ref" : "F8", "expr": "(cast(DEC18 as float8))" }
],

"child" : 1
  },
{
"pop" : "project",
"@id" : 4,
"exprs" : [ {
  "ref" : "DECIMAL_9",
  "expr" : " cast(F4 as decimal9(9, 4))  "
},
{"ref": "DECIMAL38", "expr" : "cast(F8 as decimal38sparse(38, 4))"}
],

"child" : 2
  },
{
"pop" : "screen",
"@id" : 5,
"child" : 4
  } ]
}; line: 18, column: 3] (through reference chain: 
org.apache.drill.exec.physical.PhysicalPlan["graph"]->java.util.ArrayList[0])
com.fasterxml.jackson.databind.JsonMappingException.from():223

com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException():445

com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem():464

com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith():258
com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build():135

com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():444

com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1123

com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():298

com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1094

com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():166
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():135

com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():120

com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():91

com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():142

com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize():279

com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize():249

com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize():26
com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize():490

com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping():465

com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializ