Re: DRILL-4199: Add Support for HBase 1.X - planning to merge
Another issue is some time when I restart the node, the node can not be startup. Here is the exception. ache-drill-1.7.0/jars/drill-gis-1.7.0-SNAPSHOT.jar!/, jar:file:/usr/lib/apache-drill-1.7.0/jars/drill-memory-base-1.7.0-SNAPSHOT.jar!/] took 2800ms 2016-06-20 19:10:18,313 [main] INFO o.a.d.e.s.s.PersistentStoreRegistry - Using the configured PStoreProvider class: 'org.apache.drill.exec.store.sys.store.provider.ZookeeperPersistentStoreProvider'. 2016-06-20 19:10:19,221 [main] INFO o.apache.drill.exec.server.Drillbit - Construction completed (1529 ms). 2016-06-20 19:10:31,136 [main] WARN o.apache.drill.exec.server.Drillbit - Failure on close() java.lang.NullPointerException: null at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:153) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:159) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:293) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] 2016-06-20 19:10:31,137 [main] INFO o.apache.drill.exec.server.Drillbit - Shutdown completed (1914 ms). I did nothing and start it at next day, then it can startup. 2016-06-21 9:48 GMT+08:00 qiang li : > Hi Aman, > > I did not fully test with the old version. > > Cloud you please help me create the JIRA issue, I think my account have > not the privilege, my account is griffinli and can not find the place to > create new issue. Below is the explain detail for the same SQL in different > nodes of cluster. > > > This is the correct plan which only have two nodes: > 0: jdbc:drill:zk=xxx:> explain plan for select > CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, > convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as > `nation` join hbase.offers_ref0 as `ref0` on > BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key > > '0br' and `nation`.row_key < '0bs' limit 10; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(uid=[$0], v=[$1]) > 00-02SelectionVectorRemover > 00-03 Limit(fetch=[10]) > 00-04UnionExchange > 01-01 SelectionVectorRemover > 01-02Limit(fetch=[10]) > 01-03 Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3, > -8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))]) > 01-04Project(row_key=[$3], v=[$4], ITEM=[$5], > row_key0=[$0], v0=[$1], $f2=[$2]) > 01-05 HashJoin(condition=[=($2, $5)], > joinType=[inner]) > 01-07Project(row_key=[$0], v=[$1], > $f2=[BYTE_SUBSTR($0, -8, 8)]) > 01-09 Scan(groupscan=[HBaseGroupScan > [HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null, > stopRow=null, filter=null], columns=[`*`]]]) > 01-06Project(row_key0=[$0], v0=[$1], ITEM=[$2]) > 01-08 *BroadcastExchange* > 02-01Project(row_key=[$0], v=[$1], > ITEM=[ITEM($1, 'v')]) > 02-02 Scan(groupscan=[HBaseGroupScan > [HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx, > startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter > (GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`, > `v`.`v`]]]) > > > This is the plan that fails which have more than 5 nodes: > 0: jdbc:drill:zk=xxx:> explain plan for select > CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, > convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as > `nation` join hbase.offers_ref0 as `ref0` on > BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key > > '0br' and `nation`.row_key < '0bs' limit 10; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(uid=[$0], v=[$1]) > 00-02SelectionVectorRemover > 00-03 Limit(fetch=[10]) > 00-04UnionExchange > 01-01 SelectionVectorRemover > 01-02Limit(fetch=[10]) > 01-03 Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3, > -8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))]) > 01-04Project(row_key=[$3], v=[$4], ITEM=[$5], > row_key0=[$0], v0=[$1], $f2=[$2]) > 01-05 HashJoin(condition=[=($2, $5)], > joinType=[inner]) > 01-07Pr
Re: Dynamic UDFs support
Hi Neeraja, The proposal calls for the user to copy the jar file to each Drillbit node. The jar would go into a new $DRILL_HOME/jars/3rdparty/udf directory. In Drill-on-YARN (DoY), YARN is responsible for copying Drill code to each node (which is good.) YARN puts that code in a location known only to YARN. Since the location is private to YARN, the user can’t easily hunt down the location in order to add the udf jar. Even if the user did find the location, the next Drillbit to start would create a new copy of the Drill software, without the udf jar. Second, in DoY we have separated user files from Drill software. This makes it much easier to distribute the software to each node: we give the Drill distribution tar archive to YARN, and YARN copies it to each node and untars the Drill files. We make a separate copy of the (far smaller) set of user config files. If the udf jar goes into a Drill folder ($DRILL_HOME/jars/3rdparty/udf), then the user would have to rebuild the Drill tar file each time they add a udf jar. When I tried this myself when building DoY, I found it to be slow and error-prone. So, the solution is to place the udf code in the new “site” directory: $DRILL_SITE/jars. That’s what that is for. Then, let DoY automatically distribute the code to every node. Perfect! Except that it does not work to dynamically distribute code after Drill starts. For DoY, the solution requirements are: 1. Distribute code using Drill itself, rather than manually copying jars to (unknown) Drill directories. 2. Ensure the solution works even if another Drillbit is spun up later, and uses the original Drill tar file. I’m thinking we want to leverage DFS: place udf files into a well-known DFS directory. Register the udf into, say, ZK. When a new Drillbit starts, it looks for new udf jars in ZK, copies the file to a temporary location, and launches. An existing Drill is notified of the change and does the same download process. Clean-up is needed at some point to remove ZK entries if the udf jar becomes statically available on the next launch. That needs more thought. We’d still need the phases mentioned earlier to ensure consistency. Suggestions anyone as to how to do this super simply & still get it to work with DoY? Thanks, - Paul > On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala > wrote: > > This will need to work with YARN (Once Drill is YARN enabled, I would > expect a lot of users using it in conjunction with YARN). > Paul, I am not clear why this wouldn't work with YARN. Can you elaborate. > > -Neeraja > > On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers wrote: > >> Good enough, as long as we document the limitation that this feature can’t >> work with YARN deployment as users generally do not have access to the >> temporary “localization” directories where the Drill code is placed by YARN. >> >> Note that the jar distribution race condition issue occurs with the >> proposed design: I believe I sketched out a scenario in one of the earlier >> comments. Drillbit A receives the CREATE FUNCTION command. It tells >> Drillbit B. While informing the other Drillbits, Drillbit B plans and >> launches a query that uses the function. Drillbit Z starts execution of the >> query before it learns from A about the new function. This will be rare — >> just rare enough to create very hard to reproduce bugs. >> >> The only reliable solution is to do the work in multiple passes: >> >> Pass 1: Ask each node to load the function, but not make it available to >> the planner. (it would be available to the execution engine.) >> Pass 2: Await confirmation from each node that this is done. >> Pass 3: Alert every node that it is now free to plan queries with the >> function. >> >> Finally, I wonder if we should design the SQL syntax based on a long-term >> design, even if the feature itself is a short-term work-around. Changing >> the syntax later might break scripts that users might write. >> >> So, the question for the group is this: is the value of semi-complete >> feature sufficient to justify the potential problems? >> >> - Paul >> >>> On Jun 20, 2016, at 6:15 PM, Parth Chandra >> wrote: >>> >>> Moving discussion to dev. >>> >>> I believe the aim is to do a simple implementation without the complexity >>> of distributing the UDF. I think the document should make this limitation >>> clear. >>> >>> Per Paul's point on there being a simpler solution of just having each >>> drillbit detect the if a UDF is present, I think the problem is if a UDF >>> get's deployed to some but not all drillbits. A query can then start >>> executing but not run successfully. The intent of the create commands >> would >>> be to ensure that all drillbits have the UDF or none would. >>> >>> I think Jacques' point about ownership conflicts is not addressed >> clearly. >>> Also, the unloading is not clear. The delete command should probably >> remove >>> the UDF and unload it. >>> >>> >>> On Fri, Jun 17
Re: Dynamic UDFs support
This will need to work with YARN (Once Drill is YARN enabled, I would expect a lot of users using it in conjunction with YARN). Paul, I am not clear why this wouldn't work with YARN. Can you elaborate. -Neeraja On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers wrote: > Good enough, as long as we document the limitation that this feature can’t > work with YARN deployment as users generally do not have access to the > temporary “localization” directories where the Drill code is placed by YARN. > > Note that the jar distribution race condition issue occurs with the > proposed design: I believe I sketched out a scenario in one of the earlier > comments. Drillbit A receives the CREATE FUNCTION command. It tells > Drillbit B. While informing the other Drillbits, Drillbit B plans and > launches a query that uses the function. Drillbit Z starts execution of the > query before it learns from A about the new function. This will be rare — > just rare enough to create very hard to reproduce bugs. > > The only reliable solution is to do the work in multiple passes: > > Pass 1: Ask each node to load the function, but not make it available to > the planner. (it would be available to the execution engine.) > Pass 2: Await confirmation from each node that this is done. > Pass 3: Alert every node that it is now free to plan queries with the > function. > > Finally, I wonder if we should design the SQL syntax based on a long-term > design, even if the feature itself is a short-term work-around. Changing > the syntax later might break scripts that users might write. > > So, the question for the group is this: is the value of semi-complete > feature sufficient to justify the potential problems? > > - Paul > > > On Jun 20, 2016, at 6:15 PM, Parth Chandra > wrote: > > > > Moving discussion to dev. > > > > I believe the aim is to do a simple implementation without the complexity > > of distributing the UDF. I think the document should make this limitation > > clear. > > > > Per Paul's point on there being a simpler solution of just having each > > drillbit detect the if a UDF is present, I think the problem is if a UDF > > get's deployed to some but not all drillbits. A query can then start > > executing but not run successfully. The intent of the create commands > would > > be to ensure that all drillbits have the UDF or none would. > > > > I think Jacques' point about ownership conflicts is not addressed > clearly. > > Also, the unloading is not clear. The delete command should probably > remove > > the UDF and unload it. > > > > > > On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers > wrote: > > > >> Reviewed the spec; many comments posted. Three primary comments for the > >> community to consider. > >> > >> 1. The design conflicts with the Drill-on-YARN project. Is this a > specific > >> fix for one unique problem, or is it worth expanding the solution to > work > >> with Drill-on-YARN deployments? Might be hard to make the two work > together > >> later. See comments in docs for details. > >> > >> 2. Have we, by chance, looked at how other projects handle code > >> distribution? Spark, Storm and others automatically deploy code across > the > >> cluster; no manual distribution to each node. The key difference between > >> Drill and others is that, for Storm, say, code is associated with a job > >> (“topology” in Storm terms.) But, in Drill, functions are global and > have > >> no obvious life cycle that suggests when the code can be unloaded. > >> > >> 3. Have considered the class loader, dependency and name space isolation > >> issues addressed by such products as Tomcat (web apps) or Eclipse > >> (plugins)? Putting user code in the same namespace as Drill code is > quick > >> & dirty. It turns out, however, that doing so leads to problems that > >> require long, frustrating debugging sessions to resolve. > >> > >> Addressing item 1 might expand scope a bit. Addressing items 2 and 3 > are a > >> big increase in scope, so I won’t be surprised if we leave those issues > for > >> later. (Though, addressing item 2 might be the best way to address item > 1.) > >> > >> If we want a very simple solution that requires minimal change, perhaps > we > >> can use an even simpler solution. In the proposed design, the user still > >> must distribute code to all the nodes. The primary change is to tell > Drill > >> to load (or unload) that code. Can accomplish the same result easier > simply > >> by having Drill periodically scan certain directories looking for new > (or > >> removed) jars? Still won’t work with YARN, or solve the name space > issues, > >> but will work for existing non-YARN Drill users without new SQL syntax. > >> > >> Thanks, > >> > >> - Paul > >> > >>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau > wrote: > >>> > >>> Two quick thoughts: > >>> > >>> - (user) In the design document I didn't see any discussion of > >>> ownership/conflicts or unloading. Would be helpful to see the thinking > >> there > >>> - (dev) There is a
Re: Dynamic UDFs support
Good enough, as long as we document the limitation that this feature can’t work with YARN deployment as users generally do not have access to the temporary “localization” directories where the Drill code is placed by YARN. Note that the jar distribution race condition issue occurs with the proposed design: I believe I sketched out a scenario in one of the earlier comments. Drillbit A receives the CREATE FUNCTION command. It tells Drillbit B. While informing the other Drillbits, Drillbit B plans and launches a query that uses the function. Drillbit Z starts execution of the query before it learns from A about the new function. This will be rare — just rare enough to create very hard to reproduce bugs. The only reliable solution is to do the work in multiple passes: Pass 1: Ask each node to load the function, but not make it available to the planner. (it would be available to the execution engine.) Pass 2: Await confirmation from each node that this is done. Pass 3: Alert every node that it is now free to plan queries with the function. Finally, I wonder if we should design the SQL syntax based on a long-term design, even if the feature itself is a short-term work-around. Changing the syntax later might break scripts that users might write. So, the question for the group is this: is the value of semi-complete feature sufficient to justify the potential problems? - Paul > On Jun 20, 2016, at 6:15 PM, Parth Chandra wrote: > > Moving discussion to dev. > > I believe the aim is to do a simple implementation without the complexity > of distributing the UDF. I think the document should make this limitation > clear. > > Per Paul's point on there being a simpler solution of just having each > drillbit detect the if a UDF is present, I think the problem is if a UDF > get's deployed to some but not all drillbits. A query can then start > executing but not run successfully. The intent of the create commands would > be to ensure that all drillbits have the UDF or none would. > > I think Jacques' point about ownership conflicts is not addressed clearly. > Also, the unloading is not clear. The delete command should probably remove > the UDF and unload it. > > > On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers wrote: > >> Reviewed the spec; many comments posted. Three primary comments for the >> community to consider. >> >> 1. The design conflicts with the Drill-on-YARN project. Is this a specific >> fix for one unique problem, or is it worth expanding the solution to work >> with Drill-on-YARN deployments? Might be hard to make the two work together >> later. See comments in docs for details. >> >> 2. Have we, by chance, looked at how other projects handle code >> distribution? Spark, Storm and others automatically deploy code across the >> cluster; no manual distribution to each node. The key difference between >> Drill and others is that, for Storm, say, code is associated with a job >> (“topology” in Storm terms.) But, in Drill, functions are global and have >> no obvious life cycle that suggests when the code can be unloaded. >> >> 3. Have considered the class loader, dependency and name space isolation >> issues addressed by such products as Tomcat (web apps) or Eclipse >> (plugins)? Putting user code in the same namespace as Drill code is quick >> & dirty. It turns out, however, that doing so leads to problems that >> require long, frustrating debugging sessions to resolve. >> >> Addressing item 1 might expand scope a bit. Addressing items 2 and 3 are a >> big increase in scope, so I won’t be surprised if we leave those issues for >> later. (Though, addressing item 2 might be the best way to address item 1.) >> >> If we want a very simple solution that requires minimal change, perhaps we >> can use an even simpler solution. In the proposed design, the user still >> must distribute code to all the nodes. The primary change is to tell Drill >> to load (or unload) that code. Can accomplish the same result easier simply >> by having Drill periodically scan certain directories looking for new (or >> removed) jars? Still won’t work with YARN, or solve the name space issues, >> but will work for existing non-YARN Drill users without new SQL syntax. >> >> Thanks, >> >> - Paul >> >>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau wrote: >>> >>> Two quick thoughts: >>> >>> - (user) In the design document I didn't see any discussion of >>> ownership/conflicts or unloading. Would be helpful to see the thinking >> there >>> - (dev) There is a row oriented facade via the >>> FieldReader/FieldWriter/ComplexWriter classes. That would be a good place >>> to start when trying to implement an alternative interface. >>> >>> >>> -- >>> Jacques Nadeau >>> CTO and Co-Founder, Dremio >>> >>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik wrote: >>> Honestly, I don't see it as a priority issue. I think some of the ideas around community java UDFs could be a better approach. I'd hate to take >
Re: DRILL-4199: Add Support for HBase 1.X - planning to merge
Hi Aman, I did not fully test with the old version. Cloud you please help me create the JIRA issue, I think my account have not the privilege, my account is griffinli and can not find the place to create new issue. Below is the explain detail for the same SQL in different nodes of cluster. This is the correct plan which only have two nodes: 0: jdbc:drill:zk=xxx:> explain plan for select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as `nation` join hbase.offers_ref0 as `ref0` on BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key > '0br' and `nation`.row_key < '0bs' limit 10; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(uid=[$0], v=[$1]) 00-02SelectionVectorRemover 00-03 Limit(fetch=[10]) 00-04UnionExchange 01-01 SelectionVectorRemover 01-02Limit(fetch=[10]) 01-03 Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3, -8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))]) 01-04Project(row_key=[$3], v=[$4], ITEM=[$5], row_key0=[$0], v0=[$1], $f2=[$2]) 01-05 HashJoin(condition=[=($2, $5)], joinType=[inner]) 01-07Project(row_key=[$0], v=[$1], $f2=[BYTE_SUBSTR($0, -8, 8)]) 01-09 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null, stopRow=null, filter=null], columns=[`*`]]]) 01-06Project(row_key0=[$0], v0=[$1], ITEM=[$2]) 01-08 *BroadcastExchange* 02-01Project(row_key=[$0], v=[$1], ITEM=[ITEM($1, 'v')]) 02-02 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx, startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter (GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`, `v`.`v`]]]) This is the plan that fails which have more than 5 nodes: 0: jdbc:drill:zk=xxx:> explain plan for select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as `nation` join hbase.offers_ref0 as `ref0` on BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key > '0br' and `nation`.row_key < '0bs' limit 10; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(uid=[$0], v=[$1]) 00-02SelectionVectorRemover 00-03 Limit(fetch=[10]) 00-04UnionExchange 01-01 SelectionVectorRemover 01-02Limit(fetch=[10]) 01-03 Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3, -8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))]) 01-04Project(row_key=[$3], v=[$4], ITEM=[$5], row_key0=[$0], v0=[$1], $f2=[$2]) 01-05 HashJoin(condition=[=($2, $5)], joinType=[inner]) 01-07Project(row_key=[$0], v=[$1], $f2=[$2]) 01-09 *HashToRandomExchange*(dist0=[[$2]]) 02-01UnorderedMuxExchange 04-01 Project(row_key=[$0], v=[$1], $f2=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)]) 04-02Project(row_key=[$0], v=[$1], $f2=[BYTE_SUBSTR($0, -8, 8)]) 04-03 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null, stopRow=null, filter=null], columns=[`*`]]]) 01-06Project(row_key0=[$0], v0=[$1], ITEM=[$2]) 01-08 Project(row_key=[$0], v=[$1], ITEM=[$2]) 01-10*HashToRandomExchange*(dist0=[[$2]]) 03-01 UnorderedMuxExchange 05-01Project(row_key=[$0], v=[$1], ITEM=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)]) 05-02 Project(row_key=[$0], v=[$1], ITEM=[ITEM($1, 'v')]) 05-03Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx, startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter (GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`, `v`.`v`]]]) The difference is use *BroadcastExchange *and *HashToRandomExchange.* You can create the JIRA and send me the link . Thanks. 2016-06-20 23:44 GMT+08:00 Aman Sinha : > Hi Qiang, > were you seeing this same issue with the prior HBase version also ? (I > would think this is not a regression). It would be best to create a new > JIRA and attach the EXPLAIN plans for the successful and failed runs. With > more nodes some minor fragments of the hash join may be getting empty input > batches and I am guessing that has something to do with the > SchemaChangeException. Someone would need to debu
Re: Dynamic UDFs support
Moving discussion to dev. I believe the aim is to do a simple implementation without the complexity of distributing the UDF. I think the document should make this limitation clear. Per Paul's point on there being a simpler solution of just having each drillbit detect the if a UDF is present, I think the problem is if a UDF get's deployed to some but not all drillbits. A query can then start executing but not run successfully. The intent of the create commands would be to ensure that all drillbits have the UDF or none would. I think Jacques' point about ownership conflicts is not addressed clearly. Also, the unloading is not clear. The delete command should probably remove the UDF and unload it. On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers wrote: > Reviewed the spec; many comments posted. Three primary comments for the > community to consider. > > 1. The design conflicts with the Drill-on-YARN project. Is this a specific > fix for one unique problem, or is it worth expanding the solution to work > with Drill-on-YARN deployments? Might be hard to make the two work together > later. See comments in docs for details. > > 2. Have we, by chance, looked at how other projects handle code > distribution? Spark, Storm and others automatically deploy code across the > cluster; no manual distribution to each node. The key difference between > Drill and others is that, for Storm, say, code is associated with a job > (“topology” in Storm terms.) But, in Drill, functions are global and have > no obvious life cycle that suggests when the code can be unloaded. > > 3. Have considered the class loader, dependency and name space isolation > issues addressed by such products as Tomcat (web apps) or Eclipse > (plugins)? Putting user code in the same namespace as Drill code is quick > & dirty. It turns out, however, that doing so leads to problems that > require long, frustrating debugging sessions to resolve. > > Addressing item 1 might expand scope a bit. Addressing items 2 and 3 are a > big increase in scope, so I won’t be surprised if we leave those issues for > later. (Though, addressing item 2 might be the best way to address item 1.) > > If we want a very simple solution that requires minimal change, perhaps we > can use an even simpler solution. In the proposed design, the user still > must distribute code to all the nodes. The primary change is to tell Drill > to load (or unload) that code. Can accomplish the same result easier simply > by having Drill periodically scan certain directories looking for new (or > removed) jars? Still won’t work with YARN, or solve the name space issues, > but will work for existing non-YARN Drill users without new SQL syntax. > > Thanks, > > - Paul > > > On Jun 16, 2016, at 2:07 PM, Jacques Nadeau wrote: > > > > Two quick thoughts: > > > > - (user) In the design document I didn't see any discussion of > > ownership/conflicts or unloading. Would be helpful to see the thinking > there > > - (dev) There is a row oriented facade via the > > FieldReader/FieldWriter/ComplexWriter classes. That would be a good place > > to start when trying to implement an alternative interface. > > > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Thu, Jun 16, 2016 at 11:32 AM, John Omernik wrote: > > > >> Honestly, I don't see it as a priority issue. I think some of the ideas > >> around community java UDFs could be a better approach. I'd hate to take > >> away from other work to hack in something like this. > >> > >> > >> > >> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers > wrote: > >> > >>> Ted refers to source code transformation. Drill gains its speed from > >> value > >>> vectors. However, VVs are a far cry from the row-based interface that > >> most > >>> mere mortals are accustomed to using. Since VVs are very type specific, > >>> code is typically generated to handle the specifics of each type. > >> Accessing > >>> VVs in Jython may be a bit of a challenge because of the "impedence > >>> mismatch" between how VVs work and the row-and-column view expected by > >> most > >>> (non-Drill) developers. > >>> > >>> I wonder if we've considered providing a row-oriented "facade" that can > >> be > >>> used by roll-your own data sources and user-defined row transforms? > Might > >>> be a hiccup in the fast VV pipeline, but might be handy for users > willing > >>> to trade a bit of speed for convenience. With such a facade, the Jython > >> row > >>> transforms that John mentions could be quite simple. > >>> > >>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning > >>> wrote: > >>> > Since UDF's use source code transformation, using Jython would be > difficult. > > > > On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva < > arina.yelchiy...@gmail.com> wrote: > > > Hi Charles, > > > > not that I am aware of. Proposed solution doesn't invent anything > >> new, > just > > adds possibility to add UDFs without drillbit restart. But
Re: Time for a 1.7 release
Quick update: DRILL-4733 (https://issues.apache.org/jira/browse/DRILL-4733) is a regression that Drill QA team found today, so I will have to wait to have it resolved for 1.7.0 before creating a release candidate. On Mon, Jun 20, 2016 at 1:52 PM, Johannes Schulte < johannes.schu...@gmail.com> wrote: > Speaking for DRILL-4574: > > I can get a simple mvn test to run, even on the master. I always get > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-dependency-plugin:2.8:unpack > (unpack-vector-types) on project drill-java-exec: Artifact has not been > packaged yet. When used on reactor artifact, unpack should be executed > after packaging: see MDEP-98. -> [Help 1] > > I tried some things but nothing so far worked. It's not really unit tests > failing, it's the building. If somebody could check it out and run the > tests i'd be really happy > > On Mon, Jun 20, 2016 at 5:53 AM, Aman Sinha wrote: > > > For the fixes that were committed in the last few days, could the > > committers close the pull requests and update the JIRAs with 'Fixed' > status > > for 1.7. > > > > For the remaining JIRAs mentioned in this thread, here are the status: > > > > 1. DRILL-4525 (BETWEEN clause on Date and Timestamp): the right place to > > fix this would be an enhancement in Calcite. In the meantime, a > workaround > > is to do explicit CASTing as suggested in the JIRA. > > 2. DRILL-4653 (Skip malformed JSON): mostly reviewed but needs some more > > review/testing. > > 3. DRILL-4704 (Decimal type): unit test being added. needs review. > > 4. DRILL-4574 (Avro): Rebased but unit tests failing for some other > > reason. > > > > I would like to finalize the content by EOD tomorrow. Clearly, 1 has > been > > pushed out of 1.7 and I think it would be quite a stretch to get the > rest > > in, so I would be in favor of pushing 2, 3, 4 into the next release. > > However, since all 3 have good momentum going right now, let's try to get > > the pending issues resolved soon. > > > > Thanks ! > > Aman > > > > > > On Thu, Jun 16, 2016 at 1:57 PM, Aman Sinha > wrote: > > > > > It does look like DRILL-4574 was previously reviewed and ready to be > > > merged. Right now it will need to be rebased on master branch. Since > > this > > > is in the Avro plugin, I am unsure about the types of tests that need > to > > > be run.. I would prefer if Jason Altekruse could take a quick look and > > > merge into master if everything looks ok. > > > > > > On Thu, Jun 16, 2016 at 1:22 PM, Johannes Schulte < > > > johannes.schu...@gmail.com> wrote: > > > > > >> Hi, > > >> > > >> https://github.com/apache/drill/pull/459 ( > > >> https://issues.apache.org/jira/browse/DRILL-4574) is still not merged > > >> but i > > >> think it is ready for a merge. Are there any other actions necessary? > > >> > > >> Johannes > > >> > > >> On Thu, Jun 16, 2016 at 8:07 PM, Jinfeng Ni > > >> wrote: > > >> > > >> > I will review the Sean's PR for DRILL-4525, since it's a regression > > from > > >> > 1.6. > > >> > > > >> > > > >> > On Thu, Jun 16, 2016 at 9:39 AM, rahul challapalli > > >> > wrote: > > >> > > I would like to have DRILL-4525 as this is a regression (most > likely > > >> from > > >> > > 1.6). Any takers for this? > > >> > > > > >> > > - Rahul > > >> > > > > >> > > On Wed, Jun 15, 2016 at 4:03 PM, Aman Sinha > > > >> > wrote: > > >> > > > > >> > >> I can take a look at DRILL-4653. > > >> > >> > > >> > >> Could someone familiar with the Decimal type take a look at > > >> DRILL-4704 ? > > >> > >> Agree with Dave that it is a simple case that should be fixed > > (note, > > >> > >> however, that decimal is disabled by default currently). > > >> > >> > > >> > >> > > >> > >> On Wed, Jun 15, 2016 at 3:12 PM, Subbu Srinivasan < > > >> > ssriniva...@zscaler.com > > >> > >> > > > >> > >> wrote: > > >> > >> > > >> > >> > Who can review > https://issues.apache.org/jira/browse/DRILL-4653 > > ? > > >> > >> > > > >> > >> > On Wed, Jun 15, 2016 at 1:37 PM, Parth Chandra < > > >> pchan...@maprtech.com > > >> > > > > >> > >> > wrote: > > >> > >> > > > >> > >> > > +1 on the 1.7 release > > >> > >> > > > > >> > >> > > I'm reviewing the following and hope to get them in the > release > > >> > before > > >> > >> > > cutoff: > > >> > >> > > https://issues.apache.org/jira/browse/DRILL-2593 > > >> > >> > > https://issues.apache.org/jira/browse/DRILL-4309 > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > >> > > On Wed, Jun 15, 2016 at 1:20 PM, Jinfeng Ni < > > >> jinfengn...@gmail.com> > > >> > >> > wrote: > > >> > >> > > > > >> > >> > > > I'm reviewing a follow-up PR [1] for DRILL-4573. I think we > > >> need > > >> > get > > >> > >> > > > it merged in, since it's a regression in terms of query > > >> > correctness > > >> > >> > > > from release 1.6. > > >> > >> > > > > > >> > >> > > > [1] https://github.com/apache/drill/pull/512 > > >> > >> > > > > > >> > >> > > > On Wed, Jun 15, 2016 at 12:21 PM, Dave Oshinsky < > > >> > >> > doshin...@commvault.co
[jira] [Created] (DRILL-4733) max(dir0) reading more columns than necessary
Rahul Challapalli created DRILL-4733: Summary: max(dir0) reading more columns than necessary Key: DRILL-4733 URL: https://issues.apache.org/jira/browse/DRILL-4733 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization, Storage - Parquet Affects Versions: 1.7.0 Reporter: Rahul Challapalli Priority: Critical Attachments: bug.tgz The below query started to fail from this commit : 3209886a8548eea4a2f74c059542672f8665b8d2 {code} select max(dir0) from dfs.`/drill/testdata/bug/2016`; Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support schema changes Fragment 0:0 [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] (state=,code=0) {code} The sub-folders contains files which do have schema change for one column "contributions" (int32 vs double). However prior to this commit we did not fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Time for a 1.7 release
Speaking for DRILL-4574: I can get a simple mvn test to run, even on the master. I always get [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.8:unpack (unpack-vector-types) on project drill-java-exec: Artifact has not been packaged yet. When used on reactor artifact, unpack should be executed after packaging: see MDEP-98. -> [Help 1] I tried some things but nothing so far worked. It's not really unit tests failing, it's the building. If somebody could check it out and run the tests i'd be really happy On Mon, Jun 20, 2016 at 5:53 AM, Aman Sinha wrote: > For the fixes that were committed in the last few days, could the > committers close the pull requests and update the JIRAs with 'Fixed' status > for 1.7. > > For the remaining JIRAs mentioned in this thread, here are the status: > > 1. DRILL-4525 (BETWEEN clause on Date and Timestamp): the right place to > fix this would be an enhancement in Calcite. In the meantime, a workaround > is to do explicit CASTing as suggested in the JIRA. > 2. DRILL-4653 (Skip malformed JSON): mostly reviewed but needs some more > review/testing. > 3. DRILL-4704 (Decimal type): unit test being added. needs review. > 4. DRILL-4574 (Avro): Rebased but unit tests failing for some other > reason. > > I would like to finalize the content by EOD tomorrow. Clearly, 1 has been > pushed out of 1.7 and I think it would be quite a stretch to get the rest > in, so I would be in favor of pushing 2, 3, 4 into the next release. > However, since all 3 have good momentum going right now, let's try to get > the pending issues resolved soon. > > Thanks ! > Aman > > > On Thu, Jun 16, 2016 at 1:57 PM, Aman Sinha wrote: > > > It does look like DRILL-4574 was previously reviewed and ready to be > > merged. Right now it will need to be rebased on master branch. Since > this > > is in the Avro plugin, I am unsure about the types of tests that need to > > be run.. I would prefer if Jason Altekruse could take a quick look and > > merge into master if everything looks ok. > > > > On Thu, Jun 16, 2016 at 1:22 PM, Johannes Schulte < > > johannes.schu...@gmail.com> wrote: > > > >> Hi, > >> > >> https://github.com/apache/drill/pull/459 ( > >> https://issues.apache.org/jira/browse/DRILL-4574) is still not merged > >> but i > >> think it is ready for a merge. Are there any other actions necessary? > >> > >> Johannes > >> > >> On Thu, Jun 16, 2016 at 8:07 PM, Jinfeng Ni > >> wrote: > >> > >> > I will review the Sean's PR for DRILL-4525, since it's a regression > from > >> > 1.6. > >> > > >> > > >> > On Thu, Jun 16, 2016 at 9:39 AM, rahul challapalli > >> > wrote: > >> > > I would like to have DRILL-4525 as this is a regression (most likely > >> from > >> > > 1.6). Any takers for this? > >> > > > >> > > - Rahul > >> > > > >> > > On Wed, Jun 15, 2016 at 4:03 PM, Aman Sinha > >> > wrote: > >> > > > >> > >> I can take a look at DRILL-4653. > >> > >> > >> > >> Could someone familiar with the Decimal type take a look at > >> DRILL-4704 ? > >> > >> Agree with Dave that it is a simple case that should be fixed > (note, > >> > >> however, that decimal is disabled by default currently). > >> > >> > >> > >> > >> > >> On Wed, Jun 15, 2016 at 3:12 PM, Subbu Srinivasan < > >> > ssriniva...@zscaler.com > >> > >> > > >> > >> wrote: > >> > >> > >> > >> > Who can review https://issues.apache.org/jira/browse/DRILL-4653 > ? > >> > >> > > >> > >> > On Wed, Jun 15, 2016 at 1:37 PM, Parth Chandra < > >> pchan...@maprtech.com > >> > > > >> > >> > wrote: > >> > >> > > >> > >> > > +1 on the 1.7 release > >> > >> > > > >> > >> > > I'm reviewing the following and hope to get them in the release > >> > before > >> > >> > > cutoff: > >> > >> > > https://issues.apache.org/jira/browse/DRILL-2593 > >> > >> > > https://issues.apache.org/jira/browse/DRILL-4309 > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > On Wed, Jun 15, 2016 at 1:20 PM, Jinfeng Ni < > >> jinfengn...@gmail.com> > >> > >> > wrote: > >> > >> > > > >> > >> > > > I'm reviewing a follow-up PR [1] for DRILL-4573. I think we > >> need > >> > get > >> > >> > > > it merged in, since it's a regression in terms of query > >> > correctness > >> > >> > > > from release 1.6. > >> > >> > > > > >> > >> > > > [1] https://github.com/apache/drill/pull/512 > >> > >> > > > > >> > >> > > > On Wed, Jun 15, 2016 at 12:21 PM, Dave Oshinsky < > >> > >> > doshin...@commvault.com > >> > >> > > > > >> > >> > > > wrote: > >> > >> > > > > This is a pretty basic bug affecting decimal values, with a > >> > simple > >> > >> > fix: > >> > >> > > > > https://issues.apache.org/jira/browse/DRILL-4704 > >> > >> > > > > > >> > >> > > > > It would be great if it could be reviewed. > >> > >> > > > > > >> > >> > > > > -Original Message- > >> > >> > > > > From: Aman Sinha [mailto:amansi...@apache.org] > >> > >> > > > > Sent: Wednesday, June 15, 2016 3:15 PM > >> > >> > > > > To: dev > >> > >> > > > > Subject: Time for a 1.7 release > >> > >> > > > > >
Re: DRILL-4199: Add Support for HBase 1.X - planning to merge
Thanks Aditya. By the way, I found another issue. Let say I have two tables. offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: v(string) offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long 8 byte) there is the SQL: select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as `nation` join hbase.offers_ref0 as `ref0` on CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > '0br' and `nation`.row_key < '0bs' limit 10 When I execute the query with single node or less than 5 nodes, its working good. But when I execute it in cluster which have about 14 nodes, its throw a exception: First time will throw this exception: *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes* Then if I query again, it will always throw below exception: *Query Failed: An Error Occurred* *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.complex.MapVector, field= v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id: 06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* Its very strange, and I do not know how to solve it. I tried add node to the cluster one by one, it will reproduce when I added 5 nodes. Can anyone help me solve this issue? 2016-06-17 4:39 GMT+08:00 Aditya : > https://issues.apache.org/jira/browse/DRILL-4727 > > On Thu, Jun 16, 2016 at 11:39 AM, Aman Sinha wrote: > >> Qiang/Aditya can you create a JIRA for this and mark it for 1.7. thanks. >> >> On Thu, Jun 16, 2016 at 11:25 AM, Aditya wrote: >> >> > Thanks for reporting, I'm looking into it and will post a patch soon. >> > >> > On Wed, Jun 15, 2016 at 7:27 PM, qiang li wrote: >> > >> > > Hi Aditya, >> > > >> > > I tested the latest version and got this exception and the drillbit >> fail >> > > to startup . >> > > >> > > Exception in thread "main" java.lang.NoSuchMethodError: >> > > io.netty.util.UniqueName.(Ljava/lang/String;)V >> > > at >> io.netty.channel.ChannelOption.(ChannelOption.java:136) >> > > at >> io.netty.channel.ChannelOption.valueOf(ChannelOption.java:99) >> > > at >> io.netty.channel.ChannelOption.(ChannelOption.java:42) >> > > at >> > > org.apache.drill.exec.rpc.BasicServer.(BasicServer.java:63) >> > > at >> > > org.apache.drill.exec.rpc.user.UserServer.(UserServer.java:74) >> > > at >> > > >> org.apache.drill.exec.service.ServiceEngine.(ServiceEngine.java:78) >> > > at >> > org.apache.drill.exec.server.Drillbit.(Drillbit.java:108) >> > > at >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285) >> > > at >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271) >> > > at >> org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267) >> > > >> > > It will working if I remove jars/3rdparty/netty-all-4.0.23.Final.jar, >> the >> > > drill can startup. I think there have some package dependency version >> > > issue, do you think so ? >> > > >> > > >> > > >> > > 2016-06-15 8:14 GMT+08:00 Aditya : >> > > >> > >> HBase 1.x support has been merged and is available in latest >> > >> 1.7.0-SNAPSHOT >> > >> builds. >> > >> >> > >> On Wed, Jun 1, 2016 at 1:23 PM, Aditya >> wrote: >> > >> >> > >> > Thanks Jacques for promptly reviewing my long series of patches! >> > >> > >> > >> > I'm planning to merge the HBase 1.x support some time in next 48 >> > hours. >> > >> > >> > >> > If anyone else is interested and willing, please review the latest >> > patch >> > >> > here[1]. >> > >> > >> > >> > aditya... >> > >> > >> > >> > [1] https://github.com/apache/drill/pull/443/files >> > >> > >> > >> >> > > >> > > >> > >> > >
[jira] [Created] (DRILL-4732) Update JDBC driver to use the new prepared statement APIs on DrillClient
Venki Korukanti created DRILL-4732: -- Summary: Update JDBC driver to use the new prepared statement APIs on DrillClient Key: DRILL-4732 URL: https://issues.apache.org/jira/browse/DRILL-4732 Project: Apache Drill Issue Type: Sub-task Reporter: Venki Korukanti DRILL-4729 is adding new prepared statement implementation on server side and it provides APIs on DrillClient to create new prepared statement which returns metadata along with a opaque handle and submit prepared statement for execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67716686 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java --- @@ -47,16 +47,25 @@ private List statuses; public List files; + /** + * root path for the selections + */ public final String selectionRoot; + /** + * root path for the metadata cache file (if any) + */ + public final String cacheFileRoot; --- End diff -- When singlePartitionOpt is applied, is it possible to update selectionRoot to be cacheFileRoot? That is, we do not maintain cacheFileRoot separately. In stead, a FileSelection with updated selectionRoot is used when singlePartitionOpt is applied. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #530: DRILL-4729: Add support for prepared statement impl...
GitHub user vkorukanti opened a pull request: https://github.com/apache/drill/pull/530 DRILL-4729: Add support for prepared statement implementation on server side + Add following APIs for Drill Java client - DrillRpcFuture createPreparedStatement(final String query) - void executePreparedStatement(final PreparedStatement preparedStatement, UserResultsListener resultsListener) - List executePreparedStatement(final PreparedStatement preparedStatement) (for testing purpose) + Separated out the interface from UserClientConnection. It makes it easy to have wrappers which need to tap the messages and data going to the actual client. + Implement CREATE_PREPARED_STATEMENT and handle RunQuery with PreparedStatement + Test changes to support prepared statement as query type + Add tests in TestPreparedStatementProvider You can merge this pull request into a Git repository by running: $ git pull https://github.com/vkorukanti/drill DRILL-4729 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/530.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #530 commit 32ba03c7abd9a3784c9a5376dd2835325fe8d5f9 Author: vkorukanti Date: 2016-06-09T23:03:06Z DRILL-4728: Add support for new metadata fetch APIs + Protobuf messages - GetCatalogsReq -> GetCatalogsResp - GetSchemasReq -> GetSchemasResp - GetTablesReq -> GetTablesResp - GetColumnsReq -> GetColumnsResp + Java Drill client changes + Server side changes to handle the metadata API calls - Provide a self contained `Runnable` implementation for each metadata API that process the requests and sends the response to client - In `UserWorker` override the `handle` method that takes the `ResponseSender` and send the response from the `handle` method instead of returning it. - Add a method for each new API to UserWorker to submit the metadata work. - Add a method `addNewWork(Runnable runnable)` to `WorkerBee` to submit a generic `Runnable` to `ExecutorService`. - Move out couple of methods from `QueryContext` into a separate interface `SchemaConfigInfoProvider` to enable instantiating Schema trees without the full `QueryContext` + New protobuf messages increased the `jdbc-all.jar` size. Up the limit to 21MB. Change-Id: I5a5e4b453caf912d832ff8547c5789c884195cc4 commit c520eda8a2169e173763e5f84d919c87de46e895 Author: vkorukanti Date: 2016-06-13T18:20:25Z DRILL-4729: Add support for prepared statement implementation on server side + Add following APIs for Drill Java client - DrillRpcFuture createPreparedStatement(final String query) - void executePreparedStatement(final PreparedStatement preparedStatement, UserResultsListener resultsListener) - List executePreparedStatement(final PreparedStatement preparedStatement) (for testing purpose) + Separated out the interface from UserClientConnection. It makes it easy to have wrappers which need to tap the messages and data going to the actual client. + Implement CREATE_PREPARED_STATEMENT and handle RunQuery with PreparedStatement + Test changes to support prepared statement as query type + Add tests in TestPreparedStatementProvider Change-Id: Id26cbb9ed809f0ab3c7530e6a5d8314d2e868b86 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: DRILL-4199: Add Support for HBase 1.X - planning to merge
Hi Qiang, were you seeing this same issue with the prior HBase version also ? (I would think this is not a regression). It would be best to create a new JIRA and attach the EXPLAIN plans for the successful and failed runs. With more nodes some minor fragments of the hash join may be getting empty input batches and I am guessing that has something to do with the SchemaChangeException. Someone would need to debug once you create the JIRA with relevant details. -Aman On Mon, Jun 20, 2016 at 5:13 AM, qiang li wrote: > Thanks Aditya. > > By the way, I found another issue. > > Let say I have two tables. > > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v, qualifier: > v(string) > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier: v(long > 8 byte) > > there is the SQL: > > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid, > convert_from(`ref0`.`v`.`v`,'UTF8') as v from hbase.`offers_nation_idx` as > `nation` join hbase.offers_ref0 as `ref0` on > CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') = > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key > '0br' > and `nation`.row_key < '0bs' limit 10 > > When I execute the query with single node or less than 5 nodes, its working > good. But when I execute it in cluster which have about 14 nodes, its throw > a exception: > > First time will throw this exception: > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException: > Hash join does not support schema changes* > > Then if I query again, it will always throw below exception: > *Query Failed: An Error Occurred* > *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IllegalStateException: Failure while reading vector. Expected vector class > of org.apache.drill.exec.vector.NullableIntVector but was holding vector > class org.apache.drill.exec.vector.complex.MapVector, field= > v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED), > v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error Id: > 06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]* > > Its very strange, and I do not know how to solve it. > I tried add node to the cluster one by one, it will reproduce when I added > 5 nodes. Can anyone help me solve this issue? > > > > > 2016-06-17 4:39 GMT+08:00 Aditya : > > > https://issues.apache.org/jira/browse/DRILL-4727 > > > > On Thu, Jun 16, 2016 at 11:39 AM, Aman Sinha > wrote: > > > >> Qiang/Aditya can you create a JIRA for this and mark it for 1.7. > thanks. > >> > >> On Thu, Jun 16, 2016 at 11:25 AM, Aditya > wrote: > >> > >> > Thanks for reporting, I'm looking into it and will post a patch soon. > >> > > >> > On Wed, Jun 15, 2016 at 7:27 PM, qiang li > wrote: > >> > > >> > > Hi Aditya, > >> > > > >> > > I tested the latest version and got this exception and the drillbit > >> fail > >> > > to startup . > >> > > > >> > > Exception in thread "main" java.lang.NoSuchMethodError: > >> > > io.netty.util.UniqueName.(Ljava/lang/String;)V > >> > > at > >> io.netty.channel.ChannelOption.(ChannelOption.java:136) > >> > > at > >> io.netty.channel.ChannelOption.valueOf(ChannelOption.java:99) > >> > > at > >> io.netty.channel.ChannelOption.(ChannelOption.java:42) > >> > > at > >> > > org.apache.drill.exec.rpc.BasicServer.(BasicServer.java:63) > >> > > at > >> > > org.apache.drill.exec.rpc.user.UserServer.(UserServer.java:74) > >> > > at > >> > > > >> > org.apache.drill.exec.service.ServiceEngine.(ServiceEngine.java:78) > >> > > at > >> > org.apache.drill.exec.server.Drillbit.(Drillbit.java:108) > >> > > at > >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285) > >> > > at > >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271) > >> > > at > >> org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267) > >> > > > >> > > It will working if I remove > jars/3rdparty/netty-all-4.0.23.Final.jar, > >> the > >> > > drill can startup. I think there have some package dependency > version > >> > > issue, do you think so ? > >> > > > >> > > > >> > > > >> > > 2016-06-15 8:14 GMT+08:00 Aditya : > >> > > > >> > >> HBase 1.x support has been merged and is available in latest > >> > >> 1.7.0-SNAPSHOT > >> > >> builds. > >> > >> > >> > >> On Wed, Jun 1, 2016 at 1:23 PM, Aditya > >> wrote: > >> > >> > >> > >> > Thanks Jacques for promptly reviewing my long series of patches! > >> > >> > > >> > >> > I'm planning to merge the HBase 1.x support some time in next 48 > >> > hours. > >> > >> > > >> > >> > If anyone else is interested and willing, please review the > latest > >> > patch > >> > >> > here[1]. > >> > >> > > >> > >> > aditya... > >> > >> > > >> > >> > [1] https://github.com/apache/drill/pull/443/files > >> > >> > > >> > >> > >> > > > >> > > > >> > > >> > > > > >
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67711244 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { +assert part.isCompositePartition(); +if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1) { + newPartitions.add(part); + if (isSinglePartition) { // only need to do this if we are already single partition +// compose the array of partition values for the directories that are referenced by filter: +// e.g suppose the dir hierarchy is year/quarter/month and the query is: +// SELECT * FROM T WHERE dir0=2015 AND dir1 = 'Q1', +// then for 2015/Q1/Feb, this will have ['2015', 'Q1', null] --- End diff -- For WHERE condition dir0=2015 and dir2 = 'Jan', if the dataset happens to have only one 'Jan' under '2015' directory, will this qualify for singlePartitionOpt? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67709299 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -320,7 +377,17 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR condition = condition.accept(reverseVisitor); pruneCondition = pruneCondition.accept(reverseVisitor); - RelNode inputRel = descriptor.createTableScan(newPartitions); + String cacheFileRoot = null; + if (checkForSingle && isSinglePartition) { +// if metadata cache file could potentially be used, then assign a proper cacheFileRoot +String path = ""; +for (int j = 0; j <= maxIndex; j++) { + path += "/" + spInfo[j]; --- End diff -- Related to Line 313, here we do not check spInfo[j] == null ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
RE: [GitHub] drill issue #517: DRILL-4704 fix
I ran the Windows TestDecimal unit tests with ExecConstants.java modified as suggested. The failure looks like follows: Operating system: Windows 7 Windows hack for parquet: setting hadoop.home.dir to c:\winutil\ Running org.apache.drill.exec.physical.impl.TestDecimal#testCastFromFloat Query: { "head" : { "version" : 1, "generator" : { "type" : "org.apache.drill.exec.planner.logical.DrillImplementor", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "resultMode" : "EXEC" }, graph:[ { @id:1, pop:"fs-scan", format: {type: "json"}, storage:{type: "file", connection: "classpath:///"}, files:["/input_simple_decimal.json"] }, { "pop" : "project", "@id" : 2, "exprs" : [ { "ref" : "F4", "expr" : " (cast(DEC9 as float4)) " }, { "ref" : "F8", "expr": "(cast(DEC18 as float8))" } ], "child" : 1 }, { "pop" : "project", "@id" : 4, "exprs" : [ { "ref" : "DECIMAL_9", "expr" : " cast(F4 as decimal9(9, 4)) " }, {"ref": "DECIMAL38", "expr" : "cast(F8 as decimal38sparse(38, 4))"} ], "child" : 2 }, { "pop" : "screen", "@id" : 5, "child" : 4 } ] } mapException1: java.util.concurrent.ExecutionException: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: PatternSyntaxException: Unexpected internal error near index 1 \ ^ [Error Id: f99ad9ee-bc5b-4001-9568-cb806a1a2875 on DaveOshinsky-PC.gp.cv.commvault.com:31010] (org.apache.drill.exec.work.foreman.ForemanSetupException) Failure while parsing physical plan. org.apache.drill.exec.work.foreman.Foreman.parseAndRunPhysicalPlan():391 org.apache.drill.exec.work.foreman.Foreman.run():248 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 Caused By (com.fasterxml.jackson.databind.JsonMappingException) Instantiation of [simple type, class org.apache.drill.exec.store.dfs.easy.EasyGroupScan] value failed (java.util.regex.PatternSyntaxException): Unexpected internal error near index 1 \ ^ at [Source: { "head" : { "version" : 1, "generator" : { "type" : "org.apache.drill.exec.planner.logical.DrillImplementor", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "resultMode" : "EXEC" }, graph:[ { @id:1, pop:"fs-scan", format: {type: "json"}, storage:{type: "file", connection: "classpath:///"}, files:["/input_simple_decimal.json"] }, { "pop" : "project", "@id" : 2, "exprs" : [ { "ref" : "F4", "expr" : " (cast(DEC9 as float4)) " }, { "ref" : "F8", "expr": "(cast(DEC18 as float8))" } ], "child" : 1 }, { "pop" : "project", "@id" : 4, "exprs" : [ { "ref" : "DECIMAL_9", "expr" : " cast(F4 as decimal9(9, 4)) " }, {"ref": "DECIMAL38", "expr" : "cast(F8 as decimal38sparse(38, 4))"} ], "child" : 2 }, { "pop" : "screen", "@id" : 5, "child" : 4 } ] }; line: 18, column: 3] (through reference chain: org.apache.drill.exec.physical.PhysicalPlan["graph"]->java.util.ArrayList[0]) com.fasterxml.jackson.databind.JsonMappingException.from():223 com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException():445 com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem():464 com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith():258 com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build():135 com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():444 com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1123 com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():298 com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1094 com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():166 com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():135 com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():120 com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():91 com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():142 com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize():279 com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize():249 com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize():26 com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize():490 com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping():465 com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializ