Re: [DISCUSS] Pull Request Cleanup
Hi Charles, what process would you suggest? I would think some devs are using a PR to keep the work open for memory and/or others can discuss it but of course, if its stale for months maybe it will never make any more progress. Perhaps someone could trigger a comment and ask for further development, but who would be responsible for that trigger? Regards Christian \ Original-Nachricht Am 3. März 2022, 17:54, Charles Givre schrieb: > > > > Hello all, > I wanted to discuss the possibility of doing a cleanup of open and stale pull > requests. There seem to be about 10 PRs that are actively being worked, then > we have a bunch of PRs of various stages of staleness. > > What do you all think about having some sort of process for closing out old > PRs that are not actively being worked? > Best, > \-- C publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: New Committer: Tengfei Wang
congratulations Tengfei! \ Original-Nachricht Am 3. März 2022, 13:52, Charles Givre schrieb: > > > > The Project Management Committee (PMC) for Apache Drill > has invited Tengfei Wang to become a committer and we are pleased > to announce that he has accepted. > > Being a committer enables easier contribution to the > project since there is no need to go via the patch > submission process. This should enable better productivity. > A PMC member helps manage and guide the direction of the project. > Please join me in congratulating Tengfei! publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: thinking of our Ukrainian friends
oh my goodness i hope this will end soon. Stay safe! --- Original Message --- luoc schrieb am Donnerstag, 24. Februar 2022 um 10:24: > Vitalii and Vova are my Ukrainian friends, hopefully they will stay safe as > well. > > > On Feb 24, 2022, at 14:39, Ted Dunning ted.dunn...@gmail.com wrote: > > > > For commercial historical reasons many of the people who have contributed > > > > to Drill live in Ukraine. > > > > My heart is with them tonight. I hope they stay safe. publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
AW: Re: WG: Superset Drill Time Range Filter
When i manually resend the query with TIMESTAMP: WHERE `startTime` >= TIMESTAMP '2022-02-14 00:00:00.00' AND `startTime` < TIMESTAMP '2022-02-21 00:00:00.00' ORDER BY `startTime` DESC Everything is fine, but superset doesnt create the query this way. fyi... i have already created a issue in superset https://github.com/apache/superset/issues/18869 --- Original Message --- James Turton schrieb am Mittwoch, 23. Februar 2022 um 12:46: > As a matter of interest, if you test directly against Drill with the > > following timestamp literal expressions, what happens? > > SELECT * > > FROM dfs.foo.bar > > WHERE `startTime` >= timestamp '2022-02-14 00:00:00.00' > > AND `startTime` < timestamp '2022-02-21 00:00:00.00' > > ORDER BY `startTime` DESC > > On 2022/02/23 11:56, Z0ltrix wrote: > > > Hi drill devs, > > > > we have a problem with our superset -> drill connection with time > > > > range filters, as described below. > > > > Superset sends the following to drill: > > > > WHERE `startTime` >= '2022-02-14 00:00:00.00' > > > > AND `startTime` < '2022-02-21 00:00:00.00' > > > > ORDER BY `startTime` DESC > > > > and i get the following error: > > > > SYSTEM ERROR: ClassCastException: > > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast > > to org.apache.drill.exec.expr.holders.TimeStampHolder > > > > Please, refer to logs for more information. > > > > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > > during fragment initialization: > > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast > > to org.apache.drill.exec.expr.holders.TimeStampHolder > > > > org.apache.drill.exec.work.foreman.Foreman.run():305 > > > > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > > > > java.lang.Thread.run():748 > > > > Caused By (java.lang.ClassCastException) > > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast > > to org.apache.drill.exec.expr.holders.TimeStampHolder > > > > org.apache.drill.exec.expr.FilterBuilder.getValueExpressionFromConst():208 > > > > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():240 > > > > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58 > > > > org.apache.drill.common.expression.FunctionHolderExpression.accept():53 > > > > org.apache.drill.exec.expr.FilterBuilder.generateNewExpressions():268 > > > > org.apache.drill.exec.expr.FilterBuilder.handleCompareFunction():278 > > > > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():246 > > > > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58 > > > > org.apache.drill.common.expression.FunctionHolderExpression.accept():53 > > > > org.apache.drill.exec.expr.FilterBuilder.buildFilterPredicate():80 > > > > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.getFilterPredicate():317 > > > > org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch():150 > > > > org.apache.drill.exec.store.parquet.ParquetPushDownFilter$2.onMatch():103 > > > > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule():319 > > > > org.apache.calcite.plan.hep.HepPlanner.applyRule():561 > > > > org.apache.calcite.plan.hep.HepPlanner.applyRules():420 > > > > org.apache.calcite.plan.hep.HepPlanner.executeInstruction():257 > > > > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute():127 > > > > org.apache.calcite.plan.hep.HepPlanner.executeProgram():216 > > > > org.apache.calcite.plan.hep.HepPlanner.findBestExp():203 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():419 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():370 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():353 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():536 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():178 > > > > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():216 > > > > org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():121 > > > > org.apache.drill.exec.pla
AW: Re: Superset Drill Time Range Filter
I tested drill 1.16 and 1.20 hadoop2 rc4 with same behaviour --- Original Message --- luoc schrieb am Mittwoch, 23. Februar 2022 um 11:27: > Which Drill version are you running? > > > On Feb 23, 2022, at 17:57, Z0ltrix z0lt...@pm.me.invalid wrote: > > > > Hi drill devs, > > > > we have a problem with our superset -> drill connection with time range > > filters, as described below. > > > > Superset sends the following to drill: > > > > WHERE `startTime` >= '2022-02-14 00:00:00.00' > > > > AND `startTime` < '2022-02-21 00:00:00.00' > > > > ORDER BY `startTime` DESC > > > > and i get the following error: > > > > SYSTEM ERROR: ClassCastException: > > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast > > to org.apache.drill.exec.expr.holders.TimeStampHolder > > > > Please, refer to logs for more information. > > > > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > > during fragment initialization: > > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast > > to org.apache.drill.exec.expr.holders.TimeStampHolder > > > > org.apache.drill.exec.work.foreman.Foreman.run():305 > > > > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > > > > java.lang.Thread.run():748 > > > > Caused By (java.lang.ClassCastException) > > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast > > to org.apache.drill.exec.expr.holders.TimeStampHolder > > > > org.apache.drill.exec.expr.FilterBuilder.getValueExpressionFromConst():208 > > > > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():240 > > > > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58 > > > > org.apache.drill.common.expression.FunctionHolderExpression.accept():53 > > > > org.apache.drill.exec.expr.FilterBuilder.generateNewExpressions():268 > > > > org.apache.drill.exec.expr.FilterBuilder.handleCompareFunction():278 > > > > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():246 > > > > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58 > > > > org.apache.drill.common.expression.FunctionHolderExpression.accept():53 > > > > org.apache.drill.exec.expr.FilterBuilder.buildFilterPredicate():80 > > > > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.getFilterPredicate():317 > > > > org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch():150 > > > > org.apache.drill.exec.store.parquet.ParquetPushDownFilter$2.onMatch():103 > > > > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule():319 > > > > org.apache.calcite.plan.hep.HepPlanner.applyRule():561 > > > > org.apache.calcite.plan.hep.HepPlanner.applyRules():420 > > > > org.apache.calcite.plan.hep.HepPlanner.executeInstruction():257 > > > > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute():127 > > > > org.apache.calcite.plan.hep.HepPlanner.executeProgram():216 > > > > org.apache.calcite.plan.hep.HepPlanner.findBestExp():203 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():419 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():370 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():353 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():536 > > > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():178 > > > > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():216 > > > > org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():121 > > > > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():87 > > > > org.apache.drill.exec.work.foreman.Foreman.runSQL():593 > > > > org.apache.drill.exec.work.foreman.Foreman.run():276 > > > > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > > > > java.lang.Thread.run():748 > > > > When i manually resend the query with TIMESTAMP as here: > > > > WHERE `startTime` >= TIMESTAMP '2022-02-14 00:00:00.00' > > > > AND `startTime` < TIMESTAMP '2022-02-
WG: Superset Drill Time Range Filter
Hi drill devs, we have a problem with our superset -> drill connection with time range filters, as described below. Superset sends the following to drill: WHERE `startTime` >= '2022-02-14 00:00:00.00' AND `startTime` < '2022-02-21 00:00:00.00' ORDER BY `startTime` DESC and i get the following error: SYSTEM ERROR: ClassCastException: org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to org.apache.drill.exec.expr.holders.TimeStampHolder Please, refer to logs for more information. (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to org.apache.drill.exec.expr.holders.TimeStampHolder org.apache.drill.exec.work.foreman.Foreman.run():305 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 Caused By (java.lang.ClassCastException) org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to org.apache.drill.exec.expr.holders.TimeStampHolder org.apache.drill.exec.expr.FilterBuilder.getValueExpressionFromConst():208 org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():240 org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58 org.apache.drill.common.expression.FunctionHolderExpression.accept():53 org.apache.drill.exec.expr.FilterBuilder.generateNewExpressions():268 org.apache.drill.exec.expr.FilterBuilder.handleCompareFunction():278 org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():246 org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58 org.apache.drill.common.expression.FunctionHolderExpression.accept():53 org.apache.drill.exec.expr.FilterBuilder.buildFilterPredicate():80 org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.getFilterPredicate():317 org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch():150 org.apache.drill.exec.store.parquet.ParquetPushDownFilter$2.onMatch():103 org.apache.calcite.plan.AbstractRelOptPlanner.fireRule():319 org.apache.calcite.plan.hep.HepPlanner.applyRule():561 org.apache.calcite.plan.hep.HepPlanner.applyRules():420 org.apache.calcite.plan.hep.HepPlanner.executeInstruction():257 org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute():127 org.apache.calcite.plan.hep.HepPlanner.executeProgram():216 org.apache.calcite.plan.hep.HepPlanner.findBestExp():203 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():419 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():370 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():353 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():536 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():178 org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():216 org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():121 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():87 org.apache.drill.exec.work.foreman.Foreman.runSQL():593 org.apache.drill.exec.work.foreman.Foreman.run():276 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 When i manually resend the query with TIMESTAMP as here: WHERE `startTime` >= TIMESTAMP '2022-02-14 00:00:00.00' AND `startTime` < TIMESTAMP '2022-02-21 00:00:00.00' ORDER BY `startTime` DESC Everything is fine, but superset doesnt create the query this way. So, now to my question:Is this error message legit because of the missing "TIMESTAMP" before the timestamp string, or do we have a problem here in drill? Regards Christian --- Original Message --- Z0ltrix schrieb am Mittwoch, 23. Februar 2022 um 10:49: > Hello superset devs, > > we have a problem with our superset -> drill connection with time range > filters. > > When we filter a dashboard by time range (last week, month, etc.) i get an > > SYSTEM ERROR: ClassCastException: > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to > org.apache.drill.exec.expr.holders.TimeStampHolder > > from drill. > > I dont want to talk here too much about the drill error because this is a > topic for the drill project, but i think we could solve this also by adding > something to db_engine_specs/drill.py > > Superset sends the following to drill: > > WHERE `startTime` >= '2022-02-14 00:00:00.00' > AND `startTime` < '2022-02-21 00:00:00.00' > ORDER BY `startTime` DESC > > Superset should send the following filter: > WHERE `startTime` &
AW: Re: [DISCUSS] Impersonation design patterns
Hi James, i think most companies must use impersonation to use a system like drill because of security restrictions. My Customer for example is not allowed to use any system which is not able to run the query as the enduser at the storage sytem because of storing the audit log. Impersonation in HDFS, HBase and Phoenix enables us to see the real user in Ranger Audits and only through that feature im able to create complex acl's in Ranger. We could never use for example a cassandra Backend with Drill because i would not be able to impersonate this :/ Nevertheless... this is not only a problem of drill. Cassandra itself has no impersonation feature (correct me if im wring) so drill has no chance at the moment to do this. Maybe DRILL-7871 could solve this, but im not sure if this is the right approach... Every user has to be a Mini-Admin of the system because managing storage plugins is not suitable for every user. Regards Christian --- Original Message --- James Turton schrieb am Dienstag, 22. Februar 2022 um 08:55: > Errata: an improvement of the working definition of impersonation inline > > below. > > On 2022/02/22 08:27, James Turton wrote: > > > Drill has supported impersonation, which I'll use here to mean any > > > > mechanism by which Drill accesses an external system as the end user > > > > rather than some system-wide identity to which it has access e.g. the > > > > OS user, a service principal or credentials in a storage > > > > configuration, env var or config file. publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: [VOTE] Release Apache Drill 1.20.0 - RC4
+1 for release. - Installed Hadoop2 RC4 in our developement-environment on aws ec2 (ubuntu 18.04) - zookeeper 3.6.5, - hadoop 2.9.2 - hbase 1.5.0 - phoenix 4.15.0 - phoenix-queryserver 1.0.0 - everything secured by kerberos - everything tls encrypted - everything impersonated - Run Queries agains Parquet Files stored in HDFS (impersonated) + INT96 Timestamps - Run Queries against HBase (impersonated) - Run Queries against Phoenix (impersonated) - Run UNION ALL Querie agains HBase + HDFS (Parquet) to simulate Lambda Dataset - Run UNION ALL Querie agains Phoneix + HDFS (Parquet) to simulate Lambda Dataset - Run ANALYZE TABLE COMPUTE STATISTICS on HDFS Parquet Talbes (Iceberg Metastore) - Run ANALYZE TABLE REFRESH METADATA on HDFS Parquet Talbes (Iceberg Metastore) - Run Queries against the iceberg metastore to simulate icequerg format plugin reads - Tested some Superset and Tableau Dashboards over ODBC Connection (impersonated) - Tested some Queries from Nifi over JDBC Connection Regards Christian --- Original Message --- James Turton schrieb am Donnerstag, 17. Februar 2022 um 19:53: > Hi all > > I'd like to propose the fifth release candidate (RC4) of Apache Drill, > > version 1.20.0 which differs from the previous RC in the following. > > DRILL-8139: Parquet CodecFactory thread safety bug (#2463) > > DRILL-8134: Cannot query Parquet INT96 columns as timestamps (#2460) > > DRILL-8122: Change kafka metadata obtaining due to KAFKA-5697 (#2456) > > DRILL-8137: Prevent reading union inputs after cancellation request (#2462) > > The release candidate covers a total of 117 resolved JIRAs [1]. Thanks > > to everyone who contributed to this release. > > The tarball artifacts are hosted at [2][3] and the maven artifacts are > > hosted at [4][5]. > > This release candidate is based on commits > > 753bff39d8dd08eaa1273eadc20175d34a87e044 and > > 9955d082bcdba401666799f49a6cd3c3f996af97 located at [6][7]. > > Please download and try out the release. > > [ ] +1 > > [ ] +0 > > [ ] -1 > > [1] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301=12313820 > > [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc4/ > > [3] > > https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-hadoop2-rc4/ > > (Hadoop 2 build) > > [4] https://repository.apache.org/content/repositories/orgapachedrill-1094/ > > [5] > > https://repository.apache.org/content/repositories/orgapachedrill-1095/ > > (Hadoop 2 build) > > [6] https://github.com/jnturton/drill/commits/drill-1.20.0 > > [7] https://github.com/jnturton/drill/commits/drill-1.20.0-hadoop2 > > (Hadoop 2 build) publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
AW: [VOTE] Release Apache Drill 1.20.0 - RC0
+1 for release. - Installed rc0 in our testing-environemnt on aws ec2 (ubuntu 18.04) - zookeeper 3.6.3, - hadoop 3.2.1 - hbase 2.4.8 - phoenix 5.1.2 - phoenix-queryserver 6.0.0 - everything secured by kerberos - everything tls encrypted - everything impersonated - Run Queries agains Parquet Files stored in HDFS (impersonated) - Run Queries against HBase (impersonated) - Run Queries against Phoenix (impersonated) - Tested HBasePStoreProvider Regards Christian --- Original Message --- James Turton schrieb am Samstag, 5. Februar 2022 um 10:11: > Hi all > > Note from the release manager. > > The normal RC announcement follows below but please take note that while > > you should test and try this Hadoop 3-based RC 0 of Drill 1.20.0, there > > is likely to be another RC which ships both Hadoop 2 and Hadoop 3 builds > > as soon as I have got some advice on the best was to incorporate this in > > our release process. However, that RC will be based on exactly the same > > commit as this one is (assuming no issues are found), so please do test > > this one every bit as much as you would have. > > - Thank, James > > I'd like to propose the first release candidate (RC0) of Apache Drill, > > version 1.20.0. > > The release candidate covers a total of 105 resolved JIRAs [1]. Thanks > > to everyone who contributed to this release. > > The tarball artifacts are hosted at [2] and the maven artifacts are > > hosted at [3]. > > This release candidate is based on commit > > 556b972560911c20691d5b5de6c656d22c59ce0b located at [4]. > > Please download and try out the release. > > The vote ends at 2022-02-08 10:00 UTC ≅ 3×24 hours after the timestamp > > on this email. > > [ ] +1 > > [ ] +0 > > [ ] -1 > > [1] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301=12313820 > > [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0/ > > [3] https://repository.apache.org/content/repositories/orgapachedrill-1087/ > > [4] https://github.com/jnturton/drill/commits/drill-1.20.0 publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
AW: Re: using JDBC to connect to Drill
sure, i´ll create a PR in the next days --- Original Message --- luoc schrieb am Donnerstag, 3. Februar 2022 um 11:55: > Hi Christian, > > Is it possible to add this tips to the docs? It is recommended that we add it > to the "Troubleshooting" section, thanks! > > https://drill.apache.org/docs/troubleshooting/ > > > On Feb 3, 2022, at 18:46, Z0ltrix z0lt...@pm.me.invalid wrote: > > > > Christian publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
AW: Re: using JDBC to connect to Drill
Hi Jorge, its all about dns... One solution is to configure /etc/hosts on each machine correctly, so that each server can reach the others by address resolution. the other trick is to set the internal IP address of the drillbit as hostname in drill-env.sh export DRILL_HOST_NAME=`10.x.x.x` Drill uses this to register itself within zookeeper and gives this information back to the client. Regards Christian ‐‐‐ Original Message ‐‐‐ luoc schrieb am Donnerstag, 3. Februar 2022 um 11:28: > Hi Jorge, > > It seems that we have answered this question before. Let me find it first.. > > https://github.com/apache/drill/issues/2415 > > > On Feb 3, 2022, at 17:28, Jorge Alvarado alvarad...@live.com wrote: > > > > Hi Drill community, > > > > I'm trying to connect to drill 1.19 using JDBC, > > > > For context: I have a VM running zookeeper and another VM running drillbit. > > > > The web UI is working fine, the queries are working fine. > > > > In my maven dependency I have: > > > > org.apache.drill.exec > > > > drill-jdbc-all > > > > 1.19.0 > > > > In my code: > > > > Connection conn = null; > > > > String url = "jdbc:drill:zk= > cloud>:2181;schema=common1"; > > > > String query = "SELECT * FROM `common1`.`products.json` LIMIT 10"; > > > > Statement stmt = null; > > > > ResultSet result = null; > > > >conn = DriverManager.getConnection(url); > >stmt = conn.createStatement(); > >result = null; > >String column1,column2; > >result = stmt.executeQuery(query); > > > > > > When I run the console app I have a bunch of errors but the most prominent > > is: > > > > CONNECTION : java.net.UnknownHostException: drill3.internal.cloudapp.net: > > Name or service not known > > > > drill3.internal.cloudapp.net is exactly the name that appears on the drill > > web UI for my only drillbit node. > > > > It makes sense that it cannot resolve as it looks like an internal address > > so what I updated my hosts file (on my dev pc) to resolve the public ip > > address of the drill node, but it still gives me the same error. > > > > Do you have any ideas how to make my Java app to resolve the internal > > address? > > > > thanks in advance > > > > Jorge publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: [VOTE] Freeze for Drill 1.20
\+1 from me Regards Christian \ Original-Nachricht Am 2. Feb. 2022, 17:13, Vitalii Diravka schrieb: > > > > \+1 > > Kind regards > Vitalii > > > On Wed, Feb 2, 2022 at 6:04 PM Vova Vysotskyi wrote: > > > +1 > > > > Kind regards, > > Volodymyr Vysotskyi > > > > On 2022/02/02 15:59:55 James Turton wrote: > > > PR \#2449 was merged and there are now zero Dependabot alerts against > > > master. > > > > > > +1 for freezing from me. > > > > > > On 2022/02/02 16:36, Charles Givre wrote: > > > > Assuming we pass dependabot, big +1 from me!! Great work everyone! > > > > --C > > > > > > > >> On Feb 2, 2022, at 9:35 AM, James Turton wrote: > > > >> > > > >> Please vote again on the assumption that the very minor Postgresql > > 42.3.1 -> 42.3.2 PR will be merged, clearing the last Dependabot alert. It > > passed local testing so it looks like a safe bet. > > > >> > > > >> > > > >> On 2022/01/30 01:51, Charles Givre wrote: > > > >>> Hey James, > > > >>> Alas... I'm afraid I'd have to give a -1 on this. There are some > > dependabot alerts at the moment, which we really should resolve (or at > > least look at) before cutting a release. One of which has is linked to a > > severe CVE. Also, I just submitted a VERY minor bug fix which I'd love to > > squeak into this release, but that's not urgent. > > > >>> Best, > > > >>> --C > > > >>> > > > >>> > > > >>>> On Jan 29, 2022, at 7:36 AM, James Turton wrote: > > > >>>> > > > >>>> Hello Dev community > > > >>>> > > > >>>> Not a moment too soon, we've finally dispatched the last issues > > holding back 1.20! Here's a big thank you from the release manager to > > everyone who helped to push us forward to this point. I'm sure I'm not the > > only one receiving the "When it's coming??" questions. As an interesting > > bit of trivia, there have been about 9 months separating recent releases > > and it has now been about 8 months since 1.19. Who knew we were so > > consistent ;-) ? > > > >>>> > > > >>>> Please vote for or against a feature freeze on the master branch. > > I assume only critical bug or vulnerability fixes get freeze immunity? > > > >>>> > > > >>>> Thank you > > > >>>> James > > > >>>> > > > >> > > > > > > > > > > publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: [VOTE] Freeze for Drill 1.20
I've been eagerly awaiting the release with the Phoenix connection, so i would like to see the feature freeze. Regards Christian \ Original-Nachricht Am 29. Jan. 2022, 13:36, James Turton schrieb: > > > > Hello Dev community > > Not a moment too soon, we've finally dispatched the last issues holding > back 1.20! Here's a big thank you from the release manager to everyone > who helped to push us forward to this point. I'm sure I'm not the only > one receiving the "When it's coming??" questions. As an interesting bit > of trivia, there have been about 9 months separating recent releases and > it has now been about 8 months since 1.19. Who knew we were so > consistent ;-) ? > > Please vote for or against a feature freeze on the master branch. I > assume only critical bug or vulnerability fixes get freeze immunity? > > Thank you > James publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: [ANNOUNCE] New Committer: PJ Fanning
Congratulations PJ! Best Regards Christian \ Original-Nachricht Am 24. Jan. 2022, 18:15, Charles Givre schrieb: > > > > The Project Management Committee (PMC) for Apache Drill is pleased to > announce that we have invited PJ Fanning to join us as a committer to the > Drill project. PJ is a committer and PMC member for the Apache POI project > and author of the Excel Streaming library which Drill uses for the Excel > reader. He has contributed numerous fixes and assistance to Drill relating to > the Drill's Excel reader. Please join me in congratulating PJ and welcoming > him as a committer! > > Best, > Charles Givre > PMC Chair, Apache Drill publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: [ANNOUNCE] James Turton as PMC Member
Great to hear this... Congratulations James! Best Regards, Christian Original-Nachricht Am 24. Jan. 2022, 18:29, Charles Givre schrieb: > > > > The Project Management Committee (PMC) for Apache Drill is pleased to > announce that we have invited James Turton to join us as a PMC member of the > Drill project and he has accepted. Please join me in congratulating James and > welcoming him to the PMC! > > > Best, > Charles Givre > PMC Chair, Apache Drill > > > > > Charles Givre > Founder, CEO DataDistillr > Email: char...@datadistillr.com > Phone: + 443-762-3286 > Book a Meeting 30 min <https://calendly.com/datadistillr-ceo/30min> • 60 min > <https://calendly.com/datadistillr-ceo/60min?month=2021-03> > LinkedIn @cgivre <https://www.linkedin.com/in/cgivre/> > GitHub @cgivre <https://github.com/cgivre> > <https://www.datadistillr.com/> > publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
AW: Re: [DISCUSS] Per User Access Controls
Hi @All, for me, that uses Drill with a kerberized hadoop cluster and Ranger as central Access-Control-System i would love to have an Ranger-Plugin for Drill, but i would assume a lot Drill users just spins up a cluster in front of S3 or azure. So why not using a generic approach with GRANT and REVOKE for users and groups on specific workspaces, or at least storage plugins? With that an admin can control which users and groups can access all storage plugins we have, no matter if the underneath plugin has such a system. Maybe we could use the Metastore to store such information? Regards, Christian ‐‐‐ Original Message ‐‐‐ Paul Rogers schrieb am Donnerstag, 13. Januar 2022 um 23:40: > Hey All, > > Other members of the Hadoop Ecosystem rely on external systems to handle > > permissions: Ranger or Sentry. There is probably something different in the > > AWS world. > > As you look into security, you'll see that you need to maintain permissions > > on many entities: files, connections, etc. You need different permissions: > > read, write, create, etc. In larger groups of people, you need roles: admin > > role, sales analyst role, production engineer role. Users map to roles, and > > roles take permissions. > > Creating this just for Drill is not effective: no one wants to learn a > > Drill "Security Store" any more than folks want to learn the "Drill > > metastore". Drill is seldom the only tool in a shop: people want to set > > permissions in one place, not in each tool. So, we should integrate with > > existing tools. > > Drill should provide an API, and be prepared to enforce rules. Drill > > defines the entities that can be secured, and the available permissions. > > Then, it is up to an external system to provide user identity, take tuples > > of (user, resource, permission) and return a boolean of whether that user > > is authorized or not. MapR, Pam, Hadoop and other systems would be > > implemented on top of the Drill permissions API, as would whatever need you > > happen to have. > > Thanks, > > - Paul > > On Thu, Jan 13, 2022 at 12:32 PM Curtis Lambert cur...@datadistillr.com > > wrote: > > > This is what we are handling with Vault outside of Drill, combined with > > > > aliasing. James is tracking some of what you've been finding with the > > > > credential store but even then we want the single source of auth. We can > > > > chat with James on the next Drill stand up (and anyone else who wants to > > > > feel the pain). > > > > [image: avatar] > > > > Curtis Lambert > > > > CTO > > > > Email: > > > > cur...@datadistillr.com cur...@datdistillr.com > > > > Phone: > > > > - 706-402-0249 > > > > [image: LinkedIn]LinkedIn > > > > https://www.linkedin.com/in/curtis-lambert-2009b2141/ [image: Calendly] > > > > Calendly https://calendly.com/curtis283/generic-zoom > > > > [image: Data Distillr logo] https://www.datadistillr.com/ > > > > On Thu, Jan 13, 2022 at 3:29 PM Charles Givre cgi...@gmail.com wrote: > > > > > Hello all, > > > > > > One of the issues we've been dancing around is having per-user access > > > > > > controls in Drill. As Drill was originally built around the Hadoop > > > > > > ecosystem, the Hadoop based connections make use of user-impersonation > > > > > > for > > > > > > per user access controls. However, a rather glaring deficiency is the > > > > > > lack > > > > > > of per-user access controls for connections like JDBC, Mongo, Splunk etc. > > > > > > Recently when I was working on OAuth pull request, it occurred to me that > > > > > > we might be able to slightly extend the credential provider interface to > > > > > > allow for per-user credentials. Here's what I was thinking... > > > > > > A bit of background: The credential provider interface is really an > > > > > > abstraction for a HashMap. Here's my proposal The cred provider > > > > > > interface would store two hashmaps, one for per-user creds and one for > > > > > > global creds. When a user is authenticated to Drill, when they create a > > > > > > storage plugin connection, the credential provider would associate the > > > > > > creds with their Drill username. The storage plugins that use credential > > > > > > provider would thus get per-user credentials. > > > > > > If users did not want per-user credentials, they could simply use direct > > > > > > credentials OR use specify that in the credential provider classes. What > > > > > > do you think? > > > > > > Best, > > > > > > -- C publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: [DISCUSS] Restarting the Arrow Conversation
Hi Charles, Ted, and the others here, it is very interesting to hear the evolution of Drill, Dremio and Arrow in that context and thank you Charles for restarting that discussion. I think, and James mentioned this in the PR as well, that Drill could benefit from the continues progress, the Arrow project has made since its separation from Drill. And the arrow Community seems to be large, so i assume this goes on and on with improvements, new features, etc. but i have not enough experience in Drill internals to have an Idea in which mass of refactoring this would lead. In addition to that, im not aware of the current roadmap of Arrow and if these would fit into Drills roadmap. Maybe Arrow would go into a different direction than Drill and what should we do, if Drill is bound to Arrow then? On the other hand, Arrow could help Drill to a wider adoption with clients like pyarrow, arrow-flight, various other programming languages etc. and (im not sure about that) maybe its a performance benefit if Drill use Arrow to read Data from HDFS(example), useses Arrow to work with it during execution and gives the vectors directly to my Python(example) programm via arrow-flight so that i can Play around with Pandas, etc. Just some thoughts i have since i have used Dremio with pyarrow and Drill with odbc connections. Regards Christian \ Original-Nachricht Am 3. Jan. 2022, 20:08, Charles Givre schrieb: > > > > Thanks Ted for the perspective! I had always wished to be a "fly on the wall" > in those conversations. :-) > \-- C > > > On Jan 3, 2022, at 11:00 AM, Charles Givre wrote: > > > > Hello all, > > There was a discussion in a recently closed PR \[1\] with a discussion > > between z0ltrix, James Turton and a few others about integrating Drill with > > Apache Arrow and wondering why it was never done. I'd like to share my > > perspective as someone who has been around Drill for some time but also as > > someone who never worked for MapR or Dremio. This just represents my > > understanding of events as an outsider, and I could be wrong about some or > > all of this. Please forgive (or correct) any inaccuracies. > > > > When I first learned of Arrow and the idea of integrating Arrow with Drill, > > the thing that interested me the most was the ability to move data between > > platforms without having to serialize/deserialize the data. From my > > understanding, MapR did some research and didn't find a significant > > performance advantage and hence didn't really pursue the integration. The > > other side of it was that it would require a significant amount of work to > > refactor major parts of Drill. > > > > I don't know the internal politics, but this was one of the major points of > > diversion between Dremio and Drill. > > > > With that said, there was a renewed discussion on the list \[2\] where Paul > > Rogers proposed what he described as a "Crude but Effective" approach to an > > Arrow integration. > > > > This is in the email link but here was a part of Paul's email: > > > >> Charles, just brainstorming a bit, I think the easiest way to start is to > >> create a simple, stand-alone server that speaks Arrow to the client, and > >> uses the native Drill client to speak to Drill. The native Drill client > >> exposes Drill value vectors. One trick would be to convert Drill vectors > >> to the Arrow format. I think that data vectors are the same format. > >> Possibly offset vectors. I think Arrow went its own way with null-value > >> (Drill's is-set) vectors. So, some conversion might be a no-op, others > >> might need to rewrite a vector. Good thing, this is purely at the vector > >> level, so would be easy to write. The next issue is the one that Parth has > >> long pointed out: Drill and Arrow each have their own memory allocators. > >> How could we share a data vector between the two? The simplest initial > >> solution is just to copy the data from Drill to Arrow. Slow, but > >> transparent to the client. A crude first-approximation of the development > >> steps: > >> > >> A crude first-approximation of the development steps: > >> 1. Create the client shell server. > >> 2. Implement the Arrow client protocol. Need some way to accept a query > >> and return batches of results. > >> 3. Forward the query to Drill using the native Drill client. > >> 4. As a first pass, copy vectors from Drill to Arrow and return them to > >> the client. > >> 5. Then, solve that memory allocator problem to pass data without copying. > > > >
AW: Re: [LAZY VOTE] Drill 1.20 freeze delay
Hi James, what do you think about DRILL-8055 ? I finished the implementation and if the review goes well the users were able to use newest Druid Version with Drill from 1.20 on. For one of my customers it would be great to include this because we want to add Druid 0.22 in our Cluster next Year and without this Ticket we were not be able to use Drill in front of Druid. Regards, Christian ‐‐‐ Original Message ‐‐‐ James Turton schrieb am Donnerstag, 9. Dezember 2021 um 10:03: > Apologies, I was aiming for next week Friday and had an off-by-one error. So > we've agreed to delay until 2021-12-17 and we're tracking the following. > > Closed > > DRILL-1282 Parquet v2 read+write > > DRILL-7863 Phoenix storage > > DRILL-7983 Get running/completed profiles from REST API > > DRILL-8027 Iceberg format > > DRILL-8009 JDBC isValid() > > DRILL-8067 Web UI tries to fetch CSS from Google > > DRILL-8069 Remove use of excel sheet getLastRowNum > > Under review > > DRILL-8015 MongoDB Metastore > > DRILL-8028 PDF format > > DRILL-8073 Add support for persistent table and storage aliases > > Under development > > DRILL-7978 Fixed width format > > DRILL-8057 INFORMATION_SCHEMA filter push down is inefficient (feasibility > not yet clear) > > DRILL-8061 Impersonation support for Phoenix plugin > > On 2021/12/08 17:40, James Turton wrote: > > > Dear dev community > > > > Please reply if you *object* to us pushing out the freeze date by one week > > to 2021-12-16. The motivation to delay is to try to include more of the > > open PRs that we are tracking below, a number of which are essentially > > dev-complete but not yet over the line. > > > > Closed > > > > DRILL-1282 Parquet v2 read+write > > > > DRILL-7863 Phoenix storage > > > > DRILL-8027 Iceberg format > > > > DRILL-8009 JDBC isValid() > > > > Open > > > > DRILL-7978 Fixed width format > > > > DRILL-7983 Get running/completed profiles from REST API > > > > DRILL-8015 MongoDB Metastore > > > > DRILL-8028 PDF format > > > > DRILL-8057 INFORMATION_SCHEMA filter push down is inefficient (feasibility > > not yet clear) > > > > Thank you > > > > James publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
AW: Re: [DISCUSS] Refactoring Drill's CSV (Text) Reader
I would appreciate such a change. Each time i introduce drill to users i start with a csv example and its hard to explain why it has to be so difficult just to read a simple csv file. Discover Datatypes would be cool, but it has not the highest priority. Casting by Users is fine until they have an intuitive way to query the strings. ‐‐‐ Original Message ‐‐‐ Ted Dunning schrieb am Donnerstag, 18. November 2021 um 07:17: > I think that these would be significant improvements. > > The current behavior is pretty painful on average. Better defaults and just > > a bit of deduction could pay off big. I even think that the presence of > > headers might be pretty reliably inferred. > > On Wed, Nov 17, 2021 at 4:31 PM Charles Givre cgi...@gmail.com wrote: > > > Hello Drill Community, > > > > I would like to put forward some thoughts I've had relating to the CSV > > > > reader in Drill. I would like to propose a few changes which could > > > > actually be breaking changes, so I wanted to see if there are any strongly > > > > held opinions in the community. Here goes: > > > > The Problems: > > > > 1. The default behavior for Drill is to leave the extractColumnHeaders > > > > option as false. When a user queries a CSV file this way, the results > > are > > > > returned in a list of columns called columns. Thus if a user wants the > > > > first column, they would project columns[0]. I have never been a fan of > > > > this behavior. Even though Drill ships with the csvh file extension > > which > > > > enables the header extraction, this is not a commonly used file format. > > > > Furthermore, the returned results (the column list) does not work well > > with > > > > BI tools. > > > > 2. The CSV reader does not attempt to do any kind of data type discovery. > > > > > > Proposed Changes: > > > > The overall goal is to make it easier to query CSV data and also to make > > > > the behavior more consistent across format plugins. > > > > 1. Change the default behavior and set the extractHeaders to true. > > 2. Other formats, like the excel reader, read tables directly into > > > > columns. If the header is not known, Drill assigns a name of field_n. I > > > > would propose replacing the `columns` array with a model similar to the > > > > Excel reader. > > 3. Implement schema discovery (data types) with an allTextMode option > > > > similar to the JSON reader. When the allTextMode is disabled, the CSV > > > > reader would attempt to infer data types. > > > > Since there are some breaking changes here, I'd like to ask if people have > > > > any strong feelings on this topic or suggestions. > > > > Thanks!, > > > > -- C publickey - z0ltrix@pm.me - 0xF0E154C5.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature