Re: [DISCUSS] Pull Request Cleanup

2022-03-03 Thread Z0ltrix
Hi Charles,

what process would you suggest?

I would think some devs are using a PR to keep the work open for memory and/or 
others can discuss it but of course, if its stale for months maybe it will 
never make any more progress.
Perhaps someone could trigger a comment and ask for further development, but 
who would be responsible for that trigger?

Regards
Christian




\ Original-Nachricht 
Am 3. März 2022, 17:54, Charles Givre schrieb:

>
>
>
> Hello all,
> I wanted to discuss the possibility of doing a cleanup of open and stale pull 
> requests. There seem to be about 10 PRs that are actively being worked, then 
> we have a bunch of PRs of various stages of staleness.
>
> What do you all think about having some sort of process for closing out old 
> PRs that are not actively being worked?
> Best,
> \-- C

publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: New Committer: Tengfei Wang

2022-03-03 Thread Z0ltrix
congratulations Tengfei!



\ Original-Nachricht 
Am 3. März 2022, 13:52, Charles Givre schrieb:

>
>
>
> The Project Management Committee (PMC) for Apache Drill
> has invited Tengfei Wang to become a committer and we are pleased
> to announce that he has accepted.
>
> Being a committer enables easier contribution to the
> project since there is no need to go via the patch
> submission process. This should enable better productivity.
> A PMC member helps manage and guide the direction of the project.
> Please join me in congratulating Tengfei!

publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: thinking of our Ukrainian friends

2022-02-24 Thread Z0ltrix
oh my goodness i hope this will end soon.
Stay safe!

--- Original Message ---

luoc  schrieb am Donnerstag, 24. Februar 2022 um 10:24:

> Vitalii and Vova are my Ukrainian friends, hopefully they will stay safe as 
> well.
> 

> > On Feb 24, 2022, at 14:39, Ted Dunning ted.dunn...@gmail.com wrote:
> > 

> > For commercial historical reasons many of the people who have contributed
> > 

> > to Drill live in Ukraine.
> > 

> > My heart is with them tonight. I hope they stay safe.

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


AW: Re: WG: Superset Drill Time Range Filter

2022-02-23 Thread Z0ltrix
When i manually resend the query with TIMESTAMP:
WHERE `startTime` >= TIMESTAMP '2022-02-14 00:00:00.00'
  AND `startTime` < TIMESTAMP '2022-02-21 00:00:00.00'
ORDER BY `startTime` DESC

Everything is fine, but superset doesnt create the query this way.

fyi... i have already created a issue in superset 
https://github.com/apache/superset/issues/18869 


--- Original Message ---

James Turton  schrieb am Mittwoch, 23. Februar 2022 um 12:46:

> As a matter of interest, if you test directly against Drill with the
> 

> following timestamp literal expressions, what happens?
> 

> SELECT *
> 

> FROM dfs.foo.bar
> 

> WHERE `startTime` >= timestamp '2022-02-14 00:00:00.00'
> 

> AND `startTime` < timestamp '2022-02-21 00:00:00.00'
> 

> ORDER BY `startTime` DESC
> 

> On 2022/02/23 11:56, Z0ltrix wrote:
> 

> > Hi drill devs,
> > 

> > we have a problem with our superset -> drill connection with time
> > 

> > range filters, as described below.
> > 

> > Superset sends the following to drill:
> > 

> > WHERE `startTime` >= '2022-02-14 00:00:00.00'
> > 

> > AND `startTime` < '2022-02-21 00:00:00.00'
> > 

> > ORDER BY `startTime` DESC
> > 

> > and i get the following error:
> > 

> > SYSTEM ERROR: ClassCastException: 
> > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast 
> > to org.apache.drill.exec.expr.holders.TimeStampHolder
> > 

> > Please, refer to logs for more information.
> > 

> > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> > during fragment initialization: 
> > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast 
> > to org.apache.drill.exec.expr.holders.TimeStampHolder
> > 

> > org.apache.drill.exec.work.foreman.Foreman.run():305
> > 

> > java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> > 

> > java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> > 

> > java.lang.Thread.run():748
> > 

> > Caused By (java.lang.ClassCastException) 
> > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast 
> > to org.apache.drill.exec.expr.holders.TimeStampHolder
> > 

> > org.apache.drill.exec.expr.FilterBuilder.getValueExpressionFromConst():208
> > 

> > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():240
> > 

> > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58
> > 

> > org.apache.drill.common.expression.FunctionHolderExpression.accept():53
> > 

> > org.apache.drill.exec.expr.FilterBuilder.generateNewExpressions():268
> > 

> > org.apache.drill.exec.expr.FilterBuilder.handleCompareFunction():278
> > 

> > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():246
> > 

> > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58
> > 

> > org.apache.drill.common.expression.FunctionHolderExpression.accept():53
> > 

> > org.apache.drill.exec.expr.FilterBuilder.buildFilterPredicate():80
> > 

> > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.getFilterPredicate():317
> > 

> > org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch():150
> > 

> > org.apache.drill.exec.store.parquet.ParquetPushDownFilter$2.onMatch():103
> > 

> > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule():319
> > 

> > org.apache.calcite.plan.hep.HepPlanner.applyRule():561
> > 

> > org.apache.calcite.plan.hep.HepPlanner.applyRules():420
> > 

> > org.apache.calcite.plan.hep.HepPlanner.executeInstruction():257
> > 

> > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute():127
> > 

> > org.apache.calcite.plan.hep.HepPlanner.executeProgram():216
> > 

> > org.apache.calcite.plan.hep.HepPlanner.findBestExp():203
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():419
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():370
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():353
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():536
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():178
> > 

> > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():216
> > 

> > org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():121
> > 

> > org.apache.drill.exec.pla

AW: Re: Superset Drill Time Range Filter

2022-02-23 Thread Z0ltrix
I tested drill 1.16 and 1.20 hadoop2 rc4 with same behaviour

--- Original Message ---

luoc  schrieb am Mittwoch, 23. Februar 2022 um 11:27:

> Which Drill version are you running?
> 

> > On Feb 23, 2022, at 17:57, Z0ltrix z0lt...@pm.me.invalid wrote:
> > 

> > Hi drill devs,
> > 

> > we have a problem with our superset -> drill connection with time range 
> > filters, as described below.
> > 

> > Superset sends the following to drill:
> > 

> > WHERE `startTime` >= '2022-02-14 00:00:00.00'
> > 

> > AND `startTime` < '2022-02-21 00:00:00.00'
> > 

> > ORDER BY `startTime` DESC
> > 

> > and i get the following error:
> > 

> > SYSTEM ERROR: ClassCastException: 
> > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast 
> > to org.apache.drill.exec.expr.holders.TimeStampHolder
> > 

> > Please, refer to logs for more information.
> > 

> > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> > during fragment initialization: 
> > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast 
> > to org.apache.drill.exec.expr.holders.TimeStampHolder
> > 

> > org.apache.drill.exec.work.foreman.Foreman.run():305
> > 

> > java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> > 

> > java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> > 

> > java.lang.Thread.run():748
> > 

> > Caused By (java.lang.ClassCastException) 
> > org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast 
> > to org.apache.drill.exec.expr.holders.TimeStampHolder
> > 

> > org.apache.drill.exec.expr.FilterBuilder.getValueExpressionFromConst():208
> > 

> > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():240
> > 

> > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58
> > 

> > org.apache.drill.common.expression.FunctionHolderExpression.accept():53
> > 

> > org.apache.drill.exec.expr.FilterBuilder.generateNewExpressions():268
> > 

> > org.apache.drill.exec.expr.FilterBuilder.handleCompareFunction():278
> > 

> > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():246
> > 

> > org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58
> > 

> > org.apache.drill.common.expression.FunctionHolderExpression.accept():53
> > 

> > org.apache.drill.exec.expr.FilterBuilder.buildFilterPredicate():80
> > 

> > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.getFilterPredicate():317
> > 

> > org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch():150
> > 

> > org.apache.drill.exec.store.parquet.ParquetPushDownFilter$2.onMatch():103
> > 

> > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule():319
> > 

> > org.apache.calcite.plan.hep.HepPlanner.applyRule():561
> > 

> > org.apache.calcite.plan.hep.HepPlanner.applyRules():420
> > 

> > org.apache.calcite.plan.hep.HepPlanner.executeInstruction():257
> > 

> > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute():127
> > 

> > org.apache.calcite.plan.hep.HepPlanner.executeProgram():216
> > 

> > org.apache.calcite.plan.hep.HepPlanner.findBestExp():203
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():419
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():370
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():353
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():536
> > 

> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():178
> > 

> > org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():216
> > 

> > org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():121
> > 

> > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():87
> > 

> > org.apache.drill.exec.work.foreman.Foreman.runSQL():593
> > 

> > org.apache.drill.exec.work.foreman.Foreman.run():276
> > 

> > java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> > 

> > java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> > 

> > java.lang.Thread.run():748
> > 

> > When i manually resend the query with TIMESTAMP as here:
> > 

> > WHERE `startTime` >= TIMESTAMP '2022-02-14 00:00:00.00'
> > 

> > AND `startTime` < TIMESTAMP '2022-02-

WG: Superset Drill Time Range Filter

2022-02-23 Thread Z0ltrix
Hi drill devs,

we have a problem with our superset -> drill connection with time range 
filters, as described below.

Superset sends the following to drill:
WHERE `startTime` >= '2022-02-14 00:00:00.00'
  AND `startTime` < '2022-02-21 00:00:00.00'
ORDER BY `startTime` DESC

and i get the following error:

SYSTEM ERROR: ClassCastException: 
org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to 
org.apache.drill.exec.expr.holders.TimeStampHolder


Please, refer to logs for more information.


  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
during fragment initialization: 
org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to 
org.apache.drill.exec.expr.holders.TimeStampHolder
org.apache.drill.exec.work.foreman.Foreman.run():305
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
  Caused By (java.lang.ClassCastException) 
org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to 
org.apache.drill.exec.expr.holders.TimeStampHolder
org.apache.drill.exec.expr.FilterBuilder.getValueExpressionFromConst():208
org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():240
org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58
org.apache.drill.common.expression.FunctionHolderExpression.accept():53
org.apache.drill.exec.expr.FilterBuilder.generateNewExpressions():268
org.apache.drill.exec.expr.FilterBuilder.handleCompareFunction():278
org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():246
org.apache.drill.exec.expr.FilterBuilder.visitFunctionHolderExpression():58
org.apache.drill.common.expression.FunctionHolderExpression.accept():53
org.apache.drill.exec.expr.FilterBuilder.buildFilterPredicate():80

org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.getFilterPredicate():317
org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch():150
org.apache.drill.exec.store.parquet.ParquetPushDownFilter$2.onMatch():103
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule():319
org.apache.calcite.plan.hep.HepPlanner.applyRule():561
org.apache.calcite.plan.hep.HepPlanner.applyRules():420
org.apache.calcite.plan.hep.HepPlanner.executeInstruction():257
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute():127
org.apache.calcite.plan.hep.HepPlanner.executeProgram():216
org.apache.calcite.plan.hep.HepPlanner.findBestExp():203
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():419
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():370
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():353

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():536
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():178
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():216
org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():121
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():87
org.apache.drill.exec.work.foreman.Foreman.runSQL():593
org.apache.drill.exec.work.foreman.Foreman.run():276
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748

When i manually resend the query with TIMESTAMP as here:
WHERE `startTime` >= TIMESTAMP '2022-02-14 00:00:00.00'
  AND `startTime` < TIMESTAMP '2022-02-21 00:00:00.00'
ORDER BY `startTime` DESC

Everything is fine, but superset doesnt create the query this way.

So, now to my question:Is this error message legit because of the missing 
"TIMESTAMP" before the timestamp string, or do we have a problem here in drill?

Regards 
Christian

--- Original Message ---
Z0ltrix  schrieb am Mittwoch, 23. Februar 2022 um 10:49:

> Hello superset devs,
> 

> we have a problem with our superset -> drill connection with time range 
> filters.
> 

> When we filter a dashboard by time range (last week, month, etc.) i get an 
> 

> SYSTEM ERROR: ClassCastException: 
> org.apache.drill.exec.expr.holders.NullableTimeStampHolder cannot be cast to 
> org.apache.drill.exec.expr.holders.TimeStampHolder
> 

> from drill.
> 

> I dont want to talk here too much about the drill error because this is a 
> topic for the drill project, but i think we could solve this also by adding 
> something to db_engine_specs/drill.py
> 

> Superset sends the following to drill:
> 

> WHERE `startTime` >= '2022-02-14 00:00:00.00'
>   AND `startTime` < '2022-02-21 00:00:00.00'
> ORDER BY `startTime` DESC
> 

> Superset should send the following filter:
> WHERE `startTime` &

AW: Re: [DISCUSS] Impersonation design patterns

2022-02-22 Thread Z0ltrix
Hi James,

i think most companies must use impersonation to use a system like drill 
because of security restrictions. My Customer for example is not allowed to use 
any system which is not able to run the query as the enduser at the storage 
sytem because of storing the audit log. 


Impersonation in HDFS, HBase and Phoenix enables us to see the real user in 
Ranger Audits and only through that feature im able to create complex acl's in 
Ranger. 


We could never use for example a cassandra Backend with Drill because i would 
not be able to impersonate this :/ 


Nevertheless... this is not only a problem of drill. Cassandra itself has no 
impersonation feature (correct me if im wring) so drill has no chance at the 
moment to do this. 


Maybe DRILL-7871 could solve this, but im not sure if this is the right 
approach... Every user has to be a Mini-Admin of the system because managing 
storage plugins is not suitable for every user.

Regards
Christian

--- Original Message ---

James Turton  schrieb am Dienstag, 22. Februar 2022 um 08:55:

> Errata: an improvement of the working definition of impersonation inline
> 

> below.
> 

> On 2022/02/22 08:27, James Turton wrote:
> 

> > Drill has supported impersonation, which I'll use here to mean any
> > 

> > mechanism by which Drill accesses an external system as the end user
> > 

> > rather than some system-wide identity to which it has access e.g. the
> > 

> > OS user, a service principal or credentials in a storage
> > 

> > configuration, env var or config file.

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: [VOTE] Release Apache Drill 1.20.0 - RC4

2022-02-18 Thread Z0ltrix
+1 for release.

 - Installed Hadoop2 RC4 in our developement-environment on aws ec2 (ubuntu 
18.04)
   - zookeeper 3.6.5, 

   - hadoop 2.9.2 

   - hbase 1.5.0
   - phoenix 4.15.0 

   - phoenix-queryserver 1.0.0
   - everything secured by kerberos
   - everything tls encrypted
   - everything impersonated
 - Run Queries agains Parquet Files stored in HDFS (impersonated) + INT96 
Timestamps
 - Run Queries against HBase (impersonated)
 - Run Queries against Phoenix (impersonated)
 - Run UNION ALL Querie agains HBase + HDFS (Parquet) to simulate Lambda Dataset
 - Run UNION ALL Querie agains Phoneix + HDFS (Parquet) to simulate Lambda 
Dataset
 - Run ANALYZE TABLE COMPUTE STATISTICS on HDFS Parquet Talbes (Iceberg 
Metastore)
 - Run ANALYZE TABLE REFRESH METADATA on HDFS Parquet Talbes (Iceberg Metastore)
 - Run Queries against the iceberg metastore to simulate icequerg format plugin 
reads
 - Tested some Superset and Tableau Dashboards over ODBC Connection 
(impersonated)
 - Tested some Queries from Nifi over JDBC Connection

Regards 


Christian

--- Original Message ---

James Turton  schrieb am Donnerstag, 17. Februar 2022 um 
19:53:

> Hi all
> 

> I'd like to propose the fifth release candidate (RC4) of Apache Drill,
> 

> version 1.20.0 which differs from the previous RC in the following.
> 

> DRILL-8139: Parquet CodecFactory thread safety bug (#2463)
> 

> DRILL-8134: Cannot query Parquet INT96 columns as timestamps (#2460)
> 

> DRILL-8122: Change kafka metadata obtaining due to KAFKA-5697 (#2456)
> 

> DRILL-8137: Prevent reading union inputs after cancellation request (#2462)
> 

> The release candidate covers a total of 117 resolved JIRAs [1]. Thanks
> 

> to everyone who contributed to this release.
> 

> The tarball artifacts are hosted at [2][3] and the maven artifacts are
> 

> hosted at [4][5].
> 

> This release candidate is based on commits
> 

> 753bff39d8dd08eaa1273eadc20175d34a87e044 and
> 

> 9955d082bcdba401666799f49a6cd3c3f996af97 located at [6][7].
> 

> Please download and try out the release.
> 

> [ ] +1
> 

> [ ] +0
> 

> [ ] -1
> 

> [1]
> 

> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301=12313820
> 

> [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc4/
> 

> [3]
> 

> https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-hadoop2-rc4/
> 

> (Hadoop 2 build)
> 

> [4] https://repository.apache.org/content/repositories/orgapachedrill-1094/
> 

> [5]
> 

> https://repository.apache.org/content/repositories/orgapachedrill-1095/
> 

> (Hadoop 2 build)
> 

> [6] https://github.com/jnturton/drill/commits/drill-1.20.0
> 

> [7] https://github.com/jnturton/drill/commits/drill-1.20.0-hadoop2
> 

> (Hadoop 2 build)

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


AW: [VOTE] Release Apache Drill 1.20.0 - RC0

2022-02-07 Thread Z0ltrix
+1 for release.

 - Installed rc0 in our testing-environemnt on aws ec2 (ubuntu 18.04)
   - zookeeper 3.6.3, 

   - hadoop 3.2.1 

   - hbase 2.4.8
   - phoenix 5.1.2 

   - phoenix-queryserver 6.0.0
   - everything secured by kerberos
   - everything tls encrypted
   - everything impersonated
 - Run Queries agains Parquet Files stored in HDFS (impersonated)
 - Run Queries against HBase (impersonated)
 - Run Queries against Phoenix (impersonated)
 - Tested HBasePStoreProvider

Regards 

Christian

--- Original Message ---

James Turton  schrieb am Samstag, 5. Februar 2022 um 10:11:

> Hi all
> 

>  Note from the release manager.
> 

> The normal RC announcement follows below but please take note that while
> 

> you should test and try this Hadoop 3-based RC 0 of Drill 1.20.0, there
> 

> is likely to be another RC which ships both Hadoop 2 and Hadoop 3 builds
> 

> as soon as I have got some advice on the best was to incorporate this in
> 

> our release process. However, that RC will be based on exactly the same
> 

> commit as this one is (assuming no issues are found), so please do test
> 

> this one every bit as much as you would have.
> 

> - Thank, James
> 

> I'd like to propose the first release candidate (RC0) of Apache Drill,
> 

> version 1.20.0.
> 

> The release candidate covers a total of 105 resolved JIRAs [1]. Thanks
> 

> to everyone who contributed to this release.
> 

> The tarball artifacts are hosted at [2] and the maven artifacts are
> 

> hosted at [3].
> 

> This release candidate is based on commit
> 

> 556b972560911c20691d5b5de6c656d22c59ce0b located at [4].
> 

> Please download and try out the release.
> 

> The vote ends at 2022-02-08 10:00 UTC ≅ 3×24 hours after the timestamp
> 

> on this email.
> 

> [ ] +1
> 

> [ ] +0
> 

> [ ] -1
> 

> [1]
> 

> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301=12313820
> 

> [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0/
> 

> [3] https://repository.apache.org/content/repositories/orgapachedrill-1087/
> 

> [4] https://github.com/jnturton/drill/commits/drill-1.20.0

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


AW: Re: using JDBC to connect to Drill

2022-02-03 Thread Z0ltrix
sure, i´ll create a PR in the next days

--- Original Message ---

luoc  schrieb am Donnerstag, 3. Februar 2022 um 11:55:

> Hi Christian,
> 

> Is it possible to add this tips to the docs? It is recommended that we add it 
> to the "Troubleshooting" section, thanks!
> 

> https://drill.apache.org/docs/troubleshooting/
> 

> > On Feb 3, 2022, at 18:46, Z0ltrix z0lt...@pm.me.invalid wrote:
> > 

> > Christian

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


AW: Re: using JDBC to connect to Drill

2022-02-03 Thread Z0ltrix
Hi Jorge,

its all about dns...

One solution is to configure /etc/hosts on each machine correctly, so that each 
server can reach the others by address resolution.

the other trick is to set the internal IP address of the drillbit as hostname 
in drill-env.sh
export DRILL_HOST_NAME=`10.x.x.x`

Drill uses this to register itself within zookeeper and gives this information 
back to the client.

Regards 

Christian

‐‐‐ Original Message ‐‐‐

luoc  schrieb am Donnerstag, 3. Februar 2022 um 11:28:

> Hi Jorge,
> 

> It seems that we have answered this question before. Let me find it first..
> 

> https://github.com/apache/drill/issues/2415
> 

> > On Feb 3, 2022, at 17:28, Jorge Alvarado alvarad...@live.com wrote:
> > 

> > Hi Drill community,
> > 

> > I'm trying to connect to drill 1.19 using JDBC,
> > 

> > For context: I have a VM running zookeeper and another VM running drillbit.
> > 

> > The web UI is working fine, the queries are working fine.
> > 

> > In my maven dependency I have:
> > 

> > org.apache.drill.exec
> > 

> > drill-jdbc-all
> > 

> > 1.19.0
> > 

> > In my code:
> > 

> > Connection conn = null;
> > 

> > String url = "jdbc:drill:zk= > cloud>:2181;schema=common1";
> > 

> > String query = "SELECT * FROM `common1`.`products.json` LIMIT 10";
> > 

> > Statement stmt = null;
> > 

> > ResultSet result = null;
> > 

> >conn = DriverManager.getConnection(url);
> >stmt = conn.createStatement();
> >result = null;
> >String column1,column2;
> >result = stmt.executeQuery(query);
> > 

> > 

> > When I run the console app I have a bunch of errors but the most prominent 
> > is:
> > 

> > CONNECTION : java.net.UnknownHostException: drill3.internal.cloudapp.net: 
> > Name or service not known
> > 

> > drill3.internal.cloudapp.net is exactly the name that appears on the drill 
> > web UI for my only drillbit node.
> > 

> > It makes sense that it cannot resolve as it looks like an internal address 
> > so what I updated my hosts file (on my dev pc) to resolve the public ip 
> > address of the drill node, but it still gives me the same error.
> > 

> > Do you have any ideas how to make my Java app to resolve the internal 
> > address?
> > 

> > thanks in advance
> > 

> > Jorge

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: [VOTE] Freeze for Drill 1.20

2022-02-02 Thread Z0ltrix
\+1 from me

Regards
Christian



\ Original-Nachricht 
Am 2. Feb. 2022, 17:13, Vitalii Diravka schrieb:

>
>
>
> \+1
>
> Kind regards
> Vitalii
>
>
> On Wed, Feb 2, 2022 at 6:04 PM Vova Vysotskyi  wrote:
>
> > +1
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> > On 2022/02/02 15:59:55 James Turton wrote:
> > > PR \#2449 was merged and there are now zero Dependabot alerts against
> > > master.
> > >
> > > +1 for freezing from me.
> > >
> > > On 2022/02/02 16:36, Charles Givre wrote:
> > > > Assuming we pass dependabot, big +1 from me!! Great work everyone!
> > > > --C
> > > >
> > > >> On Feb 2, 2022, at 9:35 AM, James Turton  wrote:
> > > >>
> > > >> Please vote again on the assumption that the very minor Postgresql
> > 42.3.1 -> 42.3.2 PR will be merged, clearing the last Dependabot alert. It
> > passed local testing so it looks like a safe bet.
> > > >>
> > > >>
> > > >> On 2022/01/30 01:51, Charles Givre wrote:
> > > >>> Hey James,
> > > >>> Alas... I'm afraid I'd have to give a -1 on this. There are some
> > dependabot alerts at the moment, which we really should resolve (or at
> > least look at) before cutting a release. One of which has is linked to a
> > severe CVE. Also, I just submitted a VERY minor bug fix which I'd love to
> > squeak into this release, but that's not urgent.
> > > >>> Best,
> > > >>> --C
> > > >>>
> > > >>>
> > > >>>> On Jan 29, 2022, at 7:36 AM, James Turton  wrote:
> > > >>>>
> > > >>>> Hello Dev community
> > > >>>>
> > > >>>> Not a moment too soon, we've finally dispatched the last issues
> > holding back 1.20! Here's a big thank you from the release manager to
> > everyone who helped to push us forward to this point. I'm sure I'm not the
> > only one receiving the "When it's coming??" questions. As an interesting
> > bit of trivia, there have been about 9 months separating recent releases
> > and it has now been about 8 months since 1.19. Who knew we were so
> > consistent ;-) ?
> > > >>>>
> > > >>>> Please vote for or against a feature freeze on the master branch.
> > I assume only critical bug or vulnerability fixes get freeze immunity?
> > > >>>>
> > > >>>> Thank you
> > > >>>> James
> > > >>>>
> > > >>
> > > >
> > >
> >
>

publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: [VOTE] Freeze for Drill 1.20

2022-01-29 Thread Z0ltrix
I've been eagerly awaiting the release with the Phoenix connection, so i would 
like to see the feature freeze.

Regards
Christian



\ Original-Nachricht 
Am 29. Jan. 2022, 13:36, James Turton schrieb:

>
>
>
> Hello Dev community
>
> Not a moment too soon, we've finally dispatched the last issues holding
> back 1.20! Here's a big thank you from the release manager to everyone
> who helped to push us forward to this point. I'm sure I'm not the only
> one receiving the "When it's coming??" questions. As an interesting bit
> of trivia, there have been about 9 months separating recent releases and
> it has now been about 8 months since 1.19. Who knew we were so
> consistent ;-) ?
>
> Please vote for or against a feature freeze on the master branch. I
> assume only critical bug or vulnerability fixes get freeze immunity?
>
> Thank you
> James

publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] New Committer: PJ Fanning

2022-01-24 Thread Z0ltrix
Congratulations PJ!

Best Regards
Christian

\ Original-Nachricht 
Am 24. Jan. 2022, 18:15, Charles Givre schrieb:

>
>
>
> The Project Management Committee (PMC) for Apache Drill is pleased to 
> announce that we have invited PJ Fanning to join us as a committer to the 
> Drill project. PJ is a committer and PMC member for the Apache POI project 
> and author of the Excel Streaming library which Drill uses for the Excel 
> reader. He has contributed numerous fixes and assistance to Drill relating to 
> the Drill's Excel reader. Please join me in congratulating PJ and welcoming 
> him as a committer!
>
> Best,
> Charles Givre
> PMC Chair, Apache Drill

publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] James Turton as PMC Member

2022-01-24 Thread Z0ltrix
Great to hear this...
Congratulations James!

Best Regards,
Christian Original-Nachricht 
Am 24. Jan. 2022, 18:29, Charles Givre schrieb:

>
>
>
> The Project Management Committee (PMC) for Apache Drill is pleased to 
> announce that we have invited James Turton to join us as a PMC member of the 
> Drill project and he has accepted. Please join me in congratulating James and 
> welcoming him to the PMC!
>
>
> Best,
> Charles Givre
> PMC Chair, Apache Drill
>
>
>
>
> Charles Givre
> Founder, CEO DataDistillr
> Email: char...@datadistillr.com
> Phone: + 443-762-3286
> Book a Meeting 30 min <https://calendly.com/datadistillr-ceo/30min> • 60 min 
> <https://calendly.com/datadistillr-ceo/60min?month=2021-03>
> LinkedIn @cgivre <https://www.linkedin.com/in/cgivre/>
> GitHub @cgivre <https://github.com/cgivre>
> <https://www.datadistillr.com/>
>

publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


AW: Re: [DISCUSS] Per User Access Controls

2022-01-13 Thread Z0ltrix
Hi @All,

for me, that uses Drill with a kerberized hadoop cluster and Ranger as central 
Access-Control-System i would love to have an Ranger-Plugin for Drill, but i 
would assume a lot Drill users just spins up a cluster in front of S3 or azure.

So why not using a generic approach with GRANT and REVOKE for users and groups 
on specific workspaces, or at least storage plugins?

With that an admin can control which users and groups can access all storage 
plugins we have, no matter if the underneath plugin has such a system. 


Maybe we could use the Metastore to store such information?

Regards,
Christian

‐‐‐ Original Message ‐‐‐

Paul Rogers  schrieb am Donnerstag, 13. Januar 2022 um 23:40:

> Hey All,
> 

> Other members of the Hadoop Ecosystem rely on external systems to handle
> 

> permissions: Ranger or Sentry. There is probably something different in the
> 

> AWS world.
> 

> As you look into security, you'll see that you need to maintain permissions
> 

> on many entities: files, connections, etc. You need different permissions:
> 

> read, write, create, etc. In larger groups of people, you need roles: admin
> 

> role, sales analyst role, production engineer role. Users map to roles, and
> 

> roles take permissions.
> 

> Creating this just for Drill is not effective: no one wants to learn a
> 

> Drill "Security Store" any more than folks want to learn the "Drill
> 

> metastore". Drill is seldom the only tool in a shop: people want to set
> 

> permissions in one place, not in each tool. So, we should integrate with
> 

> existing tools.
> 

> Drill should provide an API, and be prepared to enforce rules. Drill
> 

> defines the entities that can be secured, and the available permissions.
> 

> Then, it is up to an external system to provide user identity, take tuples
> 

> of (user, resource, permission) and return a boolean of whether that user
> 

> is authorized or not. MapR, Pam, Hadoop and other systems would be
> 

> implemented on top of the Drill permissions API, as would whatever need you
> 

> happen to have.
> 

> Thanks,
> 

> -   Paul
> 

> On Thu, Jan 13, 2022 at 12:32 PM Curtis Lambert cur...@datadistillr.com
> 

> wrote:
> 

> > This is what we are handling with Vault outside of Drill, combined with
> > 

> > aliasing. James is tracking some of what you've been finding with the
> > 

> > credential store but even then we want the single source of auth. We can
> > 

> > chat with James on the next Drill stand up (and anyone else who wants to
> > 

> > feel the pain).
> > 

> > [image: avatar]
> > 

> > Curtis Lambert
> > 

> > CTO
> > 

> > Email:
> > 

> > cur...@datadistillr.com cur...@datdistillr.com
> > 

> > Phone:
> > 

> > -   706-402-0249
> > 

> > [image: LinkedIn]LinkedIn
> > 

> > https://www.linkedin.com/in/curtis-lambert-2009b2141/ [image: Calendly]
> > 

> > Calendly https://calendly.com/curtis283/generic-zoom
> > 

> > [image: Data Distillr logo] https://www.datadistillr.com/
> > 

> > On Thu, Jan 13, 2022 at 3:29 PM Charles Givre cgi...@gmail.com wrote:
> > 

> > > Hello all,
> > > 

> > > One of the issues we've been dancing around is having per-user access
> > > 

> > > controls in Drill. As Drill was originally built around the Hadoop
> > > 

> > > ecosystem, the Hadoop based connections make use of user-impersonation
> > > 

> > > for
> > > 

> > > per user access controls. However, a rather glaring deficiency is the
> > > 

> > > lack
> > > 

> > > of per-user access controls for connections like JDBC, Mongo, Splunk etc.
> > > 

> > > Recently when I was working on OAuth pull request, it occurred to me that
> > > 

> > > we might be able to slightly extend the credential provider interface to
> > > 

> > > allow for per-user credentials. Here's what I was thinking...
> > > 

> > > A bit of background: The credential provider interface is really an
> > > 

> > > abstraction for a HashMap. Here's my proposal The cred provider
> > > 

> > > interface would store two hashmaps, one for per-user creds and one for
> > > 

> > > global creds. When a user is authenticated to Drill, when they create a
> > > 

> > > storage plugin connection, the credential provider would associate the
> > > 

> > > creds with their Drill username. The storage plugins that use credential
> > > 

> > > provider would thus get per-user credentials.
> > > 

> > > If users did not want per-user credentials, they could simply use direct
> > > 

> > > credentials OR use specify that in the credential provider classes. What
> > > 

> > > do you think?
> > > 

> > > Best,
> > > 

> > > -- C

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Restarting the Arrow Conversation

2022-01-03 Thread Z0ltrix
Hi Charles, Ted, and the others here,

it is very interesting to hear the evolution of Drill, Dremio and Arrow in that 
context and thank you Charles for restarting that discussion.

I think, and James mentioned this in the PR as well, that Drill could benefit 
from the continues progress, the Arrow project has made since its separation 
from Drill. And the arrow Community seems to be large, so i assume this goes on 
and on with improvements, new features, etc. but i have not enough experience 
in Drill internals to have an Idea in which mass of refactoring this would lead.

In addition to that, im not aware of the current roadmap of Arrow and if these 
would fit into Drills roadmap. Maybe Arrow would go into a different direction 
than Drill and what should we do, if Drill is bound to Arrow then?

On the other hand, Arrow could help Drill to a wider adoption with clients like 
pyarrow, arrow-flight, various other programming languages etc. and (im not 
sure about that) maybe its a performance benefit if Drill use Arrow to read 
Data from HDFS(example), useses Arrow to work with it during execution and 
gives the vectors directly to my Python(example) programm via arrow-flight so 
that i can Play around with Pandas, etc.

Just some thoughts i have since i have used Dremio with pyarrow and Drill with 
odbc connections.

Regards
Christian
\ Original-Nachricht 
Am 3. Jan. 2022, 20:08, Charles Givre schrieb:

>
>
>
> Thanks Ted for the perspective! I had always wished to be a "fly on the wall" 
> in those conversations. :-)
> \-- C
>
> > On Jan 3, 2022, at 11:00 AM, Charles Givre  wrote:
> >
> > Hello all,
> > There was a discussion in a recently closed PR \[1\] with a discussion 
> > between z0ltrix, James Turton and a few others about integrating Drill with 
> > Apache Arrow and wondering why it was never done. I'd like to share my 
> > perspective as someone who has been around Drill for some time but also as 
> > someone who never worked for MapR or Dremio. This just represents my 
> > understanding of events as an outsider, and I could be wrong about some or 
> > all of this. Please forgive (or correct) any inaccuracies.
> >
> > When I first learned of Arrow and the idea of integrating Arrow with Drill, 
> > the thing that interested me the most was the ability to move data between 
> > platforms without having to serialize/deserialize the data. From my 
> > understanding, MapR did some research and didn't find a significant 
> > performance advantage and hence didn't really pursue the integration. The 
> > other side of it was that it would require a significant amount of work to 
> > refactor major parts of Drill.
> >
> > I don't know the internal politics, but this was one of the major points of 
> > diversion between Dremio and Drill.
> >
> > With that said, there was a renewed discussion on the list \[2\] where Paul 
> > Rogers proposed what he described as a "Crude but Effective" approach to an 
> > Arrow integration.
> >
> > This is in the email link but here was a part of Paul's email:
> >
> >> Charles, just brainstorming a bit, I think the easiest way to start is to 
> >> create a simple, stand-alone server that speaks Arrow to the client, and 
> >> uses the native Drill client to speak to Drill. The native Drill client 
> >> exposes Drill value vectors. One trick would be to convert Drill vectors 
> >> to the Arrow format. I think that data vectors are the same format. 
> >> Possibly offset vectors. I think Arrow went its own way with null-value 
> >> (Drill's is-set) vectors. So, some conversion might be a no-op, others 
> >> might need to rewrite a vector. Good thing, this is purely at the vector 
> >> level, so would be easy to write. The next issue is the one that Parth has 
> >> long pointed out: Drill and Arrow each have their own memory allocators. 
> >> How could we share a data vector between the two? The simplest initial 
> >> solution is just to copy the data from Drill to Arrow. Slow, but 
> >> transparent to the client. A crude first-approximation of the development 
> >> steps:
> >>
> >> A crude first-approximation of the development steps:
> >> 1. Create the client shell server.
> >> 2. Implement the Arrow client protocol. Need some way to accept a query 
> >> and return batches of results.
> >> 3. Forward the query to Drill using the native Drill client.
> >> 4. As a first pass, copy vectors from Drill to Arrow and return them to 
> >> the client.
> >> 5. Then, solve that memory allocator problem to pass data without copying.
> >
> > 

AW: Re: [LAZY VOTE] Drill 1.20 freeze delay

2021-12-11 Thread Z0ltrix
Hi James,

what do you think about DRILL-8055 ?
I finished the implementation and if the review goes well the users were able 
to use newest Druid Version with Drill from 1.20 on.
For one of my customers it would be great to include this because we want to 
add Druid 0.22 in our Cluster next Year and without this Ticket we were not be 
able to use Drill in front of Druid.

Regards,
Christian

‐‐‐ Original Message ‐‐‐
James Turton  schrieb am Donnerstag, 9. Dezember 2021 um 
10:03:

> Apologies, I was aiming for next week Friday and had an off-by-one error. So 
> we've agreed to delay until 2021-12-17 and we're tracking the following.
> 

> Closed
> 

> DRILL-1282 Parquet v2 read+write
> 

> DRILL-7863 Phoenix storage
> 

> DRILL-7983 Get running/completed profiles from REST API
> 

> DRILL-8027 Iceberg format
> 

> DRILL-8009 JDBC isValid()
> 

> DRILL-8067 Web UI tries to fetch CSS from Google
> 

> DRILL-8069 Remove use of excel sheet getLastRowNum
> 

> Under review
> 

> DRILL-8015 MongoDB Metastore
> 

> DRILL-8028 PDF format
> 

> DRILL-8073 Add support for persistent table and storage aliases
> 

> Under development
> 

> DRILL-7978 Fixed width format
> 

> DRILL-8057 INFORMATION_SCHEMA filter push down is inefficient (feasibility 
> not yet clear)
> 

> DRILL-8061 Impersonation support for Phoenix plugin
> 

> On 2021/12/08 17:40, James Turton wrote:
> 

> > Dear dev community
> > 

> > Please reply if you *object* to us pushing out the freeze date by one week 
> > to 2021-12-16. The motivation to delay is to try to include more of the 
> > open PRs that we are tracking below, a number of which are essentially 
> > dev-complete but not yet over the line.
> > 

> > Closed
> > 

> > DRILL-1282 Parquet v2 read+write
> > 

> > DRILL-7863 Phoenix storage
> > 

> > DRILL-8027 Iceberg format
> > 

> > DRILL-8009 JDBC isValid()
> > 

> > Open
> > 

> > DRILL-7978 Fixed width format
> > 

> > DRILL-7983 Get running/completed profiles from REST API
> > 

> > DRILL-8015 MongoDB Metastore
> > 

> > DRILL-8028 PDF format
> > 

> > DRILL-8057 INFORMATION_SCHEMA filter push down is inefficient (feasibility 
> > not yet clear)
> > 

> > Thank you
> > 

> > James

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


AW: Re: [DISCUSS] Refactoring Drill's CSV (Text) Reader

2021-11-17 Thread Z0ltrix
I would appreciate such a change.

Each time i introduce drill to users i start with a csv example and its hard to 
explain why it has to be so difficult just to read a simple csv file.

Discover Datatypes would be cool, but it has not the highest priority. Casting 
by Users is fine until they have an intuitive way to query the strings.

‐‐‐ Original Message ‐‐‐

Ted Dunning  schrieb am Donnerstag, 18. November 2021 um 
07:17:

> I think that these would be significant improvements.
> 

> The current behavior is pretty painful on average. Better defaults and just
> 

> a bit of deduction could pay off big. I even think that the presence of
> 

> headers might be pretty reliably inferred.
> 

> On Wed, Nov 17, 2021 at 4:31 PM Charles Givre cgi...@gmail.com wrote:
> 

> > Hello Drill Community,
> > 

> > I would like to put forward some thoughts I've had relating to the CSV
> > 

> > reader in Drill. I would like to propose a few changes which could
> > 

> > actually be breaking changes, so I wanted to see if there are any strongly
> > 

> > held opinions in the community. Here goes:
> > 

> > The Problems:
> > 

> > 1.  The default behavior for Drill is to leave the extractColumnHeaders
> > 

> > option as false. When a user queries a CSV file this way, the results 
> > are
> > 

> > returned in a list of columns called columns. Thus if a user wants the
> > 

> > first column, they would project columns[0]. I have never been a fan of
> > 

> > this behavior. Even though Drill ships with the csvh file extension 
> > which
> > 

> > enables the header extraction, this is not a commonly used file format.
> > 

> > Furthermore, the returned results (the column list) does not work well 
> > with
> > 

> > BI tools.
> > 

> > 2.  The CSV reader does not attempt to do any kind of data type discovery.
> > 

> > 

> > Proposed Changes:
> > 

> > The overall goal is to make it easier to query CSV data and also to make
> > 

> > the behavior more consistent across format plugins.
> > 

> > 1.  Change the default behavior and set the extractHeaders to true.
> > 2.  Other formats, like the excel reader, read tables directly into
> > 

> > columns. If the header is not known, Drill assigns a name of field_n. I
> > 

> > would propose replacing the `columns` array with a model similar to the
> > 

> > Excel reader.
> > 3.  Implement schema discovery (data types) with an allTextMode option
> > 

> > similar to the JSON reader. When the allTextMode is disabled, the CSV
> > 

> > reader would attempt to infer data types.
> > 

> > Since there are some breaking changes here, I'd like to ask if people have
> > 

> > any strong feelings on this topic or suggestions.
> > 

> > Thanks!,
> > 

> > -- C

publickey - z0ltrix@pm.me - 0xF0E154C5.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature