[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735764#comment-17735764 ] ASF GitHub Bot commented on DRILL-8353: --- kmatt commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1600977689 #2810, #2809 > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.21.0 > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735428#comment-17735428 ] ASF GitHub Bot commented on DRILL-8353: --- cgivre commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1599387786 @kmatt A github issue is good! Please be sure to tag @vvysotskyi in it as he was the original developer of this plugin. > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.21.0 > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735427#comment-17735427 ] ASF GitHub Bot commented on DRILL-8353: --- kmatt commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1599386040 @vvysotskyi https://issues.apache.org/jira/browse/DRILL-8442 Should this be a GitHub issue, or is Jira the correct place for it? > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.21.0 > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642681#comment-17642681 ] ASF GitHub Bot commented on DRILL-8353: --- kmatt commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1335803778 @cgivre @vvysotskyi Thanks, I missed the "will be" clause ;) > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: Future > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642591#comment-17642591 ] ASF GitHub Bot commented on DRILL-8353: --- cgivre commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1335460790 @kmatt This hasn't been implemented yet. That's why the query doesn't yet work. @vvysotskyi is working on that. :-) > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: Future > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642588#comment-17642588 ] ASF GitHub Bot commented on DRILL-8353: --- kmatt commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1335452201 The version function seems not to parse: ``` apache drill (dfs.delta)> select count(*) from table(dfs.delta.`delta_table`(type => 'delta')); ++ | EXPR$0 | ++ | 20 | ++ 1 row selected (0.157 seconds) apache drill (dfs.delta)> SELECT * 2..semicolon> FROM table(dfs.delta.`delta_table`(type => 'delta', version => 0)); Error: VALIDATION ERROR: From line 2, column 22 to line 2, column 75: No match found for function signature delta_table(type => , version => ) ``` > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: Future > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642321#comment-17642321 ] ASF GitHub Bot commented on DRILL-8353: --- vvysotskyi commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1334839612 Hi @kmatt, no, it is not supported yet, but will be added in the near future. The version will be specified using the table function. Here is the example query for it: ```sql SELECT * FROM table(dfs.delta.`/tmp/delta-table`(type => 'delta', version => 0)); ``` > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: Future > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642248#comment-17642248 ] ASF GitHub Bot commented on DRILL-8353: --- kmatt commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1334708491 @vvysotskyi Does this support VERSION AS OF queries? https://docs.delta.io/latest/quick-start.html#read-older-versions-of-data-using-time-travel Ex: `SELECT * FROM dfs.delta.`/tmp/delta-table` VERSION AS OF 0;` > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: Future > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641049#comment-17641049 ] ASF GitHub Bot commented on DRILL-8353: --- kmatt commented on PR #2702: URL: https://github.com/apache/drill/pull/2702#issuecomment-1331605371 On Windows 10 `git clone` fails due to a path length in this patch. Repo clones successfully on Debian 11. ``` git clone https://github.com/apache/drill.git Cloning into 'drill'... remote: Enumerating objects: 156537, done. remote: Counting objects: 100% (1323/1323), done. remote: Compressing objects: 100% (723/723), done. remote: Total 156537 (delta 322), reused 1119 (delta 218), pack-reused 155214Receiving objects: 100% (156537/156537), 62.00 MiB | 11.15 MiBReceiving objects: 100% (156537/156537), 65.97 MiB | 11.24 MiB/s, done. Resolving deltas: 100% (79075/79075), done. fatal: cannot create directory at 'contrib/format-deltalake/src/test/resources/data-reader-partition-values/as_int=0/as_long=0/as_byte=0/as_short=0/as_boolean=true/as_float=0.0/as_double=0.0/as_string=0/as_string_lit_null=null/as_date=2021-09-08/as_timestamp=2021-09-08 11%3A11%3A11': Filename too long warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry with 'git restore --source=HEAD :/' ``` > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: Future > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636733#comment-17636733 ] ASF GitHub Bot commented on DRILL-8353: --- cgivre merged PR #2702: URL: https://github.com/apache/drill/pull/2702 > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: Future > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636136#comment-17636136 ] ASF GitHub Bot commented on DRILL-8353: --- vvysotskyi commented on code in PR #2702: URL: https://github.com/apache/drill/pull/2702#discussion_r1027065550 ## contrib/format-deltalake/src/main/java/org/apache/drill/exec/store/delta/plan/DrillExprToDeltaTranslator.java: ## @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.delta.plan; + +import io.delta.standalone.expressions.And; +import io.delta.standalone.expressions.EqualTo; +import io.delta.standalone.expressions.Expression; +import io.delta.standalone.expressions.GreaterThan; +import io.delta.standalone.expressions.GreaterThanOrEqual; +import io.delta.standalone.expressions.IsNotNull; +import io.delta.standalone.expressions.IsNull; +import io.delta.standalone.expressions.LessThan; +import io.delta.standalone.expressions.LessThanOrEqual; +import io.delta.standalone.expressions.Literal; +import io.delta.standalone.expressions.Not; +import io.delta.standalone.expressions.Or; +import io.delta.standalone.expressions.Predicate; +import io.delta.standalone.types.StructType; +import org.apache.drill.common.FunctionNames; +import org.apache.drill.common.expression.FunctionCall; +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.PathSegment; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.expression.visitors.AbstractExprVisitor; + +public class DrillExprToDeltaTranslator extends AbstractExprVisitor { + + private final StructType structType; + + public DrillExprToDeltaTranslator(StructType structType) { +this.structType = structType; + } + + @Override + public Expression visitFunctionCall(FunctionCall call, Void value) { +try { + return visitFunctionCall(call); +} catch (Exception e) { + return null; +} + } + + private Predicate visitFunctionCall(FunctionCall call) { +switch (call.getName()) { + case FunctionNames.AND: { +Expression left = call.arg(0).accept(this, null); +Expression right = call.arg(1).accept(this, null); +if (left != null && right != null) { + return new And(left, right); +} +return null; + } + case FunctionNames.OR: { +Expression left = call.arg(0).accept(this, null); +Expression right = call.arg(1).accept(this, null); +if (left != null && right != null) { + return new Or(left, right); +} +return null; + } + case FunctionNames.NOT: { +Expression expression = call.arg(0).accept(this, null); +if (expression != null) { + return new Not(expression); +} +return null; + } + case FunctionNames.IS_NULL: { +LogicalExpression arg = call.arg(0); +if (arg instanceof SchemaPath) { + String name = getPath((SchemaPath) arg); + return new IsNull(structType.column(name)); +} +return null; + } + case FunctionNames.IS_NOT_NULL: { +LogicalExpression arg = call.arg(0); +if (arg instanceof SchemaPath) { + String name = getPath((SchemaPath) arg); + return new IsNotNull(structType.column(name)); +} +return null; + } + case FunctionNames.LT: { +LogicalExpression nameRef = call.arg(0); +Expression expression = call.arg(1).accept(this, null); +if (nameRef instanceof SchemaPath) { + String name = getPath((SchemaPath) nameRef); + return new LessThan(structType.column(name), expression); +} +return null; + } + case FunctionNames.LE: { +LogicalExpression nameRef = call.arg(0); +Expression expression = call.arg(1).accept(this, null); +if (nameRef instanceof SchemaPath) { + String name = getPath((SchemaPath) nameRef); + return new LessThanOrEqual(structType.
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634334#comment-17634334 ] ASF GitHub Bot commented on DRILL-8353: --- jnturton commented on code in PR #2702: URL: https://github.com/apache/drill/pull/2702#discussion_r1017953161 ## contrib/format-deltalake/README.md: ## @@ -0,0 +1,36 @@ +# Delta Lake format plugin + +This format plugin enabled Drill to query Delta Lake tables. Review Comment: ```suggestion This format plugin enables Drill to query Delta Lake tables. ``` ## contrib/format-deltalake/README.md: ## @@ -0,0 +1,36 @@ +# Delta Lake format plugin + +This format plugin enabled Drill to query Delta Lake tables. + +## Supported optimizations and features + +### Project pushdown + +This format plugin supports project and filter pushdown optimizations. + +For the case of project pushdown, only columns specified in the query will be read, even they are nested columns. Review Comment: ```suggestion For the case of project pushdown, only columns specified in the query will be read, even when they are nested columns. ``` ## contrib/format-deltalake/src/test/java/org/apache/drill/exec/store/delta/DeltaQueriesTest.java: ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.delta; + +import org.apache.drill.common.logical.FormatPluginConfig; +import org.apache.drill.common.logical.security.PlainCredentialsProvider; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.delta.format.DeltaFormatPluginConfig; +import org.apache.drill.exec.store.dfs.FileSystemConfig; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.math.BigDecimal; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.Map; + +import static org.apache.drill.exec.util.StoragePluginTestUtils.DFS_PLUGIN_NAME; +import static org.junit.Assert.assertEquals; + +public class DeltaQueriesTest extends ClusterTest { + + @BeforeClass + public static void setUpBeforeClass() throws Exception { +startCluster(ClusterFixture.builder(dirTestWatcher)); + +StoragePluginRegistry pluginRegistry = cluster.drillbit().getContext().getStorage(); +FileSystemConfig pluginConfig = (FileSystemConfig) pluginRegistry.getPlugin(DFS_PLUGIN_NAME).getConfig(); +Map formats = new HashMap<>(pluginConfig.getFormats()); +formats.put("delta", new DeltaFormatPluginConfig()); +FileSystemConfig newPluginConfig = new FileSystemConfig( + pluginConfig.getConnection(), + pluginConfig.getConfig(), + pluginConfig.getWorkspaces(), + formats, + PlainCredentialsProvider.EMPTY_CREDENTIALS_PROVIDER); +newPluginConfig.setEnabled(pluginConfig.isEnabled()); +pluginRegistry.put(DFS_PLUGIN_NAME, newPluginConfig); + +dirTestWatcher.copyResourceToRoot(Paths.get("data-reader-primitives")); + dirTestWatcher.copyResourceToRoot(Paths.get("data-reader-partition-values")); +dirTestWatcher.copyResourceToRoot(Paths.get("data-reader-nested-struct")); + } + + @Test + public void testSerDe() throws Exception { +String plan = queryBuilder().sql("select * from dfs.`data-reader-partition-values`").explainJson(); +long count = queryBuilder().physical(plan).run().recordCount(); +assertEquals(3, count); + } + + @Test + public void testAllPrimitives() throws Exception { +testBuilder() + .sqlQuery("select * from dfs.`data-reader-primitives`") + .ordered() + .baselineColumns("as_int", "as_long", "as_byte", "as_short", "as_boolean", "as_float", +"as_double", "as_string", "as_binary", "as_big_decimal") + .baselineValues(null, null, null, null, null, null, null, null, null, null) + .baselineValues(0, 0L, 0, 0, true, 0.0f, 0.0, "0", new byte[]{0, 0}, BigDecimal.valueOf(0)) + .baselineValues(1, 1L, 1, 1, false, 1.0f, 1.0, "1", new byte[]{1, 1}, BigDecimal.valueOf(1)) + .baselineValues(2, 2L, 2, 2, true, 2
[jira] [Commented] (DRILL-8353) Format plugin for Delta Lake
[ https://issues.apache.org/jira/browse/DRILL-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630560#comment-17630560 ] ASF GitHub Bot commented on DRILL-8353: --- vvysotskyi opened a new pull request, #2702: URL: https://github.com/apache/drill/pull/2702 # [DRILL-8353](https://issues.apache.org/jira/browse/DRILL-8353): Format plugin for Delta Lake ## Description This pull request adds support for reading delta lake tables. ## Documentation See README.md ## Testing Added unit tests. > Format plugin for Delta Lake > > > Key: DRILL-8353 > URL: https://issues.apache.org/jira/browse/DRILL-8353 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.20.2 >Reporter: Vova Vysotskyi >Assignee: Vova Vysotskyi >Priority: Major > Fix For: Future > > > Implement format plugin for Delta Lake. -- This message was sent by Atlassian Jira (v8.20.10#820010)