[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....
Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/796#discussion_r115107663 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java --- @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws IOException { } public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) throws IOException { -this.underlyingFs = FileSystem.get(fsConf); +this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), operatorStats); + } + + public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats operatorStats) throws IOException { --- End diff -- Yes, this can be removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #824: DRILL-3867: Store relative paths in metadata file
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r115105784 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java --- @@ -398,6 +398,23 @@ public void testDrill4877() throws Exception { } + @Test // DRILL-3867 + public void testMoveCache() throws Exception { +String tableName = "nation_move"; +String newTableName = "nation_moved"; +test("use dfs_test.tmp"); +test("create table `%s/t1` as select * from cp.`tpch/nation.parquet`", tableName); +test("create table `%s/t2` as select * from cp.`tpch/nation.parquet`", tableName); +test(String.format("refresh table metadata %s", tableName)); +checkForMetadataFile(tableName); +File srcFile = new File(getDfsTestTmpSchemaLocation(), tableName); +File dstFile = new File(getDfsTestTmpSchemaLocation(), newTableName); +FileUtils.moveDirectory(srcFile, dstFile); +Assert.assertFalse("Cache file was not moved successfully", srcFile.exists()); +int rowCount = testSql(String.format("select * from %s", newTableName)); +Assert.assertEquals(50, rowCount); + } + --- End diff -- The tests here don't verify opening a version 1.10 metadata file with this 1.11 change. What happens? Can we read old files? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #824: DRILL-3867: Store relative paths in metadata file
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r115105670 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -1029,14 +1099,14 @@ public ParquetTableMetadata_v2(String drillVersion) { } public ParquetTableMetadata_v2(ParquetTableMetadataBase parquetTable, -List files, List directories, String drillVersion) { +List files, List directories, String drillVersion) { --- End diff -- If this is left as-is we need a new metadata file version. See above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #824: DRILL-3867: Store relative paths in metadata file
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r115104954 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -526,6 +534,48 @@ private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, Path p } /** + * Serializer for ParquetPath. Writes the path relative to the root path + */ --- End diff -- Why compute the relative during serialization? A more common approach is to store the relative paths in our internal data structures and serialize that relative value. Doing things this way is awkward: we need to define a serializer and deserializer unnecessarily. Also, see comment later on backward compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #824: DRILL-3867: Store relative paths in metadata file
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r115105413 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -526,6 +534,48 @@ private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, Path p } /** + * Serializer for ParquetPath. Writes the path relative to the root path + */ + private static class ParquetPathSerializer extends StdSerializer { +private final String rootPath; + +ParquetPathSerializer(String rootPath) { + super(ParquetPath.class); + this.rootPath = rootPath; +} + +@Override +public void serialize(ParquetPath parquetPath, JsonGenerator jsonGenerator, SerializerProvider serializerProvider) throws IOException, JsonGenerationException { + Preconditions.checkState(parquetPath.getFullPath().startsWith(rootPath), String.format("Path %s is not a subpath of %s", parquetPath.getFullPath(), rootPath)); + String relativePath = parquetPath.getFullPath().replaceFirst(rootPath, ""); --- End diff -- Java defines a Path abstraction that will compute relative paths. An [old fashioned way](http://stackoverflow.com/questions/204784/how-to-construct-a-relative-path-in-java-from-two-absolute-paths-or-urls): ``` String relative = new File(base).toURI().relativize(new File(path).toURI()).getPath(); ``` From the same post, the Java 7 way: ``` Path pathAbsolute = Paths.get("/var/data/stuff/xyz.dat"); Path pathBase = Paths.get("/var/data"); Path pathRelative = pathBase.relativize(pathAbsolute); ``` Hadoop's path may have something similar. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #824: DRILL-3867: Store relative paths in metadata file
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/824#discussion_r115105622 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -680,7 +731,7 @@ private boolean tableModified(List directories, Path metaFilePath, } public static abstract class ParquetFileMetadata { -@JsonIgnore public abstract String getPath(); +@JsonIgnore public abstract ParquetPath getParquetPath(); --- End diff -- Doing this changes the on-disk format for the metadata file, doesn't it? If we do that, we need to introduce a new file version. Since metadata is expensive, we'd have to be able to read the existing file format. I don't see code for any of that. To address the issue, we can instead leave the field as a string. Treat the string as either relative or absolute. This should be easy to detect: "/this/is/absolute", "but/this/is/relative". Then, create a method to set the paths. When setting paths, convert them to relative. When retrieving them, give a base directory. Convert relative (new) paths to absolute, leave (old) absolute paths unchanged. This is exactly how browsers handle URLs, OS's handle paths and so on. Classic approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #827: DRILL-5481: Persist profiles in-memory only with a ...
GitHub user kkhatua opened a pull request: https://github.com/apache/drill/pull/827 DRILL-5481: Persist profiles in-memory only with a max capacity Refactored NoWriteLocalStore to EphemeralPersistentStore with the ability to maintain a max capacity Updated Default values, and added Zookeeper's property to ExecConstants You can merge this pull request into a Git repository by running: $ git pull https://github.com/kkhatua/drill DRILL-5481 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/827.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #827 commit ce9cb3a02cf8ad456827e41910639ece11ea4fc8 Author: Kunal KhatuaDate: 2017-05-05T23:37:30Z DRILL-5481: Allow Drill to persist profiles in-memory only with a max capacity Refactored NoWriteLocalStore to EphemeralPersistentStore with the ability to maintain a max capacity Updated Default values, and added Zookeeper's property to ExecConstants --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5482) Improve the error message when drill process does not have access to the spill directory
Rahul Challapalli created DRILL-5482: Summary: Improve the error message when drill process does not have access to the spill directory Key: DRILL-5482 URL: https://issues.apache.org/jira/browse/DRILL-5482 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.10.0 Reporter: Rahul Challapalli Assignee: Paul Rogers git.commit.id.abbrev=1e0a14c When the drillbit process does not have write permissions to the the spill directory, we get the below generic error message {code} Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk Fragment 0:0 [Error Id: 49addefc-2fb2-467c-a524-b09e7344e9f1 on qa-node190.qa.lab:31010] (state=,code=0) {code} We can certainly improve the error message so that the software becomes more self-serviceable -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] drill pull request #823: DRILL-5459: Extend physical operator test framework...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/823#discussion_r115103600 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/PhysicalOpUnitTestBase.java --- @@ -82,6 +81,8 @@ * Look! Doesn't extend BaseTestQuery!! */ public class PhysicalOpUnitTestBase extends ExecTest { + public static long INIT_ALLOCATION = 10_000_000l; + public static long MAX_ALLOCATION = 15_000_000L; --- End diff -- Should we use the values already defined in `AbstractBase`? To do that, perhaps refactor the code from: ``` public abstract class AbstractBase implements PhysicalOperator{ ... protected long initialAllocation = 1_000_000L; protected long maxAllocation = 10_000_000_000L; ``` To: ``` public abstract class AbstractBase implements PhysicalOperator{ ... public static long INIT_ALLOCATION = 1_000_000L; public static long MAX_ALLOCATION = 10_000_000_000L; protected long initialAllocation = INIT_ALLOCATION; protected long maxAllocation = MAX_ALLOCATION; ``` Actually, I think I did that in my much-delayed external sort PR for unit testing... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #823: DRILL-5459: Extend physical operator test framework...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/823#discussion_r115103872 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/TestMiniPlan.java --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.physical.unit; + +import com.google.common.collect.Lists; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.util.FileUtils; +import org.apache.drill.exec.physical.config.Filter; +import org.apache.drill.exec.physical.config.UnionAll; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.test.rowSet.SchemaBuilder; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.junit.BeforeClass; +import org.junit.Ignore; +import org.junit.Test; + +import java.util.Collections; +import java.util.List; + +public class TestMiniPlan extends MiniPlanUnitTestBase { + + protected static DrillFileSystem fs; + + @BeforeClass + public static void initFS() throws Exception { +Configuration conf = new Configuration(); +conf.set(FileSystem.FS_DEFAULT_NAME_KEY, "local"); +fs = new DrillFileSystem(conf); + } + + @Test + @Ignore("A bug in JsonRecordReader handling empty file") --- End diff -- File a JIRA for that bug and put the JIRA entry here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #823: DRILL-5459: Extend physical operator test framework...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/823#discussion_r115103967 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/TestMiniPlan.java --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.physical.unit; + +import com.google.common.collect.Lists; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.util.FileUtils; +import org.apache.drill.exec.physical.config.Filter; +import org.apache.drill.exec.physical.config.UnionAll; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.test.rowSet.SchemaBuilder; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.junit.BeforeClass; +import org.junit.Ignore; +import org.junit.Test; + +import java.util.Collections; +import java.util.List; + +public class TestMiniPlan extends MiniPlanUnitTestBase { + + protected static DrillFileSystem fs; + + @BeforeClass + public static void initFS() throws Exception { +Configuration conf = new Configuration(); +conf.set(FileSystem.FS_DEFAULT_NAME_KEY, "local"); +fs = new DrillFileSystem(conf); + } + + @Test + @Ignore("A bug in JsonRecordReader handling empty file") + public void testEmptyInput() throws Exception { +String emptyFile = FileUtils.getResourceAsFile("/project/pushdown/empty.json").toURI().toString(); + +RecordBatch scanBatch = new JsonScanBuilder() +.fileSystem(fs) +.inputPaths(Lists.newArrayList(emptyFile)) +.build(); + +new MiniPlanTestBuilder() +.root(scanBatch) +.expectZeroBatch(true) +.go(); + } + + @Test + public void testSimpleParquetScan() throws Exception { +String file = FileUtils.getResourceAsFile("/tpchmulti/region/01.parquet").toURI().toString(); + +RecordBatch scanBatch = new ParquetScanBuilder() +.fileSystem(fs) +.columnsToRead("R_REGIONKEY") +.inputPaths(Lists.newArrayList(file)) +.build(); + +BatchSchema expectedSchema = new SchemaBuilder() +.add("R_REGIONKEY", TypeProtos.MinorType.BIGINT) +.build(); + +new MiniPlanTestBuilder() +.root(scanBatch) +.expectedSchema(expectedSchema) +.baselineValues(0L) +.baselineValues(1L) +.go(); + } + + @Test + public void testSimpleJson() throws Exception { +List jsonBatches = Lists.newArrayList( +"{\"a\":100}" +); + +RecordBatch scanBatch = new JsonScanBuilder() +.jsonBatches(jsonBatches) +.build(); + +BatchSchema expectedSchema = new SchemaBuilder() +.addNullable("a", TypeProtos.MinorType.BIGINT) +.build(); + +new MiniPlanTestBuilder() +.root(scanBatch) +.expectedSchema(expectedSchema) +.baselineValues(100L) +.go(); + } + + @Test + public void testUnionFilter() throws Exception { +List leftJsonBatches = Lists.newArrayList( +"[{\"a\": 5, \"b\" : 1 }]", +"[{\"a\": 5, \"b\" : 5},{\"a\": 3, \"b\" : 8}]", +"[{\"a\": 40, \"b\" : 3},{\"a\": 13, \"b\" : 100}]"); + +List rightJsonBatches = Lists.newArrayList( +"[{\"a\": 5, \"b\" : 10 }]", +"[{\"a\": 50, \"b\" : 100}]"); + +RecordBatch batch = new PopBuilder() +.physicalOperator(new UnionAll(Collections.EMPTY_LIST)) // Children list is provided through RecordBatch +.addInputAsChild() + .physicalOperator(new Filter(null, parseExpr("a=5"), 1.0f)) + .addJsonScanAsChild()
[GitHub] drill pull request #823: DRILL-5459: Extend physical operator test framework...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/823#discussion_r115102606 --- Diff: exec/java-exec/src/test/java/org/apache/drill/DrillTestWrapper.java --- @@ -307,19 +308,43 @@ public void close() throws Exception { } /** + * Iterate over batches, and combine the batches into a map, where key is schema path, and value is + * the list of column values across all the batches. --- End diff -- Thanks for adding the Javadoc comments! Very helpful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #823: DRILL-5459: Extend physical operator test framework...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/823#discussion_r115103846 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/TestMiniPlan.java --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.physical.unit; + +import com.google.common.collect.Lists; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.util.FileUtils; +import org.apache.drill.exec.physical.config.Filter; +import org.apache.drill.exec.physical.config.UnionAll; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.test.rowSet.SchemaBuilder; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.junit.BeforeClass; +import org.junit.Ignore; +import org.junit.Test; + +import java.util.Collections; +import java.util.List; + +public class TestMiniPlan extends MiniPlanUnitTestBase { + + protected static DrillFileSystem fs; + + @BeforeClass + public static void initFS() throws Exception { +Configuration conf = new Configuration(); +conf.set(FileSystem.FS_DEFAULT_NAME_KEY, "local"); --- End diff -- I believe the modern way to specify the default file system is FileSystem.DEFAULT_FS, not "local". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #823: DRILL-5459: Extend physical operator test framework...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/823#discussion_r115102970 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/MiniPlanUnitTestBase.java --- @@ -0,0 +1,439 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.physical.unit; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import mockit.NonStrictExpectations; +import org.apache.drill.DrillTestWrapper; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.physical.base.PhysicalOperator; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorAccessible; +import org.apache.drill.exec.rpc.NamedThreadFactory; +import org.apache.drill.exec.store.RecordReader; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator; +import org.apache.drill.exec.store.parquet.ParquetReaderUtility; +import org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader; +import org.apache.drill.exec.util.TestUtilities; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.hadoop.CodecFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.metadata.ParquetMetadata; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; + +import static org.apache.drill.exec.physical.unit.TestMiniPlan.fs; + +/** + * A MiniPlanUnitTestBase extends PhysicalOpUnitTestBase, to construct MiniPlan (aka plan fragment). + * in the form of physical operator tree, and verify both the expected schema and output row results. + * Steps to construct a unit: + * 1. Call PopBuilder / ScanPopBuilder to construct the MiniPlan + * 2. Create a MiniPlanTestBuilder, and specify the expected schema and base line values, or if there + * is no batch expected. + */ + +public class MiniPlanUnitTestBase extends PhysicalOpUnitTestBase { + + private final ExecutorService scanExecutor = Executors.newFixedThreadPool(2, new NamedThreadFactory("scan-")); + + public class MiniPlanTestBuilder { --- End diff -- Can this class be static so it can be used from other places rather than just subclasses of this class? Again, not critical for this PR, but a future improvement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #823: DRILL-5459: Extend physical operator test framework...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/823#discussion_r115102896 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/MiniPlanUnitTestBase.java --- @@ -0,0 +1,439 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.physical.unit; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import mockit.NonStrictExpectations; +import org.apache.drill.DrillTestWrapper; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.physical.base.PhysicalOperator; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.VectorAccessible; +import org.apache.drill.exec.rpc.NamedThreadFactory; +import org.apache.drill.exec.store.RecordReader; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator; +import org.apache.drill.exec.store.parquet.ParquetReaderUtility; +import org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader; +import org.apache.drill.exec.util.TestUtilities; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.hadoop.CodecFactory; +import org.apache.parquet.hadoop.ParquetFileReader; +import org.apache.parquet.hadoop.metadata.ParquetMetadata; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; + +import static org.apache.drill.exec.physical.unit.TestMiniPlan.fs; + +/** + * A MiniPlanUnitTestBase extends PhysicalOpUnitTestBase, to construct MiniPlan (aka plan fragment). + * in the form of physical operator tree, and verify both the expected schema and output row results. + * Steps to construct a unit: + * 1. Call PopBuilder / ScanPopBuilder to construct the MiniPlan + * 2. Create a MiniPlanTestBuilder, and specify the expected schema and base line values, or if there + * is no batch expected. + */ + +public class MiniPlanUnitTestBase extends PhysicalOpUnitTestBase { --- End diff -- While there is nothing wrong with putting this stuff in a test base class, it is constraining. A test can have only one superclass. This makes it hard to create a hybrid test using different frameworks. Not necessary for this PR, but a future improvement would be to move this into a "fixture" that can be used in any test class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5481) Allow Drill to persist profiles in-memory only with a max capacity
Kunal Khatua created DRILL-5481: --- Summary: Allow Drill to persist profiles in-memory only with a max capacity Key: DRILL-5481 URL: https://issues.apache.org/jira/browse/DRILL-5481 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.10.0 Reporter: Kunal Khatua Assignee: Kunal Khatua To allow for fast persistence of profiles on a temporary basis (i.e. till the life of the Drillbit), an existing test class {{org.apache.drill.exec.testing.store.NoWriteLocalStore.java}} was refactored to {{org.apache.drill.exec.store.sys.store.EphemeralPersistentStore}} and given the ability to maintain a max capacity. This should allow query profiles to be available for as long as the Drillbit process' lifespan. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5480) Empty batch returning from HBase may cause SchemChangeException or incorrect query result
Jinfeng Ni created DRILL-5480: - Summary: Empty batch returning from HBase may cause SchemChangeException or incorrect query result Key: DRILL-5480 URL: https://issues.apache.org/jira/browse/DRILL-5480 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni The following repo was provided by [~haozhu]. 1. Create a Hbase table with 4 regions {code} create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']} put 'myhbase','a','cf1:col1','somedata' put 'myhbase','b','cf1:col2','somedata' put 'myhbase','c','cf2:col1','somedata' {code} One region has cf1.col1. One region has column family 'cf1', but does not have 'col1' under 'cf1'. One region has only column family 'cf2'. And last region is complete empty. 2. Prepare a csv file. {code} select * from dfs.tmp.`joinhbase.csv`; +---+ | columns | +---+ | ["1","somedata"] | | ["2","somedata"] | | ["3","somedata"] | {code} Now run the following query on drill 1.11.0-SNAPSHOT: {code} select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 'UTF8') as col1 from hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1] ; {code} The correct query result show be: {code} +-+---+ | keyCol | col1| +-+---+ | a | somedata | | a | somedata | | a | somedata | +-+---+ {code} Turn off broadcast join, then we will see SchemaChangeException, or incorrect result randomly. {code} alter session set `planner.enable_broadcast_join`=false; {code} {code} select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 'UTF8') as col1 . . . . . . . . . . . . . . . . . .> from . . . . . . . . . . . . . . . . . .> hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C . . . . . . . . . . . . . . . . . .> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1] . . . . . . . . . . . . . . . . . .> ; Error: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes {code} {code} +-+---+ | keyCol | col1 | +-+---+ +-+---+ No rows selected (0.302 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5479) Document ANSI format for date/time functions
Bridget Bevens created DRILL-5479: - Summary: Document ANSI format for date/time functions Key: DRILL-5479 URL: https://issues.apache.org/jira/browse/DRILL-5479 Project: Apache Drill Issue Type: New Feature Components: Documentation Affects Versions: 1.10.0 Reporter: Bridget Bevens Assignee: Bridget Bevens See https://issues.apache.org/jira/browse/DRILL-4864 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...
Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/819 Regarding Paul's suggestion of using sampling (either 1st batch, or `n` batches), if the sampled length is returned to client as metadata for query result set, it could cause quite big problems for client. If the sampled max length is 5, client expects to see varchar up to 5 chars. If a new batch arrives with varchar(10), it would either crash the client, or make the client show incorrect result. AFAIK, that's exactly what happened when Sean was working on 'LIMIT 0' optimization, and if there is an inconsistency of the type returned between 'LIMIT 0' and one returned from a regular query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...
Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/819 My understanding is Drill would hit a schema change between column with type of varchar(5) and varchar(10). However, the length calculated in this PR is from the query metadeta (function parameter, string literal), which will not change across different batches. In that sense, we should not see schema change exception, just because of varchar length. The following code shows that varchar(5) is different from varchar(10), also different from varchar without setting length. ```java TypeProtos.MajorType varchar5 = TypeProtos.MajorType.newBuilder() .setMinorType(TypeProtos.MinorType.VARCHAR) .setMode(TypeProtos.DataMode.OPTIONAL) .setPrecision(5) .build(); TypeProtos.MajorType varchar10 = TypeProtos.MajorType.newBuilder() .setMinorType(TypeProtos.MinorType.VARCHAR) .setMode(TypeProtos.DataMode.OPTIONAL) .setPrecision(10) .build(); TypeProtos.MajorType varcharNoLength = TypeProtos.MajorType.newBuilder() .setMinorType(TypeProtos.MinorType.VARCHAR) .setMode(TypeProtos.DataMode.OPTIONAL) .build(); System.out.println(varchar5.equals(varchar10) ? "varchar5 equals varchr10" : "varchar5 NOT equals varchr10" ); System.out.println(varchar5.equals(varcharNoLength) ? "varchar5 equals varchrNoLength" : "varchar5 NOT equals varchrNoLength" ); ``` output: ``` varchar5 NOT equals varchr10 varchar5 NOT equals varchrNoLength ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r115064190 --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java --- @@ -510,6 +517,38 @@ public void testIgnoreSkipHeaderFooterForSequencefile() throws Exception { .go(); } + @Test --- End diff -- I think it's fine to leave the Decimal related issues alone, as long as this PR does not introduce new issue to decimal functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5478) Spill file size parameter is not honored by the managed external sort
Rahul Challapalli created DRILL-5478: Summary: Spill file size parameter is not honored by the managed external sort Key: DRILL-5478 URL: https://issues.apache.org/jira/browse/DRILL-5478 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.10.0 Reporter: Rahul Challapalli Assignee: Paul Rogers git.commit.id.abbrev=1e0a14c Query: {code} ALTER SESSION SET `exec.sort.disable_managed` = false; alter session set `planner.width.max_per_node` = 1; alter session set `planner.disable_exchanges` = true; alter session set `planner.width.max_per_query` = 1; alter session set `planner.memory.max_query_memory_per_node` = 1052428800; alter session set `planner.enable_decimal_data_type` = true; select count(*) from ( select * from dfs.`/drill/testdata/resource-manager/all_types_large` d1 order by d1.map.missing ) d; {code} Boot Options (spill file size is set to 256MB) {code} 0: jdbc:drill:zk=10.10.100.190:5181> select * from sys.boot where name like '%spill%'; +--+-+---+-+--++---++ | name | kind | type | status | num_val | string_val | bool_val | float_val | +--+-+---+-+--++---++ | drill.exec.sort.external.spill.directories | STRING | BOOT | BOOT | null | [ # drill-override.conf: 26 "/tmp/test" ] | null | null | | drill.exec.sort.external.spill.file_size | STRING | BOOT | BOOT | null | "256M" | null | null | | drill.exec.sort.external.spill.fs| STRING | BOOT | BOOT | null | "maprfs:///" | null | null | | drill.exec.sort.external.spill.group.size| LONG| BOOT | BOOT | 4| null | null | null | | drill.exec.sort.external.spill.merge_batch_size | STRING | BOOT | BOOT | null | "16M" | null | null | | drill.exec.sort.external.spill.spill_batch_size | STRING | BOOT | BOOT | null | "8M" | null | null | | drill.exec.sort.external.spill.threshold | LONG| BOOT | BOOT | 4| null | null | null | +--+-+---+-+--++---++ {code} Below are the spill files while the query is still executing. The size of the spill files is ~34MB {code} -rwxr-xr-x 3 root root 34957815 2017-05-05 11:26 /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run1 -rwxr-xr-x 3 root root 34957815 2017-05-05 11:27 /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run2 -rwxr-xr-x 3 root root 0 2017-05-05 11:27 /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run3 {code} The data set is too large to attach here. Reach out to me if you need anything -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] drill pull request #826: DRILL-5379: Set Hdfs Block Size based on Parquet Bl...
GitHub user ppadma opened a pull request: https://github.com/apache/drill/pull/826 DRILL-5379: Set Hdfs Block Size based on Parquet Block Size Provide an option to specify blocksize during file creation. This will help create parquet files with single block on HDFS, helping improve performance when we read those files. See DRILL-5379 for details. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppadma/drill DRILL-5379 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/826.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #826 commit ae77a26aa950e401f2ca40488021431ebfde7156 Author: Padma PenumarthyDate: 2017-04-20T00:25:20Z DRILL-5379: Set Hdfs Block Size based on Parquet Block Size Provide an option to specify blocksize during file creation. This will help create parquet files with single block on HDFS, helping improve performance when we read those files. See DRILL-5379 for details. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #773: DRILL-4335: Apache Drill should support network enc...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/773#discussion_r114895578 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/BitConnectionConfig.java --- @@ -46,16 +47,34 @@ protected BitConnectionConfig(BufferAllocator allocator, BootStrapContext contex super(allocator, context); final DrillConfig config = context.getConfig(); +final AuthenticatorProvider authProvider = getAuthProvider(); + if (config.getBoolean(ExecConstants.BIT_AUTHENTICATION_ENABLED)) { this.authMechanismToUse = config.getString(ExecConstants.BIT_AUTHENTICATION_MECHANISM); try { -getAuthProvider().getAuthenticatorFactory(authMechanismToUse); +authProvider.getAuthenticatorFactory(authMechanismToUse); } catch (final SaslException e) { throw new DrillbitStartupException(String.format( "'%s' mechanism not found for bit-to-bit authentication. Please check authentication configuration.", authMechanismToUse)); } - logger.info("Configured bit-to-bit connections to require authentication using: {}", authMechanismToUse); + + // Update encryption related configurations + encryptionContext.setEncryption(config.getBoolean(ExecConstants.BIT_ENCRYPTION_SASL_ENABLED)); + final int maxWrappedSize = config.getInt(ExecConstants.BIT_ENCRYPTION_SASL_MAX_WRAPPED_SIZE); + + if (maxWrappedSize <= 0 || maxWrappedSize > RpcConstants.MAX_WRAPPED_SIZE) { +throw new DrillbitStartupException("Invalid value configured for bit.encryption.sasl.encodesize." + --- End diff -- encodesize -> max_wrapped_size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #773: DRILL-4335: Apache Drill should support network enc...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/773#discussion_r114897536 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserConnectionConfig.java --- @@ -35,24 +38,42 @@ private final UserServerRequestHandler handler; UserConnectionConfig(BufferAllocator allocator, BootStrapContext context, UserServerRequestHandler handler) - throws DrillbitStartupException { +throws DrillbitStartupException { super(allocator, context); this.handler = handler; -if (context.getConfig().getBoolean(ExecConstants.USER_AUTHENTICATION_ENABLED)) { - if (getAuthProvider().getAllFactoryNames().isEmpty()) { +final DrillConfig config = context.getConfig(); +final AuthenticatorProvider authProvider = getAuthProvider(); + +if (config.getBoolean(ExecConstants.USER_AUTHENTICATION_ENABLED)) { + if (authProvider.getAllFactoryNames().isEmpty()) { throw new DrillbitStartupException("Authentication enabled, but no mechanisms found. Please check " + "authentication configuration."); } authEnabled = true; - logger.info("Configured all user connections to require authentication using: {}", - getAuthProvider().getAllFactoryNames()); + + // Update encryption related parameters. + encryptionContext.setEncryption(config.getBoolean(ExecConstants.USER_ENCRYPTION_SASL_ENABLED)); + final int maxWrappedSize = config.getInt(ExecConstants.USER_ENCRYPTION_SASL_MAX_WRAPPED_SIZE); + + if (maxWrappedSize <= 0 || maxWrappedSize > RpcConstants.MAX_WRAPPED_SIZE) { +throw new DrillbitStartupException("Invalid value configured for user.encryption.sasl.encodesize." + --- End diff -- encodesize -> max_wrapped_size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #773: DRILL-4335: Apache Drill should support network enc...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/773#discussion_r115038181 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/AuthenticationOutcomeListener.java --- @@ -217,8 +224,9 @@ public SaslMessage process(SaslChallengeContext context) throws Exception { private static class SaslFailedProcessor implements SaslChallengeProcessor { @Override -public SaslMessage process(SaslChallengeContext context) throws Exception { - throw new SaslException("Authentication failed. Incorrect credentials?"); +public SaslMessage process(SaslChallengeContext context) throws Exception { + throw new SaslException(String.format("Authentication with encryption context %s failed. Incorrect credentials?", --- End diff -- Fix error message format to match the format in UserClient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #773: DRILL-4335: Apache Drill should support network enc...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/773#discussion_r115037769 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/AbstractServerConnection.java --- @@ -110,7 +131,18 @@ public void changeHandlerTo(final RequestHandler handler) { } @Override - public void close() { + public void setEncryption(boolean encrypted) { +throw new UnsupportedOperationException("Changing encryption setting on server connection is not permitted."); + } + + @Override + public void setMaxWrappedSize(int maxWrappedChunkSize) { --- End diff -- maxWrappedChunkSize -> maxWrappedSize; in argument and message --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #773: DRILL-4335: Apache Drill should support network enc...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/773#discussion_r115039025 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/AbstractRemoteConnection.java --- @@ -20,27 +20,35 @@ import io.netty.channel.Channel; import io.netty.channel.ChannelHandlerContext; import io.netty.channel.ChannelInboundHandlerAdapter; +import io.netty.channel.ChannelPipeline; import io.netty.channel.socket.SocketChannel; +import io.netty.handler.codec.LengthFieldBasedFrameDecoder; import java.net.SocketAddress; +import java.nio.ByteOrder; --- End diff -- ?? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #773: DRILL-4335: Apache Drill should support network enc...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/773#discussion_r115038537 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserClient.java --- @@ -201,10 +223,12 @@ protected SaslException mapException(Exception e) { if (e instanceof ExecutionException) { final Throwable cause = Throwables.getRootCause(e); if (cause instanceof SaslException) { -return new SaslException("Authentication failed: " + cause.getMessage(), cause); +return new SaslException(String.format("Authentication failed. [Details: %s, Error %s]", +connection.getEncryptionCtxtString(), cause.getMessage()), cause); } } -return new SaslException("Authentication failed unexpectedly.", e); +return new SaslException(String.format("Authentication failed. [Details: %s, Error %s", --- End diff -- Two fixes: Authentication failed **unexpectedly.** [Details: %s, Error %s **]** --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #821: DRILL-5450: Fix initcap function to convert upper c...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/821#discussion_r115026331 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java --- @@ -308,4 +308,58 @@ public void testReverseLongVarChars() throws Exception { FileUtils.deleteQuietly(path); } } + + @Test + public void testLower() throws Exception { +testBuilder() +.sqlQuery("select\n" + +"lower('ABC') col_upper,\n" + +"lower('abc') col_lower,\n" + --- End diff -- Created Jira DRILL-5477 and added appropriate unit test which is ignored for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5477) String functions (lower, upper, initcap) should work for UTF-8
Arina Ielchiieva created DRILL-5477: --- Summary: String functions (lower, upper, initcap) should work for UTF-8 Key: DRILL-5477 URL: https://issues.apache.org/jira/browse/DRILL-5477 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.10.0 Reporter: Arina Ielchiieva Drill string functions lower / upper / initcap work only for ASCII, but not for UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding to convert to Unicode characters. Without that encoding, these functions won't work for Cyrillic, Greek or any other character set with upper/lower distinctions. Currently, when user applies these functions for UTF-8, Drill returns the same value as was given. Example: {noformat} select upper('привет') from (values(1)) -> привет {noformat} There is disabled unit test in https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33 which should be enabled once issue is fixed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] drill pull request #825: DRILL-4039:Query fails when non-ascii characters ar...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/825 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #816: DRILL-5428: submit_plan fails after Drill 1.8 scrip...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/816 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #800: DRILL-5385: Vector serializer fails to read saved S...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/800 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #811: DRILL-5423: Refactor ScanBatch to allow unit testin...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/811 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #778: DRILL-5344: Priority queue copier fails with an emp...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/778 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/819 @jinfengni, @paul-rogers made changes after CR, Please review when possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114963380 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillStringLeftRightFuncHolder.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.expr.fn; + +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers; + +import java.util.List; + +/** + * Function holder for functions with function scope set as + * {@link org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope#STRING_LEFT_RIGHT}. + */ +public class DrillStringLeftRightFuncHolder extends DrillSimpleFuncHolder { + + public DrillStringLeftRightFuncHolder(FunctionAttributes functionAttributes, FunctionInitializer initializer) { +super(functionAttributes, initializer); + } + + /** + * Defines function return type and calculates output precision. + * + * Target length calculation logic for left and right functions is the same, + * they substring string the same way, just from different sides of the string. + * + * left(source, length) + * right(source, length) + * + * If length is positive, target length is given length. + * If length is negative, target length is source length minus given length. + * If after substraction length is negative, target length is 0. + * If length is 0, target length is 0. + * + * @param logicalExpressions logical expressions + * @return return type + */ + @Override + public TypeProtos.MajorType getReturnType(List logicalExpressions) { +TypeProtos.MajorType.Builder builder = TypeProtos.MajorType.newBuilder() +.setMinorType(getReturnType().getMinorType()) +.setMode(getReturnTypeDataMode(logicalExpressions)); + +int sourceLength = logicalExpressions.get(0).getMajorType().hasPrecision() ? --- End diff -- I have excluded substring / substr / left / right functions length calculation from this PR. Jira DRILL-5476 is created to address length calculation logic for these functions later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114958555 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillConcatFuncHolder.java --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.expr.fn; + +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; + +import java.util.List; + +/** + * Function holder for functions with function scope set as + * {@link org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope#CONCAT}. + */ +public class DrillConcatFuncHolder extends DrillSimpleFuncHolder { --- End diff -- I have added new annotation to `FunctionTemplate` - `ReturnType` that indicates which return type strategy should be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113906809 --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java --- @@ -27,10 +27,14 @@ import org.apache.drill.common.types.TypeProtos.MinorType; import com.google.protobuf.TextFormat; +import org.apache.drill.common.util.CoreDecimalUtility; public class Types { static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(Types.class); + public static final int MAX_VARCHAR_LENGTH = 65536; --- End diff -- Changed to 65535. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113930434 --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java --- @@ -340,18 +344,18 @@ public static int getJdbcDisplaySize(MajorType type) { case INTERVALYEAR: return precision > 0 ? 5 + precision // P..Y12M - : 0; // if precision is not set, return 0 because there's not enough info + : UNDEFINED; // if precision is not set, return 0 because there's not enough info --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114591502 --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java --- @@ -636,43 +658,63 @@ public static String toString(final MajorType type) { /** * Get the precision of given type. - * @param majorType - * @return + * + * @param majorType major type + * @return precision value */ public static int getPrecision(MajorType majorType) { -MinorType type = majorType.getMinorType(); - -if (type == MinorType.VARBINARY || type == MinorType.VARCHAR) { - return 65536; -} - if (majorType.hasPrecision()) { return majorType.getPrecision(); } -return 0; +return isScalarStringType(majorType) ? MAX_VARCHAR_LENGTH : UNDEFINED; } /** * Get the scale of given type. - * @param majorType - * @return + * + * @param majorType major type + * @return scale value */ public static int getScale(MajorType majorType) { if (majorType.hasScale()) { return majorType.getScale(); } -return 0; +return UNDEFINED; } /** - * Is the given type column be used in ORDER BY clause? - * @param type - * @return + * Checks if the given type column be used in ORDER BY clause. + * + * @param type minor type + * @return true if type can be used in ORDER BY clause */ public static boolean isSortable(MinorType type) { // Currently only map and list columns are not sortable. return type != MinorType.MAP && type != MinorType.LIST; } + + /** + * Sets max precision from both types if these types are string scalar types. + * Sets max precision and scale from both types if these types are decimal types. + * + * @param leftType type from left side + * @param rightType type from right side + * @param typeBuilder type builder + * @return type builder + */ + public static MajorType.Builder calculateTypePrecisionAndScale(MajorType leftType, MajorType rightType, MajorType.Builder typeBuilder) { +boolean isScalarString = Types.isScalarStringType(leftType) && Types.isScalarStringType(rightType); +boolean isDecimal = CoreDecimalUtility.isDecimalType(leftType); + +if ((isScalarString || isDecimal) && leftType.hasPrecision() && rightType.hasPrecision()) { + typeBuilder.setPrecision(Math.max(leftType.getPrecision(), rightType.getPrecision())); +} + +if (isDecimal && leftType.hasScale() && rightType.hasScale()) { + typeBuilder.setScale(Math.max(leftType.getScale(), rightType.getScale())); --- End diff -- This method is responsible to chose max precision and scale from two major types. `Decimal9(9,0) & Decimal9(9,9) -> Decimal9(9,9)`. I have added check at the beginning of the method to ensure minor types are the same before choosing max precision and scale. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113912681 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillSubstringFuncHolder.java --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.expr.fn; + +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers; + +import java.util.List; + +/** + * Function holder for functions with function scope set as + * {@link org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope#SUBSTRING}. + */ +public class DrillSubstringFuncHolder extends DrillSimpleFuncHolder { + + public DrillSubstringFuncHolder(FunctionAttributes functionAttributes, FunctionInitializer initializer) { +super(functionAttributes, initializer); + } + + /** + * Defines function return type and calculates output precision. + * + * substring(source, regexp) + * If input precision is known, output precision is max varchar value {@link Types#MAX_VARCHAR_LENGTH}. + * + * substring(source, offset) + * If input precision is unknown then output precision is max varchar value {@link Types#MAX_VARCHAR_LENGTH}. + * If input precision is known, output precision is input precision minus offset plus 1 + * since offset starts from 1. + * If offset value is greater than input precision or offset value is corrupted (less then equals zero), + * output precision is zero. + * + * substring(source, offset, length) + * If offset value is greater than input precision or offset or length values are corrupted (less then equals zero), + * output precision is zero. + * If source length (including offset) is less than substring length, output precision is source length (including offset). + * If source length (including offset) is greater than substring length, output precision is substring length. + * + * @param logicalExpressions logical expressions + * @return return type + */ + @Override + public TypeProtos.MajorType getReturnType(List logicalExpressions) { +TypeProtos.MajorType.Builder builder = TypeProtos.MajorType.newBuilder() +.setMinorType(getReturnType().getMinorType()) +.setMode(getReturnTypeDataMode(logicalExpressions)); + +int sourceLength = logicalExpressions.get(0).getMajorType().hasPrecision() ? +logicalExpressions.get(0).getMajorType().getPrecision() : Types.MAX_VARCHAR_LENGTH; --- End diff -- For limit 0 queries I can determine if we know precision or not. But for regular queries, it's not always possible. Example: `select substring(cast(id as int), 1, 5) from t` Before length calculation, Drill implicitly casts int to varchar using max varchar length [1]. Thus when it comes to substring length calculation, I receive varchar with max varchar precision and I don't if this is real precision or just the one used in implicit cast. I have treated unknown precision as max varchar length to sync return type output between limit 0 and regular queries. With this change we might not have good control of memory allocation for varchar column, so I have excluded substring / substr / left / right functions length calculation from this PR. Jira DRILL-5476 is created to address length calculation logic for these functions later. [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/ExpressionTreeMaterializer.java#L219 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114590735 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillStringLeftRightFuncHolder.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.expr.fn; + +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers; + +import java.util.List; + +/** + * Function holder for functions with function scope set as + * {@link org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope#STRING_LEFT_RIGHT}. + */ +public class DrillStringLeftRightFuncHolder extends DrillSimpleFuncHolder { + + public DrillStringLeftRightFuncHolder(FunctionAttributes functionAttributes, FunctionInitializer initializer) { +super(functionAttributes, initializer); + } + + /** + * Defines function return type and calculates output precision. --- End diff -- Agree. Updated the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113906744 --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java --- @@ -399,7 +403,13 @@ public static boolean isFixedWidthType(final MajorType type) { } - public static boolean isStringScalarType(final MajorType type) { + /** + * Checks is given major type is string scalar type. --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114958443 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/annotations/FunctionTemplate.java --- @@ -106,15 +106,21 @@ DECIMAL_ADD_SCALE, DECIMAL_SET_SCALE, DECIMAL_ZERO_SCALE, -SC_BOOLEAN_OPERATOR +SC_BOOLEAN_OPERATOR, --- End diff -- I have made some refactoring. Now `FunctionScope` will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n). I have added new annotation to `FunctionTemplate` - `ReturnType` that indicates which return type strategy should be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113910020 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillDecimalSumScaleFuncHolder.java --- @@ -36,17 +35,7 @@ public DrillDecimalSumScaleFuncHolder(FunctionAttributes functionAttributes, Fun @Override public MajorType getReturnType(List args) { --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113937186 --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java --- @@ -340,18 +344,18 @@ public static int getJdbcDisplaySize(MajorType type) { case INTERVALYEAR: return precision > 0 ? 5 + precision // P..Y12M - : 0; // if precision is not set, return 0 because there's not enough info + : UNDEFINED; // if precision is not set, return 0 because there's not enough info case INTERVALDAY: return precision > 0 ? 12 + precision // P..DT12H60M60S assuming fractional seconds precision is not supported - : 0; // if precision is not set, return 0 because there's not enough info + : UNDEFINED; // if precision is not set, return 0 because there's not enough info --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114592925 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java --- @@ -294,10 +296,21 @@ public static MajorType getMajorTypeFromHiveTypeInfo(final TypeInfo typeInfo, fi MajorType.Builder typeBuilder = MajorType.newBuilder().setMinorType(minorType) .setMode(DataMode.OPTIONAL); // Hive columns (both regular and partition) could have null values -if (primitiveTypeInfo.getPrimitiveCategory() == PrimitiveCategory.DECIMAL) { - DecimalTypeInfo decimalTypeInfo = (DecimalTypeInfo) primitiveTypeInfo; - typeBuilder.setPrecision(decimalTypeInfo.precision()) - .setScale(decimalTypeInfo.scale()).build(); +switch (primitiveTypeInfo.getPrimitiveCategory()) { + case CHAR: + case VARCHAR: +BaseCharTypeInfo baseCharTypeInfo = (BaseCharTypeInfo) primitiveTypeInfo; +typeBuilder.setPrecision(baseCharTypeInfo.getLength()); +break; + case STRING: +typeBuilder.setPrecision(HiveVarchar.MAX_VARCHAR_LENGTH); +break; + case DECIMAL: +DecimalTypeInfo decimalTypeInfo = (DecimalTypeInfo) primitiveTypeInfo; + typeBuilder.setPrecision(decimalTypeInfo.getPrecision()).setScale(decimalTypeInfo.getScale()); --- End diff -- In this method we are getting minor type from `HiveUtilities.getMinorTypeFromHivePrimitiveTypeInfo` [1] which in its turn takes into account decimal precision when choosing appropriate minor type [2]. [1] https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java#L293 [2] https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java#L332 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113910179 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillStringLeftRightFuncHolder.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.expr.fn; + +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers; + +import java.util.List; + +/** + * Function holder for functions with function scope set as + * {@link org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope#STRING_LEFT_RIGHT}. --- End diff -- I have removed calculation for substring / left/ right functions, this will be addressed in separate Jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113915317 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java --- @@ -599,10 +598,28 @@ public LogicalExpression visitLiteral(RexLiteral literal) { throw new UnsupportedOperationException(String.format("Unable to convert the value of %s and type %s to a Drill constant expression.", literal, literal.getType().getSqlTypeName())); } } - } - private static final TypedNullConstant createNullExpr(MinorType type) { -return new TypedNullConstant(Types.optional(type)); +/** + * Create nullable major type using given minor type + * and wraps it in typed null constant. + * + * @param type minor type + * @return typed null constant instance + */ +private TypedNullConstant createNullExpr(MinorType type) { + return new TypedNullConstant(Types.optional(type)); +} + +/** + * Create nullable varchar major type with given precision + * and wraps it in typed null constant. + * + * @param precision precision value + * @return typed null constant instance + */ +private TypedNullConstant createStringNullExpr(int precision) { + return new TypedNullConstant(Types.withPrecision(MinorType.VARCHAR, TypeProtos.DataMode.OPTIONAL, precision)); --- End diff -- 4. Expanded reply in previous comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113908201 --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java --- @@ -636,43 +658,63 @@ public static String toString(final MajorType type) { /** * Get the precision of given type. - * @param majorType - * @return + * + * @param majorType major type + * @return precision value */ public static int getPrecision(MajorType majorType) { -MinorType type = majorType.getMinorType(); - -if (type == MinorType.VARBINARY || type == MinorType.VARCHAR) { - return 65536; -} - if (majorType.hasPrecision()) { return majorType.getPrecision(); } -return 0; +return isScalarStringType(majorType) ? MAX_VARCHAR_LENGTH : UNDEFINED; --- End diff -- Currently Drill does not consider length change in varchar as schema change. Regarding sampling data, since Drill is schema-less, I think it's OK to return unknown or max length, sampling won't give exact result any way and will complicate query process. This PR will help users who want to have length to be specified (ex: by means of cast function usage). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113912267 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillStringLeftRightFuncHolder.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to you under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.expr.fn; + +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers; + +import java.util.List; + +/** + * Function holder for functions with function scope set as + * {@link org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope#STRING_LEFT_RIGHT}. + */ +public class DrillStringLeftRightFuncHolder extends DrillSimpleFuncHolder { + + public DrillStringLeftRightFuncHolder(FunctionAttributes functionAttributes, FunctionInitializer initializer) { +super(functionAttributes, initializer); + } + + /** + * Defines function return type and calculates output precision. + * + * Target length calculation logic for left and right functions is the same, + * they substring string the same way, just from different sides of the string. + * + * left(source, length) + * right(source, length) + * + * If length is positive, target length is given length. + * If length is negative, target length is source length minus given length. + * If after substraction length is negative, target length is 0. + * If length is 0, target length is 0. + * + * @param logicalExpressions logical expressions + * @return return type + */ + @Override + public TypeProtos.MajorType getReturnType(List logicalExpressions) { +TypeProtos.MajorType.Builder builder = TypeProtos.MajorType.newBuilder() +.setMinorType(getReturnType().getMinorType()) +.setMode(getReturnTypeDataMode(logicalExpressions)); + +int sourceLength = logicalExpressions.get(0).getMajorType().hasPrecision() ? --- End diff -- Done. When we do not know precision, it will be left unset instead of setting max varchar length. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114957022 --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java --- @@ -340,18 +344,18 @@ public static int getJdbcDisplaySize(MajorType type) { case INTERVALYEAR: return precision > 0 ? 5 + precision // P..Y12M - : 0; // if precision is not set, return 0 because there's not enough info + : UNDEFINED; // if precision is not set, return 0 because there's not enough info case INTERVALDAY: return precision > 0 ? 12 + precision // P..DT12H60M60S assuming fractional seconds precision is not supported - : 0; // if precision is not set, return 0 because there's not enough info + : UNDEFINED; // if precision is not set, return 0 because there's not enough info case INTERVAL: case MAP: case LATE: case NULL: -case UNION: return 0; +case UNION: return UNDEFINED; --- End diff -- If `MajorType` is `UNION`, it's `getSubTypeList()` is not empty. But it contains only minor types, i.e. no precision to calculate exact length but approximate length can be calculated. To change union type calculation we need to make changes in `Types.class` and in `fieldmeta.cpp` (C++ part). I'll have created separate Jira for this change - DRILL-5475. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113908238 --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java --- @@ -636,43 +658,63 @@ public static String toString(final MajorType type) { /** * Get the precision of given type. - * @param majorType - * @return + * + * @param majorType major type + * @return precision value */ public static int getPrecision(MajorType majorType) { -MinorType type = majorType.getMinorType(); - -if (type == MinorType.VARBINARY || type == MinorType.VARCHAR) { - return 65536; -} - if (majorType.hasPrecision()) { return majorType.getPrecision(); } -return 0; +return isScalarStringType(majorType) ? MAX_VARCHAR_LENGTH : UNDEFINED; } /** * Get the scale of given type. - * @param majorType - * @return + * + * @param majorType major type + * @return scale value */ public static int getScale(MajorType majorType) { if (majorType.hasScale()) { return majorType.getScale(); } -return 0; +return UNDEFINED; } /** - * Is the given type column be used in ORDER BY clause? - * @param type - * @return + * Checks if the given type column be used in ORDER BY clause. --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114966942 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/TypeInferenceUtils.java --- @@ -394,31 +414,37 @@ public RelDataType inferReturnType(SqlOperatorBinding opBinding) { } private static class DrillConcatSqlReturnTypeInference implements SqlReturnTypeInference { -private static final DrillConcatSqlReturnTypeInference INSTANCE = new DrillConcatSqlReturnTypeInference(); --- End diff -- Logic is close but different. I have think we can leave them separated for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r114593932 --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java --- @@ -510,6 +517,38 @@ public void testIgnoreSkipHeaderFooterForSequencefile() throws Exception { .go(); } + @Test --- End diff -- As explained in previous comment, minor type for decimals is chosen according to precision length, thus there is not issue here. Agree that there is lack of tests for decimals but I guess they can be added with different Jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #819: DRILL-5419: Calculate return string length for lite...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/819#discussion_r113915450 --- Diff: protocol/src/main/protobuf/Types.proto --- @@ -70,8 +70,8 @@ enum MinorType { message MajorType { optional MinorType minor_type = 1; optional DataMode mode = 2; - optional int32 width = 3; // optional width for fixed size values. - optional int32 precision = 4; // used for decimal types + optional int32 width = 3; + optional int32 precision = 4; // used for decimal types or as optional length for fixed size values --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5476) Calculate return type precision value for substring functions (substring. substr, left, right)
Arina Ielchiieva created DRILL-5476: --- Summary: Calculate return type precision value for substring functions (substring. substr, left, right) Key: DRILL-5476 URL: https://issues.apache.org/jira/browse/DRILL-5476 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.10.0 Reporter: Arina Ielchiieva Priority: Minor Currently when applying substring / substr / left / right functions Drill doesn't calculate return length. Since we know calculation rules for these functions, we can predict return length. For example: substring(cast(col as varchar(10), 1, 5) -> varchar(5) substring(cast(col as varchar(10), 5) -> varchar(5) Similar changes has been done in DRILL-5419 for other string functions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5475) Calculate union type precision and display size
Arina Ielchiieva created DRILL-5475: --- Summary: Calculate union type precision and display size Key: DRILL-5475 URL: https://issues.apache.org/jira/browse/DRILL-5475 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.10.0 Reporter: Arina Ielchiieva Priority: Minor Union type is subset of other types. We can calculate its precision and display size by choosing max from available types. For example, if `MajorType` is `UNION`, it's `getSubTypeList()` is not empty. But it contains only minor types, i.e. no precision to calculate exact length but approximate length can be calculated. Changes should be done in: *Types.class* https://github.com/apache/drill/blob/master/common/src/main/java/org/apache/drill/common/types/Types.java#L354 *fieldmeta.cpp* https://github.com/apache/drill/blob/master/contrib/native/client/src/clientlib/fieldmeta.cpp#L257 https://github.com/apache/drill/blob/master/contrib/native/client/src/clientlib/fieldmeta.cpp#L346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)