[jira] [Created] (DRILL-7660) Modify Drill Dockerfiles
Abhishek Girish created DRILL-7660: -- Summary: Modify Drill Dockerfiles Key: DRILL-7660 URL: https://issues.apache.org/jira/browse/DRILL-7660 Project: Apache Drill Issue Type: Sub-task Reporter: Abhishek Girish Assignee: Abhishek Girish -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7659) Add support for Helm Charts based deployment on Kubernetes
Abhishek Girish created DRILL-7659: -- Summary: Add support for Helm Charts based deployment on Kubernetes Key: DRILL-7659 URL: https://issues.apache.org/jira/browse/DRILL-7659 Project: Apache Drill Issue Type: Sub-task Reporter: Abhishek Girish Assignee: Abhishek Girish -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Excessive Memory Use in Parquet Files (From Drill Slack Channel)
Hi Charles, Thanks for forwarding this. Looks like Idan found the right answer. Still, I repeated the analysis and have some suggestions. Looked at the code mentioned in the message chain. This is a place where our error handling could use work: public void allocateNew() throws OutOfMemoryException { if (!allocateNewSafe()) { throw new OutOfMemoryException(); } } Some default allocation failed, but we preserve none of the relevant information: nothing about the kind of vector, nothing about the cause of the failure. There are dozens of implementations of allocateNewSafe(); it is impossible to determine which was called. A typical implementation: public boolean allocateNewSafe() { long curAllocationSize = ... try{ allocateBytes(curAllocationSize); } catch (DrillRuntimeException ex) { return false; } return true; } We catch the exception, then ignore it. Sign... We can fix all this, but it does not help with this specific issue. See DRILL-7658. As it turns out, most of the implementations are in the generated vector classes. These classes, oddly, have their own redundant copy of allocateNewSafe(). Since we don't see those methods on the stack, we can quickly narrow down the candidates to: * AbstractMapVector (any map vector) * A few others that won't occur for Parquet Given this, it means the allocation is failing when allocating a map. Idan mentions "one column is an array of a single element comprised of 15 columns". We can presume that the "element" is actually a map, and that the map has 15 columns. So, looks like the map allocation failed during a partition sender (the next element on the stack). The partition sender takes incoming batches (presumably from he scan, though the stack trace does not say because were at the root of the DAG), and splits them by key to destination nodes. Idan mentions the query runs on a single machine. So, the partitions are only to threads on that same machine. Idan mentions a 16-core machine. Since Drill parallelizes queries to 70% of the cores, we may be running 11 threads, so each partition sender tries to buffer data for 11 receivers. Each will buffer three batches of data for a total of 33 batches. Next we need to know how many records are in each batch. Seems we have two default values, defined in drill-override.conf: store.parquet.flat.batch.num_records: 32767, store.parquet.complex.batch.num_records: 4000, If we think the record has a map, then perhaps Parquet choose the "complex" count of 4000 records? I think this can be checked by looking at the query profile which, if I recall, should be produced even for a failed query. So, let's guess 4000 records * 33 buffered batches = 130K records. We don't know the size of each, however. (And, note that Idan said that he artificially increased parallelism, so the buffering need is greater than the above back-of-the-envelope calcs.) We do know the size of the data: 15 files of 150K each. Let's assume that is compressed. So, if all files are in memory, that would be 15 * 150K * 10:1 compression ratio = 22 MB, which is tiny. So, it is unlikely that, Drill is actually buffering all 33 batches. This tells us that something else is going wrong; we are not actually running out memory for data, just as Idan suggested, we are exhausting memory for some other reason. Reading further it looks like Idan found his own solution. He increased parallelism to the point where the internal buffering of each Parquet reader used up all available memory. This is probably a bug, but Parquet is a a fiendishly complex beast. Over time, people threw all kinds of parallel readers, buffering and other things at it to beat Impala in TPC benchmarks. Since a query that finishes is faster than a highly-tuned query that crashes, I'd recommend throttling the slice count back. You really only need as many as there are cores. In fact, you need less. Unlike other readers, Parquet launches a bunch of its own parallel readers so each single Parquet reader will have many (I don't recall the number) of parallel column readers, each aggressively buffering everything it can. Since the data is small, there is no need for such heroics: Drill can read 20+ meg of data quite quickly, even with a few threads. So, try that first and see if that works. Once the query works, study the query profile to determine the memory budget and CPU usage. Tune from there, keeping memory well within the available bounds. Thanks, - Paul On Tuesday, March 24, 2020, 11:46:47 AM PDT, Charles Givre wrote: Idan Sheinberg 8:21 AM Hi there I'm trying run a simple offset query (ORDER BY timestamp LIMIT 500 OFFSET 1000) against rather complex parquet files (say 4 columns, once being an array currently consisting of a single element comprised of 15 columns) All files share the same Schema, of course. User Error Occurred: One or more nodes ran out
[jira] [Created] (DRILL-7658) Vector allocateNew() has poor error reporting
Paul Rogers created DRILL-7658: -- Summary: Vector allocateNew() has poor error reporting Key: DRILL-7658 URL: https://issues.apache.org/jira/browse/DRILL-7658 Project: Apache Drill Issue Type: Bug Affects Versions: 1.17.0 Reporter: Paul Rogers See posting by Charles on 2020-03-24 on the user and dev lists of a message forwarded from another user where a query ran out of memory. Stack trace: {noformat} Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null at org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59) at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate. {noformat} Notice the complete lack of context. The method in question: {code:java} public void allocateNew() throws OutOfMemoryException { if (!allocateNewSafe()) { throw new OutOfMemoryException(); } } {code} A generated implementation of the {{allocateNewSafe()}} method: {code:java} @Override public boolean allocateNewSafe() { long curAllocationSize = allocationSizeInBytes; if (allocationMonitor > 10) { curAllocationSize = Math.max(8, curAllocationSize / 2); allocationMonitor = 0; } else if (allocationMonitor < -2) { curAllocationSize = allocationSizeInBytes * 2L; allocationMonitor = 0; } try{ allocateBytes(curAllocationSize); } catch (DrillRuntimeException ex) { return false; } return true; } {code} Note that the {{allocateNew()}} method is not "safe" (it throws an exception), but it does so by discarding the underlying exception. What should happen is that the "non-safe" {{allocateNew()}} should call the {{allocateBytes()}} method and simply forward the {{DrillRuntimeException}}. It probably does not do so because the author wanted to reuse the extra size calcs in {{allocateNewSafe()}}. The solution is to put the calcs and the call to {{allocateBytes()}} in a "non-safe" method, and call that entire method from {{allocateNew()}} and {{allocateNewSafe()}}. Or, better, generate {{allocateNew()}} using the above code, but have the base class define {{allocateNewSafe()}} as a wrapper. Note an extra complexity: although the base class provides the method shown above, each generated vector also provides: {code:java} @Override public void allocateNew() { if (!allocateNewSafe()) { throw new OutOfMemoryException("Failure while allocating buffer."); } } {code} Which is both redundant and inconsistent (one has a message, the other does not.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
Excessive Memory Use in Parquet Files (From Drill Slack Channel)
Idan Sheinberg 8:21 AM Hi there I'm trying run a simple offset query (ORDER BY timestamp LIMIT 500 OFFSET 1000) against rather complex parquet files (say 4 columns, once being an array currently consisting of a single element comprised of 15 columns) All files share the same Schema, of course. User Error Occurred: One or more nodes ran out of memory while executing the query. (null) org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. null [Error Id: 67b61fc9-320f-47a1-8718-813843a10ecc ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:338) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null at org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59) at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.java:380) at org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:400) at org.apache.drill.exec.test.generated.PartitionerGen5.setup(PartitionerTemplate.java:126) at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createClassInstances(PartitionSenderRootExec.java:263) at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createPartitioner(PartitionSenderRootExec.java:218) at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:188) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310) ... 4 common frames omitted Now, I'm running this query from a 16 core, 32GB Ram machine, with Heap sized at 20GB, Eden sized at 16GB (added manually to JAVA_OPTS) and Direct Sized at 8 GB. By querying sys.memory I can confirm all limits apply. At no point throughout the query Am I nearing memory limit of the HEAP/DIRECT or the OS itself 8:25 However, due to the way org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew is impelmented 8:27 @Override public void allocateNew() throws OutOfMemoryException { if (!allocateNewSafe()) { throw new OutOfMemoryException(); } } 8:27 The actual exception/error is swallowed, and I have no idea what's the cause of the failure 8:28 The data-set itself consists of say 15 parquet files, each one weighing at about 100kb 8:30 but as mentioned earlier, the parquet files are a bit more complex than the usual. 8:32 @cgivre @Vova Vysotskyi is there anything I can do or tweak to make this error go away? cgivre 8:40 AM Hmm... 8:40 This may be a bug. Can you create an issue on our JIRA board? Idan Sheinberg 8:43 AM Sure 8:43 I'll get to it cgivre 8:44 AM I'd like for Paul Rogers to see this as I think he was the author of some of this. Idan Sheinberg 8:44 AM Hmm. I'll keep that in mind cgivre 8:47 AM We've been refactoring some of the complex readers as well, so its possible that is caused this, but I'm not really sure. 8:47 What version of Drill? cgivre 9:11 AM This kind of info is super helpful as we're trying to work out all these details. 9:11 Reading schemas on the fly is not trivial, so when we find issues, we do like to resolve them Idan Sheinberg 9:16 AM This is drill 0.18 -SNAPSHOT as of last month 9:16 U 9:16 I do think I managed to resolve the issue however 9:16 I'm going to run some additional tests and let you know cgivre 9:16 AM What did you do? 9:17 You might want to rebase with today's build as well Idan Sheinberg 9:21 AM I'll come back with the details in a few moments cgivre 9:38 AM Thx new messages Idan Sheinberg 9:50 AM Ok. See it seems as though it's a combination of a few things. The data-set in question is still small (as mentioned before), but we are setting planner.slice_target to an extremely low value in order to trigger
[GitHub] [drill] paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors
paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors URL: https://github.com/apache/drill/pull/2018#issuecomment-603434987 @arina-ielchiieva, thanks for the suggestions, they solved the problem. I suspect we have some confusion with those methods based on earlier testing, but we'll fix those issues later. Squashed commits and reran all unit tests. We should be good to go. Thanks again for your help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version
cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version URL: https://github.com/apache/drill/pull/2038#issuecomment-603251228 @vvysotskyi Thanks for the response. The reason I ask is that for many enterprises, they are at the mercy of IT teams and in many cases they are forced to use older versions of tools like Hive. Often, the upgrade schedules are years behind the most current version. (I'm speaking from experience here..;-)) Obviously we should do our best to support the most current version of common platforms like Hive, however, I think you would be surprised at how many large enterprises still use very old versions of these tools. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vvysotskyi commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version
vvysotskyi commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version URL: https://github.com/apache/drill/pull/2038#issuecomment-603237847 @cgivre, 1. Info about building Drill for different Hive versions is a workaround, we shouldn't recommend doing it for users if other alternatives available, so I don't think that we should document it. Also, when all supported profiles are moved to this version of Hive, code will be cleaned up to remove hacks which allowed using older versions. 2. I don't see a way to achieve this. The issue here is in the interaction between Hive Jars with different versions. I assume besides API changes, also thrift format updates. 3. I don't think so. For updating versions to 2.3.2, code was changed, so I think it still incompatible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version
cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version URL: https://github.com/apache/drill/pull/2038#issuecomment-603223087 @vvysotskyi Thanks for submitting this. A few questions: 1. Can we please include the bit about building Drill for different Hive versions in the documentation? 2. Is there any way to write this such that the users will not have to build a dedicated version of Drill for each version of Hive or are the Hive APIs so different that it is not practical? 3. If we are requiring that Drill is built specially for a given version of Hive, is it possible to support older versions than `2.3.2`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version
cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version URL: https://github.com/apache/drill/pull/2038#issuecomment-603219947 @vvysotskyi Thank you for submitting this! I am wondering whether we test for backwards compatibility with older versions of Hive as we upgrade? What is the earliest version we currently support? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] cgivre removed a comment on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version
cgivre removed a comment on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version URL: https://github.com/apache/drill/pull/2038#issuecomment-603219947 @vvysotskyi Thank you for submitting this! I am wondering whether we test for backwards compatibility with older versions of Hive as we upgrade? What is the earliest version we currently support? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva edited a comment on issue #2018: DRILL-7633: Fixes for union and repeated list accessors
arina-ielchiieva edited a comment on issue #2018: DRILL-7633: Fixes for union and repeated list accessors URL: https://github.com/apache/drill/pull/2018#issuecomment-603206563 @paul-rogers thanks for addressing code review comment, I have left comments how to fix unit test failures, please apply them and squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors
arina-ielchiieva commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors URL: https://github.com/apache/drill/pull/2018#issuecomment-603206563 @paul-rogers thanks for addressing code review comment, I have left comment how to fix unit test failures, please apply them and squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors
arina-ielchiieva commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors URL: https://github.com/apache/drill/pull/2018#discussion_r397109336 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/VariantColumnMetadata.java ## @@ -145,16 +147,28 @@ public ColumnMetadata cloneEmpty() { @Override public ColumnMetadata copy() { -// TODO Auto-generated method stub -assert false; -return null; +return new VariantColumnMetadata(name, type, mode, variantSchema.copy()); } @Override public VariantMetadata variantSchema() { return variantSchema; } + @JsonProperty("type") Review comment: Remove json property annotation as it will be present in parent class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors
arina-ielchiieva commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors URL: https://github.com/apache/drill/pull/2018#discussion_r397109112 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java ## @@ -295,12 +295,6 @@ public String toString() { .toString(); } - @JsonProperty("type") - @Override - public String typeString() { -return majorType().toString(); - } - Review comment: Please don't remove this method but leave it as abstract with json property annotation: ``` @JsonProperty("type") @Override public abstract String typeString(); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vvysotskyi opened a new pull request #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version
vvysotskyi opened a new pull request #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version URL: https://github.com/apache/drill/pull/2038 # [DRILL-6604](https://issues.apache.org/jira/browse/DRILL-6604): Upgrade Drill Hive client to Hive3.1 version ## Description One of the major changes in this PR is cleaning up the output for Hive tests. Now, almost all hive-related stuff is not printed to the stdout. Additionally, the Hive version was updated to the latest (at the current time) version 3.1.2. New Hive version introduced new `ObjectInspector` classes for date and timestamp values, so to be able to compile with other versions, code was updated to use correct class names (see changes in `tdd` files and related changes. As for any other Hive update, Dill wouldn't be able to work with previous Hive versions, but for users, who still want to do it, it is possible to compile manually Drill with setting the following properties: `hive.version=2.3.2` (or any other version in range [2.3.2-3.1.2]), `freemarker.conf.file=src/main/codegen/config.fmpp` (or `src/main/codegen/configHive3.fmpp` for versions where new date / timestamp classes were introduced). Example of usage: ``` mvn clean install -DskipTests -Dhive.version=2.3.2 -Dfreemarker.conf.file=src/main/codegen/config.fmpp ``` ## Documentation A new supported Hive version should be documented. ## Testing Ran all tests suite, checked manually that Drill is able to select from new Hive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva merged pull request #2037: DRILL-7648: Scrypt j_security_check works without security headers
arina-ielchiieva merged pull request #2037: DRILL-7648: Scrypt j_security_check works without security headers URL: https://github.com/apache/drill/pull/2037 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors
paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors URL: https://github.com/apache/drill/pull/2018#issuecomment-603078801 After rebasing I'm seeing a number of metastore test failures: ``` [ERROR] Errors: [ERROR] TestTableMetadataUnitConversion.testBaseTableMetadata:145 » IllegalArgument Un... [ERROR] TestTableMetadataUnitConversion.testFileMetadata:280 » IllegalArgument Unable ... [ERROR] TestTableMetadataUnitConversion.testPartitionMetadata:383 » IllegalArgument Un... [ERROR] TestTableMetadataUnitConversion.testRowGroupMetadata:332 » IllegalArgument Una... [ERROR] TestTableMetadataUnitConversion.testSegmentMetadata:237 » IllegalArgument Unab... ``` It is not immediately obvious what went wrong; I'll poke around tomorrow to see if I can find the cause. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors
paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors URL: https://github.com/apache/drill/pull/2018#issuecomment-603075717 Squashed commits and rebased on the latest master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors
paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors URL: https://github.com/apache/drill/pull/2018#discussion_r396947139 ## File path: exec/vector/src/main/java/org/apache/drill/exec/record/metadata/VariantColumnMetadata.java ## @@ -96,11 +134,13 @@ public StructureType structureType() { public boolean isVariant() { return true; } @Override - public boolean isArray() { return type() == MinorType.LIST; } + public boolean isArray() { +return super.isArray() || type() == MinorType.LIST; + } @Override public ColumnMetadata cloneEmpty() { -return new VariantColumnMetadata(name, type, variantSchema.cloneEmpty()); +return new VariantColumnMetadata(name, type, mode, new VariantSchema()); Review comment: @arina-ielchiieva, thanks for the reminder; I did miss your response. I think I see where we had a misunderstanding. I thought the `typeString()` code already worked because I saw this in `AbstractColumnMetadata`: ``` public String typeString() { return majorType().toString(); } ``` `PrimitiveColumnMetadata` uses a different set of type names. It seems this area has gotten a bit muddled: the `AbstractColumnMetadata` version seems to never be called, except for `VariantColumnMetadata`. (This confused the heck out of me on an earlier attempt to clean up this area, but I didn't clue into the problem then.) So, now that I understand what you are asking for, I did this: * Removed the `AbstractColumnMetadata` version. * Added a `VariantColumnMetadata` that produces either `UNION` or `ARRAY`. Until I see a good reason otherwise, I think we should treat `UNION` (what I call a "variant") as an opaque type. I guess I don't see the contents of a variant as something we want to specify, either in the metastore or in a provided schema. For example, can the metastore compute an NDV across an `INT`, `VARCHAR` and `MAP`? Probably not. Nor do the min/max values or other stats make sense for a variant. As a result, a variant is an opaque type for the metastore. Similarly, in a provided schema, I can't convince myself that the user wants not only to say, "the type of this field can vary", but also that "the type can be an `INT`, `DOUBLE` or `VARCHAR`, but not a `BIGINT`.) Just does not seem super-useful. An additional concern, which I think I mentioned somewhere else, is that as soon as we start serializing complex types, the SQL-like text format becomes far too cumbersome. We'd be much better of with a JSON format that can capture the complex tree structure. Compare: ``` (A UNION)>) ``` With something like: ``` { schema: [ {name: "a", type: { name: "UNION", nullable: true, subtypes: [ {name: INT, nullable: false}, {name: MAP, nullable: false}, members: [ ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-7657) Invalid internal link
Aaron-Mhs created DRILL-7657: Summary: Invalid internal link Key: DRILL-7657 URL: https://issues.apache.org/jira/browse/DRILL-7657 Project: Apache Drill Issue Type: Bug Components: Security Affects Versions: 1.17.0 Reporter: Aaron-Mhs Attachments: image-2020-03-24-14-57-53-568.png, image-2020-03-24-14-58-17-865.png There is an invalid link (Prerequisites-> See Enabling Authentication and Encryption.) In the Configuring Kerberos Security module in the document. After opening, it displays 404 (The requested URL was not found on this server.), And I hope to fix it as soon as possible. !image-2020-03-24-14-57-53-568.png! !image-2020-03-24-14-58-17-865.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)