[jira] [Created] (DRILL-7660) Modify Drill Dockerfiles

2020-03-24 Thread Abhishek Girish (Jira)
Abhishek Girish created DRILL-7660:
--

 Summary: Modify Drill Dockerfiles
 Key: DRILL-7660
 URL: https://issues.apache.org/jira/browse/DRILL-7660
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Abhishek Girish
Assignee: Abhishek Girish






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7659) Add support for Helm Charts based deployment on Kubernetes

2020-03-24 Thread Abhishek Girish (Jira)
Abhishek Girish created DRILL-7659:
--

 Summary: Add support for Helm Charts based deployment on Kubernetes
 Key: DRILL-7659
 URL: https://issues.apache.org/jira/browse/DRILL-7659
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Abhishek Girish
Assignee: Abhishek Girish






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Excessive Memory Use in Parquet Files (From Drill Slack Channel)

2020-03-24 Thread Paul Rogers
Hi Charles,
Thanks for forwarding this. Looks like Idan found the right answer. Still, I 
repeated the analysis and have some suggestions.

Looked at the code mentioned in the message chain. This is a place where our 
error handling could use work:

  public void allocateNew() throws OutOfMemoryException {
    if (!allocateNewSafe()) {
  throw new OutOfMemoryException();
    }
  }

Some default allocation failed, but we preserve none of the relevant 
information: nothing about the kind of vector, nothing about the cause of the 
failure. There are dozens of implementations of allocateNewSafe(); it is 
impossible to determine which was called.

A typical implementation:
  public boolean allocateNewSafe() {
    long curAllocationSize = ...
    try{
  allocateBytes(curAllocationSize);
    } catch (DrillRuntimeException ex) {
  return false;
    }
    return true;
  }


We catch the exception, then ignore it. Sign... We can fix all this, but it 
does not help with this specific issue. See DRILL-7658.

As it turns out, most of the implementations are in the generated vector 
classes. These classes, oddly, have their own redundant copy of 
allocateNewSafe(). Since we don't see those methods on the stack, we can 
quickly narrow down the candidates to:

* AbstractMapVector (any map vector)
* A few others that won't occur for Parquet


Given this, it means the allocation is failing when allocating a map. Idan 
mentions "one column is an array of a single element comprised of 15 columns". 
We can presume that the "element" is actually a map, and that the map has 15 
columns.

So, looks like the map allocation failed during a partition sender (the next 
element on the stack). The partition sender takes incoming batches (presumably 
from he scan, though the stack trace does not say because were at the root of 
the DAG), and splits them by key to destination nodes.

Idan mentions the query runs on a single machine. So, the partitions are only 
to threads on that same machine. Idan mentions a 16-core machine. Since Drill 
parallelizes queries to 70% of the cores, we may be running 11 threads, so each 
partition sender tries to buffer data for 11 receivers. Each will buffer three 
batches of data for a total of 33 batches.

Next we need to know how many records are in each batch. Seems we have two 
default values, defined in drill-override.conf:


    store.parquet.flat.batch.num_records: 32767,
    store.parquet.complex.batch.num_records: 4000,


If we think the record has a map, then perhaps Parquet choose the "complex" 
count of 4000 records? I think this can be checked by looking at the query 
profile which, if I recall, should be produced even for a failed query.

So, let's guess 4000 records * 33 buffered batches = 130K records. We don't 
know the size of each, however. (And, note that Idan said that he artificially 
increased parallelism, so the buffering need is greater than the above 
back-of-the-envelope calcs.)


We do know the size of the data: 15 files of 150K each. Let's assume that is 
compressed. So, if all files are in memory, that would be 15 * 150K * 10:1 
compression ratio = 22 MB, which is tiny. So, it is unlikely that, Drill is 
actually buffering all 33 batches. This tells us that something else is going 
wrong; we are not actually running out memory for data, just as Idan suggested, 
we are exhausting memory for some other reason.


Reading further it looks like Idan found his own solution. He increased 
parallelism to the point where the internal buffering of each Parquet reader 
used up all available memory. This is probably a bug, but Parquet is a a 
fiendishly complex beast. Over time, people threw all kinds of parallel 
readers, buffering and other things at it to beat Impala in TPC benchmarks.

Since a query that finishes is faster than a highly-tuned query that crashes, 
I'd recommend throttling the slice count back. You really only need as many as 
there are cores. In fact, you need less. Unlike other readers, Parquet launches 
a bunch of its own parallel readers so each single Parquet reader will have 
many (I don't recall the number) of parallel column readers, each aggressively 
buffering everything it can.

Since the data is small, there is no need for such heroics: Drill can read 20+ 
meg of data quite quickly, even with a few threads. So, try that first and see 
if that works.

Once the query works, study the query profile to determine the memory budget 
and CPU usage. Tune from there, keeping memory well within the available bounds.


Thanks,
- Paul

 

On Tuesday, March 24, 2020, 11:46:47 AM PDT, Charles Givre 
 wrote:  
 
 
Idan Sheinberg  8:21 AM
Hi  there
I'm trying run a simple offset query (ORDER BY timestamp LIMIT 500 OFFSET 1000) 
against rather complex parquet files (say 4 columns, once being an array 
currently consisting of a single element comprised of 15 columns)
All files share the same Schema, of course.
 User Error Occurred: One or more nodes ran out 

[jira] [Created] (DRILL-7658) Vector allocateNew() has poor error reporting

2020-03-24 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7658:
--

 Summary: Vector allocateNew() has poor error reporting
 Key: DRILL-7658
 URL: https://issues.apache.org/jira/browse/DRILL-7658
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Paul Rogers


See posting by Charles on 2020-03-24 on the user and dev lists of a message 
forwarded from another user where a query ran out of memory. Stack trace:

{noformat}
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null
    at 
org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59)
    at 
org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.
{noformat}

Notice the complete lack of context. The method in question:

{code:java}
  public void allocateNew() throws OutOfMemoryException {
   if (!allocateNewSafe()) {
 throw new OutOfMemoryException();
 }
   }
{code}

A generated implementation of the {{allocateNewSafe()}} method:

{code:java}
  @Override
  public boolean allocateNewSafe() {
long curAllocationSize = allocationSizeInBytes;
if (allocationMonitor > 10) {
  curAllocationSize = Math.max(8, curAllocationSize / 2);
  allocationMonitor = 0;
} else if (allocationMonitor < -2) {
  curAllocationSize = allocationSizeInBytes * 2L;
  allocationMonitor = 0;
}

try{
  allocateBytes(curAllocationSize);
} catch (DrillRuntimeException ex) {
  return false;
}
return true;
  }
{code}

Note that the {{allocateNew()}} method is not "safe" (it throws an exception), 
but it does so by discarding the underlying exception. What should happen is 
that the "non-safe" {{allocateNew()}} should call the {{allocateBytes()}} 
method and simply forward the {{DrillRuntimeException}}. It probably does not 
do so because the author wanted to reuse the extra size calcs in 
{{allocateNewSafe()}}.

The solution is to put the calcs and the call to {{allocateBytes()}} in a 
"non-safe" method, and call that entire method from {{allocateNew()}} and 
{{allocateNewSafe()}}.  Or, better, generate {{allocateNew()}} using the above 
code, but have the base class define {{allocateNewSafe()}} as a wrapper.

Note an extra complexity: although the base class provides the method shown 
above, each generated vector also provides:

{code:java}
  @Override
  public void allocateNew() {
if (!allocateNewSafe()) {
  throw new OutOfMemoryException("Failure while allocating buffer.");
}
  }
{code}

Which is both redundant and inconsistent (one has a message, the other does 
not.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Excessive Memory Use in Parquet Files (From Drill Slack Channel)

2020-03-24 Thread Charles Givre


Idan Sheinberg  8:21 AM
Hi  there
I'm trying run a simple offset query (ORDER BY timestamp LIMIT 500 OFFSET 1000) 
against rather complex parquet files (say 4 columns, once being an array 
currently consisting of a single element comprised of 15 columns)
All files share the same Schema, of course.
 User Error Occurred: One or more nodes ran out of memory while executing the 
query. (null)
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.
null
[Error Id: 67b61fc9-320f-47a1-8718-813843a10ecc ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:338)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null
at 
org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59)
at 
org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.java:380)
at 
org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:400)
at 
org.apache.drill.exec.test.generated.PartitionerGen5.setup(PartitionerTemplate.java:126)
at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createClassInstances(PartitionSenderRootExec.java:263)
at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createPartitioner(PartitionSenderRootExec.java:218)
at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:188)
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
... 4 common frames omitted
Now, I'm running this query from a 16 core, 32GB Ram machine, with Heap sized 
at 20GB, Eden sized at 16GB (added manually to JAVA_OPTS) and Direct Sized at 8 
GB.
By querying sys.memory I can confirm all limits apply. At no point throughout 
the query Am I nearing memory limit of the HEAP/DIRECT or the OS itself





8:25
However, due to the way 
org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew is 
impelmented
8:27
@Override
  public void allocateNew() throws OutOfMemoryException {
if (!allocateNewSafe()) {
  throw new OutOfMemoryException();
}
  }
8:27
The actual exception/error is swallowed, and I have no idea what's the cause of 
the failure
8:28
The data-set itself consists of say 15 parquet files, each one weighing at 
about 100kb
8:30
but as mentioned earlier, the parquet files are a bit more complex than the 
usual.
8:32
@cgivre @Vova Vysotskyi is there anything I can do or tweak to make this error 
go away?

cgivre  8:40 AM
Hmm...
8:40
This may be a bug.  Can you create an issue on our JIRA board?

Idan Sheinberg  8:43 AM
Sure
8:43
I'll get to it

cgivre  8:44 AM
I'd like for Paul Rogers to see this as I think he was the author of some of 
this.

Idan Sheinberg  8:44 AM
Hmm. I'll keep that in mind

cgivre  8:47 AM
We've been refactoring some of the complex readers as well, so its possible 
that is caused this, but I'm not really sure.
8:47
What version of Drill?

cgivre  9:11 AM
This kind of info is super helpful as we're trying to work out all these 
details.
9:11
Reading schemas on the fly is not trivial, so when we find issues, we do like 
to resolve them

Idan Sheinberg  9:16 AM
This is drill 0.18 -SNAPSHOT as of last month
9:16
U
9:16
I do think I managed to resolve the issue however
9:16
I'm going to run some additional tests and let you know

cgivre  9:16 AM
What did you do?
9:17
You might want to rebase with today's build as well

Idan Sheinberg  9:21 AM
I'll come back with the details in a few moments

cgivre  9:38 AM
Thx
new messages

Idan Sheinberg  9:50 AM
Ok. See it seems as though it's a combination of a few things.
The data-set in question is still small (as mentioned before), but we are 
setting planner.slice_target  to an extremely low value in order to trigger 

[GitHub] [drill] paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated 
list accessors
URL: https://github.com/apache/drill/pull/2018#issuecomment-603434987
 
 
   @arina-ielchiieva, thanks for the suggestions, they solved the problem. I 
suspect we have some confusion with those methods based on earlier testing, but 
we'll fix those issues later.
   
   Squashed commits and reran all unit tests. We should be good to go. Thanks 
again for your help.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version

2020-03-24 Thread GitBox
cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to 
Hive3.1 version
URL: https://github.com/apache/drill/pull/2038#issuecomment-603251228
 
 
   @vvysotskyi 
   Thanks for the response. The reason I ask is that for many enterprises, they 
are at the mercy of IT teams and in many cases they are forced to use older 
versions of tools like Hive.  Often, the upgrade schedules are years behind the 
most current version.  (I'm speaking from experience here..;-))
   
   Obviously we should do our best to support the most current version of 
common platforms like Hive, however, I think you would be surprised at how many 
large enterprises still use very old versions of these tools.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version

2020-03-24 Thread GitBox
vvysotskyi commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to 
Hive3.1 version
URL: https://github.com/apache/drill/pull/2038#issuecomment-603237847
 
 
   @cgivre,
   1. Info about building Drill for different Hive versions is a workaround, we 
shouldn't recommend doing it for users if other alternatives available, so I 
don't think that we should document it. Also, when all supported profiles are 
moved to this version of Hive, code will be cleaned up to remove hacks which 
allowed using older versions.
   2. I don't see a way to achieve this. The issue here is in the interaction 
between Hive Jars with different versions. I assume besides API changes, also 
thrift format updates.
   3. I don't think so. For updating versions to 2.3.2, code was changed, so I 
think it still incompatible.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version

2020-03-24 Thread GitBox
cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to 
Hive3.1 version
URL: https://github.com/apache/drill/pull/2038#issuecomment-603223087
 
 
   @vvysotskyi 
   Thanks for submitting this.  A few questions:
   1.  Can we please include the bit about building Drill for different Hive 
versions in the documentation?
   2.  Is there any way to write this such that the users will not have to 
build a dedicated version of Drill for each version of Hive or are the Hive 
APIs so different that it is not practical?  
   3.  If we are requiring that Drill is built specially for a given version of 
Hive, is it possible to support older versions than `2.3.2`?  
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version

2020-03-24 Thread GitBox
cgivre commented on issue #2038: DRILL-6604: Upgrade Drill Hive client to 
Hive3.1 version
URL: https://github.com/apache/drill/pull/2038#issuecomment-603219947
 
 
   @vvysotskyi 
   Thank you for submitting this!  I am wondering whether we test for backwards 
compatibility with older versions of Hive as we upgrade?  What is the earliest 
version we currently support?
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre removed a comment on issue #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version

2020-03-24 Thread GitBox
cgivre removed a comment on issue #2038: DRILL-6604: Upgrade Drill Hive client 
to Hive3.1 version
URL: https://github.com/apache/drill/pull/2038#issuecomment-603219947
 
 
   @vvysotskyi 
   Thank you for submitting this!  I am wondering whether we test for backwards 
compatibility with older versions of Hive as we upgrade?  What is the earliest 
version we currently support?
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva edited a comment on issue #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
arina-ielchiieva edited a comment on issue #2018: DRILL-7633: Fixes for union 
and repeated list accessors
URL: https://github.com/apache/drill/pull/2018#issuecomment-603206563
 
 
   @paul-rogers thanks for addressing code review comment, I have left comments 
how to fix unit test failures, please apply them and squash the commits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
arina-ielchiieva commented on issue #2018: DRILL-7633: Fixes for union and 
repeated list accessors
URL: https://github.com/apache/drill/pull/2018#issuecomment-603206563
 
 
   @paul-rogers thanks for addressing code review comment, I have left comment 
how to fix unit test failures, please apply them and squash the commits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #2018: DRILL-7633: Fixes 
for union and repeated list accessors
URL: https://github.com/apache/drill/pull/2018#discussion_r397109336
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/VariantColumnMetadata.java
 ##
 @@ -145,16 +147,28 @@ public ColumnMetadata cloneEmpty() {
 
   @Override
   public ColumnMetadata copy() {
-// TODO Auto-generated method stub
-assert false;
-return null;
+return new VariantColumnMetadata(name, type, mode, variantSchema.copy());
   }
 
   @Override
   public VariantMetadata variantSchema() {
 return variantSchema;
   }
 
+  @JsonProperty("type")
 
 Review comment:
   Remove json property annotation as it will be present in parent class. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
arina-ielchiieva commented on a change in pull request #2018: DRILL-7633: Fixes 
for union and repeated list accessors
URL: https://github.com/apache/drill/pull/2018#discussion_r397109112
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/AbstractColumnMetadata.java
 ##
 @@ -295,12 +295,6 @@ public String toString() {
 .toString();
   }
 
-  @JsonProperty("type")
-  @Override
-  public String typeString() {
-return majorType().toString();
-  }
-
 
 Review comment:
   Please don't remove this method but leave it as abstract with json property 
annotation:
   ```
 @JsonProperty("type")
 @Override
 public abstract String typeString();
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi opened a new pull request #2038: DRILL-6604: Upgrade Drill Hive client to Hive3.1 version

2020-03-24 Thread GitBox
vvysotskyi opened a new pull request #2038: DRILL-6604: Upgrade Drill Hive 
client to Hive3.1 version
URL: https://github.com/apache/drill/pull/2038
 
 
   # [DRILL-6604](https://issues.apache.org/jira/browse/DRILL-6604): Upgrade 
Drill Hive client to Hive3.1 version
   
   ## Description
   
   One of the major changes in this PR is cleaning up the output for Hive 
tests. Now, almost all hive-related stuff is not printed to the stdout.
   
   Additionally, the Hive version was updated to the latest (at the current 
time) version 3.1.2.
   New Hive version introduced new `ObjectInspector` classes for date and 
timestamp values, so to be able to compile with other versions, code was 
updated to use correct class names (see changes in `tdd` files and related 
changes.
   
   As for any other Hive update, Dill wouldn't be able to work with previous 
Hive versions, but for users, who still want to do it, it is possible to 
compile manually Drill with setting the following properties:
   `hive.version=2.3.2` (or any other version in range [2.3.2-3.1.2]), 
`freemarker.conf.file=src/main/codegen/config.fmpp` (or 
`src/main/codegen/configHive3.fmpp` for versions where new date / timestamp 
classes were introduced).
   Example of usage:
   ```
   mvn clean install -DskipTests -Dhive.version=2.3.2 
-Dfreemarker.conf.file=src/main/codegen/config.fmpp
   ```
   
   ## Documentation
   A new supported Hive version should be documented.
   
   ## Testing
   Ran all tests suite, checked manually that Drill is able to select from new 
Hive.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva merged pull request #2037: DRILL-7648: Scrypt j_security_check works without security headers

2020-03-24 Thread GitBox
arina-ielchiieva merged pull request #2037: DRILL-7648: Scrypt j_security_check 
works without security headers
URL: https://github.com/apache/drill/pull/2037
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated 
list accessors
URL: https://github.com/apache/drill/pull/2018#issuecomment-603078801
 
 
   After rebasing I'm seeing a number of metastore test failures:
   ```
   [ERROR] Errors: 
   [ERROR]   TestTableMetadataUnitConversion.testBaseTableMetadata:145 » 
IllegalArgument Un...
   [ERROR]   TestTableMetadataUnitConversion.testFileMetadata:280 » 
IllegalArgument Unable ...
   [ERROR]   TestTableMetadataUnitConversion.testPartitionMetadata:383 » 
IllegalArgument Un...
   [ERROR]   TestTableMetadataUnitConversion.testRowGroupMetadata:332 » 
IllegalArgument Una...
   [ERROR]   TestTableMetadataUnitConversion.testSegmentMetadata:237 » 
IllegalArgument Unab...
   ```
   
   It is not immediately obvious what went wrong; I'll poke around tomorrow to 
see if I can find the cause.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
paul-rogers commented on issue #2018: DRILL-7633: Fixes for union and repeated 
list accessors
URL: https://github.com/apache/drill/pull/2018#issuecomment-603075717
 
 
   Squashed commits and rebased on the latest master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for union and repeated list accessors

2020-03-24 Thread GitBox
paul-rogers commented on a change in pull request #2018: DRILL-7633: Fixes for 
union and repeated list accessors
URL: https://github.com/apache/drill/pull/2018#discussion_r396947139
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/VariantColumnMetadata.java
 ##
 @@ -96,11 +134,13 @@ public StructureType structureType() {
   public boolean isVariant() { return true; }
 
   @Override
-  public boolean isArray() { return type() == MinorType.LIST; }
+  public boolean isArray() {
+return super.isArray() || type() == MinorType.LIST;
+  }
 
   @Override
   public ColumnMetadata cloneEmpty() {
-return new VariantColumnMetadata(name, type, variantSchema.cloneEmpty());
+return new VariantColumnMetadata(name, type, mode, new VariantSchema());
 
 Review comment:
   @arina-ielchiieva, thanks for the reminder; I did miss your response.
   
   I think I see where we had a misunderstanding. I thought the `typeString()` 
code already worked because I saw this in `AbstractColumnMetadata`:
   
   ```
 public String typeString() {
   return majorType().toString();
 }
   ```
   
`PrimitiveColumnMetadata` uses a different set of type names.
   
   It seems this area has gotten a bit muddled: the  `AbstractColumnMetadata` 
version seems to never be called, except for `VariantColumnMetadata`. (This 
confused the heck out of me on an earlier attempt to clean up this area, but I 
didn't clue into the problem then.)
   
   So, now that I understand what you are asking for, I did this:
   
   * Removed the `AbstractColumnMetadata` version.
   * Added a  `VariantColumnMetadata` that produces either `UNION` or 
`ARRAY`.
   
   Until I see a good reason otherwise, I think we should treat `UNION` (what I 
call a "variant") as an opaque type. I guess I don't see the contents of a 
variant as something we want to specify, either in the metastore or in a 
provided schema. For example, can the metastore compute an NDV across an `INT`, 
`VARCHAR` and `MAP`? Probably not. Nor do the min/max values or other stats 
make sense for a variant. As a result, a variant is an opaque type for the 
metastore.
   
   Similarly, in a provided schema, I can't convince myself that the user wants 
not only to say, "the type of this field can vary", but also that "the type can 
be an `INT`, `DOUBLE` or `VARCHAR`, but not a `BIGINT`.) Just does not seem 
super-useful.
   
   An additional concern, which I think I mentioned somewhere else, is that as 
soon as we start serializing complex types, the SQL-like text format becomes 
far too cumbersome. We'd be much better of with a JSON format that can capture 
the complex tree structure. Compare:
   
   ```
   (A UNION)>)
   ```
   With something like:
   ```
   { schema: [
 {name: "a",
 type: {
name: "UNION",
nullable: true,
subtypes: [
 {name: INT, nullable: false},
 {name: MAP, nullable: false},
 members: [ ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-7657) Invalid internal link

2020-03-24 Thread Aaron-Mhs (Jira)
Aaron-Mhs created DRILL-7657:


 Summary: Invalid internal link
 Key: DRILL-7657
 URL: https://issues.apache.org/jira/browse/DRILL-7657
 Project: Apache Drill
  Issue Type: Bug
  Components: Security
Affects Versions: 1.17.0
Reporter: Aaron-Mhs
 Attachments: image-2020-03-24-14-57-53-568.png, 
image-2020-03-24-14-58-17-865.png

There is an invalid link (Prerequisites-> See Enabling Authentication and 
Encryption.) In the Configuring Kerberos Security module in the document. After 
opening, it displays 404 (The requested URL was not found on this server.), And 
I hope to fix it as soon as possible.

!image-2020-03-24-14-57-53-568.png!

!image-2020-03-24-14-58-17-865.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)