[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536535#comment-16536535 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers edited a comment on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-403358757 Rebased on latest master. Cherry-picked the fix from DRILL-6585 to this branch. @vrozov or @ppadma - can you kick off a pre-commit build to see if the DRILL-6586 fix resolves the failure we saw earlier? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536534#comment-16536534 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-403358757 Rebased on latest paster. Cherry-picked the fix from DRILL-6585 to this branch. @vrozov or @ppadma - can you kick off a pre-commit build to see if the DRILL-6586 fix resolves the failure we saw earlier? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6585) PartitionSender clones vectors, but shares field metdata
[ https://issues.apache.org/jira/browse/DRILL-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536533#comment-16536533 ] ASF GitHub Bot commented on DRILL-6585: --- paul-rogers commented on issue #1367: DRILL-6585: PartitionSender clones vectors, but shares field metdata URL: https://github.com/apache/drill/pull/1367#issuecomment-403358142 @vrozov, please review. This is the fix for the issue we discussed a few weeks ago. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > PartitionSender clones vectors, but shares field metdata > > > Key: DRILL-6585 > URL: https://issues.apache.org/jira/browse/DRILL-6585 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > See the discussion for [PR #1244 for > DRILL-6373|https://github.com/apache/drill/pull/1244]. > The PartitionSender clones vectors. But, it does so by reusing the > {{MaterializedField}} from the original vector. Though the original authors > of {{MaterializedField}} apparently meant it to be immutable, later changes > for maps and unions ended up changing it to add members. > When cloning a map, we get the original map materialized field, then start > doctoring it up as we add the cloned map members. This screws up the original > map vector's metadata. > The solution is to clone an empty version of the materialized field when > creating a new vector. > But, since much code creates vectors by giving a perfectly valid, unique > materialized field, we want to add a new method for use by the ill-behaved > uses, such as PartitionSender, that ask to create a new vector without > cloning the materialized field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6585) PartitionSender clones vectors, but shares field metdata
[ https://issues.apache.org/jira/browse/DRILL-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536528#comment-16536528 ] ASF GitHub Bot commented on DRILL-6585: --- paul-rogers opened a new pull request #1367: DRILL-6585: PartitionSender clones vectors, but shares field metdata URL: https://github.com/apache/drill/pull/1367 See the discussion for [PR #1244 for DRILL-6373](https://github.com/apache/drill/pull/1244). The PartitionSender clones vectors. But, it does so by reusing the MaterializedField from the original vector. Though the original authors of MaterializedField apparently meant it to be immutable, later changes for maps and unions ended up changing it to add members. When cloning a map, we get the original map materialized field, then start doctoring it up as we add the cloned map members. This screws up the original map vector's metadata. The solution is to clone an empty version of the materialized field when creating a new vector. But, since much code creates vectors by giving a perfectly valid, unique materialized field, we want to add a new method for use by the ill-behaved uses, such as PartitionSender, that ask to create a new vector without cloning the materialized field. The solution is to add a new method, `TypeHelper.getClonedVector()`, which handles the "new vector from the materialized field of an existing vector" case. Modified the partition sender to use this version. Moved an existing method to group the "new vector" functions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > PartitionSender clones vectors, but shares field metdata > > > Key: DRILL-6585 > URL: https://issues.apache.org/jira/browse/DRILL-6585 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > See the discussion for [PR #1244 for > DRILL-6373|https://github.com/apache/drill/pull/1244]. > The PartitionSender clones vectors. But, it does so by reusing the > {{MaterializedField}} from the original vector. Though the original authors > of {{MaterializedField}} apparently meant it to be immutable, later changes > for maps and unions ended up changing it to add members. > When cloning a map, we get the original map materialized field, then start > doctoring it up as we add the cloned map members. This screws up the original > map vector's metadata. > The solution is to clone an empty version of the materialized field when > creating a new vector. > But, since much code creates vectors by giving a perfectly valid, unique > materialized field, we want to add a new method for use by the ill-behaved > uses, such as PartitionSender, that ask to create a new vector without > cloning the materialized field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6585) PartitionSender clones vectors, but shares field metdata
Paul Rogers created DRILL-6585: -- Summary: PartitionSender clones vectors, but shares field metdata Key: DRILL-6585 URL: https://issues.apache.org/jira/browse/DRILL-6585 Project: Apache Drill Issue Type: Bug Affects Versions: 1.13.0 Reporter: Paul Rogers Assignee: Paul Rogers See the discussion for [PR #1244 for DRILL-6373|https://github.com/apache/drill/pull/1244]. The PartitionSender clones vectors. But, it does so by reusing the {{MaterializedField}} from the original vector. Though the original authors of {{MaterializedField}} apparently meant it to be immutable, later changes for maps and unions ended up changing it to add members. When cloning a map, we get the original map materialized field, then start doctoring it up as we add the cloned map members. This screws up the original map vector's metadata. The solution is to clone an empty version of the materialized field when creating a new vector. But, since much code creates vectors by giving a perfectly valid, unique materialized field, we want to add a new method for use by the ill-behaved uses, such as PartitionSender, that ask to create a new vector without cloning the materialized field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container
[ https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536465#comment-16536465 ] ASF GitHub Bot commented on DRILL-6346: --- paul-rogers commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r200860634 ## File path: distribution/Dockerfile ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 + +# Project version defined in pom.xml is passed as an argument +ARG VERSION + +# JDK 8 is a pre-requisite to run Drill ; 'which' package is needed for drill-config.sh +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +# The drill tarball is generated upon building the Drill project +COPY target/apache-drill-$VERSION.tar.gz /tmp + +# Drill binaries are extracted into the '/opt/drill' directory +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 + +# Starts Drill in embedded mode and connects to Sqlline +ENTRYPOINT /opt/drill/bin/drill-embedded Review comment: Wondering how best to configure a true Drill server? Would the user really want to bind the ZK address, say, at container build time (as we suggested above, copying in the user's `drill-override.conf` file.) Or, should we think about how to pass in config? In K8s, config can be passed in via config maps and mounted into the proper location. Doing so, however, points out a flaw in the site directory structure: config files go into `$DRILL_SITE` but jars go into `$DRILL_SITE/jars`. If `$DRILL_SITE` is mounted from a config map, we hide the jars. Maybe need to modify the site directory so that the scripts first look in `$DRILL_SITE/conf`, then in `$DRILL_SITE`. In this case, the config map would be mounted into `$DRILL_SITE/conf`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create an Official Drill Docker Container > - > > Key: DRILL-6346 > URL: https://issues.apache.org/jira/browse/DRILL-6346 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Abhishek Girish >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container
[ https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536468#comment-16536468 ] ASF GitHub Bot commented on DRILL-6346: --- paul-rogers commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r200860402 ## File path: distribution/Dockerfile ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 + +# Project version defined in pom.xml is passed as an argument +ARG VERSION + +# JDK 8 is a pre-requisite to run Drill ; 'which' package is needed for drill-config.sh +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +# The drill tarball is generated upon building the Drill project +COPY target/apache-drill-$VERSION.tar.gz /tmp + +# Drill binaries are extracted into the '/opt/drill' directory +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 + +# Starts Drill in embedded mode and connects to Sqlline +ENTRYPOINT /opt/drill/bin/drill-embedded Review comment: We've all learned that Drill logs are essential to figure out what went wrong. As configured, these will end up in `/opt/drill/log` in the container's writeable layer. The data is lost when the container exits. Might want to configure the container to write logs to, say, `/var/log/drill`, then encourage the user to do a bind or volume mount to that location in order to persist logs after the container exits. Alternatively, for use in a system such as K8s, configure the logs to write to stdout so that the K8s log system can capture the log output, display it to the user (`kubectl logs `) or route it to the log aggregation system. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create an Official Drill Docker Container > - > > Key: DRILL-6346 > URL: https://issues.apache.org/jira/browse/DRILL-6346 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Abhishek Girish >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container
[ https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536470#comment-16536470 ] ASF GitHub Bot commented on DRILL-6346: --- paul-rogers commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r200860792 ## File path: distribution/Dockerfile ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 + +# Project version defined in pom.xml is passed as an argument +ARG VERSION + +# JDK 8 is a pre-requisite to run Drill ; 'which' package is needed for drill-config.sh +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +# The drill tarball is generated upon building the Drill project +COPY target/apache-drill-$VERSION.tar.gz /tmp + +# Drill binaries are extracted into the '/opt/drill' directory +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 + +# Starts Drill in embedded mode and connects to Sqlline +ENTRYPOINT /opt/drill/bin/drill-embedded Review comment: The `Dockerfile` does not expose ports: ``` EXPOSE 8047/tcp ``` The above is for the web console port. Should we expose others? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create an Official Drill Docker Container > - > > Key: DRILL-6346 > URL: https://issues.apache.org/jira/browse/DRILL-6346 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Abhishek Girish >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container
[ https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536472#comment-16536472 ] ASF GitHub Bot commented on DRILL-6346: --- paul-rogers commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r200860927 ## File path: distribution/Dockerfile ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 + +# Project version defined in pom.xml is passed as an argument +ARG VERSION + +# JDK 8 is a pre-requisite to run Drill ; 'which' package is needed for drill-config.sh +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +# The drill tarball is generated upon building the Drill project +COPY target/apache-drill-$VERSION.tar.gz /tmp + +# Drill binaries are extracted into the '/opt/drill' directory +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 + +# Starts Drill in embedded mode and connects to Sqlline +ENTRYPOINT /opt/drill/bin/drill-embedded Review comment: Should we provide a `README.md` file? Explain that this particular `Dockerfile`: * Runs an embedded Drillbit * Works only for data already on the class path * That the user must ssh(?) into the container? Explain how to do additional configuration (as discussed in other comments). Explain how to create a production `Dockerfile`. Else, the user must be a bit of a Docker and Drill expert to work out what would be required, an it won't be clear what goal this particular file is trying to achieve. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create an Official Drill Docker Container > - > > Key: DRILL-6346 > URL: https://issues.apache.org/jira/browse/DRILL-6346 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Abhishek Girish >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container
[ https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536469#comment-16536469 ] ASF GitHub Bot commented on DRILL-6346: --- paul-rogers commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r200861000 ## File path: distribution/Dockerfile ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 + +# Project version defined in pom.xml is passed as an argument +ARG VERSION + +# JDK 8 is a pre-requisite to run Drill ; 'which' package is needed for drill-config.sh +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +# The drill tarball is generated upon building the Drill project +COPY target/apache-drill-$VERSION.tar.gz /tmp + +# Drill binaries are extracted into the '/opt/drill' directory +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 + +# Starts Drill in embedded mode and connects to Sqlline +ENTRYPOINT /opt/drill/bin/drill-embedded Review comment: The entry point launches an embedded Drillbit. That in turn fires up Sqlline which has an interactive console. It would seem that the only way to use this image is to launch it in an interactive session: ``` docker run -it ``` This seems like a handy trick for trying Drill. But, very limited for production. In any event, the `README.md` file should probably explain how to use the image. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create an Official Drill Docker Container > - > > Key: DRILL-6346 > URL: https://issues.apache.org/jira/browse/DRILL-6346 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Abhishek Girish >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container
[ https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536466#comment-16536466 ] ASF GitHub Bot commented on DRILL-6346: --- paul-rogers commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r200860239 ## File path: distribution/Dockerfile ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 + +# Project version defined in pom.xml is passed as an argument +ARG VERSION + +# JDK 8 is a pre-requisite to run Drill ; 'which' package is needed for drill-config.sh +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +# The drill tarball is generated upon building the Drill project +COPY target/apache-drill-$VERSION.tar.gz /tmp + +# Drill binaries are extracted into the '/opt/drill' directory +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 + +# Starts Drill in embedded mode and connects to Sqlline +ENTRYPOINT /opt/drill/bin/drill-embedded Review comment: This `Dockerfile` is only for an embedded Drillbit, which is find for playing around. Should we also offer an example production `Dockerfile`? In such a file, the following would also be needed: * Copy in custom jars (UDFs custom storage plugins.) * Copy in custom libraries (PAM, etc.) * Specify custom config. Doing the above will be easier if the user's files are not placed in the `$DRILL_HOME/conf` and other directories. That is, we want to use the site directory feature. Maybe allow a `/opt/drill-site` location and pass `--site /opt/drill-site` on the command line for the entry point. Then, the `Dockerfile` can provide examples of how to copy the various kinds of files to the proper site directory locations: * `/opt/drill-site/` - `drill-override.conf`, `core-site.xml`, `logback.xml` * `/opt/drill-site/jar/` - UDF and storage-plugin jars * `/opt/drill-site/lib/` - native libraries Note that the functionality for Drill to find things in these site directory locations already exists. All we're doing here is showing the user how to use them. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create an Official Drill Docker Container > - > > Key: DRILL-6346 > URL: https://issues.apache.org/jira/browse/DRILL-6346 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Abhishek Girish >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container
[ https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536471#comment-16536471 ] ASF GitHub Bot commented on DRILL-6346: --- paul-rogers commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r200860662 ## File path: distribution/Dockerfile ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 + +# Project version defined in pom.xml is passed as an argument +ARG VERSION + +# JDK 8 is a pre-requisite to run Drill ; 'which' package is needed for drill-config.sh +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +# The drill tarball is generated upon building the Drill project +COPY target/apache-drill-$VERSION.tar.gz /tmp + +# Drill binaries are extracted into the '/opt/drill' directory +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 + +# Starts Drill in embedded mode and connects to Sqlline +ENTRYPOINT /opt/drill/bin/drill-embedded Review comment: How will secrets be handled? MapR tickets or Kerberos certificates? What other configuration is needed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create an Official Drill Docker Container > - > > Key: DRILL-6346 > URL: https://issues.apache.org/jira/browse/DRILL-6346 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Abhishek Girish >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6346) Create an Official Drill Docker Container
[ https://issues.apache.org/jira/browse/DRILL-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536467#comment-16536467 ] ASF GitHub Bot commented on DRILL-6346: --- paul-rogers commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r200860284 ## File path: distribution/Dockerfile ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 + +# Project version defined in pom.xml is passed as an argument +ARG VERSION + +# JDK 8 is a pre-requisite to run Drill ; 'which' package is needed for drill-config.sh +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +# The drill tarball is generated upon building the Drill project +COPY target/apache-drill-$VERSION.tar.gz /tmp + +# Drill binaries are extracted into the '/opt/drill' directory +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 Review comment: A handy trick I learned is to do the container setup in a script, say `setup.sh` that is copied into the container and run. This avoids creating a bunch of extra Docker layers as the set of steps gets larger. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create an Official Drill Docker Container > - > > Key: DRILL-6346 > URL: https://issues.apache.org/jira/browse/DRILL-6346 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Abhishek Girish >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6584) Implementing bitmap Indexes
mehran created DRILL-6584: - Summary: Implementing bitmap Indexes Key: DRILL-6584 URL: https://issues.apache.org/jira/browse/DRILL-6584 Project: Apache Drill Issue Type: Improvement Components: Query Planning Optimization Reporter: mehran I see that you may have priorities in your development. and supporting multiple plugins for drill connections are also appreciated. But your default storage engine is parquet, that is very cool for its kind of purposes. Is it possible to bring forward implementing an index( roaring bitmap indexes similar to druid)? or just to write a guideline for developing index on drill? In fact full scan problems of drill is one of big problems that if solved, drill will be the best sql engine that can replace many use cases of databases. Thank you in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)
[ https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536086#comment-16536086 ] ASF GitHub Bot commented on DRILL-6385: --- weijietong commented on issue #1334: DRILL-6385: Support JPPD feature URL: https://github.com/apache/drill/pull/1334#issuecomment-403284219 @amansinha100 RuntimeFilterManager has changed to support the left deep tree case which you mentioned at the JIRA. Also please see the JRIA reply and review the updates, Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support JPPD (Join Predicate Push Down) > --- > > Key: DRILL-6385 > URL: https://issues.apache.org/jira/browse/DRILL-6385 > Project: Apache Drill > Issue Type: New Feature > Components: Server, Execution - Flow >Affects Versions: 1.14.0 >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > > This feature is to support the JPPD (Join Predicate Push Down). It will > benefit the HashJoin ,Broadcast HashJoin performance by reducing the number > of rows to send across the network ,the memory consumed. This feature is > already supported by Impala which calls it RuntimeFilter > ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]). > The first PR will try to push down a bloom filter of HashJoin node to > Parquet’s scan node. The propose basic procedure is described as follow: > # The HashJoin build side accumulate the equal join condition rows to > construct a bloom filter. Then it sends out the bloom filter to the foreman > node. > # The foreman node accept the bloom filters passively from all the fragments > that has the HashJoin operator. It then aggregates the bloom filters to form > a global bloom filter. > # The foreman node broadcasts the global bloom filter to all the probe side > scan nodes which maybe already have send out partial data to the hash join > nodes(currently the hash join node will prefetch one batch from both sides ). > 4. The scan node accepts a global bloom filter from the foreman node. > It will filter the rest rows satisfying the bloom filter. > > To implement above execution flow, some main new notion described as below: > 1. RuntimeFilter > It’s a filter container which may contain BloomFilter or MinMaxFilter. > 2. RuntimeFilterReporter > It wraps the logic to send hash join’s bloom filter to the foreman.The > serialized bloom filter will be sent out through the data tunnel.This object > will be instanced by the FragmentExecutor and passed to the > FragmentContext.So the HashJoin operator can obtain it through the > FragmentContext. > 3. RuntimeFilterRequestHandler > It is responsible to accept a SendRuntimeFilterRequest RPC to strip the > actual BloomFilter from the network. It then translates this filter to the > WorkerBee’s new interface registerRuntimeFilter. > Another RPC type is BroadcastRuntimeFilterRequest. It will register the > accepted global bloom filter to the WorkerBee by the registerRuntimeFilter > method and then propagate to the FragmentContext through which the probe side > scan node can fetch the aggregated bloom filter. > 4.RuntimeFilterManager > The foreman will instance a RuntimeFilterManager .It will indirectly get > every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been > accepted and aggregated . It will broadcast the aggregated bloom filter to > all the probe side scan nodes through the data tunnel by a > BroadcastRuntimeFilterRequest RPC. > 5. RuntimeFilterEnableOption > A global option will be added to decide whether to enable this new feature. > > Welcome suggestion and advice from you.The related PR will be presented as > soon as possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)