[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140914589 --- Diff: contrib/native/client/src/clientlib/zkCluster.cpp --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include "drill/common.hpp" +#include +#ifdef _WIN32 +#include +#else +#include +#endif +#include "drill/drillConfig.hpp" +#include "drill/drillClient.hpp" +#include "errmsgs.hpp" +#include "logger.hpp" +#include "zkCluster.hpp" + +namespace Drill{ + +char ZkCluster::s_drillRoot[]="/drill/"; +char ZkCluster::s_defaultCluster[]="drillbits1"; + +ZkCluster::ZkCluster(){ +m_pDrillbits=new String_vector; +srand (time(NULL)); +m_bConnecting=true; +memset(&m_id, 0, sizeof(m_id)); +} + +ZkCluster::~ZkCluster(){ +delete m_pDrillbits; +} + +ZooLogLevel ZkCluster::getZkLogLevel(){ +//typedef enum {ZOO_LOG_LEVEL_ERROR=1, +//ZOO_LOG_LEVEL_WARN=2, +//ZOO_LOG_LEVEL_INFO=3, +//ZOO_LOG_LEVEL_DEBUG=4 +//} ZooLogLevel; +switch(DrillClientConfig::getLogLevel()){ +case LOG_TRACE: +case LOG_DEBUG: +return ZOO_LOG_LEVEL_DEBUG; +case LOG_INFO: +return ZOO_LOG_LEVEL_INFO; +case LOG_WARNING: +return ZOO_LOG_LEVEL_WARN; +case LOG_ERROR: +case LOG_FATAL: +default: +return ZOO_LOG_LEVEL_ERROR; +} +return ZOO_LOG_LEVEL_ERROR; +} + +int ZkCluster::connectToZookeeper(const char* connectStr, const char* pathToDrill){ +uint32_t waitTime=3; // 10 seconds --- End diff -- 10 seconds ? As discussed in person please remove these files as per your plan: `zkCluster.cpp and zkCluster.hpp` ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140893808 --- Diff: contrib/native/client/src/include/drill/userProperties.hpp --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef USER_PROPERTIES_H +#define USER_PROPERTIES_H + +#include +#include "drill/common.hpp" + +namespace Drill{ + +class DECLSPEC_DRILL_CLIENT DrillUserProperties{ +public: +static const std::map USER_PROPERTIES; + +DrillUserProperties(){}; + +void setProperty( const std::string& propName, const std::string& propValue){ +std::pair< std::string, std::string> in = make_pair(propName, propValue); +m_properties.insert(in); +} + +size_t size() const { return m_properties.size(); } + +const bool isPropSet(const std::string& key) const{ +bool isSet=true; +std::map::const_iterator f=m_properties.find(key); +if(f==m_properties.end()){ +isSet=false; +} +return isSet; +} + +const std::string& getProp(const std::string& key, std::string& value) const{ --- End diff -- this method is little confusing since it's returning value both in input parameter and as a return value. I think we should choose either of it NOT both. ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141215252 --- Diff: contrib/native/client/src/clientlib/channel.cpp --- @@ -0,0 +1,452 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include + +#include "drill/drillConfig.hpp" +#include "drill/drillError.hpp" +#include "drill/userProperties.hpp" +#include "channel.hpp" +#include "errmsgs.hpp" +#include "logger.hpp" +#include "utils.hpp" +#include "zookeeperClient.hpp" + +#include "GeneralRPC.pb.h" + +namespace Drill{ + +ConnectionEndpoint::ConnectionEndpoint(const char* connStr){ +m_connectString=connStr; +m_pError=NULL; +} + +ConnectionEndpoint::ConnectionEndpoint(const char* host, const char* port){ +m_host=host; +m_port=port; +m_protocol="drillbit"; // direct connection +m_pError=NULL; +} + +ConnectionEndpoint::~ConnectionEndpoint(){ +if(m_pError!=NULL){ +delete m_pError; m_pError=NULL; +} +} + +connectionStatus_t ConnectionEndpoint::getDrillbitEndpoint(){ +connectionStatus_t ret=CONN_SUCCESS; +if(!m_connectString.empty()){ +parseConnectString(); +if(m_protocol.empty()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, "")); +} +if(isZookeeperConnection()){ +if((ret=getDrillbitEndpointFromZk())!=CONN_SUCCESS){ +return ret; +} +}else if(!this->isDirectConnection()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, this->getProtocol().c_str())); +} +}else{ +if(m_host.empty() || m_port.empty()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_NOCONNSTR)); +} +} +return ret; +} + +void ConnectionEndpoint::parseConnectString(){ +boost::regex connStrExpr("(.*)=(.*):([0-9]+)(?:/(.+))?"); --- End diff -- Haven't reviewed this change based on regex change discussed in person. ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140621499 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/ConnectionMultiListener.java --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc; + +import com.google.protobuf.MessageLite; +import io.netty.buffer.ByteBuf; +import io.netty.channel.Channel; +import io.netty.channel.ChannelFuture; +import io.netty.util.concurrent.Future; +import io.netty.util.concurrent.GenericFutureListener; +import org.apache.drill.common.exceptions.DrillException; +import org.slf4j.Logger; + +import java.net.SocketAddress; +import java.util.concurrent.TimeUnit; + +/** + * @param Client Connection Listener + * @param Outbound handshake message type + * @param Inbound handshake message type + * @param BasicClient type + * + * Implements a wrapper class that allows a client connection to associate different behaviours after + * establishing a connection with the server. The client can choose to send an application handshake, or + * in the case of SSL, wait for a SSL handshake completion and then send an application handshake. + */ + +public class ConnectionMultiListener { --- End diff -- I am seeing lots of unchecked warning for this file. Please fix those. ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140621635 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/BasicClient.java --- @@ -179,6 +203,13 @@ void send(RpcOutcomeListener listener, T rpcType, SEND protobufBody, return super.send(connection, rpcType, protobufBody, clazz, dataBodies); } + public void send( + RpcOutcomeListener listener, SEND protobufBody, boolean allowInEventLoop, + ByteBuf... dataBodies) { +super.send(listener, connection, handshakeType, protobufBody, (Class) responseClass, +allowInEventLoop, dataBodies); --- End diff -- Seeing unchecked warning for this function. Please resolve this. ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141215109 --- Diff: contrib/native/client/src/clientlib/channel.cpp --- @@ -0,0 +1,452 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include + +#include "drill/drillConfig.hpp" +#include "drill/drillError.hpp" +#include "drill/userProperties.hpp" +#include "channel.hpp" +#include "errmsgs.hpp" +#include "logger.hpp" +#include "utils.hpp" +#include "zookeeperClient.hpp" + +#include "GeneralRPC.pb.h" + +namespace Drill{ + +ConnectionEndpoint::ConnectionEndpoint(const char* connStr){ +m_connectString=connStr; +m_pError=NULL; +} + +ConnectionEndpoint::ConnectionEndpoint(const char* host, const char* port){ +m_host=host; +m_port=port; +m_protocol="drillbit"; // direct connection +m_pError=NULL; +} + +ConnectionEndpoint::~ConnectionEndpoint(){ +if(m_pError!=NULL){ +delete m_pError; m_pError=NULL; +} +} + +connectionStatus_t ConnectionEndpoint::getDrillbitEndpoint(){ +connectionStatus_t ret=CONN_SUCCESS; +if(!m_connectString.empty()){ +parseConnectString(); +if(m_protocol.empty()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, "")); +} +if(isZookeeperConnection()){ +if((ret=getDrillbitEndpointFromZk())!=CONN_SUCCESS){ +return ret; +} +}else if(!this->isDirectConnection()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, this->getProtocol().c_str())); +} +}else{ +if(m_host.empty() || m_port.empty()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_NOCONNSTR)); +} +} +return ret; +} + +void ConnectionEndpoint::parseConnectString(){ +boost::regex connStrExpr("(.*)=(.*):([0-9]+)(?:/(.+))?"); +boost::cmatch matched; + +if(boost::regex_match(m_connectString.c_str(), matched, connStrExpr)){ +m_protocol.assign(matched[1].first, matched[1].second); +std::string host, port; +host.assign(matched[2].first, matched[2].second); +port.assign(matched[3].first, matched[3].second); +if(isDirectConnection()){ +// if the connection is to a zookeeper, +// we will get the host and the port only after connecting to the Zookeeper +m_host=host; +m_port=port; +} +m_hostPortStr=host+std::string(":")+port; +std::string pathToDrill; +if(matched.size()==5){ +pathToDrill.assign(matched[4].first, matched[4].second); +if(!pathToDrill.empty()){ +m_pathToDrill=std::string("/")+pathToDrill; +} +} +DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) +<< "Conn str: "<< m_connectString +<< "; protocol: " << m_protocol +<< "; host: " << host +<< "; port: " << port +<< "; path to drill: " << m_pathToDrill +<< std::endl;) +} else { +DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Invalid connect string. Regexp did not match" << std::endl;) +} + +return; +} + +bool ConnectionEndpoint::isDirectConnection(){ +assert(!m_protocol.empty()); +return (!strcmp(m_protocol.c_str(), "local") || !strcmp(m_protocol.c_str(), "drillbit")); +} + +bool ConnectionEndpoint::isZookeeperConnection(){ +assert(!m_protocol.empty()); +return (!strcmp(m_protocol.c_str(), "zk")); +} + +connectionStatus_t ConnectionEndpoint::getDrillbitEndpointFromZk(){ +ZookeeperClient zook(m_pathToDrill); +assert(!m_hostPortStr.empty
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140623048 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserClient.java --- @@ -102,19 +115,78 @@ // these are used for authentication private volatile List serverAuthMechanisms = null; private volatile boolean authComplete = true; + private SSLConfig sslConfig; + private Channel sslChannel; --- End diff -- I don't think you have to store the sslChannel reference explicitly here to make sure it's closed. The connection wrapper like AbstractRemoteConnection will already have reference to channel object and will take care of closing it. Also that path is taking care of channel close both in graceful (explicitly close being called on client) and failure scenario (in which case Netty channelClosedHandler will be invoked). Same for server side. ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141245539 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -65,108 +66,70 @@ struct ToRpcType: public std::unary_function(i); } }; -} -connectionStatus_t DrillClientImpl::connect(const char* connStr, DrillUserProperties* props){ -std::string pathToDrill, protocol, hostPortStr; -std::string host; -std::string port; +} // anonymous -if (this->m_bIsConnected) { -if(std::strcmp(connStr, m_connectStr.c_str())){ // trying to connect to a different address is not allowed if already connected +connectionStatus_t DrillClientImpl::connect(const char* connStr, DrillUserProperties* props){ +if (this->m_bIsConnected || (this->m_pChannelContext!=NULL && this->m_pChannel!=NULL)) { +if(!std::strcmp(connStr, m_connectStr.c_str())){ +// trying to connect to a different address is not allowed if already connected return handleConnError(CONN_ALREADYCONNECTED, getMessage(ERR_CONN_ALREADYCONN)); } return CONN_SUCCESS; } +std::string val; +channelType_t type = ( props->isPropSet(USERPROP_USESSL) && +props->getProp(USERPROP_USESSL, val) =="true") ? +CHANNEL_TYPE_SSLSTREAM : +CHANNEL_TYPE_SOCKET; -m_connectStr=connStr; -Utils::parseConnectStr(connStr, pathToDrill, protocol, hostPortStr); -if(protocol == "zk"){ -ZookeeperClient zook(pathToDrill); -std::vector drillbits; -int err = zook.getAllDrillbits(hostPortStr, drillbits); -if(!err){ -if (drillbits.empty()){ -return handleConnError(CONN_FAILURE, getMessage(ERR_CONN_ZKNODBIT)); -} -Utils::shuffle(drillbits); -exec::DrillbitEndpoint endpoint; -err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint);// get the last one in the list -if(!err){ -host=boost::lexical_cast(endpoint.address()); - port=boost::lexical_cast(endpoint.user_port()); -} -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Choosing drillbit <" << (drillbits.size() - 1) << ">. Selected " << endpoint.DebugString() << std::endl;) - -} -if(err){ -return handleConnError(CONN_ZOOKEEPER_ERROR, getMessage(ERR_CONN_ZOOKEEPER, zook.getError().c_str())); -} -zook.close(); -m_bIsDirectConnection=true; -}else if(protocol == "local"){ -boost::lock_guard lock(m_dcMutex);//strtok is not reentrant -char tempStr[MAX_CONNECT_STR+1]; -strncpy(tempStr, hostPortStr.c_str(), MAX_CONNECT_STR); tempStr[MAX_CONNECT_STR]=0; -host=strtok(tempStr, ":"); -port=strtok(NULL, ""); -m_bIsDirectConnection=false; -}else{ -return handleConnError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, protocol.c_str())); -} -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Connecting to endpoint: " << host << ":" << port << std::endl;) -std::string serviceHost; -for (size_t i = 0; i < props->size(); i++) { -if (props->keyAt(i) == USERPROP_SERVICE_HOST) { -serviceHost = props->valueAt(i); -} +connectionStatus_t ret = CONN_SUCCESS; +m_pChannelContext = ChannelContextFactory::getChannelContext(type, props); +m_pChannel= ChannelFactory::getChannel(type, m_io_service, connStr); +ret=m_pChannel->init(m_pChannelContext); +if(ret!=CONN_SUCCESS){ +handleConnError(m_pChannel->getError()); +return ret; } -if (serviceHost.empty()) { -props->setProperty(USERPROP_SERVICE_HOST, host); +ret= m_pChannel->connect(); +if(ret!=CONN_SUCCESS){ +handleConnError(m_pChannel->getError()); +return ret; } -connectionStatus_t ret = this->connect(host.c_str(), port.c_str()); +props->setProperty(USERPROP_SERVICE_HOST, m_pChannel->getEndpoint()->getHost()); return ret; } -connectionStatus_t DrillClientImpl::connect(const char* host, const char* port){ -using boost::asio::ip::tcp; -tcp::endpoint endpoint; -try{ -tcp::resolver resolver(m_io_service); -tcp::resolver::query query(tcp::v4(), host, port); -tcp::resolver::iterator iter = resolver.resolve(query); -tcp::resolver::iterator end; -while (iter != end){ -endpoint = *iter++; -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << endpoint << std::endl;) -} -
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141246382 --- Diff: contrib/native/client/src/clientlib/channel.hpp --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef CHANNEL_HPP +#define CHANNEL_HPP + +#include "drill/common.hpp" +#include "drill/drillClient.hpp" +#include "streamSocket.hpp" + +namespace Drill { + +class UserProperties; + +class ConnectionEndpoint{ +public: +ConnectionEndpoint(const char* connStr); +ConnectionEndpoint(const char* host, const char* port); +~ConnectionEndpoint(); + +//parse the connection string and set up the host and port to connect to +connectionStatus_t getDrillbitEndpoint(); + +std::string& getProtocol(){return m_protocol;} +std::string& getHost(){return m_host;} +std::string& getPort(){return m_port;} +DrillClientError* getError(){ return m_pError;}; + +private: +void parseConnectString(); +connectionStatus_t validateConnectionString(); +bool isDirectConnection(); +bool isZookeeperConnection(); +connectionStatus_t getDrillbitEndpointFromZk(); +connectionStatus_t handleError(connectionStatus_t status, std::string msg); + +std::string m_connectString; +std::string m_pathToDrill; +std::string m_protocol; +std::string m_hostPortStr; +std::string m_host; +std::string m_port; + +DrillClientError* m_pError; + +}; + +class ChannelContext{ +public: +ChannelContext(DrillUserProperties* props):m_properties(props){}; +virtual ~ChannelContext(){}; +const DrillUserProperties* getUserProperties() const { return m_properties;} +protected: +DrillUserProperties* m_properties; +}; + +class SSLChannelContext: public ChannelContext{ +public: +static boost::asio::ssl::context::method getTlsVersion(std::string version){ +if(version.empty()){ +return boost::asio::ssl::context::tlsv12; +} else if (version == "tlsv12") { +return boost::asio::ssl::context::tlsv12; +} else if (version == "tlsv11") { +return boost::asio::ssl::context::tlsv11; +} else if (version == "sslv23") { +return boost::asio::ssl::context::sslv23; +} else if (version == "tlsv1") { +return boost::asio::ssl::context::tlsv1; +} else if (version == "sslv3") { +return boost::asio::ssl::context::sslv3; +} else { +return boost::asio::ssl::context::tlsv12; +} +} + +SSLChannelContext(DrillUserProperties *props, boost::asio::ssl::context::method tlsVersion, boost::asio::ssl::verify_mode verifyMode) : +ChannelContext(props), +m_SSLContext(tlsVersion) { +m_SSLContext.set_default_verify_paths(); +m_SSLContext.set_options( +boost::asio::ssl::context::default_workarounds +| boost::asio::ssl::context::no_sslv2 +| boost::asio::ssl::context::single_dh_use +); +m_SSLContext.set_verify_mode(verifyMode); +}; +~SSLChannelContext(){}; +boost::asio::ssl::context& getSslContext(){ return m_SSLContext;} +private: +boost::asio::ssl::context m_SSLContext; +}; +
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140973654 --- Diff: contrib/native/client/src/clientlib/streamSocket.hpp --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +#ifndef STREAMSOCKET_HPP +#define STREAMSOCKET_HPP + +#include "drill/common.hpp" +#include "env.h" +#include "wincert.ipp" + +#include +#include + +namespace Drill { + +typedef boost::asio::ip::tcp::socket::lowest_layer_type streamSocket_t; +typedef boost::asio::ssl::stream sslTCPSocket_t; +typedef boost::asio::ip::tcp::socket basicTCPSocket_t; + + +// Some helper typedefs to define the highly templatized boost::asio methods +typedef boost::asio::const_buffers_1 ConstBufferSequence; +typedef boost::asio::mutable_buffers_1 MutableBufferSequence; + +// ReadHandlers have different possible signatures. +// +// As a standard C-type callback +//typedef void (*ReadHandler)(const boost::system::error_code& ec, std::size_t bytes_transferred); +// +// Or as a C++ functor +//struct ReadHandler { +//virtual void operator()( +//const boost::system::error_code& ec, +//std::size_t bytes_transferred) = 0; +//}; +// +// We need a different signature though, since we need to pass in a member function of a drill client +// class (which is C++), as a functor generated by boost::bind as a ReadHandler +// +typedef boost::function ReadHandler; + +class AsioStreamSocket{ +public: +virtual ~AsioStreamSocket(){}; +virtual streamSocket_t& getInnerSocket() = 0; + +virtual std::size_t writeSome( +const ConstBufferSequence& buffers, +boost::system::error_code & ec) = 0; + +virtual std::size_t readSome( +const MutableBufferSequence& buffers, +boost::system::error_code & ec) = 0; + +// +// boost::asio::async_read has the signature +// template< +// typename AsyncReadStream, +// typename MutableBufferSequence, +// typename ReadHandler> +// void-or-deduced async_read( +// AsyncReadStream & s, +// const MutableBufferSequence & buffers, +// ReadHandler handler); +// +// For our use case, the derived class will have an instance of a concrete type for AsyncReadStream which +// will implement the requirements for the AsyncReadStream type. We need not pass that in as a parameter +// since the class already has the value +// The method is templatized since the ReadHandler type is dependent on the class implementing the read +// handler (basically the class using the asio stream) +// +virtual void asyncRead( const MutableBufferSequence & buffers, ReadHandler handler) = 0; + +// call the underlying protocol's handshake method. +// if the useSystemConfig flag is true, then use properties read +// from the underlying operating system +virtual void protocolHandshake(bool useSystemConfig) = 0; +virtual void protocolClose() = 0; +}; + +class Socket: +public AsioStreamSocket, +public basicTCPSocket_t{ + +public: +Socket(boost::asio::io_service& ioService) : basicTCPSocket_t(ioService) { +} + +~Socket(){ +boost::system::error_code ignorederr; +this->shutdown(boost::asio::ip::tcp::socket::shutdown_both, ignorederr); +this->close(); +}; + +basicTCPSocket_t& getSocketStream(){ return *this;} + +streamSocket_t& getInnerSocket(){ return this->lowest_layer();} + +std::size_t writeSome( +
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141240324 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -65,108 +66,70 @@ struct ToRpcType: public std::unary_function(i); } }; -} -connectionStatus_t DrillClientImpl::connect(const char* connStr, DrillUserProperties* props){ -std::string pathToDrill, protocol, hostPortStr; -std::string host; -std::string port; +} // anonymous -if (this->m_bIsConnected) { -if(std::strcmp(connStr, m_connectStr.c_str())){ // trying to connect to a different address is not allowed if already connected +connectionStatus_t DrillClientImpl::connect(const char* connStr, DrillUserProperties* props){ +if (this->m_bIsConnected || (this->m_pChannelContext!=NULL && this->m_pChannel!=NULL)) { +if(!std::strcmp(connStr, m_connectStr.c_str())){ +// trying to connect to a different address is not allowed if already connected return handleConnError(CONN_ALREADYCONNECTED, getMessage(ERR_CONN_ALREADYCONN)); } return CONN_SUCCESS; } +std::string val; +channelType_t type = ( props->isPropSet(USERPROP_USESSL) && +props->getProp(USERPROP_USESSL, val) =="true") ? +CHANNEL_TYPE_SSLSTREAM : +CHANNEL_TYPE_SOCKET; -m_connectStr=connStr; -Utils::parseConnectStr(connStr, pathToDrill, protocol, hostPortStr); -if(protocol == "zk"){ -ZookeeperClient zook(pathToDrill); -std::vector drillbits; -int err = zook.getAllDrillbits(hostPortStr, drillbits); -if(!err){ -if (drillbits.empty()){ -return handleConnError(CONN_FAILURE, getMessage(ERR_CONN_ZKNODBIT)); -} -Utils::shuffle(drillbits); -exec::DrillbitEndpoint endpoint; -err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint);// get the last one in the list -if(!err){ -host=boost::lexical_cast(endpoint.address()); - port=boost::lexical_cast(endpoint.user_port()); -} -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Choosing drillbit <" << (drillbits.size() - 1) << ">. Selected " << endpoint.DebugString() << std::endl;) - -} -if(err){ -return handleConnError(CONN_ZOOKEEPER_ERROR, getMessage(ERR_CONN_ZOOKEEPER, zook.getError().c_str())); -} -zook.close(); -m_bIsDirectConnection=true; -}else if(protocol == "local"){ -boost::lock_guard lock(m_dcMutex);//strtok is not reentrant -char tempStr[MAX_CONNECT_STR+1]; -strncpy(tempStr, hostPortStr.c_str(), MAX_CONNECT_STR); tempStr[MAX_CONNECT_STR]=0; -host=strtok(tempStr, ":"); -port=strtok(NULL, ""); -m_bIsDirectConnection=false; -}else{ -return handleConnError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, protocol.c_str())); -} -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Connecting to endpoint: " << host << ":" << port << std::endl;) -std::string serviceHost; -for (size_t i = 0; i < props->size(); i++) { -if (props->keyAt(i) == USERPROP_SERVICE_HOST) { -serviceHost = props->valueAt(i); -} +connectionStatus_t ret = CONN_SUCCESS; +m_pChannelContext = ChannelContextFactory::getChannelContext(type, props); +m_pChannel= ChannelFactory::getChannel(type, m_io_service, connStr); +ret=m_pChannel->init(m_pChannelContext); +if(ret!=CONN_SUCCESS){ +handleConnError(m_pChannel->getError()); +return ret; } -if (serviceHost.empty()) { -props->setProperty(USERPROP_SERVICE_HOST, host); +ret= m_pChannel->connect(); +if(ret!=CONN_SUCCESS){ +handleConnError(m_pChannel->getError()); +return ret; } -connectionStatus_t ret = this->connect(host.c_str(), port.c_str()); +props->setProperty(USERPROP_SERVICE_HOST, m_pChannel->getEndpoint()->getHost()); --- End diff -- we should set `this->m_bIsConnected = true` here once connection is successful. ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140971420 --- Diff: contrib/native/client/src/clientlib/streamSocket.hpp --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +#ifndef STREAMSOCKET_HPP +#define STREAMSOCKET_HPP + +#include "drill/common.hpp" +#include "env.h" +#include "wincert.ipp" + +#include +#include + +namespace Drill { + +typedef boost::asio::ip::tcp::socket::lowest_layer_type streamSocket_t; +typedef boost::asio::ssl::stream sslTCPSocket_t; +typedef boost::asio::ip::tcp::socket basicTCPSocket_t; + + +// Some helper typedefs to define the highly templatized boost::asio methods +typedef boost::asio::const_buffers_1 ConstBufferSequence; +typedef boost::asio::mutable_buffers_1 MutableBufferSequence; + +// ReadHandlers have different possible signatures. +// +// As a standard C-type callback +//typedef void (*ReadHandler)(const boost::system::error_code& ec, std::size_t bytes_transferred); +// +// Or as a C++ functor +//struct ReadHandler { +//virtual void operator()( +//const boost::system::error_code& ec, +//std::size_t bytes_transferred) = 0; +//}; +// +// We need a different signature though, since we need to pass in a member function of a drill client +// class (which is C++), as a functor generated by boost::bind as a ReadHandler +// +typedef boost::function ReadHandler; + +class AsioStreamSocket{ +public: +virtual ~AsioStreamSocket(){}; +virtual streamSocket_t& getInnerSocket() = 0; + +virtual std::size_t writeSome( +const ConstBufferSequence& buffers, +boost::system::error_code & ec) = 0; + +virtual std::size_t readSome( +const MutableBufferSequence& buffers, +boost::system::error_code & ec) = 0; + +// +// boost::asio::async_read has the signature +// template< +// typename AsyncReadStream, +// typename MutableBufferSequence, +// typename ReadHandler> +// void-or-deduced async_read( +// AsyncReadStream & s, +// const MutableBufferSequence & buffers, +// ReadHandler handler); +// +// For our use case, the derived class will have an instance of a concrete type for AsyncReadStream which +// will implement the requirements for the AsyncReadStream type. We need not pass that in as a parameter +// since the class already has the value +// The method is templatized since the ReadHandler type is dependent on the class implementing the read +// handler (basically the class using the asio stream) +// +virtual void asyncRead( const MutableBufferSequence & buffers, ReadHandler handler) = 0; + +// call the underlying protocol's handshake method. +// if the useSystemConfig flag is true, then use properties read +// from the underlying operating system +virtual void protocolHandshake(bool useSystemConfig) = 0; +virtual void protocolClose() = 0; +}; + +class Socket: +public AsioStreamSocket, +public basicTCPSocket_t{ + +public: +Socket(boost::asio::io_service& ioService) : basicTCPSocket_t(ioService) { +} + +~Socket(){ +boost::system::error_code ignorederr; +this->shutdown(boost::asio::ip::tcp::socket::shutdown_both, ignorederr); +this->close(); +}; + +basicTCPSocket_t& getSocketStream(){ return *this;} + +streamSocket_t& getInnerSocket(){ return this->lowest_layer();} + +std::size_t writeSome( +
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140970043 --- Diff: contrib/native/client/src/clientlib/wincert.ipp --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#if defined(IS_SSL_ENABLED) + +#include +#include + +#if defined _WIN32 || defined _WIN64 + +#include +#include +#include +#include +#include +#include + + +#pragma comment (lib, "crypt32.lib") +#pragma comment (lib, "cryptui.lib") + +#define MY_ENCODING_TYPE (PKCS_7_ASN_ENCODING | X509_ASN_ENCODING) + +inline +int loadSystemTrustStore(const SSL *ssl) { --- End diff -- Can we update this method to take a second parameter like `string& store_name` which we can set in case of error while opening store. so the caller in case of error can also print the store name which caused the error. ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140621394 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/ConnectionMultiListener.java --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc; + +import com.google.protobuf.MessageLite; +import io.netty.buffer.ByteBuf; +import io.netty.channel.Channel; +import io.netty.channel.ChannelFuture; +import io.netty.util.concurrent.Future; +import io.netty.util.concurrent.GenericFutureListener; +import org.apache.drill.common.exceptions.DrillException; +import org.slf4j.Logger; + +import java.net.SocketAddress; +import java.util.concurrent.TimeUnit; + +/** + * @param Client Connection Listener + * @param Outbound handshake message type + * @param Inbound handshake message type + * @param BasicClient type + * + * Implements a wrapper class that allows a client connection to associate different behaviours after + * establishing a connection with the server. The client can choose to send an application handshake, or + * in the case of SSL, wait for a SSL handshake completion and then send an application handshake. + */ + +public class ConnectionMultiListener { + + private static final Logger logger = org.slf4j.LoggerFactory.getLogger(ConnectionMultiListener.class); + + private final RpcConnectionHandler connectionListener; + private final HS handshakeValue; + private final BC parent; + + private ConnectionMultiListener(RpcConnectionHandler connectionListener, HS handshakeValue, + BC basicClient) { +assert connectionListener != null; +assert handshakeValue != null; + +this.connectionListener = connectionListener; +this.handshakeValue = handshakeValue; +this.parent = basicClient; + } + + @SuppressWarnings("unchecked") + public static Builder + newBuilder(RpcConnectionHandler connectionListener, HS handshakeValue, + BC basicClient) { +return new Builder(connectionListener, handshakeValue, basicClient); + } + + public ConnectionHandler connectionHandler = null; + public HandshakeSendHandler handshakeSendHandler = null; + public SSLConnectionHandler sslConnectionHandler = null; + + /** + * Manages connection establishment outcomes. + */ + private class ConnectionHandler implements GenericFutureListener { + +@Override public void operationComplete(ChannelFuture future) throws Exception { + boolean isInterrupted = false; + + // We want to wait for at least 120 secs when interrupts occur. Establishing a connection fails/succeeds quickly, + // So there is no point propagating the interruption as failure immediately. + long remainingWaitTimeMills = 12; + long startTime = System.currentTimeMillis(); + // logger.debug("Connection operation finished. Success: {}", future.isSuccess()); + while (true) { +try { + future.get(remainingWaitTimeMills, TimeUnit.MILLISECONDS); + if (future.isSuccess()) { +SocketAddress remote = future.channel().remoteAddress(); +SocketAddress local = future.channel().localAddress(); +parent.setAddresses(remote, local); +// if SSL is enabled send the handshake after the ssl handshake is completed, otherwise send it +// now +if(!parent.isSslEnabled()) { + // send a handshake on the current thread. This is the only time we will send from within the event thread. + // We can do this because the connection will not be backed up. + parent.send(handshakeSendHandler, handshakeValue, true); +} + } else { + connectionListener.connectionFailed(RpcConn
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140587424 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ssl/SSLConfig.java --- @@ -0,0 +1,325 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.ssl; + +import com.google.common.base.Preconditions; +import io.netty.handler.ssl.SslContext; +import io.netty.handler.ssl.SslProvider; +import io.netty.handler.ssl.util.InsecureTrustManagerFactory; +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.common.exceptions.DrillException; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.memory.BufferAllocator; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.security.ssl.SSLFactory; + +import javax.net.ssl.KeyManagerFactory; +import javax.net.ssl.SSLContext; +import javax.net.ssl.SSLEngine; +import javax.net.ssl.TrustManagerFactory; +import java.io.FileInputStream; +import java.io.InputStream; +import java.security.KeyStore; +import java.text.MessageFormat; + +public abstract class SSLConfig { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(SSLConfig.class); + + public static final String DEFAULT_SSL_PROVIDER = "JDK"; // JDK or OPENSSL + public static final String DEFAULT_SSL_PROTOCOL = "TLSv1.2"; + public static final int DEFAULT_SSL_HANDSHAKE_TIMEOUT_MS = 10 * 1000; // 10 seconds + + protected final boolean httpsEnabled; + protected final DrillConfig config; + protected final Configuration hadoopConfig; + + // Either the Netty SSL context or the JDK SSL context will be initialized + // The JDK SSL context is use iff the useSystemTrustStore setting is enabled. + protected SslContext nettySslContext; + protected SSLContext jdkSSlContext; + + private static final boolean isWindows = System.getProperty("os.name").toLowerCase().indexOf("win") >= 0; + private static final boolean isMacOs = System.getProperty("os.name").toLowerCase().indexOf("mac") >= 0; + + public static final String HADOOP_SSL_CONF_TPL_KEY = "hadoop.ssl.{0}.conf"; + public static final String HADOOP_SSL_KEYSTORE_LOCATION_TPL_KEY = "ssl.{0}.keystore.location"; + public static final String HADOOP_SSL_KEYSTORE_PASSWORD_TPL_KEY = "ssl.{0}.keystore.password"; + public static final String HADOOP_SSL_KEYSTORE_TYPE_TPL_KEY = "ssl.{0}.keystore.type"; + public static final String HADOOP_SSL_KEYSTORE_KEYPASSWORD_TPL_KEY = + "ssl.{0}.keystore.keypassword"; + public static final String HADOOP_SSL_TRUSTSTORE_LOCATION_TPL_KEY = "ssl.{0}.truststore.location"; + public static final String HADOOP_SSL_TRUSTSTORE_PASSWORD_TPL_KEY = "ssl.{0}.truststore.password"; + public static final String HADOOP_SSL_TRUSTSTORE_TYPE_TPL_KEY = "ssl.{0}.truststore.type"; + + public SSLConfig(DrillConfig config, Configuration hadoopConfig, SSLFactory.Mode mode) + throws DrillException { + +this.config = config; +httpsEnabled = +config.hasPath(ExecConstants.HTTP_ENABLE_SSL) && config.getBoolean(ExecConstants.HTTP_ENABLE_SSL); +// For testing we will mock up a hadoop configuration, however for regular use, we find the actual hadoop config. +boolean enableHadoopConfig = config.getBoolean(ExecConstants.SSL_USE_HADOOP_CONF); +if (enableHadoopConfig && this instanceof SSLConfigServer) { + if (hadoopConfig == null) { +this.hadoopConfig = new Configuration(); // get hadoop configuration + } else { +this.hadoopConfig = hadoopConfig; + } + String hadoopSSLConfigFile = + this.hadoopConfig.get(resolveHadoopPropertyName(HADOOP_SSL_CONF_TPL_KEY, mode)); + logger.debug("Using Hadoop configuration for SSL"); + logger.debug("Hadoop SSL configuration file: {}", hadoopSSLConfigFile); + this.
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141218943 --- Diff: contrib/native/client/src/clientlib/channel.hpp --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef CHANNEL_HPP +#define CHANNEL_HPP + +#include "drill/common.hpp" +#include "drill/drillClient.hpp" +#include "streamSocket.hpp" + +namespace Drill { + +class UserProperties; + +class ConnectionEndpoint{ +public: +ConnectionEndpoint(const char* connStr); +ConnectionEndpoint(const char* host, const char* port); +~ConnectionEndpoint(); + +//parse the connection string and set up the host and port to connect to +connectionStatus_t getDrillbitEndpoint(); + +std::string& getProtocol(){return m_protocol;} +std::string& getHost(){return m_host;} +std::string& getPort(){return m_port;} +DrillClientError* getError(){ return m_pError;}; + +private: +void parseConnectString(); +connectionStatus_t validateConnectionString(); +bool isDirectConnection(); +bool isZookeeperConnection(); +connectionStatus_t getDrillbitEndpointFromZk(); +connectionStatus_t handleError(connectionStatus_t status, std::string msg); + +std::string m_connectString; +std::string m_pathToDrill; +std::string m_protocol; +std::string m_hostPortStr; +std::string m_host; +std::string m_port; + +DrillClientError* m_pError; + +}; + +class ChannelContext{ +public: +ChannelContext(DrillUserProperties* props):m_properties(props){}; +virtual ~ChannelContext(){}; +const DrillUserProperties* getUserProperties() const { return m_properties;} +protected: +DrillUserProperties* m_properties; +}; + +class SSLChannelContext: public ChannelContext{ +public: +static boost::asio::ssl::context::method getTlsVersion(std::string version){ +if(version.empty()){ +return boost::asio::ssl::context::tlsv12; +} else if (version == "tlsv12") { +return boost::asio::ssl::context::tlsv12; +} else if (version == "tlsv11") { +return boost::asio::ssl::context::tlsv11; +} else if (version == "sslv23") { +return boost::asio::ssl::context::sslv23; +} else if (version == "tlsv1") { +return boost::asio::ssl::context::tlsv1; +} else if (version == "sslv3") { +return boost::asio::ssl::context::sslv3; +} else { +return boost::asio::ssl::context::tlsv12; +} +} + +SSLChannelContext(DrillUserProperties *props, boost::asio::ssl::context::method tlsVersion, boost::asio::ssl::verify_mode verifyMode) : +ChannelContext(props), +m_SSLContext(tlsVersion) { +m_SSLContext.set_default_verify_paths(); +m_SSLContext.set_options( +boost::asio::ssl::context::default_workarounds +| boost::asio::ssl::context::no_sslv2 +| boost::asio::ssl::context::single_dh_use +); +m_SSLContext.set_verify_mode(verifyMode); +}; +~SSLChannelContext(){}; +boost::asio::ssl::context& getSslContext(){ return m_SSLContext;} +private: +boost::asio::ssl::context m_SSLContext; +}; +
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140588134 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/TestSSLConfig.java --- @@ -91,10 +123,35 @@ public void testForTrustStore() throws Exception { ConfigBuilder config = new ConfigBuilder(); config.put(ExecConstants.HTTP_TRUSTSTORE_PATH, "/root"); config.put(ExecConstants.HTTP_TRUSTSTORE_PASSWORD, "root"); -SSLConfig sslv = new SSLConfig(config.build()); +config.put(ExecConstants.SSL_USE_HADOOP_CONF, false); +SSLConfig sslv = new SSLConfigBuilder() +.config(config.build()) +.mode(SSLFactory.Mode.SERVER) +.initializeSSLContext(false) +.validateKeyStore(true) +.build(); assertEquals(true, sslv.hasTrustStorePath()); assertEquals(true,sslv.hasTrustStorePassword()); assertEquals("/root",sslv.getTrustStorePath()); assertEquals("root",sslv.getTrustStorePassword()); } -} \ No newline at end of file + + @Test + public void testInvalidHadoopKeystore() throws Exception { +Configuration hadoopConfig = new Configuration(); +String hadoopSSLFileProp = MessageFormat +.format(HADOOP_SSL_CONF_TPL_KEY, SSLFactory.Mode.SERVER.toString().toLowerCase()); +hadoopConfig.set(hadoopSSLFileProp, "ssl-server-invalid.xml"); +ConfigBuilder config = new ConfigBuilder(); +config.put(ExecConstants.SSL_USE_HADOOP_CONF, true); +SSLConfig sslv = new SSLConfigBuilder() +.config(config.build()) +.mode(SSLFactory.Mode.SERVER) +.initializeSSLContext(false) +.validateKeyStore(true) +.hadoopConfig(hadoopConfig) +.build(); +assertEquals("FAIL", sslv.getKeyStorePassword()); --- End diff -- Shouldn't this test fail while doing `validateKeyStore` since there is no keystore path but only password in the hadoop config file ? ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141228991 --- Diff: contrib/native/client/src/include/drill/userProperties.hpp --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef USER_PROPERTIES_H +#define USER_PROPERTIES_H + +#include +#include "drill/common.hpp" + +namespace Drill{ + +class DECLSPEC_DRILL_CLIENT DrillUserProperties{ +public: +static const std::map USER_PROPERTIES; + +DrillUserProperties(){}; + +void setProperty( const std::string& propName, const std::string& propValue){ --- End diff -- so for each `propName` we have defined if it's `string/boolean/etc`. How about checking here if `propName` is boolean then make sure to set the value in lower case? So when we retrieve those values and compare with string "true" or "false", we don't have to do the translation. ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141247355 --- Diff: contrib/native/client/src/clientlib/wincert.ipp --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#if defined(IS_SSL_ENABLED) + +#include +#include + +#if defined _WIN32 || defined _WIN64 + +#include +#include +#include +#include +#include +#include + + +#pragma comment (lib, "crypt32.lib") +#pragma comment (lib, "cryptui.lib") + +#define MY_ENCODING_TYPE (PKCS_7_ASN_ENCODING | X509_ASN_ENCODING) + +inline +int loadSystemTrustStore(const SSL *ssl) { +HCERTSTORE hStore; +PCCERT_CONTEXT pContext = NULL; +X509 *x509; + char* stores[] = { + "CA", + "MY", + "ROOT", + "SPC" + }; + +SSL_CTX * ctx = SSL_get_SSL_CTX(ssl); +X509_STORE *store = SSL_CTX_get_cert_store(ctx); + + for(int i=0; i<4; i++){ +hStore = CertOpenSystemStore(NULL, stores[i]); + +if (!hStore) +return 1; --- End diff -- This means we will return with failure while opening any of the 4 system store. Should we instead try all 4 system stores and log the ones for which failure happened (by appending the names to string param suggested in above comment) but still succeed if anyone store was successfully opened ? But then I think we should also check if there is atleast one certificate which was added to X509 store out of these system store ? ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141245863 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -65,108 +66,70 @@ struct ToRpcType: public std::unary_function(i); } }; -} -connectionStatus_t DrillClientImpl::connect(const char* connStr, DrillUserProperties* props){ -std::string pathToDrill, protocol, hostPortStr; -std::string host; -std::string port; +} // anonymous -if (this->m_bIsConnected) { -if(std::strcmp(connStr, m_connectStr.c_str())){ // trying to connect to a different address is not allowed if already connected +connectionStatus_t DrillClientImpl::connect(const char* connStr, DrillUserProperties* props){ +if (this->m_bIsConnected || (this->m_pChannelContext!=NULL && this->m_pChannel!=NULL)) { +if(!std::strcmp(connStr, m_connectStr.c_str())){ +// trying to connect to a different address is not allowed if already connected return handleConnError(CONN_ALREADYCONNECTED, getMessage(ERR_CONN_ALREADYCONN)); } return CONN_SUCCESS; } +std::string val; +channelType_t type = ( props->isPropSet(USERPROP_USESSL) && +props->getProp(USERPROP_USESSL, val) =="true") ? +CHANNEL_TYPE_SSLSTREAM : +CHANNEL_TYPE_SOCKET; -m_connectStr=connStr; -Utils::parseConnectStr(connStr, pathToDrill, protocol, hostPortStr); -if(protocol == "zk"){ -ZookeeperClient zook(pathToDrill); -std::vector drillbits; -int err = zook.getAllDrillbits(hostPortStr, drillbits); -if(!err){ -if (drillbits.empty()){ -return handleConnError(CONN_FAILURE, getMessage(ERR_CONN_ZKNODBIT)); -} -Utils::shuffle(drillbits); -exec::DrillbitEndpoint endpoint; -err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint);// get the last one in the list -if(!err){ -host=boost::lexical_cast(endpoint.address()); - port=boost::lexical_cast(endpoint.user_port()); -} -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Choosing drillbit <" << (drillbits.size() - 1) << ">. Selected " << endpoint.DebugString() << std::endl;) - -} -if(err){ -return handleConnError(CONN_ZOOKEEPER_ERROR, getMessage(ERR_CONN_ZOOKEEPER, zook.getError().c_str())); -} -zook.close(); -m_bIsDirectConnection=true; -}else if(protocol == "local"){ -boost::lock_guard lock(m_dcMutex);//strtok is not reentrant -char tempStr[MAX_CONNECT_STR+1]; -strncpy(tempStr, hostPortStr.c_str(), MAX_CONNECT_STR); tempStr[MAX_CONNECT_STR]=0; -host=strtok(tempStr, ":"); -port=strtok(NULL, ""); -m_bIsDirectConnection=false; -}else{ -return handleConnError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, protocol.c_str())); -} -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Connecting to endpoint: " << host << ":" << port << std::endl;) -std::string serviceHost; -for (size_t i = 0; i < props->size(); i++) { -if (props->keyAt(i) == USERPROP_SERVICE_HOST) { -serviceHost = props->valueAt(i); -} +connectionStatus_t ret = CONN_SUCCESS; +m_pChannelContext = ChannelContextFactory::getChannelContext(type, props); +m_pChannel= ChannelFactory::getChannel(type, m_io_service, connStr); +ret=m_pChannel->init(m_pChannelContext); +if(ret!=CONN_SUCCESS){ +handleConnError(m_pChannel->getError()); +return ret; } -if (serviceHost.empty()) { -props->setProperty(USERPROP_SERVICE_HOST, host); +ret= m_pChannel->connect(); +if(ret!=CONN_SUCCESS){ +handleConnError(m_pChannel->getError()); +return ret; } -connectionStatus_t ret = this->connect(host.c_str(), port.c_str()); +props->setProperty(USERPROP_SERVICE_HOST, m_pChannel->getEndpoint()->getHost()); return ret; } -connectionStatus_t DrillClientImpl::connect(const char* host, const char* port){ -using boost::asio::ip::tcp; -tcp::endpoint endpoint; -try{ -tcp::resolver resolver(m_io_service); -tcp::resolver::query query(tcp::v4(), host, port); -tcp::resolver::iterator iter = resolver.resolve(query); -tcp::resolver::iterator end; -while (iter != end){ -endpoint = *iter++; -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << endpoint << std::endl;) -} -
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140593112 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/security/TestUserBitSSLServer.java --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc.user.security; + +import com.typesafe.config.ConfigValueFactory; +import junit.framework.TestCase; +import org.apache.drill.BaseTestQuery; +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.common.config.DrillProperties; +import org.apache.drill.exec.ExecConstants; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.security.ssl.SSLFactory; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.io.File; +import java.text.MessageFormat; +import java.util.Properties; + +import static org.apache.drill.exec.ssl.SSLConfig.HADOOP_SSL_CONF_TPL_KEY; +import static org.junit.Assert.assertEquals; + +public class TestUserBitSSLServer extends BaseTestQuery { + private static final org.slf4j.Logger logger = + org.slf4j.LoggerFactory.getLogger(TestUserBitSSLServer.class); + + private static DrillConfig sslConfig; + private static Properties initProps; // initial client properties + private static ClassLoader classLoader; + private static String ksPath; + private static String tsPath; + private static String emptyTSPath; + + @BeforeClass + public static void setupTest() throws Exception { + +classLoader = TestUserBitSSLServer.class.getClassLoader(); +ksPath = new File(classLoader.getResource("ssl/keystore.ks").getFile()).getAbsolutePath(); +tsPath = new File(classLoader.getResource("ssl/truststore.ks").getFile()).getAbsolutePath(); +emptyTSPath = new File(classLoader.getResource("ssl/emptytruststore.ks").getFile()).getAbsolutePath(); +sslConfig = new DrillConfig(DrillConfig.create(cloneDefaultTestConfigProperties()) +.withValue(ExecConstants.USER_SSL_ENABLED, ConfigValueFactory.fromAnyRef(true)) +.withValue(ExecConstants.SSL_KEYSTORE_TYPE, ConfigValueFactory.fromAnyRef("JKS")) +.withValue(ExecConstants.SSL_KEYSTORE_PATH, ConfigValueFactory.fromAnyRef(ksPath)) +.withValue(ExecConstants.SSL_KEYSTORE_PASSWORD, ConfigValueFactory.fromAnyRef("drill123")) +.withValue(ExecConstants.SSL_KEY_PASSWORD, ConfigValueFactory.fromAnyRef("drill123")) +.withValue(ExecConstants.SSL_PROTOCOL, ConfigValueFactory.fromAnyRef("TLSv1.2")), false); +initProps = new Properties(); +initProps.setProperty(DrillProperties.ENABLE_TLS, "true"); +initProps.setProperty(DrillProperties.TRUSTSTORE_PATH, tsPath); +initProps.setProperty(DrillProperties.TRUSTSTORE_PASSWORD, "drill123"); +initProps.setProperty(DrillProperties.DISABLE_HOST_VERIFICATION, "true"); + } + + @AfterClass + public static void cleanTest() throws Exception { +DrillConfig restoreConfig = +new DrillConfig(DrillConfig.create(cloneDefaultTestConfigProperties()), false); +updateTestCluster(1, restoreConfig); + } + + @Test + public void testInvalidKeystorePath() throws Exception { +DrillConfig testConfig = new DrillConfig(DrillConfig.create(sslConfig) +.withValue(ExecConstants.SSL_KEYSTORE_PATH, ConfigValueFactory.fromAnyRef("/bad/path")), +false); + +// Start an SSL enabled cluster +boolean failureCaught = false; +try { + updateTestCluster(1, testConfig, initProps); +} catch (Exception e) { + failureCaught = true; +} +assertEquals(failureCaught, true); + } + + @Test + public void testInvalidKeystorePassword() throws Exception { +DrillConfig testConfig = new DrillConfig(DrillConfig.create(sslConfig) +.withValue(ExecCo
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141230974 --- Diff: contrib/native/client/src/clientlib/channel.cpp --- @@ -0,0 +1,452 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include + +#include "drill/drillConfig.hpp" +#include "drill/drillError.hpp" +#include "drill/userProperties.hpp" +#include "channel.hpp" +#include "errmsgs.hpp" +#include "logger.hpp" +#include "utils.hpp" +#include "zookeeperClient.hpp" + +#include "GeneralRPC.pb.h" + +namespace Drill{ + +ConnectionEndpoint::ConnectionEndpoint(const char* connStr){ +m_connectString=connStr; +m_pError=NULL; +} + +ConnectionEndpoint::ConnectionEndpoint(const char* host, const char* port){ +m_host=host; +m_port=port; +m_protocol="drillbit"; // direct connection +m_pError=NULL; +} + +ConnectionEndpoint::~ConnectionEndpoint(){ +if(m_pError!=NULL){ +delete m_pError; m_pError=NULL; +} +} + +connectionStatus_t ConnectionEndpoint::getDrillbitEndpoint(){ +connectionStatus_t ret=CONN_SUCCESS; +if(!m_connectString.empty()){ +parseConnectString(); +if(m_protocol.empty()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, "")); +} +if(isZookeeperConnection()){ +if((ret=getDrillbitEndpointFromZk())!=CONN_SUCCESS){ +return ret; +} +}else if(!this->isDirectConnection()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_UNKPROTO, this->getProtocol().c_str())); +} +}else{ +if(m_host.empty() || m_port.empty()){ +return handleError(CONN_INVALID_INPUT, getMessage(ERR_CONN_NOCONNSTR)); +} +} +return ret; +} + +void ConnectionEndpoint::parseConnectString(){ +boost::regex connStrExpr("(.*)=(.*):([0-9]+)(?:/(.+))?"); +boost::cmatch matched; + +if(boost::regex_match(m_connectString.c_str(), matched, connStrExpr)){ +m_protocol.assign(matched[1].first, matched[1].second); +std::string host, port; +host.assign(matched[2].first, matched[2].second); +port.assign(matched[3].first, matched[3].second); +if(isDirectConnection()){ +// if the connection is to a zookeeper, +// we will get the host and the port only after connecting to the Zookeeper +m_host=host; +m_port=port; +} +m_hostPortStr=host+std::string(":")+port; +std::string pathToDrill; +if(matched.size()==5){ +pathToDrill.assign(matched[4].first, matched[4].second); +if(!pathToDrill.empty()){ +m_pathToDrill=std::string("/")+pathToDrill; +} +} +DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) +<< "Conn str: "<< m_connectString +<< "; protocol: " << m_protocol +<< "; host: " << host +<< "; port: " << port +<< "; path to drill: " << m_pathToDrill +<< std::endl;) +} else { +DRILL_MT_LOG(DRILL_LOG(LOG_DEBUG) << "Invalid connect string. Regexp did not match" << std::endl;) +} + +return; +} + +bool ConnectionEndpoint::isDirectConnection(){ +assert(!m_protocol.empty()); +return (!strcmp(m_protocol.c_str(), "local") || !strcmp(m_protocol.c_str(), "drillbit")); +} + +bool ConnectionEndpoint::isZookeeperConnection(){ +assert(!m_protocol.empty()); +return (!strcmp(m_protocol.c_str(), "zk")); +} + +connectionStatus_t ConnectionEndpoint::getDrillbitEndpointFromZk(){ +ZookeeperClient zook(m_pathToDrill); +assert(!m_hostPortStr.empty
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r141215569 --- Diff: contrib/native/client/src/clientlib/channel.hpp --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef CHANNEL_HPP +#define CHANNEL_HPP + +#include "drill/common.hpp" +#include "drill/drillClient.hpp" +#include "streamSocket.hpp" + +namespace Drill { + +class UserProperties; + +class ConnectionEndpoint{ +public: +ConnectionEndpoint(const char* connStr); +ConnectionEndpoint(const char* host, const char* port); +~ConnectionEndpoint(); + +//parse the connection string and set up the host and port to connect to +connectionStatus_t getDrillbitEndpoint(); + +std::string& getProtocol(){return m_protocol;} +std::string& getHost(){return m_host;} +std::string& getPort(){return m_port;} +DrillClientError* getError(){ return m_pError;}; + +private: +void parseConnectString(); +connectionStatus_t validateConnectionString(); --- End diff -- Not seeing any implementation of this function: `validateConnectionString` ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140590439 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/security/TestUserBitSSL.java --- @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc.user.security; + +import com.typesafe.config.ConfigValueFactory; +import io.netty.handler.ssl.util.SelfSignedCertificate; +import junit.framework.TestCase; +import org.apache.drill.BaseTestQuery; +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.common.config.DrillProperties; +import org.apache.drill.exec.ExecConstants; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.io.File; +import java.io.FileOutputStream; +import java.net.InetAddress; +import java.security.KeyStore; +import java.util.Properties; + +import static junit.framework.TestCase.fail; +import static org.junit.Assert.assertEquals; + +public class TestUserBitSSL extends BaseTestQuery { + private static final org.slf4j.Logger logger = + org.slf4j.LoggerFactory.getLogger(TestUserBitSSL.class); + + private static DrillConfig newConfig; + private static Properties initProps; // initial client properties + private static ClassLoader classLoader; + private static String ksPath; + private static String tsPath; + private static String emptyTSPath; + private static String unknownKsPath; + + @BeforeClass + public static void setupTest() throws Exception { + +// Create a new DrillConfig +classLoader = TestUserBitSSL.class.getClassLoader(); +ksPath = new File(classLoader.getResource("ssl/keystore.ks").getFile()).getAbsolutePath(); +unknownKsPath = new File(classLoader.getResource("ssl/unknownkeystore.ks").getFile()).getAbsolutePath(); +tsPath = new File(classLoader.getResource("ssl/truststore.ks").getFile()).getAbsolutePath(); +emptyTSPath = new File(classLoader.getResource("ssl/emptytruststore.ks").getFile()).getAbsolutePath(); +newConfig = new DrillConfig(DrillConfig.create(cloneDefaultTestConfigProperties()) +.withValue(ExecConstants.SSL_USE_HADOOP_CONF, +ConfigValueFactory.fromAnyRef(false)) +.withValue(ExecConstants.USER_SSL_ENABLED, +ConfigValueFactory.fromAnyRef(true)) +.withValue(ExecConstants.SSL_KEYSTORE_TYPE, +ConfigValueFactory.fromAnyRef("JKS")) +.withValue(ExecConstants.SSL_KEYSTORE_PATH, +ConfigValueFactory.fromAnyRef(ksPath)) +.withValue(ExecConstants.SSL_KEYSTORE_PASSWORD, +ConfigValueFactory.fromAnyRef("drill123")) +.withValue(ExecConstants.SSL_KEY_PASSWORD, +ConfigValueFactory.fromAnyRef("drill123")) +.withValue(ExecConstants.SSL_TRUSTSTORE_TYPE, +ConfigValueFactory.fromAnyRef("JKS")) +.withValue(ExecConstants.SSL_TRUSTSTORE_PATH, +ConfigValueFactory.fromAnyRef(tsPath)) +.withValue(ExecConstants.SSL_TRUSTSTORE_PASSWORD, +ConfigValueFactory.fromAnyRef("drill123")) +.withValue(ExecConstants.SSL_PROTOCOL, +ConfigValueFactory.fromAnyRef("TLSv1.2")), + false); + +initProps = new Properties(); +initProps.setProperty(DrillProperties.ENABLE_TLS, "true"); +initProps.setProperty(DrillProperties.TRUSTSTORE_PATH, tsPath); +initProps.setProperty(DrillProperties.TRUSTSTORE_PASSWORD, "drill123"); +initProps.setProperty(DrillProperties.DISABLE_HOST_VERIFICATION, "true"); + +// Start an SSL enabled cluster +updateTestCluster(1, newConfig, initProps); + } + + @AfterClass + public static void cleanTest() throws Exception { +DrillConfig restoreConfig = +new DrillConfig(DrillConfig.create(cloneDefaultTestConfigPr
Assign a JIRA
Hello All, I would like to work on this JIRA DRILL-5773. Can you please assign this JIRA to me. my user-name : hanu.ncr Thanks, -Hanu
[jira] [Created] (DRILL-5821) In the Drill web UI, display configured direct, heap memory
Paul Rogers created DRILL-5821: -- Summary: In the Drill web UI, display configured direct, heap memory Key: DRILL-5821 URL: https://issues.apache.org/jira/browse/DRILL-5821 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.11.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.12.0 When diagnosing memory-related issues, the first question we always ask is, "how much memory has Drill been given." This turns out to be rather tedious: we have to log onto the node running Drill, track down the config value, and check the configured amount. Since there are multiple possible config files, we have to figure out which was actually used. An alternative is to do `ps aux | grep Drill` to look at the command line, but this also requires access to the node itself. As an easier route, just as we display encryption information on the web UI main page, display the following: {noformat} Resources Direct Memory: ## GB Heap Memory: # GB Cores: ## {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill issue #963: DRILL-5259: Allow listing a user-defined number of profile...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/963 @paul-rogers please bless this - Checks in place for non-numeric or out-of-range numeric inputs. - Auto fill if the max requested range exceeds the actual number of profiles available. User would not need to re-enter a value, just hit `Go` ---
[GitHub] drill issue #964: DRILL-5755 Reduced default number of batches kept in memor...
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/964 @Ben-Zvi @paul-rogers ---
[GitHub] drill pull request #964: DRILL-5755 Reduced default number of batches kept i...
GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/964 DRILL-5755 Reduced default number of batches kept in memory by TopN You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-5755 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/964.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #964 commit 80ff848a419b65ae578a382a24971a01d97f2eb0 Author: Timothy Farkas Date: 2017-09-21T22:59:58Z DRILL-5755 Reduced default number of batches kept in memory by the TopN operator. ---
[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/914 Added final development stage to this PR. This is a minor implementation tweak. The result set loader internals for maps held redundant map vectors. In the final vector container, each map must be represented by a map (or repeated map) vector. But, because the result set loader handles projection and overflow, the set of columns that a map writer works with is a superset of those that appear in the output container. For this reason, there turns out to be no reason to maintain these "redundant" map vectors. For repeated map vectors, we must maintain the offset vector. The code already did this; we just strip off the enclosing repeated map vector. ---
[GitHub] drill issue #954: DRILL-5803: Show the hostname for each minor fragment in o...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/954 @paul-rogers Made the changes ---
[GitHub] drill pull request #962: DRILL-5820: Add support for libpam4j Pam Authentica...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/962#discussion_r141196893 --- Diff: distribution/src/resources/drill-override-example.conf --- @@ -133,6 +133,8 @@ drill.exec: { security.user.auth { enabled: false, packages += "org.apache.drill.exec.rpc.user.security", +# There are 2 implementations available "pam" using JPAM +# and "pam4j" using libpam4j --- End diff -- As discussed in person, perhaps explain how the `impl` property here related to the implementation class. (Evidently via an annotation. So, maybe just name the annotation used.) ---
[GitHub] drill pull request #962: DRILL-5820: Add support for libpam4j Pam Authentica...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/962#discussion_r141199041 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/security/PamUserAuthenticator.java --- @@ -59,7 +59,8 @@ public void authenticate(String user, String password) throws UserAuthentication for (String pamProfile : profiles) { Pam pam = new Pam(pamProfile); if (!pam.authenticateSuccessful(user, password)) { -throw new UserAuthenticationException(String.format("PAM profile '%s' validation failed", pamProfile)); +throw new UserAuthenticationException(String.format("PAM profile '%s' validation failed for user %s", --- End diff -- The module above throws an exception. Here we get a `boolean`. Does this mean that the exception was mapped to `false`, then mapped back to an exception? ---
[GitHub] drill pull request #962: DRILL-5820: Add support for libpam4j Pam Authentica...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/962#discussion_r141198653 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/security/Pam4jUserAuthenticator.java --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc.user.security; + +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.DrillbitStartupException; +import org.jvnet.libpam.PAM; +import org.jvnet.libpam.PAMException; +import org.jvnet.libpam.UnixUser; + +import java.io.IOException; +import java.util.List; + +/** + * Implement {@link org.apache.drill.exec.rpc.user.security.UserAuthenticator} based on Pluggable Authentication + * Module (PAM) configuration. Configure the PAM profiles using "drill.exec.security.user.auth.pam_profiles" BOOT + * option. Ex. value [ "login", "sudo" ] (value is an array of strings). + */ +@UserAuthenticatorTemplate(type = "pam4j") +public class Pam4jUserAuthenticator implements UserAuthenticator { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(Pam4jUserAuthenticator.class); + + private List profiles; + + @Override + public void setup(DrillConfig drillConfig) throws DrillbitStartupException { +profiles = drillConfig.getStringList(ExecConstants.PAM_AUTHENTICATOR_PROFILES); + } + + @Override + public void authenticate(String user, String password) throws UserAuthenticationException { +for (String profile : profiles) { + PAM pam = null; + UnixUser unixUser; + try { +pam = new PAM(profile); +unixUser = pam.authenticate(user, password); + } catch (PAMException ex) { +logger.error("PAM auth failed for user: {} against {} profile. Exception: {}", user, profile, ex.getMessage()); +throw new UserAuthenticationException(String.format("PAM auth failed for user: %s using profile: %s", +user, profile)); + } finally { +if (pam != null) { + pam.dispose(); +} + } + + if (!user.equals(unixUser.getUserName())) { --- End diff -- `equals` or `equalsIgnoreCase`? That is, should "Fred" match "fred"? ---
[GitHub] drill pull request #962: DRILL-5820: Add support for libpam4j Pam Authentica...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/962#discussion_r141197203 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/security/Pam4jUserAuthenticator.java --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc.user.security; + +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.DrillbitStartupException; +import org.jvnet.libpam.PAM; +import org.jvnet.libpam.PAMException; +import org.jvnet.libpam.UnixUser; + +import java.io.IOException; +import java.util.List; + +/** + * Implement {@link org.apache.drill.exec.rpc.user.security.UserAuthenticator} based on Pluggable Authentication + * Module (PAM) configuration. Configure the PAM profiles using "drill.exec.security.user.auth.pam_profiles" BOOT + * option. Ex. value [ "login", "sudo" ] (value is an array of strings). + */ +@UserAuthenticatorTemplate(type = "pam4j") +public class Pam4jUserAuthenticator implements UserAuthenticator { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(Pam4jUserAuthenticator.class); + + private List profiles; --- End diff -- `final`? ---
[GitHub] drill pull request #962: DRILL-5820: Add support for libpam4j Pam Authentica...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/962#discussion_r141198807 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/security/Pam4jUserAuthenticator.java --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc.user.security; + +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.DrillbitStartupException; +import org.jvnet.libpam.PAM; +import org.jvnet.libpam.PAMException; +import org.jvnet.libpam.UnixUser; + +import java.io.IOException; +import java.util.List; + +/** + * Implement {@link org.apache.drill.exec.rpc.user.security.UserAuthenticator} based on Pluggable Authentication + * Module (PAM) configuration. Configure the PAM profiles using "drill.exec.security.user.auth.pam_profiles" BOOT + * option. Ex. value [ "login", "sudo" ] (value is an array of strings). + */ +@UserAuthenticatorTemplate(type = "pam4j") +public class Pam4jUserAuthenticator implements UserAuthenticator { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(Pam4jUserAuthenticator.class); + + private List profiles; + + @Override + public void setup(DrillConfig drillConfig) throws DrillbitStartupException { +profiles = drillConfig.getStringList(ExecConstants.PAM_AUTHENTICATOR_PROFILES); + } + + @Override + public void authenticate(String user, String password) throws UserAuthenticationException { +for (String profile : profiles) { + PAM pam = null; + UnixUser unixUser; + try { +pam = new PAM(profile); +unixUser = pam.authenticate(user, password); + } catch (PAMException ex) { +logger.error("PAM auth failed for user: {} against {} profile. Exception: {}", user, profile, ex.getMessage()); +throw new UserAuthenticationException(String.format("PAM auth failed for user: %s using profile: %s", +user, profile)); + } finally { +if (pam != null) { + pam.dispose(); +} + } + + if (!user.equals(unixUser.getUserName())) { +throw new UserAuthenticationException(String.format("Unexpected error from pam module. Input user %s is " + +"different from authenticated output user %s of pam module libpam4j", user, unixUser.getUserName())); + } + + if (logger.isTraceEnabled()) { --- End diff -- Can omit this if statement, `logger.trace()` will do teh right thing. ---
[GitHub] drill pull request #963: DRILL-5259: Allow listing a user-defined number of ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/963#discussion_r141191837 --- Diff: exec/java-exec/src/main/resources/rest/profile/list.ftl --- @@ -37,7 +37,15 @@ No running queries. - Completed Queries + + + Completed Queries + +Show + --- End diff -- Better! But, the number to show is the *target* number of profiles, not the *actual* number. If we show the actual number, then we get this: * Show profiles with the target of 100. * We actually have 10 profiles, so that is displayed in the "max" control. * Two more profiles come in, now we have 12. * Hit the "Go" button. Only 10 profiles appear because that was the prior count. * Must manually type a larger number to see more. What we want: * Show profiles with the target of 100. * We actually have 10 profiles, but the target is 100, so 100 is displayed in the "max" control. * Two more profiles come in, now we have 12. * Hit the "Go" button. All 12 profiles appear. To make this work: * Add to the model the *target* number: either the default of 100 or the number received via the max parameter. * Display that model number in the "max" control. ---
[GitHub] drill pull request #963: DRILL-5259: Allow listing a user-defined number of ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/963#discussion_r141192014 --- Diff: exec/java-exec/src/main/resources/rest/profile/list.ftl --- @@ -37,7 +37,15 @@ No running queries. - Completed Queries + + + Completed Queries + +Show + + + + --- End diff -- BTW: we should validate the max parameter in the Java code. Force max into the range: 1 ... 500 (say). That is, don't allow -1, 0 or 100 profiles. ---
[GitHub] drill issue #932: DRILL-5758: Fix for repeated columns; enable managed sort ...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/932 Changes LGTM. +1 ---
[GitHub] drill pull request #932: DRILL-5758: Fix for repeated columns; enable manage...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/932#discussion_r141191139 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java --- @@ -74,53 +74,52 @@ public final int estSize; /** - * Number of times the value here (possibly repeated) appears in - * the record batch. + * Number of occurrences of the value in the batch. This is trivial + * for top-level scalars: it is the record count. For a top-level + * repeated vector, this is the number of arrays, also the record + * count. For a value nested inside a repeated map, it is the + * total number of values across all maps, and may be less than, + * greater than (but unlikely) same as the row count. */ public final int valueCount; /** - * The number of elements in the value vector. Consider two cases. - * A required or nullable vector has one element per row, so the - * entryCount is the same as the valueCount (which, - * in turn, is the same as the row count.) But, if this vector is an - * array, then the valueCount is the number of columns, while - * entryCount is the total number of elements in all the arrays - * that make up the columns, so entryCount will be different than - * the valueCount (normally larger, but possibly smaller if most - * arrays are empty. - * - * Finally, the column may be part of another list. In this case, the above - * logic still applies, but the valueCount is the number of entries - * in the outer array, not the row count. + * Total number of elements for a repeated type, or 1 if this is + * a non-repeated type. That is, a batch of 100 rows may have an + * array with 10 elements per row. In this case, the element count + * is 1000. */ -public int entryCount; +public final int elementCount; --- End diff -- Not related to elementCount per-se but I see that netBatchSize and accountedMemorySize are integers. These could overflow depending on number of columns. Should they be longs ? ---
[GitHub] drill issue #963: DRILL-5259: Allow listing a user-defined number of profile...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/963 @paul-rogers Opened a new PR as I accidentally closed #953 . This is ready for review. ---
[GitHub] drill issue #953: DRILL-5259: Allow listing a user-defined number of profile...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/953 Closed the original PR accidentally. Have filed a new one #963 ---
[GitHub] drill pull request #963: DRILL-5259: Allow listing a user-defined number of ...
GitHub user kkhatua opened a pull request: https://github.com/apache/drill/pull/963 DRILL-5259: Allow listing a user-defined number of profiles Added an additional field in the UI allowing a user to specify the max number of profiles to display You can merge this pull request into a Git repository by running: $ git pull https://github.com/kkhatua/drill DRILL-5259 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/963.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #963 commit f4a538361fe2494627d5425de345f525471c245c Author: Kunal Khatua Date: 2017-09-26T20:55:40Z DRILL-5259: Allow listing a user-defined number of profiles Added an additional field in the UI allowing a user to specify the max number of profiles to display ---
[GitHub] drill issue #962: DRILL-5820: Add support for libpam4j Pam Authenticator
Github user sohami commented on the issue: https://github.com/apache/drill/pull/962 @parthchandra - Please help to review ---
[GitHub] drill pull request #962: DRILL-5820: Add support for libpam4j Pam Authentica...
GitHub user sohami opened a pull request: https://github.com/apache/drill/pull/962 DRILL-5820: Add support for libpam4j Pam Authenticator You can merge this pull request into a Git repository by running: $ git pull https://github.com/sohami/drill DRILL-5820 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/962.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #962 commit 6efa28f63e56e7996d76cba19c577fadca3d9d1f Author: Sorabh Hamirwasia Date: 2017-09-21T00:38:08Z DRILL-5820: Add support for libpam4j Pam Authenticator ---
[GitHub] drill pull request #953: DRILL-5259: Allow listing a user-defined number of ...
Github user kkhatua closed the pull request at: https://github.com/apache/drill/pull/953 ---
[jira] [Created] (DRILL-5820) Add support for libpam4j Pam Authenticator
Sorabh Hamirwasia created DRILL-5820: Summary: Add support for libpam4j Pam Authenticator Key: DRILL-5820 URL: https://issues.apache.org/jira/browse/DRILL-5820 Project: Apache Drill Issue Type: Task Reporter: Sorabh Hamirwasia Assignee: Sorabh Hamirwasia Fix For: 1.12.0 Drill uses JPAM as the PAM authenticator module for username/password verification for PLAIN mechanism. There are some known issues with JPAM which leads to JVM crash and memory leaks. JPAM also requires a manual step in copying the native library. Also based on the [HIVE-16529|https://issues.apache.org/jira/browse/HIVE-16529] there have been mention of these issues with JPAM which is resolved in the libpam4j. Also libpam4j avoids the need to install native library explicitly. It would be good to provide support for libpam4j in Drill to avoid these issues. Some other reported problems with JPAM: * https://wiki.dlib.indiana.edu/display/V3/Pam+Authentication+through+JPam * https://bugzilla.redhat.com/show_bug.cgi?id=860119#c12 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill issue #959: DRILL-5816: Hash function produces skewed results on Strin...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/959 lgtm. +1. @sohami could you pls check with @chunhui-shi if unit tests were added previously for DRILL-4237 (another skew issue)? If not, could you create a new JIRA to add such tests, which should include the data set for DRILL-4237, DRILL-4119, DRILL-5816 among others. ---
Drill Views getting Connection Closed error after adding more columns
Hi , I have modified my existing drill View and added new columns. It includes few placeholder attributes also where I am passing ‘null’ as String(also tried with blank ‘’) but its impacting performance. Drill Views are getting connection timeout very frequently. If I remove additional and new columns added, it works fine. Please suggest Regards, Sanchita
[GitHub] drill pull request #961: DRILL-5792: CONVERT_FROM_JSON on an empty file thro...
GitHub user vdiravka opened a pull request: https://github.com/apache/drill/pull/961 DRILL-5792: CONVERT_FROM_JSON on an empty file throws runtime exception Easyfix - using GenericAccessor for UntypedNullVector. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vdiravka/drill DRILL-5792 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/961.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #961 commit 9d3a92b963271897565cfd81098050b702fcdac5 Author: Vitalii Diravka Date: 2017-09-26T11:06:45Z DRILL-5792: CONVERT_FROM_JSON on an empty file throws runtime exception ---
[GitHub] drill issue #904: DRILL-5717: change some date time test cases with specific...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/904 @vvysotskyi thanks for your patient review . As you pointed out , the timezone would not take effect when test cases run in parallel. I also tried the mock strategy. It will also fail in parallel. So I moved out that part of code to a new class called `TestCastFunctionsSpecTZCases` by specifying the timezone before the cluster starts up. It would be clear to the codes. ---
Re: [EXT] Re: Food for thought about intra-document operation
Hello Aman, AsterixDb seems to follow the standard SQL with a few minor modifications and add functions to ease aggregations (array_count, array_avg…) That would tend to confirm at least that the support of unnest is a good idea to improve Drill. Best regards Damien ** On 09/25/2017 07:53 PM, Aman Sinha wrote: Damien, thanks for initiating the discussion..indeed this would be a very useful enhancement. Currently, Drill provides repeated_contains() for filtering and repeated_count() for count aggregates on arrays but not the general purpose intra-document operations that you need based on your example. I haven't gone through all the alternatives but in addition to what you have described, you might also want to look at SQL++ ( https://ci.apache.org/projects/asterixdb/sqlpp/manual.html) which has been adopted by AsterixDB and has syntax extensions to SQL for unstructured data. -Aman On Mon, Sep 25, 2017 at 6:10 AM, Damien Profeta wrote: Hello, A few format handled by Drill enable to work with document, meaning nested and repeated structure instead of just tables. Json and Parquet are the two that come to my mind right now. Document modeling is a great way to express complex object and is used a lot in my company. Drill is able to handle them but unfortunately, it cannot make much computation on it. By computation I mean, filtering branches of the document, computing statistics (avg, min, max) on part of the document … That would be very useful as an analytic tools. _What can be done_ The question then is how to express the computation we want to do on the document. I have found multiple ways to handle that and I don't really know which one is the best hence the mail to expose what I have found to initiate discussion, maybe. First, in we look back at the Dremel paper which is the base of the parquet format and also one of the example for drill, dremel is adding the special keyword "WITHIN" to SQL to specify that the computation has to be done within a document. What is very powerful with this keyword is that it allows you to generate document and doesn't force you to flatten everything. You can find exemple of it usage in the google successor of Dremel: BigQuery and its documentation : https://cloud.google.com/bigqu ery/docs/legacy-nested-repeated. But it seems that it was problematic for Google, because they now propose a SQL that seems to be compliant with SQL 2011 for Bigquery to handle such computation. I am not familiar with SQL 2011 but it is told in BigQuery documentation to integrated the keywords for nested and repeated structure. You can have a view about how this is done in BigQuery here: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays . Basically, what I have seen is that they leverage UNNEST and ARRAY keyword and then are able to use JOIN or CROSS JOIN to describe the aggregation. In Impala, they have added a way to add a subquery on a complex type in such a way that the subquery only act intra-document. I have no idea if this is standard SQL or not. In page https://www.cloudera.com/docum entation/enterprise/5-5-x/topics/impala_complex_types.html#complex_types look at the phrase: “The subquery labelled SUBQ1 is correlated:” for example. In Presto, you can apply lambda function to map/array to transform the structure and apply filter on it. So you have filter, map_filter function to filter array and map respectively. (cf https://prestodb.io/docs/curre nt/functions/lambda.html#filter) _Example_ If I want to make a short example, let’s say we have a flight with a group of passengers in it. A document would be : { “flightnb”:1234, “group”:[{“age”:30,”gender”:”M”},{“age”:15,”gender”:”F”}, {“age”:10,”gender”:”F”},{“age”:30,”gender”:”F”}]} The database would be millions of such document and I want to know the average age of the male passenger for every flight. In Dremel, the query would be something like: select flightnb, avg(male_age) within record from (select groups.age as male_age from flight where group.gender = "M") With sql, it would be something like: select flightnb, avg(male_age) from (array(select g.age as male_age from unnest(group)as g where g.gender = "M") as male_age) With impala it would be something like: select flightnb, avg(male) from flight, select g.age from groups as g where g.gender = “M” as male With presto, it would be something like: select flightnb, avg(male) from flight, filter(group,x->x.gender = "M")as male I am not sure at all about my SQL queries but it should give you a rough idea about the different ways to express the inital query. So many different ways to express the same query… I would personally go for the SQL way of expressing things to implement it in Drill, especially because calcite is already able to parse unnest, array, but that’s only my first thought. Best regards, Damien
[GitHub] drill issue #959: DRILL-5816: Hash function produces skewed results on Strin...
Github user sohami commented on the issue: https://github.com/apache/drill/pull/959 @paul-rogers - I have added few tests and findings using hash32 and hash64 to compute the 32 bit hash codes in JIRA. ---