[jira] [Commented] (DRILL-5541) C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV

2017-06-05 Thread Rob Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037936#comment-16037936
 ] 

Rob Wu commented on DRILL-5541:
---

I set up a proxy server that mess with the incoming data randomly before 
returning it to see if the C++ client handles invalid data gracefully.

DrillClient  <--> Proxy 
<---> Server
connect()  
O--->
select * from Tab 
O--->
   <--- Flip random bits (do work 
on the data) <- Data
Process X
CrashX

> C++ Client Crashes During Simple "Man in the Middle" Attack Test with 
> Exploitable Write AV
> --
>
> Key: DRILL-5541
> URL: https://issues.apache.org/jira/browse/DRILL-5541
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060
> Exploitability Classification: EXPLOITABLE
> Recommended Bug Title: Exploitable - User Mode Write AV starting at 
> drillClient!boost_sb::shared_ptr::reset+0x00a7
>  (Hash=0x4ae7fdff.0xb15af658)
> User mode write access violations that are not near NULL are exploitable.
> ==
> Stack Trace:
> Child-SP  RetAddr   Call Site
> `030df630 07fe`c295bca1 
> drillClient!boost_sb::shared_ptr::reset+0xa7
>  
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\smart_ptr\shared_ptr.hpp
>  @ 620]
> `030df680 07fe`c295433c 
> drillClient!Drill::DrillClientImpl::processSchemasResult+0x281 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1227]
> `030df7a0 07fe`c294cbf6 
> drillClient!Drill::DrillClientImpl::handleRead+0x75c 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1555]
> `030df9c0 07fe`c294ce9f 
> drillClient!boost_sb::asio::detail::win_iocp_socket_recv_op
>  
> >,boost_sb::asio::mutable_buffers_1,boost_sb::asio::detail::transfer_all_t,boost_sb::_bi::bind_t  char * __ptr64,boost_sb::system::error_code const & __ptr64,unsigned 
> __int64>,boost_sb::_bi::list4 __ptr64>,boost_sb::_bi::value __ptr64>,boost_sb::arg<1>,boost_sb::arg<2> > > > >::do_complete+0x166 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\win_iocp_socket_recv_op.hpp
>  @ 97]
> `030dfa90 07fe`c296009d 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::do_one+0x27f 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 406]
> `030dfb70 07fe`c295ffc9 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::run+0xad 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 164]
> `030dfbd0 07fe`c2aa5b53 
> drillClient!boost_sb::asio::io_service::run+0x29 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\impl\io_service.ipp
>  @ 60]
> `030dfc10 07fe`c2ad3e03 drillClient!boost_sb::`anonymous 
> namespace'::thread_start_function+0x43
> `030dfc50 07fe`c2ad404e drillClient!_callthreadstartex+0x17 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
> `030dfc80 `779e59cd drillClient!_threadstartex+0x102 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
> `030dfcb0 `77c1a561 kernel32!BaseThreadInitThunk+0xd
> `030dfce0 ` ntdll!RtlUserThreadStart+0x1d
> ==
> Register:
> rax=0284bae0 rbx=07fec2b3de70 rcx=027ec210
> rdx=027ec210 rsi=027f2638 rdi=027f25d0
> rip=07fec292f827 rsp=030df630 rbp=027ec210
>  r8=027ec210  r9= r10=027d32fc
> r11=27eb001b0003 r12= r13=028035a0
> r14=027ec210 r15=
> iopl=0 nv up ei pl nz na pe nc
> cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b efl=00010200
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060



--
This message was sent by Atlassian JIRA

[jira] [Issue Comment Deleted] (DRILL-5541) C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV

2017-06-05 Thread Rob Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Wu updated DRILL-5541:
--
Comment: was deleted

(was: I set up a server)

> C++ Client Crashes During Simple "Man in the Middle" Attack Test with 
> Exploitable Write AV
> --
>
> Key: DRILL-5541
> URL: https://issues.apache.org/jira/browse/DRILL-5541
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060
> Exploitability Classification: EXPLOITABLE
> Recommended Bug Title: Exploitable - User Mode Write AV starting at 
> drillClient!boost_sb::shared_ptr::reset+0x00a7
>  (Hash=0x4ae7fdff.0xb15af658)
> User mode write access violations that are not near NULL are exploitable.
> ==
> Stack Trace:
> Child-SP  RetAddr   Call Site
> `030df630 07fe`c295bca1 
> drillClient!boost_sb::shared_ptr::reset+0xa7
>  
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\smart_ptr\shared_ptr.hpp
>  @ 620]
> `030df680 07fe`c295433c 
> drillClient!Drill::DrillClientImpl::processSchemasResult+0x281 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1227]
> `030df7a0 07fe`c294cbf6 
> drillClient!Drill::DrillClientImpl::handleRead+0x75c 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1555]
> `030df9c0 07fe`c294ce9f 
> drillClient!boost_sb::asio::detail::win_iocp_socket_recv_op
>  
> >,boost_sb::asio::mutable_buffers_1,boost_sb::asio::detail::transfer_all_t,boost_sb::_bi::bind_t  char * __ptr64,boost_sb::system::error_code const & __ptr64,unsigned 
> __int64>,boost_sb::_bi::list4 __ptr64>,boost_sb::_bi::value __ptr64>,boost_sb::arg<1>,boost_sb::arg<2> > > > >::do_complete+0x166 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\win_iocp_socket_recv_op.hpp
>  @ 97]
> `030dfa90 07fe`c296009d 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::do_one+0x27f 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 406]
> `030dfb70 07fe`c295ffc9 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::run+0xad 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 164]
> `030dfbd0 07fe`c2aa5b53 
> drillClient!boost_sb::asio::io_service::run+0x29 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\impl\io_service.ipp
>  @ 60]
> `030dfc10 07fe`c2ad3e03 drillClient!boost_sb::`anonymous 
> namespace'::thread_start_function+0x43
> `030dfc50 07fe`c2ad404e drillClient!_callthreadstartex+0x17 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
> `030dfc80 `779e59cd drillClient!_threadstartex+0x102 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
> `030dfcb0 `77c1a561 kernel32!BaseThreadInitThunk+0xd
> `030dfce0 ` ntdll!RtlUserThreadStart+0x1d
> ==
> Register:
> rax=0284bae0 rbx=07fec2b3de70 rcx=027ec210
> rdx=027ec210 rsi=027f2638 rdi=027f25d0
> rip=07fec292f827 rsp=030df630 rbp=027ec210
>  r8=027ec210  r9= r10=027d32fc
> r11=27eb001b0003 r12= r13=028035a0
> r14=027ec210 r15=
> iopl=0 nv up ei pl nz na pe nc
> cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b efl=00010200
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5541) C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV

2017-06-05 Thread Rob Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037935#comment-16037935
 ] 

Rob Wu commented on DRILL-5541:
---

I set up a server

> C++ Client Crashes During Simple "Man in the Middle" Attack Test with 
> Exploitable Write AV
> --
>
> Key: DRILL-5541
> URL: https://issues.apache.org/jira/browse/DRILL-5541
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060
> Exploitability Classification: EXPLOITABLE
> Recommended Bug Title: Exploitable - User Mode Write AV starting at 
> drillClient!boost_sb::shared_ptr::reset+0x00a7
>  (Hash=0x4ae7fdff.0xb15af658)
> User mode write access violations that are not near NULL are exploitable.
> ==
> Stack Trace:
> Child-SP  RetAddr   Call Site
> `030df630 07fe`c295bca1 
> drillClient!boost_sb::shared_ptr::reset+0xa7
>  
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\smart_ptr\shared_ptr.hpp
>  @ 620]
> `030df680 07fe`c295433c 
> drillClient!Drill::DrillClientImpl::processSchemasResult+0x281 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1227]
> `030df7a0 07fe`c294cbf6 
> drillClient!Drill::DrillClientImpl::handleRead+0x75c 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1555]
> `030df9c0 07fe`c294ce9f 
> drillClient!boost_sb::asio::detail::win_iocp_socket_recv_op
>  
> >,boost_sb::asio::mutable_buffers_1,boost_sb::asio::detail::transfer_all_t,boost_sb::_bi::bind_t  char * __ptr64,boost_sb::system::error_code const & __ptr64,unsigned 
> __int64>,boost_sb::_bi::list4 __ptr64>,boost_sb::_bi::value __ptr64>,boost_sb::arg<1>,boost_sb::arg<2> > > > >::do_complete+0x166 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\win_iocp_socket_recv_op.hpp
>  @ 97]
> `030dfa90 07fe`c296009d 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::do_one+0x27f 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 406]
> `030dfb70 07fe`c295ffc9 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::run+0xad 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 164]
> `030dfbd0 07fe`c2aa5b53 
> drillClient!boost_sb::asio::io_service::run+0x29 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\impl\io_service.ipp
>  @ 60]
> `030dfc10 07fe`c2ad3e03 drillClient!boost_sb::`anonymous 
> namespace'::thread_start_function+0x43
> `030dfc50 07fe`c2ad404e drillClient!_callthreadstartex+0x17 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
> `030dfc80 `779e59cd drillClient!_threadstartex+0x102 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
> `030dfcb0 `77c1a561 kernel32!BaseThreadInitThunk+0xd
> `030dfce0 ` ntdll!RtlUserThreadStart+0x1d
> ==
> Register:
> rax=0284bae0 rbx=07fec2b3de70 rcx=027ec210
> rdx=027ec210 rsi=027f2638 rdi=027f25d0
> rip=07fec292f827 rsp=030df630 rbp=027ec210
>  r8=027ec210  r9= r10=027d32fc
> r11=27eb001b0003 r12= r13=028035a0
> r14=027ec210 r15=
> iopl=0 nv up ei pl nz na pe nc
> cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b efl=00010200
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5541) C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV

2017-06-05 Thread Parth Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037912#comment-16037912
 ] 

Parth Chandra commented on DRILL-5541:
--

Curious to know how you created this issue. 

> C++ Client Crashes During Simple "Man in the Middle" Attack Test with 
> Exploitable Write AV
> --
>
> Key: DRILL-5541
> URL: https://issues.apache.org/jira/browse/DRILL-5541
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060
> Exploitability Classification: EXPLOITABLE
> Recommended Bug Title: Exploitable - User Mode Write AV starting at 
> drillClient!boost_sb::shared_ptr::reset+0x00a7
>  (Hash=0x4ae7fdff.0xb15af658)
> User mode write access violations that are not near NULL are exploitable.
> ==
> Stack Trace:
> Child-SP  RetAddr   Call Site
> `030df630 07fe`c295bca1 
> drillClient!boost_sb::shared_ptr::reset+0xa7
>  
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\smart_ptr\shared_ptr.hpp
>  @ 620]
> `030df680 07fe`c295433c 
> drillClient!Drill::DrillClientImpl::processSchemasResult+0x281 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1227]
> `030df7a0 07fe`c294cbf6 
> drillClient!Drill::DrillClientImpl::handleRead+0x75c 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1555]
> `030df9c0 07fe`c294ce9f 
> drillClient!boost_sb::asio::detail::win_iocp_socket_recv_op
>  
> >,boost_sb::asio::mutable_buffers_1,boost_sb::asio::detail::transfer_all_t,boost_sb::_bi::bind_t  char * __ptr64,boost_sb::system::error_code const & __ptr64,unsigned 
> __int64>,boost_sb::_bi::list4 __ptr64>,boost_sb::_bi::value __ptr64>,boost_sb::arg<1>,boost_sb::arg<2> > > > >::do_complete+0x166 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\win_iocp_socket_recv_op.hpp
>  @ 97]
> `030dfa90 07fe`c296009d 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::do_one+0x27f 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 406]
> `030dfb70 07fe`c295ffc9 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::run+0xad 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 164]
> `030dfbd0 07fe`c2aa5b53 
> drillClient!boost_sb::asio::io_service::run+0x29 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\impl\io_service.ipp
>  @ 60]
> `030dfc10 07fe`c2ad3e03 drillClient!boost_sb::`anonymous 
> namespace'::thread_start_function+0x43
> `030dfc50 07fe`c2ad404e drillClient!_callthreadstartex+0x17 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
> `030dfc80 `779e59cd drillClient!_threadstartex+0x102 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
> `030dfcb0 `77c1a561 kernel32!BaseThreadInitThunk+0xd
> `030dfce0 ` ntdll!RtlUserThreadStart+0x1d
> ==
> Register:
> rax=0284bae0 rbx=07fec2b3de70 rcx=027ec210
> rdx=027ec210 rsi=027f2638 rdi=027f25d0
> rip=07fec292f827 rsp=030df630 rbp=027ec210
>  r8=027ec210  r9= r10=027d32fc
> r11=27eb001b0003 r12= r13=028035a0
> r14=027ec210 r15=
> iopl=0 nv up ei pl nz na pe nc
> cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b efl=00010200
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5541) C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV

2017-06-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037601#comment-16037601
 ] 

ASF GitHub Bot commented on DRILL-5541:
---

GitHub user superbstreak opened a pull request:

https://github.com/apache/drill/pull/850

DRILL-5541: C++ Client Crashes During Simple "Man in the Middle" Atta…

…ck Test with Exploitable Write AV

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/superbstreak/drill DRILL-5541

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #850


commit 716db51df61d0ee47804217a6a133d1d1152b64a
Author: Rob Wu 
Date:   2017-06-05T21:06:33Z

DRILL-5541: C++ Client Crashes During Simple "Man in the Middle" Attack 
Test with Exploitable Write AV




> C++ Client Crashes During Simple "Man in the Middle" Attack Test with 
> Exploitable Write AV
> --
>
> Key: DRILL-5541
> URL: https://issues.apache.org/jira/browse/DRILL-5541
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060
> Exploitability Classification: EXPLOITABLE
> Recommended Bug Title: Exploitable - User Mode Write AV starting at 
> drillClient!boost_sb::shared_ptr::reset+0x00a7
>  (Hash=0x4ae7fdff.0xb15af658)
> User mode write access violations that are not near NULL are exploitable.
> ==
> Stack Trace:
> Child-SP  RetAddr   Call Site
> `030df630 07fe`c295bca1 
> drillClient!boost_sb::shared_ptr::reset+0xa7
>  
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\smart_ptr\shared_ptr.hpp
>  @ 620]
> `030df680 07fe`c295433c 
> drillClient!Drill::DrillClientImpl::processSchemasResult+0x281 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1227]
> `030df7a0 07fe`c294cbf6 
> drillClient!Drill::DrillClientImpl::handleRead+0x75c 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1555]
> `030df9c0 07fe`c294ce9f 
> drillClient!boost_sb::asio::detail::win_iocp_socket_recv_op
>  
> >,boost_sb::asio::mutable_buffers_1,boost_sb::asio::detail::transfer_all_t,boost_sb::_bi::bind_t  char * __ptr64,boost_sb::system::error_code const & __ptr64,unsigned 
> __int64>,boost_sb::_bi::list4 __ptr64>,boost_sb::_bi::value __ptr64>,boost_sb::arg<1>,boost_sb::arg<2> > > > >::do_complete+0x166 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\win_iocp_socket_recv_op.hpp
>  @ 97]
> `030dfa90 07fe`c296009d 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::do_one+0x27f 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 406]
> `030dfb70 07fe`c295ffc9 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::run+0xad 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 164]
> `030dfbd0 07fe`c2aa5b53 
> drillClient!boost_sb::asio::io_service::run+0x29 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\impl\io_service.ipp
>  @ 60]
> `030dfc10 07fe`c2ad3e03 drillClient!boost_sb::`anonymous 
> namespace'::thread_start_function+0x43
> `030dfc50 07fe`c2ad404e drillClient!_callthreadstartex+0x17 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
> `030dfc80 `779e59cd drillClient!_threadstartex+0x102 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
> `030dfcb0 `77c1a561 kernel32!BaseThreadInitThunk+0xd
> `030dfce0 ` ntdll!RtlUserThreadStart+0x1d
> ==
> Register:
> rax=0284bae0 rbx=07fec2b3de70 rcx=027ec210
> rdx=027ec210 rsi=027f2638 rdi=027f25d0
> rip=07fec292f827 rsp=030df630 rbp=027ec210
>  r8=027ec210  r9= r10=027d32fc
> r11=27eb001b0003 r12= r13=028035a0
> r14=027ec210 r15=
> iopl=0 nv up ei pl nz na pe nc
> cs=0033  ss=002b  ds=002b  es=002

[jira] [Updated] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-05 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-5568:
-
Reviewer: Paul Rogers

> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037583#comment-16037583
 ] 

ASF GitHub Bot commented on DRILL-5568:
---

GitHub user sohami opened a pull request:

https://github.com/apache/drill/pull/849

DRILL-5568: Include hadoop-common jars inside drill-jdbc-all.jar

More details on this PR is in 
[JIRA](https://issues.apache.org/jira/browse/DRILL-5568)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohami/drill DRILL-5568

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #849


commit e84ce5bb6317e7a8caa50c7ffc85dfc416616596
Author: Sorabh Hamirwasia 
Date:   2017-06-05T20:45:27Z

DRILL-5568: Include hadoop-common jars inside drill-jdbc-all.jar




> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (DRILL-5567) Review changes for DRILL 5514

2017-06-05 Thread Karthikeyan Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthikeyan Manivannan resolved DRILL-5567.
---
Resolution: Done

> Review changes for DRILL 5514
> -
>
> Key: DRILL-5567
> URL: https://issues.apache.org/jira/browse/DRILL-5567
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
> Fix For: 1.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5514) Enhance VectorContainer to merge two row sets

2017-06-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037580#comment-16037580
 ] 

ASF GitHub Bot commented on DRILL-5514:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/837#discussion_r118797793
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType 
t2) {
 return true;
   }
 
+  /**
+   * Merge two schema to produce a new, merged schema. The caller is 
responsible
+   * for ensuring that column names are unique. The order of the fields in 
the
+   * new schema is the same as that of this schema, with the other 
schema's fields
+   * appended in the order defined in the other schema. The resulting 
selection
+   * vector mode is the same as this schema. (That is, this schema is 
assumed to
+   * be the main part of the batch, possibly with a selection vector, with 
the
+   * other schema representing additional, new columns.)
+   * @param otherSchema the schema to merge with this one
+   * @return the new, merged, schema
+   */
+
+  public BatchSchema merge(BatchSchema otherSchema) {
+if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE &&
+selectionVectorMode != otherSchema.selectionVectorMode) {
+  throw new IllegalArgumentException("Left schema must carry the 
selection vector mode");
--- End diff --

"Left schema must carry the same selection vector mode"  + "as the right 
schema"?


> Enhance VectorContainer to merge two row sets
> -
>
> Key: DRILL-5514
> URL: https://issues.apache.org/jira/browse/DRILL-5514
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> 
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 ---- bundle 2 --
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5514) Enhance VectorContainer to merge two row sets

2017-06-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037581#comment-16037581
 ] 

ASF GitHub Bot commented on DRILL-5514:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/837#discussion_r120198724
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType 
t2) {
 return true;
   }
 
+  /**
+   * Merge two schema to produce a new, merged schema. The caller is 
responsible
+   * for ensuring that column names are unique. The order of the fields in 
the
+   * new schema is the same as that of this schema, with the other 
schema's fields
+   * appended in the order defined in the other schema. The resulting 
selection
+   * vector mode is the same as this schema. (That is, this schema is 
assumed to
+   * be the main part of the batch, possibly with a selection vector, with 
the
+   * other schema representing additional, new columns.)
+   * @param otherSchema the schema to merge with this one
+   * @return the new, merged, schema
+   */
+
+  public BatchSchema merge(BatchSchema otherSchema) {
+if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE &&
+selectionVectorMode != otherSchema.selectionVectorMode) {
+  throw new IllegalArgumentException("Left schema must carry the 
selection vector mode");
+}
+List mergedFields = new ArrayList<>();
--- End diff --

List mergedFields = new ArrayList(this.fields.size() +  
otherSchema.fields.size()) would avoid having to potentially grow the ArrayList 
twice.


> Enhance VectorContainer to merge two row sets
> -
>
> Key: DRILL-5514
> URL: https://issues.apache.org/jira/browse/DRILL-5514
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> 
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 ---- bundle 2 --
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-05 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-5568:
-
Summary: Include hadoop-common jars inside drill-jdbc-all.jar  (was: 
Include Hadoop dependency jars inside drill-jdbc-all.jar)

> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5568) Include Hadoop dependency jars inside drill-jdbc-all.jar

2017-06-05 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-5568:


 Summary: Include Hadoop dependency jars inside drill-jdbc-all.jar
 Key: DRILL-5568
 URL: https://issues.apache.org/jira/browse/DRILL-5568
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


With Sasl support in 1.10 the authentication using username/password was moved 
to Plain Mechanism of Sasl Framework. There are couple of Hadoop classes like 
Configuration.java and UserGroupInformation.java defined in hadoop-common 
package which were used in DrillClient for security mechanisms like 
Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency inside 
_drill-jdbc-all.jar_  Without it the application using this driver will fail to 
connect to Drill with authentication enabled.

Today this jar (which is JDBC driver for Drill) already has lots of other 
dependencies which DrillClient relies on like Netty, etc. But the way we add 
these dependencies are under *oadd* namespace so that the application using 
this driver won't end up in conflict with it's own version of same 
dependencies. As part of this JIRA it will include hadoop-common dependencies 
under same namespace. This will allow an application to connect to Drill using 
this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5567) Review changes for DRILL 5514

2017-06-05 Thread Karthikeyan Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthikeyan Manivannan updated DRILL-5567:
--
Remaining Estimate: 2h
 Original Estimate: 2h

> Review changes for DRILL 5514
> -
>
> Key: DRILL-5567
> URL: https://issues.apache.org/jira/browse/DRILL-5567
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
> Fix For: 1.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5514) Enhance VectorContainer to merge two row sets

2017-06-05 Thread Karthikeyan Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthikeyan Manivannan updated DRILL-5514:
--
Reviewer: Karthikeyan Manivannan  (was: Sorabh Hamirwasia)

> Enhance VectorContainer to merge two row sets
> -
>
> Key: DRILL-5514
> URL: https://issues.apache.org/jira/browse/DRILL-5514
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> 
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 ---- bundle 2 --
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5567) Review changes for DRILL 5514

2017-06-05 Thread Karthikeyan Manivannan (JIRA)
Karthikeyan Manivannan created DRILL-5567:
-

 Summary: Review changes for DRILL 5514
 Key: DRILL-5567
 URL: https://issues.apache.org/jira/browse/DRILL-5567
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Karthikeyan Manivannan
Assignee: Karthikeyan Manivannan






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5565) Directory Query fails with Permission denied: access=EXECUTE if dirN name is 'year=2017' or 'month=201704'

2017-06-05 Thread ehur (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ehur updated DRILL-5565:

Environment: CentOS release 6.8

> Directory Query fails with Permission denied: access=EXECUTE if dirN name is 
> 'year=2017' or 'month=201704'
> --
>
> Key: DRILL-5565
> URL: https://issues.apache.org/jira/browse/DRILL-5565
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, SQL Parser
>Affects Versions: 1.6.0
> Environment: CentOS release 6.8
>Reporter: ehur
>
> running a query like this works fine, when the name dir0 contains numerics 
> only:
> select * from all.my.records
> where dir0 >= '20170322'
> limit 10;
> if the dirN is named according to this convention: year=2017 we get one of 
> the following problems:
> 1. Either "system error permission denied" in:
> select * from all.my.records
> where dir0 >= 'year=2017'
> limit 10;
>  SYSTEM ERROR: RemoteException: Permission denied: user=myuser, 
> access=EXECUTE,
> inode: 
> /user/myuser/all/my/records/year=2017/month=201701/day=20170101/application_1485464650247_1917/part-r-0.gz.parquet":myuser:supergroup:-rw-r--r--
>   at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>   at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>   at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6609)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4223)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:894)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:526)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:822)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> 2. OR, if the where clause only specifies numerics in the dirname, it does 
> not blow up, but neither does it return the relevant data, since that where 
> clause is not the correct path to our data:
> select * from all.my.records
> where dir0 >= '2017'
> limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037481#comment-16037481
 ] 

Paul Rogers commented on DRILL-5546:


Agreed. Just to clarify, the idea of a "non-existent batch" is confusing. Yes, 
NONE means that there are no more batches. If the only outcome is NONE ("fast 
NONE"), then that means that there is no output: not a batch with empty schema, 
rather that there is no batch at all (a null batch.) That's what the ∧ is 
supposed to mean...

> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5566) AssertionError: Internal error: invariant violated: call to wrong operator

2017-06-05 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5566:
-

 Summary: AssertionError: Internal error: invariant violated: call 
to wrong operator
 Key: DRILL-5566
 URL: https://issues.apache.org/jira/browse/DRILL-5566
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
Reporter: Khurram Faraaz


CHARACTER_LENGTH is a non-reserved keyword as per the SQL specification. It is 
a monadic function that accepts exactly one operand or parameter.

{noformat}
 ::=

  | 
  | 
  | 
  | 
  ...
  ...

 ::=

  | 
 ::=
  { CHAR_LENGTH | CHARACTER_LENGTH }  
  [ USING  ] 
...
...
 ::=
CHARACTERS
  | OCTETS  
{noformat}

Drill reports an assertion error in drillbit.log when character_length function 
is used in a SQL query.
{noformat}
0: jdbc:drill:schema=dfs.tmp> select character_length(cast('hello' as 
varchar(10))) col1 from (values(1));
Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: call 
to wrong operator


[Error Id: 49198839-5a1b-4786-9257-59739b27d2a8 on centos-01.qa.lab:31010]

  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
during fragment initialization: Internal error: invariant violated: call to 
wrong operator
org.apache.drill.exec.work.foreman.Foreman.run():297
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745
Caused By (java.lang.AssertionError) Internal error: invariant violated: call 
to wrong operator
org.apache.calcite.util.Util.newInternal():777
org.apache.calcite.util.Util.permAssert():885
org.apache.calcite.sql2rel.ReflectiveConvertletTable$3.convertCall():219
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall():59
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():4148
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():3581
org.apache.calcite.sql.SqlCall.accept():130

org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression():4040
org.apache.calcite.sql2rel.StandardConvertletTable$8.convertCall():185
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall():59
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():4148
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():3581
org.apache.calcite.sql.SqlCall.accept():130

org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression():4040
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList():3411
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl():612
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect():568
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive():2773
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery():522
org.apache.drill.exec.planner.sql.SqlConverter.toRel():269

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel():623

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():195
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():164
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():131
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():79
org.apache.drill.exec.work.foreman.Foreman.runSQL():1050
org.apache.drill.exec.work.foreman.Foreman.run():280
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745 (state=,code=0)
{noformat}

Calcite supports character_length function
{noformat}
[root@centos-0170 csv]# ./sqlline
sqlline version 1.1.9
sqlline> !connect jdbc:calcite:model=target/test-classes/model.json admin admin
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
0: jdbc:calcite:model=target/test-classes/mod> select 
character_length(cast('hello' as varchar(10))) col1 from (values(1));
++
|COL1|
++
| 5  |
++
1 row selected (1.379 seconds)
{noformat}

Postgres 9.3 also supports character_length function
{noformat}
postgres=# select character_length(cast('hello' as varchar(10))) col1 from 
(values(1)) foo;
 col1 
--
5
(1 row)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5565) Directory Query fails with Permission denied: access=EXECUTE if dirN name is 'year=2017' or 'month=201704'

2017-06-05 Thread ehur (JIRA)
ehur created DRILL-5565:
---

 Summary: Directory Query fails with Permission denied: 
access=EXECUTE if dirN name is 'year=2017' or 'month=201704'
 Key: DRILL-5565
 URL: https://issues.apache.org/jira/browse/DRILL-5565
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill, SQL Parser
Affects Versions: 1.6.0
Reporter: ehur


running a query like this works fine, when the name dir0 contains numerics only:
select * from all.my.records
where dir0 >= '20170322'
limit 10;

if the dirN is named according to this convention: year=2017 we get one of the 
following problems:

1. Either "system error permission denied" in:
select * from all.my.records
where dir0 >= 'year=2017'
limit 10;

 SYSTEM ERROR: RemoteException: Permission denied: user=myuser, access=EXECUTE,
inode: 
/user/myuser/all/my/records/year=2017/month=201701/day=20170101/application_1485464650247_1917/part-r-0.gz.parquet":myuser:supergroup:-rw-r--r--

at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6609)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4223)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:894)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:526)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:822)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

2. OR, if the where clause only specifies numerics in the dirname, it does not 
blow up, but neither does it return the relevant data, since that where clause 
is not the correct path to our data:
select * from all.my.records
where dir0 >= '2017'
limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5564) IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: buffer space (16674816) + prealloc space (0) + child space (0) != allocated (16740352)

2017-06-05 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5564:
-

 Summary: IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: 
buffer space (16674816) + prealloc space (0) + child space (0) != allocated 
(16740352)
 Key: DRILL-5564
 URL: https://issues.apache.org/jira/browse/DRILL-5564
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
 Environment: 3 node CentOS cluster
Reporter: Khurram Faraaz


Run a concurrent Java program that executes TPCDS query11
while the above concurrent java program is under execution
stop foreman Drillbit (from another shell, using below command)
./bin/drillbit.sh stop
and you will see the IllegalStateException: allocator[op:21:1:5:HashJoinPOP]:  
and another assertion error, in the drillbit.log
AssertionError: Failure while stopping processing for operator id 10. Currently 
have states of processing:false, setup:false, waiting:true.   

Drill 1.11.0 git commit ID: d11aba2 (with assertions enabled)
 
details from drillbit.log from the foreman Drillbit node.
{noformat}
2017-06-05 18:38:33,838 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
26ca5afa-7f6d-991b-1fdf-6196faddc229:23:1: State change requested RUNNING --> 
FAILED
2017-06-05 18:38:33,849 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
26ca5afa-7f6d-991b-1fdf-6196faddc229:23:1: State change requested FAILED --> 
FINISHED
2017-06-05 18:38:33,852 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: AssertionError: Failure 
while stopping processing for operator id 10. Currently have states of 
processing:false, setup:false, waiting:true.

Fragment 23:1

[Error Id: a116b326-43ed-4569-a20e-a10ba03d215e on centos-01.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: 
Failure while stopping processing for operator id 10. Currently have states of 
processing:false, setup:false, waiting:true.

Fragment 23:1

[Error Id: a116b326-43ed-4569-a20e-a10ba03d215e on centos-01.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_91]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: java.lang.RuntimeException: java.lang.AssertionError: Failure while 
stopping processing for operator id 10. Currently have states of 
processing:false, setup:false, waiting:true.
at 
org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:101)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:409)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
... 4 common frames omitted
Caused by: java.lang.AssertionError: Failure while stopping processing for 
operator id 10. Currently have states of processing:false, setup:false, 
waiting:true.
at 
org.apache.drill.exec.ops.OperatorStats.stopProcessing(OperatorStats.java:167) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:255) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleR

[jira] [Comment Edited] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037290#comment-16037290
 ] 

Jinfeng Ni edited comment on DRILL-5546 at 6/5/17 6:40 PM:
---

Thanks for putting a set of formal definitions of terms, which would clear 
confusion in further discussion.

In current Drill execution and as well as in this proposal, *NONE* simply means 
the end of input; there is no more batch coming. It should not be used to 
represent any batch. The Drill's iterator framework has the code to handle 
'NONE'.  This proposal is just to suggest we return *NONE* directly, if the 
data source does not have any schema/data.  This is different from what 
currently Drill is doing: return a *OK_NEW_SCHEMA* with a trivial result set 
(injected with nullable-int column), followed by a 'NONE'.  

Using the notation you defined, previously Drill has 

{{protocol : (OK_NEW_SCHEMA OK\*)\+ NONE}}

Now we have:

{{protocol : (OK_NEW_SCHEMA OK\*)\* NONE}}

Some operators seems to work fine under the protocol change, some operators 
such as Join, UnionAll may not, due to the above protocol changes.   



was (Author: jni):
Thanks for putting a set of form definitions of terms, which would clear 
confusion.

In current Drill execution and as well as in this proposal, *NONE* simply means 
the end of input; there is no more batch coming. It should not be used to 
represent any batch. The Drill's iterator framework has the code to handle 
'NONE'.  This proposal is just to suggest we return *NONE* directly, if the 
data source does not have any schema/data.  This is different from what 
currently Drill is doing: return a *OK_NEW_SCHEMA* with a trivial result set 
(injected with nullable-int column), followed by a 'NONE'.  

Using the notation you defined, previously Drill has 

{{protocol : (OK_NEW_SCHEMA OK\*)\+ NONE}}

Now we have:

{{protocol : (OK_NEW_SCHEMA OK\*)\* NONE}}

Some operators seems to work fine under the protocol change, some operators 
such as Join, UnionAll may not, due to the above protocol changes.   


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037290#comment-16037290
 ] 

Jinfeng Ni commented on DRILL-5546:
---

Thanks for putting a set of form definitions of terms, which would clear 
confusion.

In current Drill execution and as well as in this proposal, *NONE* simply means 
the end of input; there is no more batch coming. It should not be used to 
represent any batch. The Drill's iterator framework has the code to handle 
'NONE'.  This proposal is just to suggest we return *NONE* directly, if the 
data source does not have any schema/data.  This is different from what 
currently Drill is doing: return a *OK_NEW_SCHEMA* with a trivial result set 
(injected with nullable-int column), followed by a 'NONE'.  

Using the notation you defined, previously Drill has 

{{protocol : (OK_NEW_SCHEMA OK\*)\+ NONE}}

Now we have:

{{protocol : (OK_NEW_SCHEMA OK\*)\* NONE}}

Some operators seems to work fine under the protocol change, some operators 
such as Join, UnionAll may not, due to the above protocol changes.   


> Schema change problems caused by empty batch
> 
>
> Key: DRILL-5546
> URL: https://issues.apache.org/jira/browse/DRILL-5546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036141#comment-16036141
 ] 

Paul Rogers edited comment on DRILL-5546 at 6/5/17 3:57 PM:


I believe we are saying basically the same thing. To be certain, [watch out, 
we're gonna try some theory|https://store.xkcd.com/products/try-science].

h3. Basics

We'll need some terminology, defined in the usual way:

* (a, b, c) is a set that contains a, b, c
* \{a:b} is a map from a to b
* \[a, b, c] is an ordered list of a, b, c, where each element has an index i = 
0, 1, 2...
* empty = () or {} or \[] is the empty collection (of the proper type)
* null = SQL null: we don't know what the value is
* ∧ = Java, C, etc. null: we know the value, and it is nothing

Drill is, at its core, relational. A relation R can be defined as:

{{R = (S, T)}}

Where:

* S is the schema
* T is a set of tuples (t ~1~, t ~2~, t ~3~) (AKA a table)

{{S = (C, N)}}

Where:

* C is the list of column schemas:

{{C = \[ c ~0~, c ~1~, c ~2~, ...]}}
{{c = (name, type)}}

And:

* N is a map from name to column index:

{{N = \{name : i\} }} where i the index of column c ∈ C

Drill defines the idea of _compatible_ schemas. Two schemas are compatible if 
we redefine the schema as a set of columns:

{{S’ = (C ~0~, C ~1~, …)}}

Two schemas are compatible iff the column sets are identical (same name and 
type for each column). This is a bit more forgiving than the traditional 
relational model which requires that the ordered list of column schemas be 
identical. Let's assume this rule as we discuss schema below.

We'll also need the idea of _cardinality_:

{{\|S| = n}}

Says that the cardinality (number of items) in S is _n_. Later, well just use 
_n_ to mean a schema (or relation or whatever) that has n items.

h3. Relations and Multi-Relations

A relation can be:

{{R : ∧}} (programming null) -- the relation simply does not exist.
{{  | (0,0)}} -- the trivial relation of no columns and (by 
definition), no rows.
{{  | (s,0)}} -- a relation with some schema s, |s| ≠ 0, and no rows
{{  | (s,n)}} -- a "normal" relation with schema and n tuples (rows) 
of data, \|s| ≠ 0, n ≠ 0

It is helpful to remember some basic relational algebra:

{{R(s,0) ⋃ R(s,n) = R(s,n)}}
{{R(s,n) ⋃ R(s,m) = R(s,n+m)}}

Drill works not just with relations R, but also "multi-relations" (to make up a 
term). That is, each schema change in Drill introduces a new relation, so that 
the whole result set from a query is a series of relations. Let's define a 
"multi-reation" M using semi-BNF as:

{{M : ∧}} -- undefined result set
{{  | R(0,0)}} -- trivial result set
{{  | R(s,0)}}  - empty result set, \|s| ≠ 0
{{  | R(s,n)}} -- normal, single result set, n  ≠ 0
{{  | R ~1~(s ~1~,n), R ~1~(s ~2~,m), ...}} -- multi-relation if s 
~i~ ≠ s ~j~

Normally when we say multi-relation, we mean the last case: two or more 
relations with distinct schemas. The condition above says that to adjacent 
relations R ~i~ and R ~j~ must have distinct schema (or by the rules above, two 
relations with the same schema just collapse into a single relation with that 
schema. A schema can repeat, but a different schema must occur between 
repetitions.)

h3. Multi-Relations in Drill

Now let's get to Drill. Drill uses the term "schema change" to mean the 
transition from s ~i~ to s ~j~, s ~i~ ≠ s ~j~.

The essence of the proposal here, as I understand it, is to update the 
implementation to fully support the first three definitions of a multi-relation 
(undefined, trivial and empty), assuming that Drill already supports the other 
two definitions (single relation and multi-relation.)

In Drill, relations are broken into batches, which are, themselves, just 
relations. Thus a batch, B, can be:

{{B : ∧}} -- undefined result set)
{{  | R(0,0)}} -- trivial batch
{{  | R(s,0)}} -- empty batch set, |s| ≠ 0
{{  | R(s,n)}} -- normal batch, |s| ≠ 0, n  ≠ 0

And the whole result set D (for Drill) is an ordered list of batches:

{{D = \[B ~1~, B ~2~, ..., B ~n~]}}

Where

{{B ~i~ = (s ~i~, t ~i~)}}

As noted above, if adjacent batches have the same schema, then they are just 
sub-relations within a single larger (logical) relation. But, if the schemas 
differ, then the adjacent batches are the last and first of two distinct 
relations within a larger (logical) multi-relation. Said another way:

{{D = R ~1~, R ~2~}}
{{R ~i~ = B ~i,1~, B ~i,2~, ...}}

To clarify, let’s visualize the schema changes as ∆~i~:

{{D = \[B ~0~(s ~0~,t), B ~1~(s ~0~,t), … ∆~1~, B ~i~(s ~1~,t), B ~i+1~(s 
~1~,t), …]}}

The above sequence can describe any series of batches. As I understand the 
proposal, we want to put some constraints on the sequence.

h3. Results of a Drill Query

Let’s start by defining how should present the multi-relation to the client. 
The client also receives batches, but, following the rules in the proposal, we 
wan

[jira] [Created] (DRILL-5563) Stop non foreman Drillbit results in IllegalStateException: Allocator[ROOT] closed with outstanding child allocators.

2017-06-05 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5563:
-

 Summary: Stop non foreman Drillbit results in 
IllegalStateException: Allocator[ROOT] closed with outstanding child allocators.
 Key: DRILL-5563
 URL: https://issues.apache.org/jira/browse/DRILL-5563
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
 Environment: 3 node CentOS cluster
Reporter: Khurram Faraaz


Stopping the non-foreman Drillbit normally (as shown below) results in 
IllegalStateException: Allocator[ROOT] closed with outstanding child allocators.

/opt/mapr/drill/drill-1.11.0/bin/drillbit.sh stop

Drill 1.11.0 commit ID: d11aba2

Details from drillbit.log
{noformat}
Mon Jun  5 09:29:09 UTC 2017 Terminating drillbit pid 28182
2017-06-05 09:29:09,651 [Drillbit-ShutdownHook#0] INFO  
o.apache.drill.exec.server.Drillbit - Received shutdown request.
2017-06-05 09:29:11,691 [pool-6-thread-1] INFO  
o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup 
io.netty.channel.nio.NioEventLoopGroup@55511dc2 in 1004 ms
2017-06-05 09:29:11,691 [pool-6-thread-2] INFO  
o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup 
io.netty.channel.nio.NioEventLoopGroup@4078d750 in 1004 ms
2017-06-05 09:29:11,692 [pool-6-thread-1] INFO  
o.a.drill.exec.service.ServiceEngine - closed userServer in 1005 ms
2017-06-05 09:29:11,692 [pool-6-thread-2] INFO  
o.a.drill.exec.service.ServiceEngine - closed dataPool in 1005 ms
2017-06-05 09:29:11,701 [Drillbit-ShutdownHook#0] INFO  
o.a.drill.exec.compile.CodeCompiler - Stats: code gen count: 21, cache miss 
count: 7, hit rate: 67%
2017-06-05 09:29:11,709 [Drillbit-ShutdownHook#0] ERROR 
o.a.d.exec.server.BootStrapContext - Error while closing
java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding child 
allocators.
Allocator(ROOT) 0/800/201359872/17179869184 (res/actual/peak/limit)
  child allocators: 4
Allocator(frag:3:2) 200/0/0/200 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 0
  reservations: 0
Allocator(frag:4:2) 200/0/0/200 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 0
  reservations: 0
Allocator(frag:1:2) 200/0/0/200 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 0
  reservations: 0
Allocator(frag:2:2) 200/0/0/200 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 0
  reservations: 0
  ledgers: 0
  reservations: 0

at 
org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:492) 
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:247) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:159) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.server.Drillbit$ShutdownThread.run(Drillbit.java:253) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
2017-06-05 09:29:11,709 [Drillbit-ShutdownHook#0] INFO  
o.apache.drill.exec.server.Drillbit - Shutdown completed (2057 ms).
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036141#comment-16036141
 ] 

Paul Rogers edited comment on DRILL-5546 at 6/5/17 7:31 AM:


I believe we are saying basically the same thing. To be certain, [watch out, 
we're gonna try some theory|https://store.xkcd.com/products/try-science].

h3. Basics

We'll need some terminology, defined in the usual way:

* (a, b, c) is a set that contains a, b, c
* \{a:b} is a map from a to b
* \[a, b, c] is an ordered list of a, b, c, where each element has an index i = 
0, 1, 2...
* empty = () or {} or \[] is the empty collection (of the proper type)
* null = SQL null: we don't know what the value is
* ∧ = Java, C, etc. null: we know the value, and it is nothing

Drill is, at its core, relational. A relation R can be defined as:

{{R = (S, T)}}

Where:

* S is the schema
* T is a set of tuples (t ~1~, t ~2~, t ~3~) (AKA a table)

{{S = (C, N)}}

Where:

* C is the list of column schemas:

{{C = \[ c ~0~, c ~1~, c ~2~, ...]}}
{{c = (name, type)}}

And:

* N is a map from name to column index:

{{N = \{name : i\} }} where i the index of column c ∈ C

Drill defines the idea of _compatible_ schemas. Two schemas are compatible if 
we redefine the schema as a set of columns:

{{S’ = (C ~0~, C ~1~, …)}}

Two schemas are compatible iff the column sets are identical (same name and 
type for each column). This is a bit more forgiving than the traditional 
relational model which requires that the ordered list of column schemas be 
identical. Let's assume this rule as we discuss schema below.

We'll also need the idea of _cardinality_:

{{\|S| = n}}

Says that the cardinality (number of items) in S is _n_. Later, well just use 
_n_ to mean a schema (or relation or whatever) that has n items.

h3. Relations and Multi-Relations

A relation can be:

{{R : ∧}} (programming null) -- the relation simply does not exist.
{{  | (0,0)}} -- the trivial relation of no columns and (by 
definition), no rows.
{{  | (s,0)}} -- a relation with some schema s, |s| ≠ 0, and no rows
{{  | (s,n)}} -- a "normal" relation with schema and n tuples (rows) 
of data, \|s| ≠ 0, n ≠ 0

It is helpful to remember some basic relational algebra:

{{R(s,0) ⋃ R(s,n) = R(s,n)}}
{{R(s,n) ⋃ R(s,m) = R(s,n+m)}}

Drill works not just with relations R, but also "multi-relations" M. That is, a 
multi-relation (the entire result set from a single query) can defined (using 
semi-BNF) as:

{{M : ∧}} -- undefined result set
{{  | R(0,0)}} -- trivial result set
{{  | R(s,0)}}  - empty result set, \|s| ≠ 0
{{  | R(s,n)}} -- normal, single result set, n  ≠ 0
{{  | R ~1~(s ~1~,n), R ~1~(s ~2~,m), ...}} -- multi-relation if s 
~i~ ≠ s ~j~

Normally when we say multi-relation, we mean the last case: two or more 
relations with distinct schemas. The condition above says that to adjacent 
relations R ~i~ and R ~j~ must have distinct schema (or by the rules above, two 
relations with the same schema just collapse into a single relation with that 
schema. A schema can repeat, but a different schema must occur between 
repetitions.)

h3. Multi-Relations in Drill

Now let's get to Drill. Drill uses the term "schema change" to mean the 
transition from s ~i~ to s ~j~, s ~i~ ≠ s ~j~.

The essence of the proposal here, as I understand it, is to update the 
implementation to fully support the first three definitions of a multi-relation 
(undefined, trivial and empty), assuming that Drill already supports the other 
two definitions (single relation and multi-relation.)

In Drill, relations are broken into batches, which are, themselves, just 
relations. Thus a batch, B, can be:

{{B : ∧}} -- undefined result set)
{{  | R(0,0)}} -- trivial batch
{{  | R(s,0)}} -- empty batch set, |s| ≠ 0
{{  | R(s,n)}} -- normal batch, |s| ≠ 0, n  ≠ 0

And the whole result set D (for Drill) is an ordered list of batches:

{{D = \[B ~1~, B ~2~, ..., B ~n~]}}

Where

{{B ~i~ = (s ~i~, t ~i~)}}

As noted above, if adjacent batches have the same schema, then they are just 
sub-relations within a single larger (logical) relation. But, if the schemas 
differ, then the adjacent batches are the last and first of two distinct 
relations within a larger (logical) multi-relation. Said another way:

{{D = R ~1~, R ~2~}}
{{R ~i~ = B ~i,1~, B ~i,2~, ...}}

To clarify, let’s visualize the schema changes as ∆~i~:

{{D = \[B ~0~(s ~0~,t), B ~1~(s ~0~,t), … ∆~1~, B ~i~(s ~1~,t), B ~i+1~(s 
~1~,t), …]}}

The above sequence can describe any series of batches. As I understand the 
proposal, we want to put some constraints on the sequence.

h3. Results of a Drill Query

Let’s start by defining how should present the multi-relation to the client. 
The client also receives batches, but, following the rules in the proposal, we 
want to constrain the output:

{{D ~e~ : B(0,0)}} -- Null result
{{  | B(s,0)}} -- Empty result
{{  | B

[jira] [Comment Edited] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036141#comment-16036141
 ] 

Paul Rogers edited comment on DRILL-5546 at 6/5/17 7:30 AM:


I believe we are saying basically the same thing. To be certain, [watch out, 
we're gonna try some theory|https://store.xkcd.com/products/try-science].

h3. Basics

We'll need some terminology, defined in the usual way:

* (a, b, c) is a set that contains a, b, c
* \{a:b} is a map from a to b
* \[a, b, c] is an ordered list of a, b, c, where each element has an index i = 
0, 1, 2...
* empty = () or {} or \[] is the empty collection (of the proper type)
* null = SQL null: we don't know what the value is
* ∧ = Java, C, etc. null: we know the value, and it is nothing

Drill is, at its core, relational. A relation R can be defined as:

{{R = (S, T)}}

Where:

* S is the schema
* T is a set of tuples (t ~1~, t ~2~, t ~3~) (AKA a table)

{{S = (C, N)}}

Where:

* C is the list of column schemas:

{{C = \[ c ~0~, c ~1~, c ~2~, ...]}}
{{c = (name, type)}}

And:

* N is a map from name to column index:

{{N = \{name : i\} }} where i the index of column c ∈ C

Drill defines the idea of _compatible_ schemas. Two schemas are compatible if 
we redefine the schema as a set of columns:

{{S’ = (C ~0~, C ~1~, …)}}

Two schemas are compatible iff the column sets are identical (same name and 
type for each column). This is a bit more forgiving than the traditional 
relational model which requires that the ordered list of column schemas be 
identical. Let's assume this rule as we discuss schema below.

We'll also need the idea of _cardinality_:

{{\|S| = n}}

Says that the cardinality (number of items) in S is _n_. Later, well just use 
_n_ to mean a schema (or relation or whatever) that has n items.

h3. Relations and Multi-Relations

A relation can be:

{{R : ∧}} (programming null) -- the relation simply does not exist.
{{  | (0,0)}} -- the trivial relation of no columns and (by 
definition), no rows.
{{  | (s,0)}} -- a relation with some schema s, |s| ≠ 0, and no rows
{{  | (s,n)}} -- a "normal" relation with schema and n tuples (rows) 
of data, \|s| ≠ 0, n ≠ 0

It is helpful to remember some basic relational algebra:

{{R(s,0) ⋃ R(s,n) = R(s,n)}}
{{R(s,n) ⋃ R(s,m) = R(s,n+m)}}

Drill works not just with relations R, but also "multi-relations" M. That is, a 
multi-relation (the entire result set from a single query) can defined (using 
semi-BNF) as:

{{M : ∧}} -- undefined result set
{{  | R(0,0)}} -- trivial result set
{{  | R(s,0)}}  - empty result set, \|s| ≠ 0
{{  | R(s,n)}} -- n  ≠ 0, normal, single result set
{{  | R ~1~(s ~1~,n), R ~1~(s ~2~,m), ...}} multi-relation if s ~i~ ≠ 
s ~j~

Normally when we say multi-relation, we mean the last case: two or more 
relations with distinct schemas. The condition above says that to adjacent 
relations R ~i~ and R ~j~ must have distinct schema (or by the rules above, two 
relations with the same schema just collapse into a single relation with that 
schema. A schema can repeat, but a different schema must occur between 
repetitions.)

h3. Multi-Relations in Drill

Now let's get to Drill. Drill uses the term "schema change" to mean the 
transition from s ~i~ to s ~j~, s ~i~ ≠ s ~j~.

The essence of the proposal here, as I understand it, is to update the 
implementation to fully support the first three definitions of a multi-relation 
(undefined, trivial and empty), assuming that Drill already supports the other 
two definitions (single relation and multi-relation.)

In Drill, relations are broken into batches, which are, themselves, just 
relations. Thus a batch, B, can be:

{{B : ∧}} -- undefined result set)
{{  | R(0,0)}} -- trivial batch
{{  | R(s,0)}} -- empty batch set, |s| ≠ 0
{{  | R(s,n)}} -- normal batch, |s| ≠ 0, n  ≠ 0

And the whole result set D (for Drill) is an ordered list of batches:

{{D = \[B ~1~, B ~2~, ..., B ~n~]}}

Where

{{B ~i~ = (s ~i~, t ~i~)}}

As noted above, if adjacent batches have the same schema, then they are just 
sub-relations within a single larger (logical) relation. But, if the schemas 
differ, then the adjacent batches are the last and first of two distinct 
relations within a larger (logical) multi-relation. Said another way:

{{D = R ~1~, R ~2~}}
{{R ~i~ = B ~i,1~, B ~i,2~, ...}}

To clarify, let’s visualize the schema changes as ∆~i~:

{{D = \[B ~0~(s ~0~,t), B ~1~(s ~0~,t), … ∆~1~, B ~i~(s ~1~,t), B ~i+1~(s 
~1~,t), …]}}

The above sequence can describe any series of batches. As I understand the 
proposal, we want to put some constraints on the sequence.

h3. Results of a Drill Query

Let’s start by defining how should present the multi-relation to the client. 
The client also receives batches, but, following the rules in the proposal, we 
want to constrain the output:

{{D ~e~ : B(0,0)}} -- Null result
{{  | B(s,0)}} -- Empty result
{{  | B(s,

[jira] [Comment Edited] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036141#comment-16036141
 ] 

Paul Rogers edited comment on DRILL-5546 at 6/5/17 7:28 AM:


I believe we are saying basically the same thing. To be certain, [watch out, 
we're gonna try some theory|https://store.xkcd.com/products/try-science].

h3. Basics

We'll need some terminology, defined in the usual way:

* (a, b, c) is a set that contains a, b, c
* \{a:b} is a map from a to b
* \[a, b, c] is an ordered list of a, b, c, where each element has an index i = 
0, 1, 2...
* empty = () or {} or \[] is the empty collection (of the proper type)
* null = SQL null: we don't know what the value is
* ∧ = Java, C, etc. null: we know the value, and it is nothing

Drill is, at its core, relational. A relation R can be defined as:

{{R = (S, T)}}

Where:

* S is the schema
* T is a set of tuples (t ~1~, t ~2~, t ~3~) (AKA a table)

{{S = (C, N)}}

Where:

* C is the list of column schemas:

{{C = \[ c ~0~, c ~1~, c ~2~, ...]}}
{{c = (name, type)}}

And:

* N is a map from name to column index:

{{N = \{name : i\} }} where i the index of column c ∈ C

Drill defines the idea of _compatible_ schemas. Two schemas are compatible if 
we redefine the schema as a set of columns:

{{S’ = (C ~0~, C ~1~, …)}}

Two schemas are compatible iff the column sets are identical (same name and 
type for each column). This is a bit more forgiving than the traditional 
relational model which requires that the ordered list of column schemas be 
identical. Let's assume this rule as we discuss schema below.

We'll also need the idea of _cardinality_:

{{\|S| = n}}

Says that the cardinality (number of items) in S is _n_. Later, well just use 
_n_ to mean a schema (or relation or whatever) that has n items.

h3. Relations and Multi-Relations

A relation can be:

{{R : ∧}} (programming null) -- the relation simply does not exist.
{{  | (0,0)}} -- the trivial relation of no columns and (by 
definition), no rows.
{{  | (s,0)}} -- a relation with some schema s, |s| ≠ 0, and no rows
{{  | (s,n)}} -- a "normal" relation with schema and n tuples (rows) 
of data, \|s| ≠ 0, n ≠ 0

It is helpful to remember some basic relational algebra:

{{R(s,0) ⋃ R(s,n) = R(s,n)}}
{{R(s,n) ⋃ R(s,m) = R(s,n+m)}}

Drill works not just with relations R, but also "multi-relations" M. That is, a 
multi-relation (the entire result set from a single query) can defined (using 
semi-BNF) as:

{{M : ∧}} -- undefined result set
{{  | R(0,0)}} -- trivial result set
{{  | R(c,0)}}  - empty result set, \|c| ≠ 0
{{  | R(c,n)}} -- n  ≠ 0, normal, single result set
{{  | R ~1~(s ~1~,n), R ~1~(s ~2~,m), ...}} multi-relation if s ~i~ ≠ 
s ~j~

Normally when we say multi-relation, we mean the last case: two or more 
relations with distinct schemas. The condition above says that to adjacent 
relations R ~i~ and R ~j~ must have distinct schema (or by the rules above, two 
relations with the same schema just collapse into a single relation with that 
schema. A schema can repeat, but a different schema must occur between 
repetitions.)

h3. Multi-Relations in Drill

Now let's get to Drill. Drill uses the term "schema change" to mean the 
transition from s ~i~ to s ~j~, s ~i~ ≠ s ~j~.

The essence of the proposal here, as I understand it, is to update the 
implementation to fully support the first three definitions of a multi-relation 
(undefined, trivial and empty), assuming that Drill already supports the other 
two definitions (single relation and multi-relation.)

In Drill, relations are broken into batches, which are, themselves, just 
relations. Thus a batch, B, can be:

{{B : ∧}} -- undefined result set)
{{  | R(0,0)}} -- trivial batch
{{  | R(c,0)}} -- empty batch set, |c| ≠ 0
{{  | R(c,n)}} -- normal batch, |c| ≠ 0, n  ≠ 0

And the whole result set D (for Drill) is an ordered list of batches:

{{D = \[B ~1~, B ~2~, ..., B ~n~]}}

Where

{{B ~i~ = (s ~i~, t ~i~)}}

As noted above, if adjacent batches have the same schema, then they are just 
sub-relations within a single larger (logical) relation. But, if the schemas 
differ, then the adjacent batches are the last and first of two distinct 
relations within a larger (logical) multi-relation. Said another way:

{{D = R ~1~, R ~2~}}
{{R ~i~ = B ~i,1~, B ~i,2~, ...}}

To clarify, let’s visualize the schema changes as ∆~i~:

{{D = \[B ~0~(s ~0~,t), B ~1~(s ~0~,t), … ∆~1~, B ~i~(s ~1~,t), B ~i+1~(s 
~1~,t), …]}}

The above sequence can describe any series of batches. As I understand the 
proposal, we want to put some constraints on the sequence.

h3. Results of a Drill Query

Let’s start by defining how should present the multi-relation to the client. 
The client also receives batches, but, following the rules in the proposal, we 
want to constrain the output:

{{D ~e~ : B(0,0)}} -- Null result
{{  | B(s,0)}} -- Empty result
{{  | B(s,

[jira] [Comment Edited] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036141#comment-16036141
 ] 

Paul Rogers edited comment on DRILL-5546 at 6/5/17 7:25 AM:


I believe we are saying basically the same thing. To be certain, [watch out, 
we're gonna try some theory|https://store.xkcd.com/products/try-science].

h3. Basics

We'll need some terminology, defined in the usual way:

* (a, b, c) is a set that contains a, b, c
* \{a:b} is a map from a to b
* \[a, b, c] is an ordered list of a, b, c, where each element has an index i = 
0, 1, 2...
* empty = () or {} or \[] is the empty collection (of the proper type)
* null = SQL null: we don't know what the value is
* ∧ = Java, C, etc. null: we know the value, and it is nothing

Drill is, at its core, relational. A relation R can be defined as:

{{R = (S, T)}}

Where:

* S is the schema
* T is a set of tuples (t ~1~, t ~2~, t ~3~) (AKA a table)

{{S = (C, N)}}

Where:

* C is the list of column schemas:

{{C = \[ c ~0~, c ~1~, c ~2~, ...]}}
{{c = (name, type)}}

And:

* N is a map from name to column index:

{{N = \{name : i\} }} where i the index of column c ∈ C

Drill defines the idea of _compatible_ schemas. Two schemas are compatible if 
we redefine the schema as a set of columns:

{{S’ = (C ~0~, C ~1~, …)}}

Two schemas are compatible iff the column sets are identical (same name and 
type for each column). This is a bit more forgiving than the traditional 
relational model which requires that the ordered list of column schemas be 
identical. Let's assume this rule as we discuss schema below.

We'll also need the idea of _cardinality_:

{{\|S| = n}}

Says that the cardinality (number of items) in S is _n_. Later, well just use 
_n_ to mean a schema (or relation or whatever) that has n items.

h3. Relations and Multi-Relations

A relation can be:

{{R : ∧}} (programming null) -- the relation simply does not exist.
{{  | (0,0)}} -- the trivial relation of no columns and (by 
definition), no rows.
{{  | (s,0)}} -- a relation with some schema s, |s| ≠ 0, and no rows
{{  | (s,n)}} -- a "normal" relation with schema and n tuples (rows) 
of data, \|s| ≠ 0, n ≠ 0

It is helpful to remember some basic relational algebra:

{{R(s,0) ⋃ R(s,n) = R(s,n)}}
{{R(s,n) ⋃ R(s,m) = R(s,n+m)}}

Drill works not just with relations R, but also "multi-relations" M. That is, a 
multi-relation (the entire result set from a single query) can defined (using 
semi-BNF) as:

{{M : ∧}} -- undefined result set
{{  | R(0,0)}} -- trivial result set
{{  | R(c,0)}}  - empty result set, \|c| ≠ 0
{{  | R(c,n)}} -- n  ≠ 0, normal, single result set
{{  | R ~1~(s ~1~,n), R ~1~(s ~2~,m), ...}} multi-relation if s ~i~ ≠ 
s ~j~

Normally when we say multi-relation, we mean the last case: two or more 
relations with distinct schemas. The condition above says that to adjacent 
relations R ~i~ and R ~j~ must have distinct schema (or by the rules above, two 
relations with the same schema just collapse into a single relation with that 
schema. A schema can repeat, but a different schema must occur between 
repetitions.)

h3. Multi-Relations in Drill

Now let's get to Drill. Drill uses the term "schema change" to mean the 
transition from s ~i~ to s ~j~, s ~i~ ≠ s ~j~.

The essence of the proposal here, as I understand it, is to update the 
implementation to fully support the first three definitions of a multi-relation 
(undefined, trivial and empty), assuming that Drill already supports the other 
two definitions (single relation and multi-relation.)

In Drill, relations are broken into batches, which are, themselves, just 
relations. Thus a batch, B, can be:

{{B : ∧}} -- undefined result set)
{{  | R(0,0)}} -- trivial batch
{{  | R(c,0)}} -- empty batch set, |c| ≠ 0
{{  | R}} -- normal batch

And the whole result set D (for Drill) is an ordered list of batches:

{{D = \[B ~1~, B ~2~, ..., B ~n~]}}

Where

{{B ~i~ = (s ~i~, t ~i~)}}

As noted above, if adjacent batches have the same schema, then they are just 
sub-relations within a single larger (logical) relation. But, if the schemas 
differ, then the adjacent batches are the last and first of two distinct 
relations within a larger (logical) multi-relation. Said another way:

{{D = R ~1~, R ~2~}}
{{R ~i~ = B ~i,1~, B ~i,2~, ...}}

To clarify, let’s visualize the schema changes as ∆~i~:

{{D = \[B ~0~(s ~0~,t), B ~1~(s ~0~,t), … ∆~1~, B ~i~(s ~1~,t), B ~i+1~(s 
~1~,t), …]}}

The above sequence can describe any series of batches. As I understand the 
proposal, we want to put some constraints on the sequence.

h3. Results of a Drill Query

Let’s start by defining how should present the multi-relation to the client. 
The client also receives batches, but, following the rules in the proposal, we 
want to constrain the output:

{{D ~e~ : B(0,0)}} -- Null result
{{  | B(s,0)}} -- Empty result
{{  | B(s,n)\+}} -- “Classic” O/

[jira] [Comment Edited] (DRILL-5546) Schema change problems caused by empty batch

2017-06-05 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036141#comment-16036141
 ] 

Paul Rogers edited comment on DRILL-5546 at 6/5/17 7:10 AM:


I believe we are saying basically the same thing. To be certain, [watch out, 
we're gonna try some theory|https://store.xkcd.com/products/try-science].

h3. Basics

We'll need some terminology, defined in the usual way:

* (a, b, c) is a set that contains a, b, c
* \{a:b} is a map from a to b
* \[a, b, c] is an ordered list of a, b, c, where each element has an index i = 
0, 1, 2...
* empty = () or {} or \[] is the empty collection (of the proper type)
* null = SQL null: we don't know what the value is
* ∧ = Java, C, etc. null: we know the value, and it is nothing

Drill is, at its core, relational. A relation R can be defined as:

{{R = (S, T)}}

Where:

* S is the schema
* T is a set of tuples (t ~1~, t ~2~, t ~3~) (AKA a table)

{{S = (C, N)}}

Where:

* C is the list of column schemas:

{{C = \[ c ~0~, c ~1~, c ~2~, ...]}}
{{c = (name, type)}}

And:

* N is a map from name to column index:

{{N = \{name : i\} }} where i the index of column c ∈ C

Drill defines the idea of _compatible_ schemas. Two schemas are compatible if 
we redefine the schema as a set of columns:

{{S’ = (C ~0~, C ~1~, …)}}

Two schemas are compatible iff the column sets are identical (same name and 
type for each column). This is a bit more forgiving than the traditional 
relational model which requires that the ordered list of column schemas be 
identical. Let's assume this rule as we discuss schema below.

We'll also need the idea of _cardinality_:

{{\|S| = n}}

Says that the cardinality (number of items) in S is _n_. Later, well just use 
_n_ to mean a schema (or relation or whatever) that has n items.

h3. Relations and Multi-Relations

A relation can be:

{{R : ∧}} (programming null) -- the relation simply does not exist.
{{  | (0,0)}} -- the trivial relation of no columns and (by 
definition), no rows.
{{  | (s,0)}} -- a relation with some schema s, |s| ≠ 0, and no rows
{{  | (s,n)}} -- a "normal" relation with schema and n tuples (rows) 
of data, \|s| ≠ 0, n ≠ 0

It is helpful to remember some basic relational algebra:

{{R(s,0) ⋃ R(s,n) = R(s,n)}}
{{R(s,n) ⋃ R(s,m) = R(s,n+m)}}

Drill works not just with relations R, but also "multi-relations" M. That is, a 
multi-relation (the entire result set from a single query) can defined (using 
semi-BNF) as:

{{M : ∧}} -- undefined result set)
{{  | R(0,0)}} -- trivial result set
{{  | R(c,0)}}  - empty result set, \|c| ≠ 0
{{  | R(c,n)}} -- n  ≠ 0, normal, single result set
{{  | R ~1~(s ~1~,n), R ~1~(s ~2~,m), ...}} multi-relation if s ~i~ ≠ 
s ~j~

Normally when we say multi-relation, we mean the last case: two or more 
relations with distinct schemas. The condition above says that to adjacent 
relations R ~i~ and R ~j~ must have distinct schema (or by the rules above, two 
relations with the same schema just collapse into a single relation with that 
schema. A schema can repeat, but a different schema must occur between 
repetitions.)

h3. Multi-Relations in Drill

Now let's get to Drill. Drill uses the term "schema change" to mean the 
transition from s ~i~ to s ~j~, s ~i~ ≠ s ~j~.

The essence of the proposal here, as I understand it, is to update the 
implementation to fully support the first three definitions of a multi-relation 
(undefined, trivial and empty), assuming that Drill already supports the other 
two definitions (single relation and multi-relation.)

In Drill, relations are broken into batches, which are, themselves, just 
relations. Thus a batch, B, can be:

{{B : ∧}} -- undefined result set)
{{  | R(0,0)}} -- trivial batch
{{  | R(c,0)}} -- empty batch set, |c| ≠ 0
{{  | R}} -- normal batch

And the whole result set D (for Drill) is an ordered list of batches:

{{D = \[B ~1~, B ~2~, ..., B ~n~]}}

Where

{{B ~i~ = (s ~i~, t ~i~)}}

As noted above, if adjacent batches have the same schema, then they are just 
sub-relations within a single larger (logical) relation. But, if the schemas 
differ, then the adjacent batches are the last and first of two distinct 
relations within a larger (logical) multi-relation. Said another way:

{{D = R ~1~, R ~2~}}
{{R ~i~ = B ~i,1~, B ~i,2~, ...}}

To clarify, let’s visualize the schema changes as ∆~i~:

{{D = \[B ~0~(s ~0~,t), B ~1~(s ~0~,t), … ∆~1~, B ~i~(s ~1~,t), B ~i+1~(s 
~1~,t), …]}}

The above sequence can describe any series of batches. As I understand the 
proposal, we want to put some constraints on the sequence.

h3. Results of a Drill Query

Let’s start by defining how should present the multi-relation to the client. 
The client also receives batches, but, following the rules in the proposal, we 
want to constrain the output:

{{D ~e~ : B(0,0)}} -- Null result
{{  | B(s,0)}} -- Empty result
{{  | B(s,n)\+}} -- “Classic” O