[jira] [Resolved] (IMPALA-9229) Link failed and retried runtime profiles

2020-09-18 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9229.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

Marking as resolved. The Web UI improvements are tracked in a separate JIRA.

> Link failed and retried runtime profiles
> 
>
> Key: IMPALA-9229
> URL: https://issues.apache.org/jira/browse/IMPALA-9229
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 4.0
>
>
> There should be a way for clients to link the runtime profiles from failed 
> queries to all retry attempts (whether successful or not), and vice versa.
> There are a few ways to do this:
>  * The simplest way would be to include the query id of the retried query in 
> the runtime profile of the failed query, and vice versa; users could then 
> manually create a chain of runtime profiles in order to fetch all failed / 
> successful attempts
>  * Extend TGetRuntimeProfileReq to include an option to fetch all runtime 
> profiles for the given query id + all retry attempts (or add a new Thrift 
> call TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a 
> given query id)
>  * The Impala debug UI should include a simple way to view all the runtime 
> profiles of a query (the failed attempts + all retry attempts) side by side 
> (perhaps the query_profile?query_id profile should include tabs to easily 
> switch between the runtime profiles of each attempt)
> These are not mutually exclusive, and it might be good to stage these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9046) Profile counter that indicates if a process or JVM pause occurred

2020-09-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9046.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Profile counter that indicates if a process or JVM pause occurred
> -
>
> Key: IMPALA-9046
> URL: https://issues.apache.org/jira/browse/IMPALA-9046
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We currently log a message if a process or JVM pause is detected but there's 
> no indication in the query profile if it got affected. I suggest that we 
> should:
> * Add metrics that indicate the number and duration of detected pauses
> * Add counters to the backend profile for the deltas in those metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-24 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10170.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Data race on Webserver::UrlHandler::is_on_nav_bar_
> --
>
> Key: IMPALA-10170
> URL: https://issues.apache.org/jira/browse/IMPALA-10170
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=31102)
>   Read of size 1 at 0x7b2c0006e3b0 by thread T42:
> #0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
>  (impalad+0x256ff39)
> #1 
> impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
>  rapidjson::MemoryPoolAllocator, 
> rapidjson::CrtAllocator>*, sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
>  (impalad+0x256be13)
> #2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
> const&, std::__cxx11::basic_stringstream, 
> std::allocator >*, impala::ContentType*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
>  (impalad+0x256e882)
> #3 impala::Webserver::BeginRequestCallback(sq_connection*, 
> sq_request_info*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
>  (impalad+0x256cfbb)
> #4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
>  (impalad+0x256ba98)
> #5 handle_request  (impalad+0x2582d59)
>   Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
> #0 
> impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
>  (impalad+0x2570dbc)
> #1 std::pair, 
> std::allocator > const, 
> impala::Webserver::UrlHandler>::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler, 
> true>(std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
>  (impalad+0x25738b3)
> #2 void 
> __gnu_cxx::new_allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> >(std::pair std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
>  (impalad+0x2573848)
> #3 void 
> std::allocator_traits  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > >&, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>*, 
> std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
>  (impalad+0x25737f1)
> #4 void std::_Rb_tree std::char_traits, std::allocator >, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>, 
> std::_Select1st std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >, std::less std::char_traits, std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::_M_construct_node std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::_Rb_tree_node std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> /data/j

[jira] [Created] (IMPALA-10190) Remove impalad_coord_exec Dockerfile

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10190:
-

 Summary: Remove impalad_coord_exec Dockerfile
 Key: IMPALA-10190
 URL: https://issues.apache.org/jira/browse/IMPALA-10190
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The impalad_coord_exec Dockerfile is a bit redundant because it basically 
contains all the same dependencies as the impalad_coordinator Dockerfile. The 
only different between the two files is that the startup flags for 
impalad_coordinator contain {{is_executor=false}}. We should find a way to 
remove the {{impalad_coord_exec}} altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10191) Test impalad_coordinator and impalad_executor in Dockerized tests

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10191:
-

 Summary: Test impalad_coordinator and impalad_executor in 
Dockerized tests
 Key: IMPALA-10191
 URL: https://issues.apache.org/jira/browse/IMPALA-10191
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


Currently only the impalad_coord_exec images are tested in the Dockerized 
tests, it would be nice to get test coverage for the other images as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8577) Crash during OpenSSLSocket.read

2020-09-28 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8577.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

This was fixed a while ago. Impala has been using wildfly for communication 
with S3 for a while now and everything seems stable.

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-01 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10202:
-

 Summary: Enable file handle cache for ABFS files
 Key: IMPALA-10202
 URL: https://issues.apache.org/jira/browse/IMPALA-10202
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should enable the file handle cache for ABFS, we have already seen it 
benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-3335.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
> Fix For: Impala 4.0
>
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9606) ABFS reads should use hdfsPreadFully

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9606.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> ABFS reads should use hdfsPreadFully
> 
>
> Key: IMPALA-9606
> URL: https://issues.apache.org/jira/browse/IMPALA-9606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> In IMPALA-8525, hdfs preads were enabled by default when reading data from 
> S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't 
> significantly improve performance. After some more investigation into the 
> ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS 
> reads.
> The ABFS client uses a different model for fetching data compared to S3A. 
> Details are beyond the scope of this JIRA, but it is related to a feature in 
> ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will 
> be required by the client. By default, it pre-fetches # cores * 4 MB of data. 
> If the requested data exists in the client cache, it is read from the cache.
> However, there is no real drawback to using {{hdfsPreadFully}} for ABFS 
> reads. It's definitely safer, because while the current implementation of 
> ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} 
> API makes that guarantee.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10202.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for ABFS files
> ---
>
> Key: IMPALA-10202
> URL: https://issues.apache.org/jira/browse/IMPALA-10202
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We should enable the file handle cache for ABFS, we have already seen it 
> benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10214) Ozone support for file handle cache

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10214:
-

 Summary: Ozone support for file handle cache
 Key: IMPALA-10214
 URL: https://issues.apache.org/jira/browse/IMPALA-10214
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


This is dependent on the Ozone input streams supporting the {{CanUnbuffer}} 
interface first (last I checked, the input streams don't implement the 
interface).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10216:
-

 Summary: BufferPoolTest.WriteErrorBlacklistCompression is flaky on 
UBSAN builds
 Key: IMPALA-10216
 URL: https://issues.apache.org/jira/browse/IMPALA-10216
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Only seen this once so far:

{code}
BufferPoolTest.WriteErrorBlacklistCompression

Error Message
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true

Stacktrace

Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10217:
-

 Summary: 
test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
 Key: IMPALA-10217
 URL: https://issues.apache.org/jira/browse/IMPALA-10217
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Seen this a few times in exhaustive builds:
{code}
query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 
1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] (from 
pytest)

query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:718: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:627: in verify_runtime_profile
% (function, field, expected_value, actual_value, actual))
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results.
E   EXPECTED VALUE:
E   102
E   
E   ACTUAL VALUE:
E   38
E   
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IMPALA-10016) Split jars for Impala executor and coordinator Docker images

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-10016.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Split jars for Impala executor and coordinator Docker images
> 
>
> Key: IMPALA-10016
> URL: https://issues.apache.org/jira/browse/IMPALA-10016
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Impala executors and coordinator currently have a common base images. The 
> base image defines a set of jar files needed by either the coordinator or the 
> executor. In order to reduce the image size, we should split out the jars 
> into two categories: those necessary for the coordinator and those necessary 
> for the executor. This should help reduce overall image size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10028.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9485.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2020-10-12 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8925.
--
Resolution: Later

This would be nice to have, but not seeing a strong reason to do this at the 
moment. So closing as "Later".

> Consider replacing ClientRequestState ResultCache with result spooling
> --
>
> Key: IMPALA-8925
> URL: https://issues.apache.org/jira/browse/IMPALA-8925
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Clients
>Reporter: Sahil Takiar
>Priority: Minor
>
> The {{ClientRequestState}} maintains an internal results cache (which is 
> really just a {{QueryResultSet}}) in order to provide support for the 
> {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
> [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).
> The cache itself has some limitations:
>  * It caches all results in a {{QueryResultSet}} with limited admission 
> control integration
>  * It has a max size, if the size is exceeded the cache is emptied
>  * It cannot spill to disk
> Result spooling could potentially replace the query result cache and provide 
> a few benefits; it should be able to fit more rows since it can spill to 
> disk. The memory is better tracked as well since it integrates with both 
> admitted and reserved memory. Hue currently sets the max result set fetch 
> size to 
> [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
>  would be good to check how well that value works for Hue users so we can 
> decide if replacing the current result cache with result spooling makes sense.
> This would require some changes to result spooling as well, currently it 
> discards rows whenever it reads them from the underlying 
> {{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
> which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10235:
-

 Summary: Averaged timer profile counters can be negative for 
trivial queries
 Key: IMPALA-10235
 URL: https://issues.apache.org/jira/browse/IMPALA-10235
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
 Attachments: profile-output.txt

Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10238) Add fault tolerance docs

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10238:
-

 Summary: Add fault tolerance docs
 Key: IMPALA-10238
 URL: https://issues.apache.org/jira/browse/IMPALA-10238
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Impala docs currently don't have much information about any of our fault 
tolerance features. We should add a dedicated section with several sub-topics 
to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10239) Docs: Add docs for node blacklisting

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10239:
-

 Summary: Docs: Add docs for node blacklisting
 Key: IMPALA-10239
 URL: https://issues.apache.org/jira/browse/IMPALA-10239
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should add some docs for node blacklisting explaining what is it, how it 
works at a high level, what errors it captures, how to debug it, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10240) Impala Doc: Add docs for cluster membership statestore heartbeats

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10240:
-

 Summary: Impala Doc: Add docs for cluster membership statestore 
heartbeats
 Key: IMPALA-10240
 URL: https://issues.apache.org/jira/browse/IMPALA-10240
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


I don't see many docs explaining how the current cluster membership logic works 
(e.g. via the statestored heartbeats). Would be nice to include a high level 
explanation along with how to configure the heartbeat threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10241) Impala Doc: RPC troubleshooting guide

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10241:
-

 Summary: Impala Doc: RPC troubleshooting guide
 Key: IMPALA-10241
 URL: https://issues.apache.org/jira/browse/IMPALA-10241
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


There have been several diagnostic improvements to how RPCs can be debugged. We 
should document them a bit along with the associated options for configuring 
them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9954.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-7625) test_web_pages.py backend tests are failing

2018-09-25 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-7625:


 Summary: test_web_pages.py backend tests are failing
 Key: IMPALA-7625
 URL: https://issues.apache.org/jira/browse/IMPALA-7625
 Project: IMPALA
  Issue Type: Test
  Components: Infrastructure
Reporter: Sahil Takiar
Assignee: Sahil Takiar


While working on IMPALA-6249, we found that the tests under 
{{webserver/test_web_pages.py}} are not being run by Jenkins. We re-enabled the 
tests, however, a few of the backend specific tests are failing. IMPALA-6249 
disabled these tests. This JIRA is to follow up on these tests and fix them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7776) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-7776:


 Summary: Fail queries where the sum of offset and limit exceed the 
max value of int64
 Key: IMPALA-7776
 URL: https://issues.apache.org/jira/browse/IMPALA-7776
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


A follow up to IMPALA-5004. We should prevent users from running queries where 
the sum of the offset and limit exceeds some threshold (e.g. 
{{Long.MAX_VALUE}}). If a user tries to run this query the impalad will crash, 
so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-:


 Summary: Fail queries where the sum of offset and limit exceed the 
max value of int64
 Key: IMPALA-
 URL: https://issues.apache.org/jira/browse/IMPALA-
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


A follow up to IMPALA-5004. We should prevent users from running queries where 
the sum of the offset and limit exceeds some threshold (e.g. 
{{Long.MAX_VALUE}}). If a user tries to run this query the impalad will crash, 
so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7776) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-7776.
--
Resolution: Duplicate

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-7776
> URL: https://issues.apache.org/jira/browse/IMPALA-7776
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7777) Fix crash due to arithmetic overflows in Exchange Node

2018-11-05 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Fix crash due to arithmetic overflows in Exchange Node
> --
>
> Key: IMPALA-
> URL: https://issues.apache.org/jira/browse/IMPALA-
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> A follow up to IMPALA-5004. Impala allows a value of LIMIT and OFFSET up to 
> 2^63. However, if a user tries to run a query with a large offset (e.g. 
> slightly lower than 2^63), the query will crash the impalad due to a 
> {{DCHECK_LE}} in {{row-batch.h}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7816) Race condition in HdfsScanNodeBase::StopAndFinalizeCounters

2018-11-06 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-7816:


 Summary: Race condition in 
HdfsScanNodeBase::StopAndFinalizeCounters
 Key: IMPALA-7816
 URL: https://issues.apache.org/jira/browse/IMPALA-7816
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.1.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar


While working on IMPALA-6964, I noticed that sometimes the runtime profile for 
a {{HDFS_SCAN_NODE}} will include {{File Formats: PARQUET/NONE:2}} and 
sometimes it won't (depending on the query). However, looking at the code, any 
scan of Parquet files should include this line.

I debugged the code and there seems to a be a race condition where 
{{HdfsScanNodeBase::StopAndFinalizeCounters}} can be called before 
{{HdfsParquetScanner::Close}} is called for all the scan ranges. This causes 
the {{File Formats}} issue above because {{HdfsParquetScanner::Close}} calls 
{{HdfsScanNodeBase::RangeComplete}} which updates the shared object 
{{file_type_counts_}}, which is read in {{StopAndFinalizeCounters}} (so 
{{StopAndFinalizeCounters}} will write out the contents of 
{{file_type_counts_}} before all scanners can update it).

{{StopAndFinalizeCounters}} can be called in two places: 
{{HdfsScanNodeBase::Close}} and in {{HdfsScanNode::GetNext}}. It can be called 
in {{GetNext}} when {{GetNextInternal}} reads enough rows to cross the query 
defined limit. So {{GetNext}} will call {{StopAndFinalizeCounters}} once the 
limit is reached, but not necessarily before the scanners are closed.

I'm able to re-produce this locally by using the queries:
{code:java}
 select * from functional_parquet.lineitem_sixblocks limit 10 {code}
The runtime profile does not include {{File Formats}}
{code:java}
 select * from functional_parquet.lineitem_sixblocks order by l_orderkey limit 
10 {code}
The runtime profile does include {{File Formats}} I tried to simply remove the 
call to {{StopAndFinalizeCounters}} from {{GetNext}} but that doesn't seem to 
work. It actually caused several other RP messages to get deleted (not entirely 
sure why).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6249) Expose several build flags via web UI

2018-11-07 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-6249.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Expose several build flags via web UI
> -
>
> Key: IMPALA-6249
> URL: https://issues.apache.org/jira/browse/IMPALA-6249
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Minor
> Fix For: Impala 3.1.0
>
> Attachments: Screen Shot 2018-09-06 at 11.47.45 AM.png
>
>
> IMPALA-6241 added a .cmake_build_type file with the CMAKE_BUILD_TYPE value 
> for the last build. The file is used to detect the type of the build that the 
> python tests are running against. However, this assumes that the tests are 
> running from the same directory that the Impala cluster under test was built 
> from, which isn't necessarily true for all dev workflows and for remote 
> cluster tests.
> It would be convenient if CMAKE_BUILD_TYPE was exposed from the Impalad web 
> UI. Currently we expose DEBUG/RELEASE depending on the value of NDEBUG - see 
> GetVersionString() and impalad-host:25000/?json=true, but we could expose the 
> precise build type, then allow the python tests to parse it from the web UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7691) test_web_pages not being run

2018-11-07 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-7691.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> test_web_pages not being run
> 
>
> Key: IMPALA-7691
> URL: https://issues.apache.org/jira/browse/IMPALA-7691
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Thomas Tauber-Marshall
>Assignee: Sahil Takiar
>Priority: Blocker
> Fix For: Impala 3.1.0
>
>
> test_web_pages.py is not being run by test/run-tests.py because the 
> 'webserver' directory is missing from VALID_TEST_DIRS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7836) Impala 3.1 Doc: New query option 'topn_bytes_limit' for TopN to Sort conversion

2018-11-08 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-7836:


 Summary: Impala 3.1 Doc: New query option 'topn_bytes_limit' for 
TopN to Sort conversion
 Key: IMPALA-7836
 URL: https://issues.apache.org/jira/browse/IMPALA-7836
 Project: IMPALA
  Issue Type: Sub-task
  Components: Frontend
Affects Versions: Impala 2.9.0
Reporter: Sahil Takiar
Assignee: Alex Rodoni


IMPALA-5004 adds a new query level option called 'topn_bytes_limit' that we 
should document. The changes in IMPALA-5004 work by estimating the amount of 
memory required to run a TopN operator. The memory estimate is based on the 
size of the individual tuples that need to be processed by the TopN operator, 
as well as the sum of the limit and offset in the query. TopN operators don't 
spill to disk so they have to keep all rows they process in memory.

If the estimated size of the working set of the TopN operator exceeds the 
threshold of 'topn_bytes_limit' the TopN operator will be replaced with a Sort 
operator. The Sort operator can spill to disk, but it processes all the data 
(the limit and offset have no affect). So switching to Sort might incur 
performance penalties, but it will require less memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-5004) Switch to sorting node for large TopN queries

2018-11-08 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-5004.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Switch to sorting node for large TopN queries
> -
>
> Key: IMPALA-5004
> URL: https://issues.apache.org/jira/browse/IMPALA-5004
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Lars Volker
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> As explained by [~tarmstrong] in IMPALA-4995:
> bq. We should also consider switching to the sort operator for large limits. 
> This allows it to spill. The memory requirements for TopN also are 
> problematic for large limits, since it would allocate large vectors that are 
> untracked and also require a large amount of contiguous memory.
> There's already logic to select TopN vs. Sort: 
> [planner/SingleNodePlanner.java#L289|https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L289]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7924) Generate Thrift 11 Python Code

2018-12-04 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-7924:


 Summary: Generate Thrift 11 Python Code
 Key: IMPALA-7924
 URL: https://issues.apache.org/jira/browse/IMPALA-7924
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Until IMPALA-7825 has been completed, it would be good to add the ability to 
generate Python code using Thrift 11. As stated in IMPALA-7825, Thrift has 
added performance improvements to its Python deserialization code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7625) test_web_pages.py backend tests are failing

2019-01-07 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-7625.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> test_web_pages.py backend tests are failing
> ---
>
> Key: IMPALA-7625
> URL: https://issues.apache.org/jira/browse/IMPALA-7625
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> While working on IMPALA-6249, we found that the tests under 
> {{webserver/test_web_pages.py}} are not being run by Jenkins. We re-enabled 
> the tests, however, a few of the backend specific tests are failing. 
> IMPALA-6249 disabled these tests. This JIRA is to follow up on these tests 
> and fix them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6964) Track stats about column and page sizes in Parquet reader

2019-01-17 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-6964.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Track stats about column and page sizes in Parquet reader
> -
>
> Key: IMPALA-6964
> URL: https://issues.apache.org/jira/browse/IMPALA-6964
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: observability, parquet, ramp-up
> Fix For: Impala 3.2.0
>
>
> It would be good to have stats for scanned parquet data about page sizes. We 
> currently can't tell much about the "shape" of the parquet pages from the 
> profile. Some questions that are interesting:
> * How big is each column? I.e. total compressed and decompressed size read.
> * How big are pages on average? Either compressed or decompressed size
> * What is the compression ratio for pages? Could be inferred from the above 
> two.
> I think storing all the stats in the profile per-column would be too much 
> data, but we could probably infer most useful things from higher-level 
> aggregates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7924) Generate Thrift 11 Python Code

2019-01-17 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-7924.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Generate Thrift 11 Python Code
> --
>
> Key: IMPALA-7924
> URL: https://issues.apache.org/jira/browse/IMPALA-7924
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> Until IMPALA-7825 has been completed, it would be good to add the ability to 
> generate Python code using Thrift 11. As stated in IMPALA-7825, Thrift has 
> added performance improvements to its Python deserialization code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8101) Thrift 11 compilation and Thrift ext-data-source compilation are always run

2019-01-23 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8101:


 Summary: Thrift 11 compilation and Thrift ext-data-source 
compilation are always run
 Key: IMPALA-8101
 URL: https://issues.apache.org/jira/browse/IMPALA-8101
 Project: IMPALA
  Issue Type: Task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


[~tarmstrong] pointed out that after IMPALA-7924 the build output started 
displaying lines such as: "Running thrift 11 compiler on..." even during builds 
when Thrift files were not modified.

I dug a bit deeper and found the following:
 * This seems to be happening for Thrift compilation of {{ext-data-source}} 
files as well (e.g. ExternalDataSource.thrift, Types.thrift, etc.); "Running 
thrift compiler for ext-data-source on..." is always printed
 * The issue is that the [custom 
command|https://cmake.org/cmake/help/v3.8/command/add_custom_command.html] for 
ext-data-source and Thrift 11 compilation specify an {{OUTPUT}} file that does 
not exist (and is not generated by Thrift)
 * According to the CMake docs "if the command does not actually create the 
{{OUTPUT}} then the rule will always run" - so Thrift compilation will run 
during every build
 * The issue is that you don't really know what files Thrift is going to 
generate without actually looking into the Thrift file and understanding Thrift 
internals
 * For C++ and Python there is a workaround; for C++ Thrift always generates a 
file \{THRIFT_FILE_NAME}_types.h (similar situation for Python); however, for 
Java no such file necessarily exists (ext-data-source only does Java gen)
 ** This is how regular Thrift compilation works (e.g. compilation of 
beeswax.thrift, ImpalaService.thrift, etc.); which is why we don't see the 
issue for regular Thrift compilation

A solution for Thrift 11 compilation is to just add generated Python files to 
the {{OUTPUT}} for the custom_command.

A solution for Thrift compilation of ext-data-source seems trickier, so open to 
suggestions.

Ideally, Thrift would be provide a way to return the list of files generated 
from a .thrift file, without actually generating the files, but I don't see a 
way to do that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8117) KuduCatalogOpExecutor.validateKuduTblExists does not check if Kudu table exists

2019-01-25 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8117:


 Summary: KuduCatalogOpExecutor.validateKuduTblExists does not 
check if Kudu table exists
 Key: IMPALA-8117
 URL: https://issues.apache.org/jira/browse/IMPALA-8117
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Sahil Takiar
Assignee: Sahil Takiar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8131) Impala is unable to read Parquet decimal columns with higher scale than table metadata

2019-01-28 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8131:


 Summary: Impala is unable to read Parquet decimal columns with 
higher scale than table metadata
 Key: IMPALA-8131
 URL: https://issues.apache.org/jira/browse/IMPALA-8131
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar


Similar to IMPALA-7087, except we should allow Impala to read Parquet data 
stored with a higher scale into a table with lower scale. The SQL Standard 
allows for this behavior, and several other databases do this as well.

More information on this can be found in this comment of IMPALA-7087



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8166) ParquetBytesReadPerColumn is displayed for non-Parquet scans

2019-02-06 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8166:


 Summary: ParquetBytesReadPerColumn is displayed for non-Parquet 
scans
 Key: IMPALA-8166
 URL: https://issues.apache.org/jira/browse/IMPALA-8166
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The issue is that these counters are added in {{hdfs-scan-node-base.h}}

These counters are only updated for Parquet, so we should only display them if 
the Scan Node is scanning Parquet data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8232) Custom cluster tests should allow setting dfs.client settings for impalads

2019-02-20 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8232:


 Summary: Custom cluster tests should allow setting dfs.client 
settings for impalads
 Key: IMPALA-8232
 URL: https://issues.apache.org/jira/browse/IMPALA-8232
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Right now, custom cluster tests only allow specifying impalad startup options, 
however, it would be nice if the tests could specify arbitrary HDFS client 
configs as well (e.g. {{dfs.client}} options). This would allow us to increase 
our test integration coverage with different HDFS client setups such as (1) 
disabling short-circuit reads (thus triggering the code path for a remote read) 
(requires setting {{dfs.client.read.shortcircuit}} to false), (2) enabling 
hedged reads (requires setting {{dfs.client.hedged.read.threadpool.size}} to a 
value greater than 0).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8237) Enabling preads always fetches hedged reads metrics

2019-02-21 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8237:


 Summary: Enabling preads always fetches hedged reads metrics
 Key: IMPALA-8237
 URL: https://issues.apache.org/jira/browse/IMPALA-8237
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


In {{HdfsFileReader}} if preads are enabled, we assume that hedged reads are 
enabled as well, so whenever we close a file we make a libhdfs call to collect 
a few hedged read metrics from the underlying {{FileSystem}} object. However, 
as part of IMPALA-5212 we may want to enable preads even when hedged reads are 
disabled, so making the call to libhdfs to fetch hedged read metrics will be a 
waste.

Digging through the HDFS code, it seems the HDFS client triggers hedged reads 
only if {{dfs.client.hedged.read.threadpool.size}} is greater than 0. We can 
use the same check in {{HdfsFileReader}} to trigger the fetch of hedged read 
metrics. The issue is that currently libhdfs does not provide a good way of 
getting the value of {{dfs.client.hedged.read.threadpool.size}}, it provides a 
method called {{hdfsConfGetInt}}, but that method simply calls {{new 
Configuration()}} and fetches the value of  
{{dfs.client.hedged.read.threadpool.size}} from it. The issue is that calling 
{{new Configuration}} simply loads the current {{hdfs-site.xml}}, 
{{core-site.xml}}, etc. which does not take into account the scenario where the 
default configuration has been modified for specific filesystem objects - e.g. 
using {{hdfsBuilder}} to set non-default configuration parameters (see 
HDFS-14301 for more details).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8342) TestAdmissionControllerStress test_mem_limit run_admission_test failure

2019-03-25 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8342:


 Summary: TestAdmissionControllerStress test_mem_limit 
run_admission_test failure
 Key: IMPALA-8342
 URL: https://issues.apache.org/jira/browse/IMPALA-8342
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar
Assignee: Bikramjeet Vig


{{TestAdmissionControllerStress.test_mem_limit}} can fail with:
{code:java}
custom_cluster/test_admission_controller.py:960: in test_mem_limit
{'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
custom_cluster/test_admission_controller.py:823: in run_admission_test
assert metric_deltas['dequeued'] == 0,\
E   AssertionError: Queued queries should not run until others are made to 
finish
E   assert 5 == 0{code}

The full test configuration is:

{code}
custom_cluster.test_admission_controller.TestAdmissionControllerStress.test_mem_limit[num_queries:
 30 | submission_delay_ms: 150 | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
text/none | round_robin_submission: True]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8343) TestParquetArrayEncodings parquet-ambiguous-list-modern.test failure

2019-03-25 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8343:


 Summary: TestParquetArrayEncodings 
parquet-ambiguous-list-modern.test failure
 Key: IMPALA-8343
 URL: https://issues.apache.org/jira/browse/IMPALA-8343
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar


The following query block in {{parquet-ambiguous-list-modern.test}} failed:

{code}
 QUERY
# 'f21' does not resolve with the 2-level encoding because it matches
# a Parquet group in the schema.
set parquet_fallback_schema_resolution=position;
set parquet_array_resolution=two_level;
select s2.f21 from ambig_modern.ambigarray;
 RESULTS
 CATCH
has an incompatible Parquet schema
 TYPES
int
{code}

With the error:

{code}
query_test/test_nested_types.py:556: in test_ambiguous_list
vector, unique_database)
common/impala_test_suite.py:415: in run_test_case
assert False, "Expected exception: %s" % expected_str
E   AssertionError: Expected exception: has an incompatible Parquet schema
{code}

The full pytest configuration was:

{code}
query_test.test_nested_types.TestParquetArrayEncodings.test_ambiguous_list[exec_option:
 {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
'disable_codegen': True, 'abort_on_error': 1, 
'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
{code}

Seen once on centos 6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8360) SynchronousThreadPoolTest ASSERT_TRUE(*no_sleep_destroyed) failed

2019-03-26 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8360:


 Summary: SynchronousThreadPoolTest 
ASSERT_TRUE(*no_sleep_destroyed) failed
 Key: IMPALA-8360
 URL: https://issues.apache.org/jira/browse/IMPALA-8360
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar


Jenkins output:

{code}
Error Message
Value of: *no_sleep_destroyed   Actual: false Expected: true
Stacktrace
/data/jenkins/workspace/impala-cdh6.x-core-data-load/repos/Impala/be/src/util/thread-pool-test.cc:112
Value of: *no_sleep_destroyed
  Actual: false
Expected: true
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8391) Impala Doc

2019-04-05 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8391:


 Summary: Impala Doc
 Key: IMPALA-8391
 URL: https://issues.apache.org/jira/browse/IMPALA-8391
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Sahil Takiar
Assignee: Alex Rodoni


The Impala-Kudu docs:

[http://impala.apache.org/docs/build/html/topics/impala_kudu.html]
[http://impala.apache.org/docs/build/html/topics/impala_tables.html]

Need to be updated after IMPALA-7640 is merged.

Specifically this part of the docs will no longer be accurate:

{quote}
When you create a Kudu table through Impala, it is assigned an internal Kudu 
table name of the form {{impala::db_name.table_name}}. You can see the 
Kudu-assigned name in the output of {{DESCRIBE FORMATTED}}, in the 
{{kudu.table_name}} field of the table properties. The Kudu-assigned name 
remains the same even if you use {{ALTER TABLE}} to rename the Impala table or 
move it to a different Impala database. You can issue the statement{{ALTER 
TABLE impala_name SET TBLPROPERTIES('kudu.table_name' = 
'different_kudu_table_name')}} for the external tables created with the 
{{CREATE EXTERNAL TABLE}} statement. Changing the {{kudu.table_name}}property 
of an external table switches which underlying Kudu table the Impala table 
refers to. The underlying Kudu table must already exist.
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8101) Thrift 11 compilation and Thrift ext-data-source compilation are always run

2019-04-07 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8101.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Thrift 11 compilation and Thrift ext-data-source compilation are always run
> ---
>
> Key: IMPALA-8101
> URL: https://issues.apache.org/jira/browse/IMPALA-8101
> Project: IMPALA
>  Issue Type: Task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> [~tarmstrong] pointed out that after IMPALA-7924 the build output started 
> displaying lines such as: "Running thrift 11 compiler on..." even during 
> builds when Thrift files were not modified.
> I dug a bit deeper and found the following:
>  * This seems to be happening for Thrift compilation of {{ext-data-source}} 
> files as well (e.g. ExternalDataSource.thrift, Types.thrift, etc.); "Running 
> thrift compiler for ext-data-source on..." is always printed
>  * The issue is that the [custom 
> command|https://cmake.org/cmake/help/v3.8/command/add_custom_command.html] 
> for ext-data-source and Thrift 11 compilation specify an {{OUTPUT}} file that 
> does not exist (and is not generated by Thrift)
>  * According to the CMake docs "if the command does not actually create the 
> {{OUTPUT}} then the rule will always run" - so Thrift compilation will run 
> during every build
>  * The issue is that you don't really know what files Thrift is going to 
> generate without actually looking into the Thrift file and understanding 
> Thrift internals
>  * For C++ and Python there is a workaround; for C++ Thrift always generates 
> a file \{THRIFT_FILE_NAME}_types.h (similar situation for Python); however, 
> for Java no such file necessarily exists (ext-data-source only does Java gen)
>  ** This is how regular Thrift compilation works (e.g. compilation of 
> beeswax.thrift, ImpalaService.thrift, etc.); which is why we don't see the 
> issue for regular Thrift compilation
> A solution for Thrift 11 compilation is to just add generated Python files to 
> the {{OUTPUT}} for the custom_command.
> A solution for Thrift compilation of ext-data-source seems trickier, so open 
> to suggestions.
> Ideally, Thrift would be provide a way to return the list of files generated 
> from a .thrift file, without actually generating the files, but I don't see a 
> way to do that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7640) ALTER TABLE RENAME on managed Kudu table should rename underlying Kudu table

2019-04-07 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-7640.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> ALTER TABLE RENAME on managed Kudu table should rename underlying Kudu table
> 
>
> Key: IMPALA-7640
> URL: https://issues.apache.org/jira/browse/IMPALA-7640
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Mike Percy
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Currently, when I execute ALTER TABLE RENAME on a managed Kudu table it will 
> not rename the underlying Kudu table. Because of IMPALA-5654 it becomes 
> nearly impossible to rename the underlying Kudu table, which is confusing and 
> makes the Kudu tables harder to identify and manage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6050) Query profiles should clearly indicate storage layer(s) used

2019-04-12 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-6050.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Query profiles should clearly indicate storage layer(s) used
> 
>
> Key: IMPALA-6050
> URL: https://issues.apache.org/jira/browse/IMPALA-6050
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sailesh Mukil
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: adls, profile, s3, supportability
> Fix For: Impala 3.3.0
>
>
> Currently, the query profile doesn't have the location of tables and 
> partitions, which makes it hard to figure out what storage layer a 
> table/partition that was queried was on.
> As we're seeing more users run Impala workloads against cloud based storage 
> like S3 and ADLS, we should have the query profiles show this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8490) Impala Doc: the file handle cache now supports S3

2019-05-03 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8490:


 Summary: Impala Doc: the file handle cache now supports S3
 Key: IMPALA-8490
 URL: https://issues.apache.org/jira/browse/IMPALA-8490
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Alex Rodoni


https://impala.apache.org/docs/build/html/topics/impala_scalability.html state:

{quote}
Because this feature only involves HDFS data files, it does not apply to 
non-HDFS tables, such as Kudu or HBase tables, or tables that store their data 
on cloud services such as S3 or ADLS.
{quote}

This section should be updated because the file handle cache now supports S3 
files.

We should add a section to the docs similar to what we added when support for 
remote HDFS files was added to the file handle cache:

{quote}
In Impala 3.2 and higher, file handle caching also applies to remote HDFS file 
handles. This is controlled by the cache_remote_file_handles flag for an 
impalad. It is recommended that you use the default value of true as this 
caching prevents your NameNode from overloading when your cluster has many 
remote HDFS reads.
{quote}

Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has been 
added as an impalad startup option (the flag is enabled by default).

Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a call 
to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode from 
overloading when your cluster has many remote HDFS reads" should be changed to 
something like "avoids an unnecessary call to S3AFileSystem#getFileStatus() 
which reduces the number of API calls made to S3."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8523) Migrate hdfsOpen to builder-based openFile API

2019-05-08 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8523:


 Summary: Migrate hdfsOpen to builder-based openFile API
 Key: IMPALA-8523
 URL: https://issues.apache.org/jira/browse/IMPALA-8523
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


When opening files via libhdfs we call {{hdfsOpen}} which ultimately calls 
{{FileSystem#open(Path f, int bufferSize)}}. As of HADOOP-15229, the 
HDFS-client now exposes a new API for opening files called {{openFile}}. The 
new API has a few advantages (1) it is capable of specifying file specific 
configuration values in a builder-based manner (see {{o.a.h.fs.FSBuilder}} for 
details), and (2) it can open files asynchronously (e.g. see 
{{o.a.h.fs.FutureDataInputStreamBuilder}} for details.

The async file opens are similar to IMPALA-7738 (Implement timeouts for HDFS 
open calls). To avoid overlap between IMPALA-7738 and the async file opens in 
{{openFile}}, HADOOP-15691 can be used to check which filesystems open files 
asynchronously and which ones don't (currently only S3A opens files 
asynchronously).

The main use case for the new {{openFile}} API is Impala-S3 performance. 
Performance benchmarks have shown that setting 
{{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} for Parquet files can 
significantly improve performance, however, this setting also adversely affects 
scans of non-splittable file formats such as gzipped files (see HADOOP-13203). 
One solution to this issue is to just document that setting 
{{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} for Parquet improves 
performance, however, a better solution would be to use the new {{openFile}} 
API to specify different values of fadvise depending on the file type.

This work is dependent on exposing the new {{openFile}} API via libhdfs 
(HDFS-14478).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

2019-05-08 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8525:


 Summary: preads should use hdfsPreadFully rather than hdfsPread
 Key: IMPALA-8525
 URL: https://issues.apache.org/jira/browse/IMPALA-8525
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Impala preads (only enabled if {{use_hdfs_pread}} is true) use the 
{{hdfsPread}} API from libhdfs, which ultimately invokes 
{{PositionedReadable#read(long position, byte[] buffer, int offset, int 
length)}} in the HDFS-client.

{{PositionedReadable}} also exposes the method {{readFully(long position, 
byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will 
"Read up to the specified number of bytes" whereas {{#readFully}} will "Read 
the specified number of bytes". So there is no guarantee that {{#read}} will 
read *all* of the request bytes.

Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside 
a while loop until all the requested bytes have been read from the file. This 
can cause a few performance issues:

(1) if the underlying {{FileSystem}} does not support ByteBuffer reads 
(HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will 
allocate a Java array equal in size to specified length of the buffer; the call 
to {{PositionedReadable#read}} may only fill up the buffer partially; Impala 
will repeat the call to {{hdfsPread}} since the buffer was not filled, which 
will cause another large array allocation; this can result in a lot of wasted 
time doing unnecessary array allocations

(2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in 
continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} 
will achieve the same thing (this doesn't actually affect performance much, but 
is unnecessary)

Prior solutions to this problem have been to introduce a "chunk-size" to Impala 
reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for 
S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no 
longer necessary.

Furthermore, preads are most effective when the data is read all at once (e.g. 
in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks 
(typically 128K). For example, {{DFSInputStream#read(long position, byte[] 
buffer, int offset, int length)}} opens up remote block readers with a byte 
range determined by the value of {{length}} passed into the {{#read}} call. 
Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the 
size of the read specified by the given {{length}} (although fadvise must be 
set to RANDOM for this to work).

This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14478



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8544) Expose additional S3A / S3Guard metrics

2019-05-13 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8544:


 Summary: Expose additional S3A / S3Guard metrics
 Key: IMPALA-8544
 URL: https://issues.apache.org/jira/browse/IMPALA-8544
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


S3A / S3Guard internally collects several useful metrics that we should 
consider exposing to Impala users. The full list of statistics can be found in 
{{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations 
performed (put, get, etc.), invocation counts for various {{FileSystem}} 
methods, stream statistics (bytes read, written, etc.), etc.

Some interesting stats that stand out:
 * "stream_aborted": "Count of times the TCP stream was aborted" - the number 
of TCP connection aborts, a high value would indicate performance issues
 * "stream_read_exceptions" : "Number of exceptions invoked on input streams" - 
incremented whenever an {{IOException}} is caught while reading (these 
exception don't always get propagated to Impala because they trigger a retry)
 * "store_io_throttled": "Requests throttled and retried" - looks like it 
tracks the number of times the fs retries an operation because the original 
request hit a throttling exception
 * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - looks 
like it tracks the number of times the fs retries S3Guard operations
 * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled 
events" - similar to "store_io_throttled" but looks like it is specific to 
S3Guard

We should consider how to expose these metrics via Impala logs / runtime 
profiles.

There are a few options:
 * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard 
via the {{FileSystem#getStorageStatistics}} method; the 
{{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, 
however, I think the stats might be aggregated globally, which would make it 
hard to create per-query specific metrics
 * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it is 
per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} 
extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some API 
(haven't looked into this yet)
 * {{S3AInputStream#toString}} dumps the statistics from 
{{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and 
{{S3AFileSystem#toString}} dumps them all as well
 * {{S3AFileSystem}} updates the stats in 
{{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, 
etc.)

Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared across 
threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8428) Add support for caching file handles on s3

2019-05-14 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8428.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Add support for caching file handles on s3
> --
>
> Key: IMPALA-8428
> URL: https://issues.apache.org/jira/browse/IMPALA-8428
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> The file handle cache is currently disabled for S3, as the S3 connector 
> needed to implement proper unbuffer support. Now that 
> https://issues.apache.org/jira/browse/HADOOP-14747 is fixed, Impala should 
> provide an option to cache S3 file handles.
> This is particularly important for data caching, as accessing the data cache 
> happens after obtaining a file handle. If getting a file handle is slow, the 
> caching will be less effective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-05-14 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8549:


 Summary: Add support for scanning DEFLATE text files
 Key: IMPALA-8549
 URL: https://issues.apache.org/jira/browse/IMPALA-8549
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar


Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one 
of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8250) Impala crashes with -Xcheck:jni

2019-05-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8250.
--
Resolution: Fixed

I'm closing this. I re-ran exhaustive tests against Impala with -Xcheck:jni and 
Impala no longer crashes; all tests pass. There are still a ton of JNI 
warnings, so I'm going to file a follow up JIRA to fix them.

> Impala crashes with -Xcheck:jni
> ---
>
> Key: IMPALA-8250
> URL: https://issues.apache.org/jira/browse/IMPALA-8250
> Project: IMPALA
>  Issue Type: Task
>Reporter: Philip Zeyliger
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> The JVM has a checker for JNI usage, and Impala (and libhdfs) have some 
> violations. This ticket captures figuring that out. At least one of the 
> issues can crash Impala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8568) Fix Impala JNI warnings when -Xcheck:jni is enabled

2019-05-21 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8568:


 Summary: Fix Impala JNI warnings when -Xcheck:jni is enabled
 Key: IMPALA-8568
 URL: https://issues.apache.org/jira/browse/IMPALA-8568
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


IMPALA-8250 made a lot of improvements to our usage of the JNI. Impala no 
longer crashes when running exhaustive tests with -Xcheck:jni enabled. We made 
some progress in cleaning up libhdfs JNI usage in HDFS-14321 and HDFS-14348 as 
well.

However, re-running exhaustive tests with -Xcheck:jni still shows a lot of 
warnings. It's not clear if these warnings are from libhdfs or Impala, but 
either way we should drive a fix.

The most concerning of the current list of JNI warnings produced by Impala, are 
the "JNI call made without checking exceptions when required to from ..." 
warnings. Essentially, this means that when making a JNI call, we are not 
properly checking for exceptions. This can be problematic because a JNI call 
make throw an exception, and we end up swallowing it.

There are lots of warnings about "WARNING: JNI local refs: [x], exceeds 
capacity: [y]". Based on some digging (e.g. 
https://community.oracle.com/message/13290783) it looks like these warnings 
aren't fatal, but are just bad practice. I think we can fix the most egregious 
offenders (looks like the HBase code is one of them), and hopefully live with 
the rest (a lot of the warnings are thrown by internal Java code as well).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8760) TestAdmissionControllerStress.test_mem_limit timed out waiting for query to end

2019-07-12 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8760:


 Summary: TestAdmissionControllerStress.test_mem_limit timed out 
waiting for query to end
 Key: IMPALA-8760
 URL: https://issues.apache.org/jira/browse/IMPALA-8760
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar
Assignee: Bikramjeet Vig


{code}
custom_cluster.test_admission_controller.TestAdmissionControllerStress.test_mem_limit[num_queries:
 30 | protocol: beeswax | table_format: text/none | exec_option: {'batch_size': 
0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': 
False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
submission_delay_ms: 150 | round_robin_submission: True]{code}

Is failing with the exception:

{code}
Error Message
AssertionError: Timed out waiting 90 seconds for query end assert 
(1562916293.1308379 - 1562916203.0256219) < 90  +  where 1562916293.1308379 = 
time()
Stacktrace
custom_cluster/test_admission_controller.py:1649: in test_mem_limit
{'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
custom_cluster/test_admission_controller.py:1541: in run_admission_test
self.end_admitted_queries(num_to_end)
custom_cluster/test_admission_controller.py:1320: in end_admitted_queries
assert (time() - start_time < STRESS_TIMEOUT),\
E   AssertionError: Timed out waiting 90 seconds for query end
E   assert (1562916293.1308379 - 1562916203.0256219) < 90
E+  where 1562916293.1308379 = time()
{code}

Looks like the timeout of 90 seconds isn't enough. Looks similar to IMPALA-8295



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8764) Kudu data load failures due to "Clock considered unsynchronized"

2019-07-15 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8764:


 Summary: Kudu data load failures due to "Clock considered 
unsynchronized"
 Key: IMPALA-8764
 URL: https://issues.apache.org/jira/browse/IMPALA-8764
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.3.0
Reporter: Sahil Takiar


Dataload error:

{code}
03:08:38 03:08:38 Error executing impala SQL: 
Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql
 See: 
Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql.log
{code}

Digging through the mini-cluster logs, I see that the Kudu tservers crashed 
with this error:

{code}
F0715 02:58:43.202059   649 hybrid_clock.cc:339] Check failed: _s.ok() unable 
to get current time with error bound: Service unavailable: could not read 
system time source: Error reading clock. Clock considered unsynchronized
*** Check failure stack trace: ***
Wrote minidump to 
Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp
Wrote minidump to 
Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp
*** Aborted at 1563184723 (unix time) try "date -d @1563184723" if you are 
using GNU date ***
PC: @ 0x7ff75ed631f7 __GI_raise
*** SIGABRT (@0x7d10232) received by PID 562 (TID 0x7ff756c1e700) from PID 
562; stack trace: ***
@ 0x7ff760b545e0 (unknown)
@ 0x7ff75ed631f7 __GI_raise
@ 0x7ff75ed648e8 __GI_abort
@  0x1fb7309 kudu::AbortFailureFunction()
@   0x9c054d google::LogMessage::Fail()
@   0x9c240d google::LogMessage::SendToLog()
@   0x9c0089 google::LogMessage::Flush()
@   0x9c2eaf google::LogMessageFatal::~LogMessageFatal()
@   0xc0c60e kudu::clock::HybridClock::WalltimeWithErrorOrDie()
@   0xc0c67e kudu::clock::HybridClock::NowWithError()
@   0xc0d4aa kudu::clock::HybridClock::NowForMetrics()
@   0x9a29c0 kudu::FunctionGauge<>::WriteValue()
@  0x1fb0dc0 kudu::Gauge::WriteAsJson()
@  0x1fb3212 kudu::MetricEntity::WriteAsJson()
@  0x1fb390e kudu::MetricRegistry::WriteAsJson()
@   0xa856a3 kudu::server::DiagnosticsLog::LogMetrics()
@   0xa8789a kudu::server::DiagnosticsLog::RunThread()
@  0x1ff44d7 kudu::Thread::SuperviseThread()
@ 0x7ff760b4ce25 start_thread
@ 0x7ff75ee2634d __clone
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8767) Jenkins failures due to "Could not get lock /var/lib/dpkg/lock"

2019-07-16 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8767:


 Summary: Jenkins failures due to "Could not get lock 
/var/lib/dpkg/lock"
 Key: IMPALA-8767
 URL: https://issues.apache.org/jira/browse/IMPALA-8767
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8779) Add RowBatchQueue interface with an implementation backed by a std::queue

2019-07-22 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8779:


 Summary: Add RowBatchQueue interface with an implementation backed 
by a std::queue
 Key: IMPALA-8779
 URL: https://issues.apache.org/jira/browse/IMPALA-8779
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Add a {{RowBatchQueue}} interface with an implementation backed by a 
{{std::queue}}. Introducing a generic queue that can buffer {{RowBatch}}es will 
help with the implementation of {{BufferedTupleSink}}. Rather than tie the 
{{BufferedTupleSink}} to a specific method of queuing row batches, we can use 
an interface. In future patches, a {{RowBatchQueue}} backed by a 
{{BufferedTupleStream}} can easily be switched out in {{BufferedTupleSink}}.

We should consider re-factoring the existing {{RowBatchQueue}} to use the new 
interface as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8780) Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched

2019-07-22 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8780:


 Summary: Implementation of BufferedPlanRootSink where FlushFinal 
blocks until all rows are fetched
 Key: IMPALA-8780
 URL: https://issues.apache.org/jira/browse/IMPALA-8780
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Implement {{BufferedPlanRootSink}} so that {{FlushFinal}} blocks until all rows 
are fetched. The implementation should use the {{RowBatchQueue}} introduced by 
IMPALA-8779. By blocking in {{FlushFinal}} all non-coordinator fragments will 
be closed if all results fit in the {{RowBatchQueue}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8781) Add stress tests in test_result_spooling.py and validate cancellation logic

2019-07-22 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8781:


 Summary: Add stress tests in test_result_spooling.py and validate 
cancellation logic
 Key: IMPALA-8781
 URL: https://issues.apache.org/jira/browse/IMPALA-8781
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{{test_result_spooling.py}} currently runs a few basic tests with result 
spooling enabled. We should add some more and validate the cancellation logic 
in {{PlanRootSink}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8784) Implement a RowBatchQueue backed by a BufferedTupleStream

2019-07-23 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8784:


 Summary: Implement a RowBatchQueue backed by a BufferedTupleStream
 Key: IMPALA-8784
 URL: https://issues.apache.org/jira/browse/IMPALA-8784
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The {{BufferedPlanRootSink}} should use a {{RowBatchQueue}} backed by a 
{{BufferedTupleStream}}. This requires the following changes:
 * Creating a new {{SpillableRowBatchQueue}} that implements {{RowBatchQueue}} 
and internally uses a {{BufferedTupleStream}}
 * Changing the implementation of {{RowBatchQueue}} used by 
{{BufferedPlanRootSink}} to {{SpillableRowBatchQueue}}
 * Update {{PlanRootSink.java}} so that it sets a {{ResourceProfile}} that 
should be used by the {{BufferedPlanRootSink}}
 * Update {{DataSinks.thrift}} so that it passes {{ResourceProfile}}-s from the 
fe/ to the be/
 * {{BufferedPlanRootSink}} should Initialize and close a 
{{ReservationManager}} to be used by the {{BufferedTupleStream}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8786) BufferedPlanRootSink should directly write to a QueryResultSet if one is available

2019-07-23 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8786:


 Summary: BufferedPlanRootSink should directly write to a 
QueryResultSet if one is available
 Key: IMPALA-8786
 URL: https://issues.apache.org/jira/browse/IMPALA-8786
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{{BufferedPlanRootSink}} uses a {{RowBatchQueue}} to buffer {{RowBatch}}-es and 
then the consumer thread reads them and writes them to a given 
{{QueryResultSet}}. Implementations of {{RowBatchQueue}} might end up copying 
the buffered {{RowBatch}}-es (e.g. if the queue is backed by a 
{{BufferedTupleStream}}). An optimization would be for the producer thread to 
directly write to the consumer {{QueryResultSet}}. This optimization would only 
be triggered if (1) the queue is empty, and (2) the consumer thread has a 
{{QueryResultSet}} available for writing.

This "fast path" is useful in a few different scenarios:
 * If the consumer is faster than at reading rows than the producer is at 
sending them; in this case, the overhead of buffering rows in a 
{{RowBatchQueue}} can be completely avoided
 * For queries that return under 1024 its likely that the consumer will produce 
a {{QueryResultSet}} before the first {{RowBatch}} is returned (except perhaps 
for very trivial queries)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8803) Coordinator should release admitted memory per-backend rather than per-query

2019-07-29 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8803:


 Summary: Coordinator should release admitted memory per-backend 
rather than per-query
 Key: IMPALA-8803
 URL: https://issues.apache.org/jira/browse/IMPALA-8803
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


When {{SPOOL_QUERY_RESULTS}} is true, the coordinator backend may be long 
lived, even though all other backends for the query have completed. Currently, 
the Coordinator only releases admitted memory when the entire query has 
completed (include the coordinator fragment) - 
https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/be/src/runtime/coordinator.cc#L562

In order to more aggressively return admitted memory, the coordinator should 
release memory when each backend for a query completes, rather than waiting for 
the entire query to complete.

Releasing memory per backend should be batched because releasing admitted 
memory in the admission controller requires obtaining a global lock and 
refreshing the internal stats of the admission controller. Batching will help 
mitigate any additional overhead from releasing admitted memory per backend.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink

2019-07-31 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8818:


 Summary: Replace deque queue with spillable queue in 
BufferedPlanRootSink
 Key: IMPALA-8818
 URL: https://issues.apache.org/jira/browse/IMPALA-8818
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in 
{{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a 
{{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by 
{{PlanRootSink#computeResourceProfile}}.

*BufferedTupleStream Usage*:

The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' mode 
so that pages are attached to the output {{RowBatch}} in 
{{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. all 
pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns false 
(it returns false if "the unused reservation was not sufficient to add a new 
page to the stream large enough to fit 'row' and the stream could not increase 
the reservation to get enough unused reservation"), it should unpin the stream 
({{BufferedTupleStream::UnpinStream}}) and then add the row (if the row still 
could not be added, then an error must have occurred, perhaps an IO error, in 
which case return the error and fail the query).

*Constraining Resources*:

When result spooling is disabled, a user can run a {{select * from 
[massive-fact-table]}} and scroll through the results without affecting the 
health of the Impala cluster (assuming they close they query promptly). Impala 
will stream the results one batch at a time to the user.

With result spooling, a naive implementation might try and buffer the enter 
fact table, and end up spilling all the contents to disk, which can potentially 
take up a large amount of space. So there needs to be restrictions on the 
memory and disk space used by the {{BufferedTupleStream}} in order to ensure a 
scan of a massive table does not consume all the memory or disk space of the 
Impala coordinator.

This problem can be solved by placing a max size on the amount of unpinned 
memory (perhaps through a new config option 
{{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). The 
max amount of pinned memory should already be constrained by the reservation 
(see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the number of rows 
returned by a query, and so it should limit the number of rows buffered by the 
BTS as well (although it is set to 0 by default). SCRATCH_LIMIT already limits 
the amount of disk space used for spilling (although it is set to -1 by 
default).

The {{PlanRootSink}} should attempt to accurately estimate how much memory it 
needs to buffer all results in memory. This requires setting an accurate value 
of {{ResourceProfile#memEstimateBytes_}} in 
{{PlanRootSink#computeResourceProfile}}. If statistics are available, the 
estimate can be based on the number of estimated rows returned multiplied by 
the size of the rows returned. The min reservation should account for a read 
and write page for the {{BufferedTupleStream}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8819) BufferedPlanRootSink should handle non-default fetch sizes

2019-07-31 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8819:


 Summary: BufferedPlanRootSink should handle non-default fetch sizes
 Key: IMPALA-8819
 URL: https://issues.apache.org/jira/browse/IMPALA-8819
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


As of IMPALA-8780, the {{BufferedPlanRootSink}} returns an error whenever a 
client sets the fetch size to a value lower than the {{BATCH_SIZE}}. The issue 
is that when reading from a {{RowBatch}} from the queue, the batch might 
contain more rows than the number requested by the client. So the 
{{BufferedPlanRootSink}} needs to be able to partially read a {{RowBatch}} and 
remember the index of the rows it read. Furthermore, {{num_results}} in 
{{BufferedPlanRootSink::GetNext}} could be lower than {{BATCH_SIZE}} if the 
query results cache in {{ClientRequestState}} has a cache hit (only happens if 
the client cursor is reset).

Another issue is that the {{BufferedPlanRootSink}} can only read up to a single 
{{RowBatch}} at a time. So if a fetch size larger than {{BATCH_SIZE}} is 
specified, only {{BATCH_SIZE}} rows will be written to the given 
{{QueryResultSet}}. This is consistent with the legacy behavior of 
{{PlanRootSink}} (now {{BlockingPlanRootSink}}), but is not ideal because that 
means clients can only read {{BATCH_SIZE}} rows at a time. A higher fetch size 
would potentially reduce the number of round-trips necessary between the client 
and the coordinator, which could improve fetch performance (but only if the 
{{BlockingPlanRootSink}} is capable of filling all the requested rows).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8784) Implement a RowBatchQueue backed by a BufferedTupleStream

2019-07-31 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8784.
--
Resolution: Duplicate

> Implement a RowBatchQueue backed by a BufferedTupleStream
> -
>
> Key: IMPALA-8784
> URL: https://issues.apache.org/jira/browse/IMPALA-8784
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The {{BufferedPlanRootSink}} should use a {{RowBatchQueue}} backed by a 
> {{BufferedTupleStream}}. This requires the following changes:
>  * Creating a new {{SpillableRowBatchQueue}} that implements 
> {{RowBatchQueue}} and internally uses a {{BufferedTupleStream}}
>  * Changing the implementation of {{RowBatchQueue}} used by 
> {{BufferedPlanRootSink}} to {{SpillableRowBatchQueue}}
>  * Update {{PlanRootSink.java}} so that it sets a {{ResourceProfile}} that 
> should be used by the {{BufferedPlanRootSink}}
>  * Update {{DataSinks.thrift}} so that it passes {{ResourceProfile}}-s from 
> the fe/ to the be/
>  * {{BufferedPlanRootSink}} should Initialize and close a 
> {{ReservationManager}} to be used by the {{BufferedTupleStream}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8780) Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched

2019-08-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8780.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

Closing as fixed. We ended up referring the complete re-factoring of 
IMPALA-8779 to a later patch.

> Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows 
> are fetched
> -
>
> Key: IMPALA-8780
> URL: https://issues.apache.org/jira/browse/IMPALA-8780
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Implement {{BufferedPlanRootSink}} so that {{FlushFinal}} blocks until all 
> rows are fetched. The implementation should use the {{RowBatchQueue}} 
> introduced by IMPALA-8779. By blocking in {{FlushFinal}} all non-coordinator 
> fragments will be closed if all results fit in the {{RowBatchQueue}}. 
> {{BufferedPlanRootSink::Send}} should enqueue each given {{RowBatch}} onto 
> the queue and then return. If the queue is full, it should block until there 
> is more space left in the queue. {{BufferedPlanRootSink::GetNext}} reads from 
> the queue and then fills in the given {{QueryResultSet}} by using the 
> {{DataSink}} {{ScalarExprEvaluator}}-s. Since the producer thread can call 
> {{BufferedPlanRootSink::Close}} while the consumer is calling 
> {{BufferedPlanRootSink::GetNext}} the two methods need to be synchronized so 
> that the {{DataSink}} {{MemTracker}}-s are not closed while {{GetNext}} is 
> running.
> The implementation of {{BufferedPlanRootSink}} should remain the same 
> regardless of whether a {{std::queue}} backed {{RowBatchQueue}} or a 
> {{BufferedTupleStream}} backed {{RowBatchQueue}} is used.
> {{BufferedPlanRootSink}} and {{BlockingPlanRootSink}} are similar in the 
> sense that {{BlockingPlanRootSink}} buffers one {{RowBatch}}, so for queries 
> that return under 1024 rows, all non-coordinator fragments are closed 
> immediately as well. The advantage of {{BufferedPlanRootSink}} is that allows 
> buffering for 1+ {{RowBatch}}-es.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8825) Add additional counters to PlanRootSink

2019-08-02 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8825:


 Summary: Add additional counters to PlanRootSink
 Key: IMPALA-8825
 URL: https://issues.apache.org/jira/browse/IMPALA-8825
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not 
contain much useful information:
{code:java}
PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%)
- PeakMemoryUsage: 0{code}
There are several additional counters we could add to the {{PlanRootSink}} 
(either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}):
 * Amount of time spent blocking inside the {{PlanRootSink}} - both the time 
spent by the client thread waiting for rows to become available and the time 
spent by the impala thread waiting for the client to consume rows

 ** So similar to the {{RowBatchQueueGetWaitTime}} and 
{{RowBatchQueuePutWaitTime}} inside the scan nodes
 ** The difference between these counters and the ones in 
{{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and 
{{RowMaterializationTimer}}) should be documented
 * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} 
counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} section
 * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} and 
the number of rows fetched (might need to be tracked in the 
{{ClientRequestState}})
 ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty 
much the same, but for {{BufferedPlanRootSink}} this is more useful
 ** Similar to {{RowsReturned}} in each exec node
 * The rate at which rows are sent and fetched
 ** Should be useful when attempting to debug perf of the fetching rows (e.g. 
if the send rate is much higher than the fetch rate, then maybe there is 
something wrong with the client)
 ** Similar to {{RowsReturnedRate}} in each exec node

Open to other suggestions for counters that folks think are useful.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8826) Impala Doc: Add docs for PLAN_ROOT_SINK

2019-08-02 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8826:


 Summary: Impala Doc: Add docs for PLAN_ROOT_SINK
 Key: IMPALA-8826
 URL: https://issues.apache.org/jira/browse/IMPALA-8826
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Sahil Takiar


Currently, I don't see many docs explaining what a {{PLAN_ROOT_SINK}} is, even 
though it shows up in explain plans and runtime profiles. After more of the 
changes in IMPALA-8656 are merged, understanding what {{PLAN_ROOT_SINK}} is 
will be more important, because it will start taking up a memory reservation 
and possibly spilling to disk.

I don't see any docs on data sinks in general, so perhaps it would be useful to 
create a dedicated page for explaining data sinks and how they work. We can 
start by documenting the {{PLAN_ROOT_SINK}} as that may be the most commonly 
used one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8781) Add additional tests in test_result_spooling.py and validate cancellation logic

2019-08-06 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8781.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

Commit Hash: bbec8fa74961755269298706302477780019e7d5

IMPALA-8781: Result spooling tests to cover edge cases and cancellation

Adds additional tests to test_result_spooling.py to cover various edge
cases when fetching query results (ensure all Impala types are returned
properly, UDFs are evaluated correctly, etc.). A new QueryTest file
result-spooling.test is added to encapsulate all these tests. Tests with
a decreased ROW_BATCH_SIZE are added as well to validate that
BufferedPlanRootSink buffers row batches correctly.

BufferedPlanRootSink requires careful synchronization of the producer
and consumer threads, especially when queries are cancelled. The
TestResultSpoolingCancellation class is dedicated to running
cancellation tests with SPOOL_QUERY_RESULTS = true. The implementation
is heavily borrowed from test_cancellation.py and some of the logic is
re-factored into a new utility class called cancel_utils.py to avoid
code duplication between test_cancellation.py and
test_result_spooling.py.

Testing:
* Looped test_result_spooling.py overnight with no failures
* Core tests passed

Change-Id: Ib3b3a1539c4a5fa9b43c8ca315cea16c9701e283
Reviewed-on: http://gerrit.cloudera.org:8080/13907
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 

> Add additional tests in test_result_spooling.py and validate cancellation 
> logic
> ---
>
> Key: IMPALA-8781
> URL: https://issues.apache.org/jira/browse/IMPALA-8781
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> {{test_result_spooling.py}} currently runs a few basic tests with result 
> spooling enabled. We should add some more to cover all necessary edge cases 
> (ensure all Impala types are returned correctly, UDFs are evaluated 
> correctly, etc.) and add tests to validate the cancellation logic in 
> {{PlanRootSink}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-07 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8845:


 Summary: Close ExecNode tree prior to calling FlushFinal in 
FragmentInstanceState
 Key: IMPALA-8845
 URL: https://issues.apache.org/jira/browse/IMPALA-8845
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
non-coordinator fragments to shutdown. In certain setups, TopN queries 
({{select * from [table] order by [col] limit [limit]}}) where all results are 
successfully spooled, still keep non-coordinator fragments alive.

The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan Node 
fragment ends up blocking waiting for a response to a {{TransmitData()}} RPC. 
This prevents the fragment from shutting down.

I haven't traced the issue exactly, but what I *think* is happening is that the 
{{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} whenever 
it has received enough rows to reach the limit defined in the query, which 
could occur before the {{DATASTREAM SINK}} sends all the rows from the TopN / 
Scan Node fragment.

So the TopN / Scan Node fragments end up hanging until they are explicitly 
closed.

The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
eagerly as possible. Moving the close call to before the call to 
{{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
shuts down and releases all {{ExecNode}} resources as soon as it can. When 
result spooling is enabled, this is particularly important because 
{{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8871) Upgrade Thrift version in fe

2019-08-16 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8871:


 Summary: Upgrade Thrift version in fe
 Key: IMPALA-8871
 URL: https://issues.apache.org/jira/browse/IMPALA-8871
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should upgrade the Thrift version in the frontend to 0.9.3-1. Since this is 
just a fe/ change, it does not require upgrading the toolchain.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8888) Profile fetch performance when result spooling is enabled

2019-08-23 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-:


 Summary: Profile fetch performance when result spooling is enabled
 Key: IMPALA-
 URL: https://issues.apache.org/jira/browse/IMPALA-
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Profile the performance of fetching rows when result spooling is enabled. There 
are a few queries that can be used to benchmark the performance:

{{time ./bin/impala-shell.sh -B -q "select l_orderkey from 
tpch_parquet.lineitem" > /dev/null}}

{{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > 
/dev/null}}

The first fetches one column and 6,001,215 the second fetches 9 columns and 
1,500,000 - so a mix of rows fetched vs. columns fetched.

The base line for the benchmark should be the commit prior to IMPALA-8780.

The benchmark should check for both latency and CPU usage (to see if the copy 
into {{BufferedTupleStream}} has a significant overhead).

Various fetch sizes should be used in the benchmark as well to see if 
increasing the fetch size for result spooling improves performance (ideally it 
should) (it would be nice to run some fetches between machines as well as that 
will better reflect network round trip latencies).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-23 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8890:


 Summary: DCHECK(!page->attached_to_output_batch) in 
SpillableRowBatchQueue::AddBatch 
 Key: IMPALA-8890
 URL: https://issues.apache.org/jira/browse/IMPALA-8890
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar
 Attachments: impalad.INFO, resolved.txt

Full stack:

{code}
F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 
6a4941285b46788d:68021ec6] Check failed: !page->attached_to_output_batch
*** Check failure stack trace: ***
@  0x4c987cc  google::LogMessage::Fail()
@  0x4c9a071  google::LogMessage::SendToLog()
@  0x4c981a6  google::LogMessage::Flush()
@  0x4c9b76d  google::LogMessageFatal::~LogMessageFatal()
@  0x2917f78  impala::BufferedTupleStream::ExpectedPinCount()
@  0x29181ec  impala::BufferedTupleStream::UnpinPageIfNeeded()
@  0x291b27b  impala::BufferedTupleStream::UnpinStream()
@  0x297d429  impala::SpillableRowBatchQueue::AddBatch()
@  0x25d5537  impala::BufferedPlanRootSink::Send()
@  0x207e94c  impala::FragmentInstanceState::ExecInternal()
@  0x207afac  impala::FragmentInstanceState::Exec()
@  0x208e854  impala::QueryState::ExecFInstance()
@  0x208cb21  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
@  0x2090536  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
@  0x1e9830b  boost::function0<>::operator()()
@  0x23e2d38  impala::Thread::SuperviseThread()
@  0x23eb0bc  boost::_bi::list5<>::operator()<>()
@  0x23eafe0  boost::_bi::bind_t<>::operator()()
@  0x23eafa3  boost::detail::thread_data<>::run()
@  0x3bc1629  thread_proxy
@ 0x7f920a3786b9  start_thread
@ 0x7f9206b5741c  clone
{code}

Happened once while I was running a full table scan of {{tpch_parquet.orders}} 
via JDBC (client was running on another EC2 machine). This was running on top 
of IMPALA-8819 with a fetch size of 32768.

Attached full logs and mini-dump stack.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink

2019-08-23 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-8818.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Replace deque queue with spillable queue in BufferedPlanRootSink
> 
>
> Key: IMPALA-8818
> URL: https://issues.apache.org/jira/browse/IMPALA-8818
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in 
> {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a 
> {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by 
> {{PlanRootSink#computeResourceProfile}}.
> *BufferedTupleStream Usage*:
> The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' 
> mode so that pages are attached to the output {{RowBatch}} in 
> {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. 
> all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns 
> false (it returns false if "the unused reservation was not sufficient to add 
> a new page to the stream large enough to fit 'row' and the stream could not 
> increase the reservation to get enough unused reservation"), it should unpin 
> the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if 
> the row still could not be added, then an error must have occurred, perhaps 
> an IO error, in which case return the error and fail the query).
> *Constraining Resources*:
> When result spooling is disabled, a user can run a {{select * from 
> [massive-fact-table]}} and scroll through the results without affecting the 
> health of the Impala cluster (assuming they close they query promptly). 
> Impala will stream the results one batch at a time to the user.
> With result spooling, a naive implementation might try and buffer the enter 
> fact table, and end up spilling all the contents to disk, which can 
> potentially take up a large amount of space. So there needs to be 
> restrictions on the memory and disk space used by the {{BufferedTupleStream}} 
> in order to ensure a scan of a massive table does not consume all the memory 
> or disk space of the Impala coordinator.
> This problem can be solved by placing a max size on the amount of unpinned 
> memory (perhaps through a new config option 
> {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). 
> The max amount of pinned memory should already be constrained by the 
> reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the 
> number of rows returned by a query, and so it should limit the number of rows 
> buffered by the BTS as well (although it is set to 0 by default). 
> SCRATCH_LIMIT already limits the amount of disk space used for spilling 
> (although it is set to -1 by default).
> The {{PlanRootSink}} should attempt to accurately estimate how much memory it 
> needs to buffer all results in memory. This requires setting an accurate 
> value of {{ResourceProfile#memEstimateBytes_}} in 
> {{PlanRootSink#computeResourceProfile}}. If statistics are available, the 
> estimate can be based on the number of estimated rows returned multiplied by 
> the size of the rows returned. The min reservation should account for a read 
> and write page for the {{BufferedTupleStream}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-27 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8845.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Michael Ho
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-27 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8890.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch 
> 
>
> Key: IMPALA-8890
> URL: https://issues.apache.org/jira/browse/IMPALA-8890
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
> Fix For: Impala 3.4.0
>
> Attachments: impalad.INFO, resolved.txt
>
>
> Full stack:
> {code}
> F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 
> 6a4941285b46788d:68021ec6] Check failed: 
> !page->attached_to_output_batch
> *** Check failure stack trace: ***
> @  0x4c987cc  google::LogMessage::Fail()
> @  0x4c9a071  google::LogMessage::SendToLog()
> @  0x4c981a6  google::LogMessage::Flush()
> @  0x4c9b76d  google::LogMessageFatal::~LogMessageFatal()
> @  0x2917f78  impala::BufferedTupleStream::ExpectedPinCount()
> @  0x29181ec  impala::BufferedTupleStream::UnpinPageIfNeeded()
> @  0x291b27b  impala::BufferedTupleStream::UnpinStream()
> @  0x297d429  impala::SpillableRowBatchQueue::AddBatch()
> @  0x25d5537  impala::BufferedPlanRootSink::Send()
> @  0x207e94c  impala::FragmentInstanceState::ExecInternal()
> @  0x207afac  impala::FragmentInstanceState::Exec()
> @  0x208e854  impala::QueryState::ExecFInstance()
> @  0x208cb21  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x2090536  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1e9830b  boost::function0<>::operator()()
> @  0x23e2d38  impala::Thread::SuperviseThread()
> @  0x23eb0bc  boost::_bi::list5<>::operator()<>()
> @  0x23eafe0  boost::_bi::bind_t<>::operator()()
> @  0x23eafa3  boost::detail::thread_data<>::run()
> @  0x3bc1629  thread_proxy
> @ 0x7f920a3786b9  start_thread
> @ 0x7f9206b5741c  clone
> {code}
> Happened once while I was running a full table scan of 
> {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). 
> This was running on top of IMPALA-8819 with a fetch size of 32768.
> Attached full logs and mini-dump stack.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8906) TestObservability.test_query_profile_contains_query_compilation_metadata_load_events is flaky

2019-08-28 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8906:


 Summary: 
TestObservability.test_query_profile_contains_query_compilation_metadata_load_events
 is flaky
 Key: IMPALA-8906
 URL: https://issues.apache.org/jira/browse/IMPALA-8906
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar
Assignee: Tamas Mate


This test failed in a recent run of ubuntu-16.04-dockerised-tests: 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1100/testReport/junit/query_test.test_observability/TestObservability/test_query_profile_contains_query_compilation_metadata_load_events/]

Error Message:
{code:java}
 query_test/test_observability.py:340: in 
test_query_profile_contains_query_compilation_metadata_load_events 
self.__verify_profile_event_sequence(load_event_regexes, runtime_profile) 
query_test/test_observability.py:432: in __verify_profile_event_sequence 
assert event_regex_index == 0, \ E   AssertionError: 
CatalogFetch.PartitionLists.Misses not in- 
CatalogFetch.PartitionLists.Hits: 1 E Query 
(id=56480a470616cf3c:7cfadfbe): E   DEBUG MODE WARNING: Query 
profile created while running a DEBUG build of Impala. Use RELEASE builds to 
measure query performance. E   Summary: E Session ID: 
854d1d6ab3cb65b7:9ba11e621c088385 E Session Type: BEESWAX E 
Start Time: 2019-08-28 20:01:05.725329000 E End Time: 2019-08-28 
20:01:07.305869000 E Query Type: QUERY E Query State: FINISHED 
E Query Status: OK E Impala Version: impalad version 
3.4.0-SNAPSHOT DEBUG (build 207b1443ff1b116d2d031dc5325ce971af80c4a6) E 
User: ubuntu E Connected User: ubuntu E Delegated User:  E  
   Network Address: 172.18.0.1:44044 E Default Db: default E 
Sql Statement: select * from functional.alltypes E Coordinator: 
f6d78aab23cf:22000 E Query Options (set by configuration): 
DEBUG_ACTION=CRS_BEFORE_ADMISSION:SLEEP@1000,TIMEZONE=Zulu,CLIENT_IDENTIFIER=query_test/test_observability.py::TestObservability::()::test_exec_summary_in_runtime_profile
 E Query Options (set by configuration and planner): 
DEBUG_ACTION=CRS_BEFORE_ADMISSION:SLEEP@1000,MT_DOP=0,TIMEZONE=Zulu,CLIENT_IDENTIFIER=query_test/test_observability.py::TestObservability::()::test_exec_summary_in_runtime_profile
 E Plan:  E  E Max Per-Host Resource 
Reservation: Memory=32.00KB Threads=3 E Per-Host Resource Estimates: 
Memory=160MB E Codegen disabled by planner E Analyzed query: SELECT * 
FROM functional.alltypes E  E F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 
instances=1 E |  Per-Host Resources: mem-estimate=490.49KB 
mem-reservation=0B thread-reservation=1 E PLAN-ROOT SINK E |  
mem-estimate=0B mem-reservation=0B thread-reservation=0 E | E 
01:EXCHANGE [UNPARTITIONED] E |  mem-estimate=490.49KB mem-reservation=0B 
thread-reservation=0 E |  tuple-ids=0 row-size=89B cardinality=7.30K E 
|  in pipelines: 00(GETNEXT) E | E F00:PLAN FRAGMENT [RANDOM] hosts=3 
instances=3 E Per-Host Resources: mem-estimate=160.00MB 
mem-reservation=32.00KB thread-reservation=2 E 00:SCAN HDFS 
[functional.alltypes, RANDOM] EHDFS partitions=24/24 files=24 
size=478.45KB Estored statistics: E  table: rows=7.30K 
size=478.45KB E  partitions: 24/24 rows=7.30K E  columns: all E 
   extrapolated-rows=disabled max-scan-range-rows=310 E
mem-estimate=160.00MB mem-reservation=32.00KB thread-reservation=1 E
tuple-ids=0 row-size=89B cardinality=7.30K Ein pipelines: 00(GETNEXT) E 
 E Estimated Per-Host Mem: 168274422 E 
Request Pool: default-pool E Per Host Min Memory Reservation: 
6db176633e3a:22000(32.00 KB) bf5c6b4d70c3:22000(32.00 KB) 
f6d78aab23cf:22000(32.00 KB) E Per Host Number of Fragment Instances: 
6db176633e3a:22000(1) bf5c6b4d70c3:22000(1) f6d78aab23cf:22000(2) E 
Admission result: Admitted immediately E Cluster Memory Admitted: 
481.44 MB E Executor Group: default E ExecSummary:  E 
Operator  #Hosts   Avg Time   Max Time  #Rows  Est. #Rows   Peak 
Mem  Est. Peak Mem  Detail   E 
-
 E F01:ROOT   1  323.998ms  323.998ms   
  0  0   E 01:EXCHANGE1 
   3.999ms3.999ms  7.30K   7.30K  776.00 KB  490.49 KB  
UNPARTITIONEDE F00:EXCHANGE SENDER37.999ms   19.999ms   
1.55 KB  0   E 00:SCAN 
HDFS   3   66.666ms  163.999ms

[jira] [Created] (IMPALA-8907) TestResultSpooling.test_slow_query is flaky

2019-08-28 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8907:


 Summary: TestResultSpooling.test_slow_query is flaky
 Key: IMPALA-8907
 URL: https://issues.apache.org/jira/browse/IMPALA-8907
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Recently failed in a ubuntu-16.04-dockerised-tests job: 
[https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1102/testReport/junit/query_test.test_result_spooling/TestResultSpooling/test_slow_query_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/]

Error Message:
{code:java}
query_test/test_result_spooling.py:172: in test_slow_query assert 
re.search(get_wait_time_regex, self.client.get_runtime_profile(handle)) \ E   
assert None is not None E+  where None = ('RowBatchGetWaitTime: [1-9]', 'Query 
(id=7f47e1d6a1a1c804:492214eb):\n  DEBUG MODE WARNING: Query profile 
created while running a DEBUG buil...  - OptimizationTime: 331.998ms\n  
 - PeakMemoryUsage: 1.09 MB (1144320)\n   - PrepareTime: 31.999ms\n') E 
   +where  = re.search E+and   
'Query (id=7f47e1d6a1a1c804:492214eb):\n  DEBUG MODE WARNING: Query 
profile created while running a DEBUG buil...  - OptimizationTime: 331.998ms\n  
 - PeakMemoryUsage: 1.09 MB (1144320)\n   - PrepareTime: 
31.999ms\n' = >() E+  where > = 
.get_runtime_profile E+where 
 = 
.client {code}
Stacktrace:
{code:java}
query_test/test_result_spooling.py:172: in test_slow_query
assert re.search(get_wait_time_regex, 
self.client.get_runtime_profile(handle)) \
E   assert None is not None
E+  where None = ('RowBatchGetWaitTime: 
[1-9]', 'Query (id=7f47e1d6a1a1c804:492214eb):\n  DEBUG MODE WARNING: 
Query profile created while running a DEBUG buil...  - OptimizationTime: 
331.998ms\n   - PeakMemoryUsage: 1.09 MB (1144320)\n   - 
PrepareTime: 31.999ms\n')
E+where  = re.search
E+and   'Query (id=7f47e1d6a1a1c804:492214eb):\n  DEBUG MODE 
WARNING: Query profile created while running a DEBUG buil...  - 
OptimizationTime: 331.998ms\n   - PeakMemoryUsage: 1.09 MB (1144320)\n  
 - PrepareTime: 31.999ms\n' = >()
E+  where > = 
.get_runtime_profile
E+where  = .client {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8902) TestResultSpooling,test_spilling is flaky

2019-08-29 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8902.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

Bumped up the timeout in the test, flakiness should be resolved.

> TestResultSpooling,test_spilling is flaky
> -
>
> Key: IMPALA-8902
> URL: https://issues.apache.org/jira/browse/IMPALA-8902
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Attila Jeges
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> Error: 
> {code:java}
> 17:45:10 FAIL 
> query_test/test_result_spooling.py::TestResultSpooling::()::test_spilling[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> 17:45:10 === FAILURES 
> ===
> 17:45:10  TestResultSpooling.test_spilling[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 17:45:10 [gw1] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 17:45:10 query_test/test_result_spooling.py:104: in test_spilling
> 17:45:10 .format(query, timeout))
> 17:45:10 E   Timeout: Query select * from functional.alltypes order by id 
> limit 1500 did not spill spooled results within the timeout 10
> 17:45:10 - Captured stderr call 
> -
> 17:45:10 SET 
> client_identifier=query_test/test_result_spooling.py::TestResultSpooling::()::test_spilling[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0}|table_f;
> 17:45:10 SET min_spillable_buffer_size=8192;
> 17:45:10 SET batch_size=0;
> 17:45:10 SET num_nodes=0;
> 17:45:10 SET disable_codegen_rows_threshold=0;
> 17:45:10 SET disable_codegen=False;
> 17:45:10 SET abort_on_error=1;
> 17:45:10 SET default_spillable_buffer_size=8192;
> 17:45:10 SET max_result_spooling_mem=32768;
> 17:45:10 SET exec_single_node_rows_threshold=0;
> 17:45:10 -- executing against localhost:21000
> 17:45:10 
> 17:45:10 select * from functional.alltypes order by id limit 1500;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8819) BufferedPlanRootSink should handle non-default fetch sizes

2019-08-29 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8819.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> BufferedPlanRootSink should handle non-default fetch sizes
> --
>
> Key: IMPALA-8819
> URL: https://issues.apache.org/jira/browse/IMPALA-8819
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> As of IMPALA-8780, the {{BufferedPlanRootSink}} returns an error whenever a 
> client sets the fetch size to a value lower than the {{BATCH_SIZE}}. The 
> issue is that when reading from a {{RowBatch}} from the queue, the batch 
> might contain more rows than the number requested by the client. So the 
> {{BufferedPlanRootSink}} needs to be able to partially read a {{RowBatch}} 
> and remember the index of the rows it read. Furthermore, {{num_results}} in 
> {{BufferedPlanRootSink::GetNext}} could be lower than {{BATCH_SIZE}} if the 
> query results cache in {{ClientRequestState}} has a cache hit (only happens 
> if the client cursor is reset).
> Another issue is that the {{BufferedPlanRootSink}} can only read up to a 
> single {{RowBatch}} at a time. So if a fetch size larger than {{BATCH_SIZE}} 
> is specified, only {{BATCH_SIZE}} rows will be written to the given 
> {{QueryResultSet}}. This is consistent with the legacy behavior of 
> {{PlanRootSink}} (now {{BlockingPlanRootSink}}), but is not ideal because 
> that means clients can only read {{BATCH_SIZE}} rows at a time. A higher 
> fetch size would potentially reduce the number of round-trips necessary 
> between the client and the coordinator, which could improve fetch performance 
> (but only if the {{BlockingPlanRootSink}} is capable of filling all the 
> requested rows).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-1618) Impala server should always try to fulfill requested fetch size

2019-08-29 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-1618.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Impala server should always try to fulfill requested fetch size
> ---
>
> Key: IMPALA-1618
> URL: https://issues.apache.org/jira/browse/IMPALA-1618
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.0.1
>Reporter: casey
>Priority: Minor
>  Labels: usability
> Fix For: Impala 3.4.0
>
>
> The thrift fetch request specifies the number of rows that it would like but 
> the Impala server may return fewer even though more results are available.
> For example, using the default row_batch size of 1024, if the client requests 
> 1023 rows, the first response contains 1023 rows but the second response 
> contains only 1 row. This is because the server internally uses row_batch 
> (1024), returns the requested count (1023) and caches the remaining row, then 
> the next time around only uses the cache.
> In general the end user should set both the row batch size and the thrift 
> request size. In practice the query writer setting row_batch and the 
> driver/programmer setting fetch size may often be different people.
> There is one case that works fine now though - setting the batch size to less 
> than the thrift req size. In this case the thrift response is always the same 
> as batch size.
> Code example:
> {noformat}
> dev@localhost:~/impyla$ git diff
> diff --git a/impala/_rpc/hiveserver2.py b/impala/_rpc/hiveserver2.py
> index 6139002..31fdab7 100644
> --- a/impala/_rpc/hiveserver2.py
> +++ b/impala/_rpc/hiveserver2.py
> @@ -265,6 +265,7 @@ def fetch_results(service, operation_handle, 
> hs2_protocol_version, schema=None,
>  req = TFetchResultsReq(operationHandle=operation_handle,
> orientation=orientation,
> maxRows=max_rows)
> +print("req: " + str(max_rows))
>  resp = service.FetchResults(req)
>  err_if_rpc_not_ok(resp)
>  
> @@ -273,6 +274,7 @@ def fetch_results(service, operation_handle, 
> hs2_protocol_version, schema=None,
>   for (i, col) in enumerate(resp.results.columns)]
>  num_cols = len(tcols)
>  num_rows = len(tcols[0].values)
> +print("rec: " + str(num_rows))
>  rows = []
>  for i in xrange(num_rows):
>  row = []
> dev@localhost:~/impyla$ cat test.py 
> from impala.dbapi import connect
> conn = connect()
> cur = conn.cursor()
> cur.set_arraysize(1024)
> cur.execute("set batch_size=1025")
> cur.execute("select * from tpch.lineitem")
> while True:
> rows = cur.fetchmany()
> if not rows:
> break
> cur.close()
> conn.close()
> dev@localhost:~/impyla$ python test.py | head
> Failed to import pandas
> req: 1024
> rec: 1024
> req: 1024
> rec: 1
> req: 1024
> rec: 1024
> req: 1024
> rec: 1
> req: 1024
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-558) HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be returned

2019-09-03 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-558.
-
Resolution: Won't Fix

Marking this as "Won't Fix" since it is not a major issue, and fixing this 
require a decent amount of code re-factoring. Furthermore, clients that enable 
result spooling should see this issue significantly less.

> HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be 
> returned
> --
>
> Key: IMPALA-558
> URL: https://issues.apache.org/jira/browse/IMPALA-558
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients
>Affects Versions: Impala 1.1
>Reporter: Henry Robinson
>Priority: Minor
>  Labels: query-lifecycle
>
> The first call to {{FetchResults}} always sets {{hasMoreRows}} even when 0 
> rows should be returned. The next call correctly sets {{hasMoreRows == 
> False}}. The upshot is there's always an extra round-trip, although 
> correctness isn't affected.
> {code}
> execute_statement_req = TCLIService.TExecuteStatementReq()
> execute_statement_req.sessionHandle = resp.sessionHandle
> execute_statement_req.statement = "SELECT COUNT(*) FROM 
> functional.alltypes WHERE 1 = 2"
> execute_statement_resp = 
> self.hs2_client.ExecuteStatement(execute_statement_req)
> 
> fetch_results_req = TCLIService.TFetchResultsReq()
> fetch_results_req.operationHandle = execute_statement_resp.operationHandle
> fetch_results_req.maxRows = 100
> fetch_results_resp = self.hs2_client.FetchResults(fetch_results_req)
> 
> assert not fetch_results_resp.hasMoreRows # Fails
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8907) TestResultSpooling.test_slow_query is flaky

2019-09-05 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8907.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> TestResultSpooling.test_slow_query is flaky
> ---
>
> Key: IMPALA-8907
> URL: https://issues.apache.org/jira/browse/IMPALA-8907
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Recently failed in an ubuntu-16.04-dockerised-tests job: 
> [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1102/testReport/junit/query_test.test_result_spooling/TestResultSpooling/test_slow_query_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/]
> Error Message:
> {code:java}
> query_test/test_result_spooling.py:172: in test_slow_query assert 
> re.search(get_wait_time_regex, self.client.get_runtime_profile(handle)) \ E   
> assert None is not None E+  where None =  0x7f0da4115c08>('RowBatchGetWaitTime: [1-9]', 'Query 
> (id=7f47e1d6a1a1c804:492214eb):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...  - OptimizationTime: 331.998ms\n
>- PeakMemoryUsage: 1.09 MB (1144320)\n   - PrepareTime: 
> 31.999ms\n') E+where  = re.search 
> E+and   'Query (id=7f47e1d6a1a1c804:492214eb):\n  DEBUG MODE 
> WARNING: Query profile created while running a DEBUG buil...  - 
> OptimizationTime: 331.998ms\n   - PeakMemoryUsage: 1.09 MB 
> (1144320)\n   - PrepareTime: 31.999ms\n' =  BeeswaxConnection.get_runtime_profile of 
>  0x7f0d94afa7d0>>( 0x7f0d94afffd0>) E+  where  BeeswaxConnection.get_runtime_profile of 
> > 
> =  0x7f0d94afa7d0>.get_runtime_profile E+where 
>  = 
> .client 
> {code}
> Stacktrace:
> {code:java}
> query_test/test_result_spooling.py:172: in test_slow_query
> assert re.search(get_wait_time_regex, 
> self.client.get_runtime_profile(handle)) \
> E   assert None is not None
> E+  where None =  0x7f0da4115c08>('RowBatchGetWaitTime: [1-9]', 'Query 
> (id=7f47e1d6a1a1c804:492214eb):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...  - OptimizationTime: 331.998ms\n
>- PeakMemoryUsage: 1.09 MB (1144320)\n   - PrepareTime: 
> 31.999ms\n')
> E+where  = re.search
> E+and   'Query (id=7f47e1d6a1a1c804:492214eb):\n  DEBUG MODE 
> WARNING: Query profile created while running a DEBUG buil...  - 
> OptimizationTime: 331.998ms\n   - PeakMemoryUsage: 1.09 MB 
> (1144320)\n   - PrepareTime: 31.999ms\n' =  BeeswaxConnection.get_runtime_profile of 
>  0x7f0d94afa7d0>>( 0x7f0d94afffd0>)
> E+  where  > 
> =  0x7f0d94afa7d0>.get_runtime_profile
> E+where  at 0x7f0d94afa7d0> =  0x7f0d94af3d50>.client {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8924) DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8924:


 Summary: DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty
 Key: IMPALA-8924
 URL: https://issues.apache.org/jira/browse/IMPALA-8924
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar


When running exhaustive tests with result spooling enabled, there are several 
impalad crashes with the following stack:
{code:java}
#0  0x7f5e797541f7 in raise () from /lib64/libc.so.6
#1  0x7f5e797558e8 in abort () from /lib64/libc.so.6
#2  0x04cc5834 in google::DumpStackTraceAndExit() ()
#3  0x04cbc28d in google::LogMessage::Fail() ()
#4  0x04cbdb32 in google::LogMessage::SendToLog() ()
#5  0x04cbbc67 in google::LogMessage::Flush() ()
#6  0x04cbf22e in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x029a16cd in impala::SpillableRowBatchQueue::IsEmpty 
(this=0x13d504e0) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/spillable-row-batch-queue.cc:128
#8  0x025f5610 in impala::BufferedPlanRootSink::IsQueueEmpty 
(this=0x13943000) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/buffered-plan-root-sink.h:147
#9  0x025f4e81 in impala::BufferedPlanRootSink::GetNext 
(this=0x13943000, state=0x13d2a1c0, results=0x173c8520, num_results=-1, 
eos=0xd30cde1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/buffered-plan-root-sink.cc:158
#10 0x0294ef4d in impala::Coordinator::GetNext (this=0xe4ed180, 
results=0x173c8520, max_rows=-1, eos=0xd30cde1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/coordinator.cc:683
#11 0x02251043 in impala::ClientRequestState::FetchRowsInternal 
(this=0xd30c800, max_rows=-1, fetched_rows=0x173c8520) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/client-request-state.cc:959
#12 0x022503e7 in impala::ClientRequestState::FetchRows 
(this=0xd30c800, max_rows=-1, fetched_rows=0x173c8520) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/client-request-state.cc:851
#13 0x0226a36d in impala::ImpalaServer::FetchInternal (this=0x12d14800, 
request_state=0xd30c800, start_over=false, fetch_size=-1, 
query_results=0x7f5daf861138) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-beeswax-server.cc:582
#14 0x02264970 in impala::ImpalaServer::fetch (this=0x12d14800, 
query_results=..., query_handle=..., start_over=false, fetch_size=-1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-beeswax-server.cc:188
#15 0x027caf09 in beeswax::BeeswaxServiceProcessor::process_fetch 
(this=0x12d6fc20, seqid=0, iprot=0x119f5780, oprot=0x119f56c0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:3398
#16 0x027c94e6 in beeswax::BeeswaxServiceProcessor::dispatchCall 
(this=0x12d6fc20, iprot=0x119f5780, oprot=0x119f56c0, fname=..., seqid=0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:3200
#17 0x02796f13 in impala::ImpalaServiceProcessor::dispatchCall 
(this=0x12d6fc20, iprot=0x119f5780, oprot=0x119f56c0, fname=..., seqid=0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/ImpalaService.cpp:1824
#18 0x01b3cee4 in apache::thrift::TDispatchProcessor::process 
(this=0x12d6fc20, in=..., out=..., connectionContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/thrift-0.9.3-p7/include/thrift/TDispatchProcessor.h:121
#19 0x01f9bf28 in apache::thrift::server::TAcceptQueueServer::Task::run 
(this=0xdf92000) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/TAcceptQueueServer.cpp:84
#20 0x01f9166d in impala::ThriftThread::RunRunnable (this=0x116ddfc0, 
runnable=..., promise=0x7f5db0862e90) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/thrift-thread.cc:74
#21 0x01f92d93 in boost::_mfi::mf2, 
impala::Promise*>::operator() 
(this=0x121e7800, p=0x116ddfc0, a1=..., a2=0x7f5db0862e90) at 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/mem_fn_template.hpp:280
#22 0x01f92c29 in 
boost::_bi::list3, 
boost::_bi::value >, 
boost::_bi::value*> 
>::operator(), 
impala::Promise*>, boost::_bi::list0> 
(this=0x121e7810, f=..., a=...) at 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1

[jira] [Created] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8925:


 Summary: Consider replacing ClientRequestState ResultCache with 
result spooling
 Key: IMPALA-8925
 URL: https://issues.apache.org/jira/browse/IMPALA-8925
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar


The {{ClientRequestState}} maintains an internal results cache (which is really 
just a {{QueryResultSet}}) in order to provide support for the 
{{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
[https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).

The cache itself has some limitations:
 * It caches all results in a {{QueryResultSet}} with limited admission control 
integration
 * It has a max size, if the size is exceeded the cache is emptied
 * It cannot spill to disk

Result spooling could potentially replace the query result cache and provide a 
few benefits; it should be able to fit more rows since it can spill to disk. 
The memory is better tracked as well since it integrates with both admitted and 
reserved memory. Hue currently sets the max result set fetch size to 
[https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
 would be good to check how well that value works for Hue users so we can 
decide if replacing the current result cache with result spooling makes sense.

This would require some changes to result spooling as well, currently it 
discards rows whenever it reads them from the underlying 
{{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8926) TestResultSpooling::_test_full_queue is flaky

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8926:


 Summary: TestResultSpooling::_test_full_queue is flaky
 Key: IMPALA-8926
 URL: https://issues.apache.org/jira/browse/IMPALA-8926
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Has happened a few times, error message is:
{code:java}
query_test/test_result_spooling.py:116: in test_full_queue_large_fetch 
self._test_full_queue(vector, query, fetch_size=num_rows) 
query_test/test_result_spooling.py:148: in _test_full_queue assert 
re.search(send_wait_time_regex, self.client.get_runtime_profile(handle)) \ E   
assert None is not None E+  where None = ('RowBatchSendWaitTime: [1-9]', 'Query 
(id=e948cdd2bbde9430:082830be):\n  DEBUG MODE WARNING: Query profile 
created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
0.000ns\n') E+where  = re.search E   
 +and   'Query (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE 
WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n 
- WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
WriteIoWaitTime: 0.000ns\n' = >() E+  where > = 
.get_runtime_profile E+where 
 = 
.client {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8803) Coordinator should release admitted memory per-backend rather than per-query

2019-09-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8803.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Coordinator should release admitted memory per-backend rather than per-query
> 
>
> Key: IMPALA-8803
> URL: https://issues.apache.org/jira/browse/IMPALA-8803
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> When {{SPOOL_QUERY_RESULTS}} is true, the coordinator backend may be long 
> lived, even though all other backends for the query have completed. 
> Currently, the Coordinator only releases admitted memory when the entire 
> query has completed (include the coordinator fragment) - 
> https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/be/src/runtime/coordinator.cc#L562
> In order to more aggressively return admitted memory, the coordinator should 
> release memory when each backend for a query completes, rather than waiting 
> for the entire query to complete.
> Releasing memory per backend should be batched because releasing admitted 
> memory in the admission controller requires obtaining a global lock and 
> refreshing the internal stats of the admission controller. Batching will help 
> mitigate any additional overhead from releasing admitted memory per backend.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink

2019-09-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8818.
--
Resolution: Fixed

> Replace deque queue with spillable queue in BufferedPlanRootSink
> 
>
> Key: IMPALA-8818
> URL: https://issues.apache.org/jira/browse/IMPALA-8818
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in 
> {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a 
> {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by 
> {{PlanRootSink#computeResourceProfile}}.
> *BufferedTupleStream Usage*:
> The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' 
> mode so that pages are attached to the output {{RowBatch}} in 
> {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. 
> all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns 
> false (it returns false if "the unused reservation was not sufficient to add 
> a new page to the stream large enough to fit 'row' and the stream could not 
> increase the reservation to get enough unused reservation"), it should unpin 
> the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if 
> the row still could not be added, then an error must have occurred, perhaps 
> an IO error, in which case return the error and fail the query).
> *Constraining Resources*:
> When result spooling is disabled, a user can run a {{select * from 
> [massive-fact-table]}} and scroll through the results without affecting the 
> health of the Impala cluster (assuming they close they query promptly). 
> Impala will stream the results one batch at a time to the user.
> With result spooling, a naive implementation might try and buffer the enter 
> fact table, and end up spilling all the contents to disk, which can 
> potentially take up a large amount of space. So there needs to be 
> restrictions on the memory and disk space used by the {{BufferedTupleStream}} 
> in order to ensure a scan of a massive table does not consume all the memory 
> or disk space of the Impala coordinator.
> This problem can be solved by placing a max size on the amount of unpinned 
> memory (perhaps through a new config option 
> {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). 
> The max amount of pinned memory should already be constrained by the 
> reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the 
> number of rows returned by a query, and so it should limit the number of rows 
> buffered by the BTS as well (although it is set to 0 by default). 
> SCRATCH_LIMIT already limits the amount of disk space used for spilling 
> (although it is set to -1 by default).
> The {{PlanRootSink}} should attempt to accurately estimate how much memory it 
> needs to buffer all results in memory. This requires setting an accurate 
> value of {{ResourceProfile#memEstimateBytes_}} in 
> {{PlanRootSink#computeResourceProfile}}. If statistics are available, the 
> estimate can be based on the number of estimated rows returned multiplied by 
> the size of the rows returned. The min reservation should account for a read 
> and write page for the {{BufferedTupleStream}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8779) Add RowBatchQueue interface with an implementation backed by a std::queue

2019-09-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8779.
--
Resolution: Won't Fix

Marking this as 'Won't Fix' for now. There does not seem to be a strong need to 
add this in right now, given that there is no other use case for a generic 
{{RowBatch}} queue. The one used in the scan nodes has some unique requirements 
and re-factoring it to use a generic interface does not seem worth it. We can 
re-visit this later if we find a stronger use case for it.

> Add RowBatchQueue interface with an implementation backed by a std::queue
> -
>
> Key: IMPALA-8779
> URL: https://issues.apache.org/jira/browse/IMPALA-8779
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Add a {{RowBatchQueue}} interface with an implementation backed by a 
> {{std::queue}}. Introducing a generic queue that can buffer {{RowBatch}}-es 
> will help with the implementation of {{BufferedPlanRootSink}}. Rather than 
> tie the {{BufferedPlanRootSink}} to a specific method of queuing row batches, 
> we can use an interface. In future patches, a {{RowBatchQueue}} backed by a 
> {{BufferedTupleStream}} can easily be switched out in 
> {{BufferedPlanRootSink}}.
> We should consider re-factoring the existing {{RowBatchQueue}} to use the new 
> interface. The KRPC receiver does some buffering of {{RowBatch}}-es as well 
> which might benefit from the new RowBatchQueue interface, and some more KRPC 
> buffering might be added in IMPALA-6692.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-7312) Non-blocking mode for Fetch() RPC

2019-09-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-7312.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Non-blocking mode for Fetch() RPC
> -
>
> Key: IMPALA-7312
> URL: https://issues.apache.org/jira/browse/IMPALA-7312
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.4.0
>
>
> Currently Fetch() can block for an arbitrary amount of time until a batch of 
> rows is produced. It might be helpful to have a mode where it returns quickly 
> when there is no data available, so that threads and RPC slots are not tied 
> up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8934) Add failpoint tests to result spooling code

2019-09-10 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8934:


 Summary: Add failpoint tests to result spooling code
 Key: IMPALA-8934
 URL: https://issues.apache.org/jira/browse/IMPALA-8934
 Project: IMPALA
  Issue Type: Sub-task
Affects Versions: Impala 3.2.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar


IMPALA-8924 was discovered while running {{test_failpoints.py}} with results 
spooling enabled. The goal of this JIRA is to add similar failpoint coverage to 
{{test_result_spooling.py}} so that we have sufficient coverage for the various 
failure paths when result spooling is enabled.

The failure paths that should be covered include:
* Failures while executing the exec tree should be handled correctly



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8942) Set file format specific values for split sizes on non-block stores

2019-09-11 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8942:


 Summary: Set file format specific values for split sizes on 
non-block stores
 Key: IMPALA-8942
 URL: https://issues.apache.org/jira/browse/IMPALA-8942
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Parquet scans on non-block based storage systems (e.g. S3, ADLS, etc.) can 
suffer from uneven scan range assignment due to the behavior described in 
IMPALA-3453. The frontend should set different split sizes depending on the 
file type and file system.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8944) Update and re-enable S3PlannerTest

2019-09-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8944:


 Summary: Update and re-enable S3PlannerTest
 Key: IMPALA-8944
 URL: https://issues.apache.org/jira/browse/IMPALA-8944
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar
Assignee: Sahil Takiar


It looks like we don't run {{S3PlannerTest}} in our regular Jenkins jobs. When 
run against a HDFS mini-cluster, they are skipped by the {{TARGET_FILESYSTEM}} 
is not S3. On our S3 jobs, they don't run either because we skip all fe/ tests 
(most of them don't work against S3 / assume they are running on HDFS).

A few things need to be fixed to get this working:
* The test cases in {{S3PlannerTest}} need to be fixed
* The Jenkins jobs that runs the S3 tests needs the ability to run specific fe/ 
tests (e.g. just the {{S3PlannerTest}} and to skip the rest)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8825) Add additional counters to PlanRootSink

2019-09-16 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8825.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Add additional counters to PlanRootSink
> ---
>
> Key: IMPALA-8825
> URL: https://issues.apache.org/jira/browse/IMPALA-8825
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not 
> contain much useful information:
> {code:java}
> PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%)
> - PeakMemoryUsage: 0{code}
> There are several additional counters we could add to the {{PlanRootSink}} 
> (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}):
>  * Amount of time spent blocking inside the {{PlanRootSink}} - both the time 
> spent by the client thread waiting for rows to become available and the time 
> spent by the impala thread waiting for the client to consume rows
>  ** So similar to the {{RowBatchQueueGetWaitTime}} and 
> {{RowBatchQueuePutWaitTime}} inside the scan nodes
>  ** The difference between these counters and the ones in 
> {{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and 
> {{RowMaterializationTimer}}) should be documented
>  * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} 
> counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} 
> section
>  * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} 
> and the number of rows fetched (might need to be tracked in the 
> {{ClientRequestState}})
>  ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty 
> much the same, but for {{BufferedPlanRootSink}} this is more useful
>  ** Similar to {{RowsReturned}} in each exec node
>  * The rate at which rows are sent and fetched
>  ** Should be useful when attempting to debug perf of the fetching rows (e.g. 
> if the send rate is much higher than the fetch rate, then maybe there is 
> something wrong with the client)
>  ** Similar to {{RowsReturnedRate}} in each exec node
> Open to other suggestions for counters that folks think are useful.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8949) PlannerTest differences when running on S3 vs HDFS

2019-09-17 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8949:


 Summary: PlannerTest differences when running on S3 vs HDFS
 Key: IMPALA-8949
 URL: https://issues.apache.org/jira/browse/IMPALA-8949
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Sahil Takiar


While re-enabling the {{S3PlannerTest}} in IMPALA-8944, there are several tests 
that are consistently failing due to actual diffs in the explain plan:
* org.apache.impala.planner.S3PlannerTest.testTpcds
* org.apache.impala.planner.S3PlannerTest.testTpch
* org.apache.impala.planner.S3PlannerTest.testJoinOrder
* org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite

All are failing for non-trivial reasons - e.g. differences in memory estimates, 
join orders, etc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


<    1   2   3   4   >