[jira] [Resolved] (IMPALA-9954) RpcRecvrTime can be negative
[ https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9954. -- Fix Version/s: Impala 4.0 Resolution: Fixed > RpcRecvrTime can be negative > > > Key: IMPALA-9954 > URL: https://issues.apache.org/jira/browse/IMPALA-9954 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.0 > > Attachments: profile_034e7209bd98c96c_9a448dfc.txt > > > Saw this on a recent version of master. Attached the full runtime profile. > {code:java} > KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, > % non-child: 32.30%) > ExecOption: Unpartitioned Sender Codegen Disabled: not needed >- BytesSent (500.000ms): 0, 0 >- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: > 4.34 MB/sec ; Number of samples: 1) >- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; > Number of samples: 2) >- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: > -71077.000ns ; Number of samples: 2) >- EosSent: 1 (1) >- PeakMemoryUsage: 416.00 B (416) >- RowsSent: 100 (100) >- RpcFailure: 0 (0) >- RpcRetry: 0 (0) >- SerializeBatchTime: 2.880ms >- TotalBytesSent: 28.67 KB (29355) >- UncompressedRowBatchSize: 69.29 KB (70950) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10241) Impala Doc: RPC troubleshooting guide
Sahil Takiar created IMPALA-10241: - Summary: Impala Doc: RPC troubleshooting guide Key: IMPALA-10241 URL: https://issues.apache.org/jira/browse/IMPALA-10241 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar There have been several diagnostic improvements to how RPCs can be debugged. We should document them a bit along with the associated options for configuring them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10240) Impala Doc: Add docs for cluster membership statestore heartbeats
Sahil Takiar created IMPALA-10240: - Summary: Impala Doc: Add docs for cluster membership statestore heartbeats Key: IMPALA-10240 URL: https://issues.apache.org/jira/browse/IMPALA-10240 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar I don't see many docs explaining how the current cluster membership logic works (e.g. via the statestored heartbeats). Would be nice to include a high level explanation along with how to configure the heartbeat threshold. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10239) Docs: Add docs for node blacklisting
Sahil Takiar created IMPALA-10239: - Summary: Docs: Add docs for node blacklisting Key: IMPALA-10239 URL: https://issues.apache.org/jira/browse/IMPALA-10239 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar We should add some docs for node blacklisting explaining what is it, how it works at a high level, what errors it captures, how to debug it, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10238) Add fault tolerance docs
Sahil Takiar created IMPALA-10238: - Summary: Add fault tolerance docs Key: IMPALA-10238 URL: https://issues.apache.org/jira/browse/IMPALA-10238 Project: IMPALA Issue Type: Task Components: Docs Reporter: Sahil Takiar Assignee: Sahil Takiar Impala docs currently don't have much information about any of our fault tolerance features. We should add a dedicated section with several sub-topics to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries
Sahil Takiar created IMPALA-10235: - Summary: Averaged timer profile counters can be negative for trivial queries Key: IMPALA-10235 URL: https://issues.apache.org/jira/browse/IMPALA-10235 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar Attachments: profile-output.txt Steps to reproduce on master: {code} stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes limit 25" -p > profile-output.txt ... Query: select sleep(100) from functional.alltypes limit 25 Query submitted at: 2020-10-13 11:13:07 (Coordinator: http://stakiar-desktop:25000) Query progress can be monitored at: http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9 Fetched 25 row(s) in 2.64s {code} Attached the contents of {{profile-output.txt}} Relevant portion of the profile: {code} Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 0.01%) ... - CompletionTime: -1665218428.000ns ... - TotalThreadsTotalWallClockTime: -1686005515.000ns - TotalThreadsSysTime: 0.000ns - TotalThreadsUserTime: 2.151ms ... - TotalTime: -1691524485.000ns {code} For whatever reason, this only affects the averaged fragment profile. For this query, there was only one coordinator fragment and thus only one fragment instance. It showed normal values: {code} Coordinator Fragment F00: ... - CompletionTime: 2s629ms ... - TotalThreadsTotalWallClockTime: 2s608ms - TotalThreadsSysTime: 0.000ns - TotalThreadsUserTime: 2.151ms ... - TotalTime: 2s603ms {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling
[ https://issues.apache.org/jira/browse/IMPALA-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8925. -- Resolution: Later This would be nice to have, but not seeing a strong reason to do this at the moment. So closing as "Later". > Consider replacing ClientRequestState ResultCache with result spooling > -- > > Key: IMPALA-8925 > URL: https://issues.apache.org/jira/browse/IMPALA-8925 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Clients >Reporter: Sahil Takiar >Priority: Minor > > The {{ClientRequestState}} maintains an internal results cache (which is > really just a {{QueryResultSet}}) in order to provide support for the > {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see > [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]). > The cache itself has some limitations: > * It caches all results in a {{QueryResultSet}} with limited admission > control integration > * It has a max size, if the size is exceeded the cache is emptied > * It cannot spill to disk > Result spooling could potentially replace the query result cache and provide > a few benefits; it should be able to fit more rows since it can spill to > disk. The memory is better tracked as well since it integrates with both > admitted and reserved memory. Hue currently sets the max result set fetch > size to > [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61], > would be good to check how well that value works for Hue users so we can > decide if replacing the current result cache with result spooling makes sense. > This would require some changes to result spooling as well, currently it > discards rows whenever it reads them from the underlying > {{BufferedTupleStream}}. It would need the ability to reset the read cursor, > which would require some changes to the {{PlanRootSink}} interface as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9485) Enable file handle cache for EC files
[ https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9485. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Enable file handle cache for EC files > - > > Key: IMPALA-9485 > URL: https://issues.apache.org/jira/browse/IMPALA-9485 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > Now that HDFS-14308 has been fixed, we can re-enable the file handle cache > for EC files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10028) Additional optimizations of Impala docker container sizes
[ https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-10028. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Additional optimizations of Impala docker container sizes > - > > Key: IMPALA-10028 > URL: https://issues.apache.org/jira/browse/IMPALA-10028 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > There are some more optimizations we can make to get the images to be even > smaller. It looks like we may have regressed with regards to image size as > well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release > build and they are currently 1.01 GB. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IMPALA-10016) Split jars for Impala executor and coordinator Docker images
[ https://issues.apache.org/jira/browse/IMPALA-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar closed IMPALA-10016. - Fix Version/s: Impala 4.0 Resolution: Fixed > Split jars for Impala executor and coordinator Docker images > > > Key: IMPALA-10016 > URL: https://issues.apache.org/jira/browse/IMPALA-10016 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > Impala executors and coordinator currently have a common base images. The > base image defines a set of jar files needed by either the coordinator or the > executor. In order to reduce the image size, we should split out the jars > into two categories: those necessary for the coordinator and those necessary > for the executor. This should help reduce overall image size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
Sahil Takiar created IMPALA-10217: - Summary: test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky Key: IMPALA-10217 URL: https://issues.apache.org/jira/browse/IMPALA-10217 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Seen this a few times in exhaustive builds: {code} query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] (from pytest) query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) common/impala_test_suite.py:718: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:627: in verify_runtime_profile % (function, field, expected_value, actual_value, actual)) E AssertionError: Aggregation of SUM over ProbeRows did not match expected results. E EXPECTED VALUE: E 102 E E ACTUAL VALUE: E 38 E {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds
Sahil Takiar created IMPALA-10216: - Summary: BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds Key: IMPALA-10216 URL: https://issues.apache.org/jira/browse/IMPALA-10216 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Only seen this once so far: {code} BufferPoolTest.WriteErrorBlacklistCompression Error Message Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL Actual: false Expected: true Stacktrace Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764 Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL Actual: false Expected: true {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10214) Ozone support for file handle cache
Sahil Takiar created IMPALA-10214: - Summary: Ozone support for file handle cache Key: IMPALA-10214 URL: https://issues.apache.org/jira/browse/IMPALA-10214 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar This is dependent on the Ozone input streams supporting the {{CanUnbuffer}} interface first (last I checked, the input streams don't implement the interface). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10202) Enable file handle cache for ABFS files
[ https://issues.apache.org/jira/browse/IMPALA-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-10202. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Enable file handle cache for ABFS files > --- > > Key: IMPALA-10202 > URL: https://issues.apache.org/jira/browse/IMPALA-10202 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > We should enable the file handle cache for ABFS, we have already seen it > benefit jobs that read data from S3A. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9606) ABFS reads should use hdfsPreadFully
[ https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9606. -- Fix Version/s: Impala 4.0 Resolution: Fixed > ABFS reads should use hdfsPreadFully > > > Key: IMPALA-9606 > URL: https://issues.apache.org/jira/browse/IMPALA-9606 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > In IMPALA-8525, hdfs preads were enabled by default when reading data from > S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't > significantly improve performance. After some more investigation into the > ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS > reads. > The ABFS client uses a different model for fetching data compared to S3A. > Details are beyond the scope of this JIRA, but it is related to a feature in > ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will > be required by the client. By default, it pre-fetches # cores * 4 MB of data. > If the requested data exists in the client cache, it is read from the cache. > However, there is no real drawback to using {{hdfsPreadFully}} for ABFS > reads. It's definitely safer, because while the current implementation of > ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} > API makes that guarantee. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-3335) Allow single-node optimization with joins.
[ https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-3335. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Allow single-node optimization with joins. > -- > > Key: IMPALA-3335 > URL: https://issues.apache.org/jira/browse/IMPALA-3335 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.5.0 >Reporter: Alexander Behm >Assignee: Sahil Takiar >Priority: Minor > Labels: ramp-up > Fix For: Impala 4.0 > > > Now that IMPALA-561 has been fixed, we can remove the workaround that > disables the our single-node optimization for any plan with joins. See > MaxRowsProcessedVisitor.java: > {code} > } else if (caller instanceof HashJoinNode || caller instanceof > NestedLoopJoinNode) { > // Revisit when multiple scan nodes can be executed in a single fragment, > IMPALA-561 > abort_ = true; > return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10202) Enable file handle cache for ABFS files
Sahil Takiar created IMPALA-10202: - Summary: Enable file handle cache for ABFS files Key: IMPALA-10202 URL: https://issues.apache.org/jira/browse/IMPALA-10202 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Assignee: Sahil Takiar We should enable the file handle cache for ABFS, we have already seen it benefit jobs that read data from S3A. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8577) Crash during OpenSSLSocket.read
[ https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8577. -- Fix Version/s: Impala 4.0 Resolution: Fixed This was fixed a while ago. Impala has been using wildfly for communication with S3 for a while now and everything seems stable. > Crash during OpenSSLSocket.read > --- > > Key: IMPALA-8577 > URL: https://issues.apache.org/jira/browse/IMPALA-8577 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: David Rorke >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, > hs_err_pid6313.log, > impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313, > > impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313 > > > Impalad crashed while running a TPC-DS 10 TB run against S3. Excerpt from > the stack trace (hs_err log file attached with more complete stack): > {noformat} > Stack: [0x7f3d095bc000,0x7f3d09dbc000], sp=0x7f3d09db9050, free > space=8180k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [impalad+0x2528a33] > tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, > unsigned long, int)+0x133 > C [impalad+0x2528e0f] tcmalloc::ThreadCache::Scavenge()+0x3f > C [impalad+0x266468a] operator delete(void*)+0x32a > C [libcrypto.so.10+0x6e70d] CRYPTO_free+0x1d > J 5709 org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ > 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9] > J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ > 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c] > J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ > 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c] > J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ > 0x7f3d4de419cc [0x7f3d4de417c0+0x20c] > J 2994 C2 > org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult; > (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134] > J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ > 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0] > J 5090 C2 > com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I > (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0] > J 5846 C1 > com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I > (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384] > J 5845 C1 > com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 > bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c] > {noformat} > The crash may not be easy to reproduce. I've run this test multiple times > and only crashed once. I have a core file if needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10191) Test impalad_coordinator and impalad_executor in Dockerized tests
Sahil Takiar created IMPALA-10191: - Summary: Test impalad_coordinator and impalad_executor in Dockerized tests Key: IMPALA-10191 URL: https://issues.apache.org/jira/browse/IMPALA-10191 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Currently only the impalad_coord_exec images are tested in the Dockerized tests, it would be nice to get test coverage for the other images as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10190) Remove impalad_coord_exec Dockerfile
Sahil Takiar created IMPALA-10190: - Summary: Remove impalad_coord_exec Dockerfile Key: IMPALA-10190 URL: https://issues.apache.org/jira/browse/IMPALA-10190 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The impalad_coord_exec Dockerfile is a bit redundant because it basically contains all the same dependencies as the impalad_coordinator Dockerfile. The only different between the two files is that the startup flags for impalad_coordinator contain {{is_executor=false}}. We should find a way to remove the {{impalad_coord_exec}} altogether. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_
[ https://issues.apache.org/jira/browse/IMPALA-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-10170. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Data race on Webserver::UrlHandler::is_on_nav_bar_ > -- > > Key: IMPALA-10170 > URL: https://issues.apache.org/jira/browse/IMPALA-10170 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > {code} > WARNING: ThreadSanitizer: data race (pid=31102) > Read of size 1 at 0x7b2c0006e3b0 by thread T42: > #0 impala::Webserver::UrlHandler::is_on_nav_bar() const > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41 > (impalad+0x256ff39) > #1 > impala::Webserver::GetCommonJson(rapidjson::GenericDocument, > rapidjson::MemoryPoolAllocator, > rapidjson::CrtAllocator>*, sq_connection const*, > kudu::WebCallbackRegistry::WebRequest const&) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24 > (impalad+0x256be13) > #2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, > kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler > const&, std::__cxx11::basic_stringstream, > std::allocator >*, impala::ContentType*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3 > (impalad+0x256e882) > #3 impala::Webserver::BeginRequestCallback(sq_connection*, > sq_request_info*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5 > (impalad+0x256cfbb) > #4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20 > (impalad+0x256ba98) > #5 handle_request (impalad+0x2582d59) > Previous write of size 2 at 0x7b2c0006e3b0 by main thread: > #0 > impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9 > (impalad+0x2570dbc) > #1 std::pair, > std::allocator > const, > impala::Webserver::UrlHandler>::pair std::char_traits, std::allocator >, > impala::Webserver::UrlHandler, > true>(std::pair, > std::allocator >, impala::Webserver::UrlHandler>&&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4 > (impalad+0x25738b3) > #2 void > __gnu_cxx::new_allocator std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler> > > >::construct std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, > impala::Webserver::UrlHandler> >(std::pair std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler>*, std::pair std::char_traits, std::allocator >, > impala::Webserver::UrlHandler>&&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23 > (impalad+0x2573848) > #3 void > std::allocator_traits std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler> > > > >::construct std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, > impala::Webserver::UrlHandler> > >(std::allocator std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler> > >&, > std::pair, > std::allocator > const, impala::Webserver::UrlHandler>*, > std::pair, > std::allocator >, impala::Webserver::UrlHandler>&&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8 > (impalad+0x25737f1) > #4 void std::_Rb_tree std::char_traits, std::allocator >, > std::pair, > std::allocator > const, impala::Webserver::UrlHandler>, > std::_Select1st std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler> >, std::less std::char_traits, std::allocator > >, > std::allocator std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler> > > >::_M_construct_node std::char_traits, std::allocator >, > impala::Webserver::UrlHandler> > >(std::_Rb_tree_node std::char_traits, std::allocator > const, > impala::Webserver::UrlHandler> >*, std::pair std::char_traits, std::allocator >, > impala::Webserver::UrlHandler>&&) > /data/j
[jira] [Resolved] (IMPALA-9046) Profile counter that indicates if a process or JVM pause occurred
[ https://issues.apache.org/jira/browse/IMPALA-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9046. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Profile counter that indicates if a process or JVM pause occurred > - > > Key: IMPALA-9046 > URL: https://issues.apache.org/jira/browse/IMPALA-9046 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > We currently log a message if a process or JVM pause is detected but there's > no indication in the query profile if it got affected. I suggest that we > should: > * Add metrics that indicate the number and duration of detected pauses > * Add counters to the backend profile for the deltas in those metrics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9229) Link failed and retried runtime profiles
[ https://issues.apache.org/jira/browse/IMPALA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9229. -- Fix Version/s: Impala 4.0 Resolution: Fixed Marking as resolved. The Web UI improvements are tracked in a separate JIRA. > Link failed and retried runtime profiles > > > Key: IMPALA-9229 > URL: https://issues.apache.org/jira/browse/IMPALA-9229 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Critical > Fix For: Impala 4.0 > > > There should be a way for clients to link the runtime profiles from failed > queries to all retry attempts (whether successful or not), and vice versa. > There are a few ways to do this: > * The simplest way would be to include the query id of the retried query in > the runtime profile of the failed query, and vice versa; users could then > manually create a chain of runtime profiles in order to fetch all failed / > successful attempts > * Extend TGetRuntimeProfileReq to include an option to fetch all runtime > profiles for the given query id + all retry attempts (or add a new Thrift > call TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a > given query id) > * The Impala debug UI should include a simple way to view all the runtime > profiles of a query (the failed attempts + all retry attempts) side by side > (perhaps the query_profile?query_id profile should include tabs to easily > switch between the runtime profiles of each attempt) > These are not mutually exclusive, and it might be good to stage these changes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10180) Add average size of fetch requests in runtime profile
Sahil Takiar created IMPALA-10180: - Summary: Add average size of fetch requests in runtime profile Key: IMPALA-10180 URL: https://issues.apache.org/jira/browse/IMPALA-10180 Project: IMPALA Issue Type: Improvement Components: Clients Reporter: Sahil Takiar When queries with a high {{ClientFetchWaitTimer}} it would be useful to know the average number of rows requested by the client per fetch request. This can help determine if setting a higher fetch size would help improve fetch performance where the network RTT between the client and Impala is high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_
Sahil Takiar created IMPALA-10170: - Summary: Data race on Webserver::UrlHandler::is_on_nav_bar_ Key: IMPALA-10170 URL: https://issues.apache.org/jira/browse/IMPALA-10170 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar {code} WARNING: ThreadSanitizer: data race (pid=31102) Read of size 1 at 0x7b2c0006e3b0 by thread T42: #0 impala::Webserver::UrlHandler::is_on_nav_bar() const /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41 (impalad+0x256ff39) #1 impala::Webserver::GetCommonJson(rapidjson::GenericDocument, rapidjson::MemoryPoolAllocator, rapidjson::CrtAllocator>*, sq_connection const*, kudu::WebCallbackRegistry::WebRequest const&) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24 (impalad+0x256be13) #2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler const&, std::__cxx11::basic_stringstream, std::allocator >*, impala::ContentType*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3 (impalad+0x256e882) #3 impala::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5 (impalad+0x256cfbb) #4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20 (impalad+0x256ba98) #5 handle_request (impalad+0x2582d59) Previous write of size 2 at 0x7b2c0006e3b0 by main thread: #0 impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9 (impalad+0x2570dbc) #1 std::pair, std::allocator > const, impala::Webserver::UrlHandler>::pair, std::allocator >, impala::Webserver::UrlHandler, true>(std::pair, std::allocator >, impala::Webserver::UrlHandler>&&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4 (impalad+0x25738b3) #2 void __gnu_cxx::new_allocator, std::allocator > const, impala::Webserver::UrlHandler> > >::construct, std::allocator > const, impala::Webserver::UrlHandler>, std::pair, std::allocator >, impala::Webserver::UrlHandler> >(std::pair, std::allocator > const, impala::Webserver::UrlHandler>*, std::pair, std::allocator >, impala::Webserver::UrlHandler>&&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23 (impalad+0x2573848) #3 void std::allocator_traits, std::allocator > const, impala::Webserver::UrlHandler> > > >::construct, std::allocator > const, impala::Webserver::UrlHandler>, std::pair, std::allocator >, impala::Webserver::UrlHandler> >(std::allocator, std::allocator > const, impala::Webserver::UrlHandler> > >&, std::pair, std::allocator > const, impala::Webserver::UrlHandler>*, std::pair, std::allocator >, impala::Webserver::UrlHandler>&&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8 (impalad+0x25737f1) #4 void std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, impala::Webserver::UrlHandler>, std::_Select1st, std::allocator > const, impala::Webserver::UrlHandler> >, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::Webserver::UrlHandler> > >::_M_construct_node, std::allocator >, impala::Webserver::UrlHandler> >(std::_Rb_tree_node, std::allocator > const, impala::Webserver::UrlHandler> >*, std::pair, std::allocator >, impala::Webserver::UrlHandler>&&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:626:8 (impalad+0x257369b) #5 std::_Rb_tree_node, std::allocator > const, impala::Webserver::UrlHandler> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, impala::Webserver::UrlHandler>, std::_Select1st, std::allocator > const, impala::Webserver::UrlHandler> >, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::Webserver::UrlHandler> > >::_M_create_node, std::allocator >, impala::Webserver::UrlHandler> >(std::pair, std::allocator >, impala::Webserver::UrlHandler>&&) /data/jenkins/worksp
[jira] [Resolved] (IMPALA-9740) TSAN data race in hdfs-bulk-ops
[ https://issues.apache.org/jira/browse/IMPALA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9740. -- Fix Version/s: Impala 4.0 Resolution: Fixed > TSAN data race in hdfs-bulk-ops > --- > > Key: IMPALA-9740 > URL: https://issues.apache.org/jira/browse/IMPALA-9740 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > hdfs-bulk-ops usage of a local connection cache (HdfsFsCache::HdfsFsMap) has > a data race: > {code:java} > WARNING: ThreadSanitizer: data race (pid=23205) > Write of size 8 at 0x7b24005642d8 by thread T47: > #0 > boost::unordered::detail::table_impl const, hdfs_internal*> >, std::string, hdfs_internal*, > boost::hash, std::equal_to > > >::add_node(boost::unordered::detail::node_constructor const, hdfs_internal*> > > >&, unsigned long) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:329:26 > (impalad+0x1f93832) > #1 > std::pair const, hdfs_internal*> > >, bool> > boost::unordered::detail::table_impl const, hdfs_internal*> >, std::string, hdfs_internal*, > boost::hash, std::equal_to > > >::emplace_impl >(std::string > const&, std::pair&&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:420:41 > (impalad+0x1f933ed) > #2 > std::pair const, hdfs_internal*> > >, bool> > boost::unordered::detail::table_impl const, hdfs_internal*> >, std::string, hdfs_internal*, > boost::hash, std::equal_to > > >::emplace > >(std::pair&&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:384:20 > (impalad+0x1f932d1) > #3 > std::pair const, hdfs_internal*> > >, bool> > boost::unordered::unordered_map boost::hash, std::equal_to, > std::allocator > > >::emplace > >(std::pair&&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:241:27 > (impalad+0x1f93238) > #4 boost::unordered::unordered_map boost::hash, std::equal_to, > std::allocator > > >::insert(std::pair&&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:390:26 > (impalad+0x1f92038) > #5 impala::HdfsFsCache::GetConnection(std::string const&, > hdfs_internal**, boost::unordered::unordered_map boost::hash, std::equal_to, > std::allocator > >*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/hdfs-fs-cache.cc:115:18 > (impalad+0x1f916b3) > #6 impala::HdfsOp::Execute() const > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:84:55 > (impalad+0x23444d5) > #7 HdfsThreadPoolHelper(int, impala::HdfsOp const&) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:137:6 > (impalad+0x2344ea9) > #8 boost::detail::function::void_function_invoker2 impala::HdfsOp const&), void, int, impala::HdfsOp > const&>::invoke(boost::detail::function::function_buffer&, int, > impala::HdfsOp const&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:118:11 > (impalad+0x2345e80) > #9 boost::function2::operator()(int, > impala::HdfsOp const&) const > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 > (impalad+0x1f883be) > #10 impala::ThreadPool::WorkerThread(int) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread-pool.h:166:9 > (impalad+0x1f874e5) > #11 boost::_mfi::mf1, > int>::operator()(impala::ThreadPool*, int) const > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165:29 > (impalad+0x1f87b7d) > #12 void > boost::_bi::list2*>, > boost::_bi::value >::operator() impala::ThreadPool, int>, > boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1 impala::ThreadPool, int>&, boost::_bi::list0&, int) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:319:9 > (impalad+0x1f87abc) > #13 boost::_bi::bind_t impala::ThreadPool, int>, > boost::_bi::list2*>, > boost::_bi::value > >::operator()() > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 > (impalad+0x1f87a23) > #14 > boost::detail::function::voi
[jira] [Created] (IMPALA-10160) kernel_stack_watchdog cannot print user stack
Sahil Takiar created IMPALA-10160: - Summary: kernel_stack_watchdog cannot print user stack Key: IMPALA-10160 URL: https://issues.apache.org/jira/browse/IMPALA-10160 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar I've seen this a few times now, the kernel_stack_watchdog is used in a few places in the KRPC code and it prints out the kernel + user stack whenever a thread is stuck in some method call for too long. The issue is that the user stack does not get printed: {code} W0908 17:15:00.365721 6605 kernel_stack_watchdog.cc:198] Thread 6612 stuck at outbound_call.cc:273 for 120ms: Kernel stack: [] futex_wait_queue_me+0xc6/0x130 [] futex_wait+0x17b/0x280 [] do_futex+0x106/0x5a0 [] SyS_futex+0x80/0x180 [] system_call_fastpath+0x16/0x1b [] 0x User stack: {code} It says that the signal handler of taking the thread stack is unavailable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10154) Data race on coord_backend_id
Sahil Takiar created IMPALA-10154: - Summary: Data race on coord_backend_id Key: IMPALA-10154 URL: https://issues.apache.org/jira/browse/IMPALA-10154 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar Assignee: Wenzhe Zhou TSAN is reporting a data race on {{ExecQueryFInstancesRequestPB#coord_backend_id}} {code:java} WARNING: ThreadSanitizer: data race (pid=15392) Write of size 8 at 0x7b74001104a8 by thread T83 (mutexes: write M871582266043729400): #0 impala::ExecQueryFInstancesRequestPB::mutable_coord_backend_id() /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.pb.h:6625:23 (impalad+0x20c03ed) #1 impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, impala::TExecPlanFragmentInfo const&) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:216:21 (impalad+0x20b8b29) #2 impala::QueryExecMgr::StartQuery(impala::ExecQueryFInstancesRequestPB const*, impala::TQueryCtx const&, impala::TExecPlanFragmentInfo const&) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-exec-mgr.cc:80:23 (impalad+0x20acb59) #3 impala::ControlService::ExecQueryFInstances(impala::ExecQueryFInstancesRequestPB const*, impala::ExecQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/control-service.cc:157:66 (impalad+0x22a621d) #4 impala::ControlServiceIf::ControlServiceIf(scoped_refptr const&, scoped_refptr const&)::$_1::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.service.cc:70:13 (impalad+0x23622a4) #5 std::_Function_handler const&, scoped_refptr const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316:2 (impalad+0x23620ed) #6 std::function::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14 (impalad+0x2a4a453) #7 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/kudu/rpc/service_if.cc:139:3 (impalad+0x2a49efe) #8 impala::ImpalaServicePool::RunThread() /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/impala-service-pool.cc:272:15 (impalad+0x2011a12) #9 boost::_mfi::mf0::operator()(impala::ImpalaServicePool*) const /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:49:29 (impalad+0x2017a16) #10 void boost::_bi::list1 >::operator(), boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0&, boost::_bi::list0&, int) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:259:9 (impalad+0x201796a) #11 boost::_bi::bind_t, boost::_bi::list1 > >::operator()() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 (impalad+0x20178f3) #12 boost::detail::function::void_function_obj_invoker0, boost::_bi::list1 > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11 (impalad+0x20176e9) #13 boost::function0::operator()() const /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 (impalad+0x1f666f1) #14 impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3 (impalad+0x252644b) #15 void boost::_bi::list5, std::allocator > >, boost::_bi::value, std::allocator > >, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >::operator(), std::allocator > const&, std::__c
[jira] [Created] (IMPALA-10142) Add RPC sender tracing
Sahil Takiar created IMPALA-10142: - Summary: Add RPC sender tracing Key: IMPALA-10142 URL: https://issues.apache.org/jira/browse/IMPALA-10142 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar We currently have RPC tracing on the receiver side, but not on the the sender side. For slow RPCs, the logs print out the total amount of time spent sending the RPC + the network time. Adding tracing will basically make this more granular. It will help determine where exactly in the stack the time was spent when sending RPCs. Combined with the trace logs in the receiver, it should be much easier to determine the timeline of a given slow RPC. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles
Sahil Takiar created IMPALA-10141: - Summary: Include aggregate TCP metrics in per-node profiles Key: IMPALA-10141 URL: https://issues.apache.org/jira/browse/IMPALA-10141 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level metrics per kRPC connection for all inbound / outbound connections. It would be useful to aggregate some of these metrics and put them in the per-node profiles. Since it is not possible to currently split these metrics out per query, they should be added at the per-host level. Furthermore, only metrics that can be sanely aggregated across all connections should be included. For example, tracking the number of Retransmitted TCP Packets across all connections for the duration of the query would be useful. TCP retransmissions should be rare and are typically indicate of network hardware issues or network congestions, having at least some high level idea of the number of TCP retransmissions that occur during a query can drastically help determine if the network is to blame for query slowness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10139) Slow RPC logs can be misleading
Sahil Takiar created IMPALA-10139: - Summary: Slow RPC logs can be misleading Key: IMPALA-10139 URL: https://issues.apache.org/jira/browse/IMPALA-10139 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The slow RPC logs added in IMPALA-9128 are based on the total time taken to successfully complete a RPC. The issue is that there are many reasons why an RPC might take a long time to complete. An RPC is considered complete only when the receiver has processed that RPC. The problem is that due to client-driven back-pressure mechanism, it is entirely possible that the receiver RPC does not process a receiver RPC because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been called yet (indirectly called by {{ExchangeNode::GetNext}}). This can lead to flood of slow RPC logs, even though the RPCs might not actually be slow themselves. What is worse is that the because of the back-pressure mechanism, slowness from the client (e.g. Hue users) will propagate across all nodes involved in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10138) Add fragment instance id to RPC trace output
Sahil Takiar created IMPALA-10138: - Summary: Add fragment instance id to RPC trace output Key: IMPALA-10138 URL: https://issues.apache.org/jira/browse/IMPALA-10138 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The RPC traces added in IMPALA-9128 are hard to correlate to specific queries because the output does not include the fragment instance id. I'm not sure if this is actually possible in the current kRPC code, but it would be nice if the tracing output included the fragment instance id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10137) Network Debugging / Supportability Improvements
Sahil Takiar created IMPALA-10137: - Summary: Network Debugging / Supportability Improvements Key: IMPALA-10137 URL: https://issues.apache.org/jira/browse/IMPALA-10137 Project: IMPALA Issue Type: Epic Reporter: Sahil Takiar There are various improvements Impala should make to improve debugging of network issues (e.g. slow RPCs, TCP retransmissions, etc.). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10126) asf-master-core-s3 test_aggregation.TestWideAggregationQueries.test_many_grouping_columns failed
[ https://issues.apache.org/jira/browse/IMPALA-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-10126. --- Resolution: Duplicate Duplicate of IMPALA-9058 > asf-master-core-s3 > test_aggregation.TestWideAggregationQueries.test_many_grouping_columns failed > > > Key: IMPALA-10126 > URL: https://issues.apache.org/jira/browse/IMPALA-10126 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Yongzhi Chen >Priority: Major > > query_test.test_aggregation.TestWideAggregationQueries.test_many_grouping_columns[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] (from pytest) > {noformat} > Error Message > query_test/test_aggregation.py:453: in test_many_grouping_columns result > = self.execute_query(query, exec_option, table_format=table_format) > common/impala_test_suite.py:811: in wrapper return function(*args, > **kwargs) common/impala_test_suite.py:843: in execute_query return > self.__execute_query(self.client, query, query_options) > common/impala_test_suite.py:909: in __execute_query return > impalad_client.execute(query, user=user) common/impala_connection.py:205: in > execute return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute handle = > self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:365: in __execute_query > self.wait_for_finished(handle) beeswax/impala_beeswax.py:386: in > wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + > error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: E > Query aborted:Disk I/O error on > impala-ec2-centos74-m5-4xlarge-ondemand-1129.vpc.cloudera.com:22001: Failed > to open HDFS file > s3a://impala-test-uswest2-1/test-warehouse/widetable_1000_cols_parquet/1f4ec08992b6e3f9-6fd9a17d_1482052561_data.0.parq > E Error(2): No such file or directory E Root cause: > ResourceNotFoundException: Requested resource not found (Service: > AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; > Request ID: 1HMMG39MJ9GP2JEENAUFVFDVA3VV4KQNSO5AEMVJF66Q9ASUAAJG) > Stacktrace > query_test/test_aggregation.py:453: in test_many_grouping_columns > result = self.execute_query(query, exec_option, table_format=table_format) > common/impala_test_suite.py:811: in wrapper > return function(*args, **kwargs) > common/impala_test_suite.py:843: in execute_query > return self.__execute_query(self.client, query, query_options) > common/impala_test_suite.py:909: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:365: in __execute_query > self.wait_for_finished(handle) > beeswax/impala_beeswax.py:386: in wait_for_finished > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EQuery aborted:Disk I/O error on > impala-ec2-centos74-m5-4xlarge-ondemand-1129.vpc.cloudera.com:22001: Failed > to open HDFS file > s3a://impala-test-uswest2-1/test-warehouse/widetable_1000_cols_parquet/1f4ec08992b6e3f9-6fd9a17d_1482052561_data.0.parq > E Error(2): No such file or directory > E Root cause: ResourceNotFoundException: Requested resource not found > (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: > ResourceNotFoundException; Request ID: > 1HMMG39MJ9GP2JEENAUFVFDVA3VV4KQNSO5AEMVJF66Q9ASUAAJG) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10128) AnalyzeDDLTest.TestCreateTableLikeFileOrc failed
[ https://issues.apache.org/jira/browse/IMPALA-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-10128. --- Resolution: Duplicate Looks like a duplicate of IMPALA-9351 > AnalyzeDDLTest.TestCreateTableLikeFileOrc failed > > > Key: IMPALA-10128 > URL: https://issues.apache.org/jira/browse/IMPALA-10128 > Project: IMPALA > Issue Type: Bug >Reporter: Yongzhi Chen >Priority: Major > > Parallel-all-tests: > In ubuntu-16.04-from-scratch, > org.apache.impala.analysis.AnalyzeDDLTest.TestCreateTableLikeFileOrc > failed with > Error during analysis: > org.apache.impala.common.AnalysisException: Cannot infer schema, path does > not exist: > hdfs://localhost:20500/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0 > sql: > create table if not exists newtbl_DNE like orc > '/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0' > Stacktrace > java.lang.AssertionError: > Error during analysis: > org.apache.impala.common.AnalysisException: Cannot infer schema, path does > not exist: > hdfs://localhost:20500/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0 > sql: > create table if not exists newtbl_DNE like orc > '/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0' > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.impala.common.FrontendFixture.analyzeStmt(FrontendFixture.java:397) > at > org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:246) > at > org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:186) > at > org.apache.impala.analysis.AnalyzeDDLTest.TestCreateTableLikeFileOrc(AnalyzeDDLTest.java:2027) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IMPALA-10123) asf-master-core-tsan load data error
[ https://issues.apache.org/jira/browse/IMPALA-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar closed IMPALA-10123. - Resolution: Duplicate I think this is a duplicate of IMPALA-10129. The underlying error was in the impalad.ERROR logs for data load. > asf-master-core-tsan load data error > > > Key: IMPALA-10123 > URL: https://issues.apache.org/jira/browse/IMPALA-10123 > Project: IMPALA > Issue Type: Bug >Reporter: Yongzhi Chen >Priority: Major > > The load data failed in asf-master-core-tsan two builds in a row: > 19:32:54 16:32:54 Error executing impala SQL: > /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/logs/data_loading/sql/functional/invalidate-functional-query-exhaustive-impala-generated.sql > See: > /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/logs/data_loading/sql/functional/invalidate-functional-query-exhaustive-impala-generated.sql.log > In the log, it shows: > Encounter errors before parsing any queries. > Traceback (most recent call last): > File > "/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/bin/load-data.py", > line 202, in exec_impala_query_from_file > impala_client.connect() > File > "/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/tests/beeswax/impala_beeswax.py", > line 162, in connect > raise ImpalaBeeswaxException(self.__build_error_message(e), e) > ImpalaBeeswaxException: ImpalaBeeswaxException: > INNER EXCEPTION: > MESSAGE: Could not connect to localhost:21000 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10129) Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats
Sahil Takiar created IMPALA-10129: - Summary: Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats Key: IMPALA-10129 URL: https://issues.apache.org/jira/browse/IMPALA-10129 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar Assignee: Qifan Chen TSAN is reporting a data race in {{MemTracker::GetTopNQueriesAndUpdatePoolStats}} {code} WARNING: ThreadSanitizer: data race (pid=6436) Read of size 1 at 0x7b480017aaa8 by thread T320 (mutexes: write M861448892003377216, write M862574791910219632, write M623321199144890016, write M1054540811927503496): #0 impala::MemTracker::GetTopNQueriesAndUpdatePoolStats(std::priority_queue >, std::greater >&, int, impala::TPoolStats&) /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:453:19 (impalad+0x20b13b1) #1 impala::MemTracker::UpdatePoolStatsForQueries(int, impala::TPoolStats&) /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:432:3 (impalad+0x20b123d) #2 impala::AdmissionController::PoolStats::UpdateMemTrackerStats() /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1642:14 (impalad+0x21c9d10) #3 impala::AdmissionController::AddPoolUpdates(std::vector >*) /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1662:18 (impalad+0x21c7053) #4 impala::AdmissionController::UpdatePoolStats(std::map, std::allocator >, impala::TTopicDelta, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::TTopicDelta> > > const&, std::vector >*) /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1355:5 (impalad+0x21c6d7d) #5 impala::AdmissionController::Init()::$_4::operator()(std::map, std::allocator >, impala::TTopicDelta, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::TTopicDelta> > > const&, std::vector >*) const /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:643:45 (impalad+0x21ce0e1) #6 boost::detail::function::void_function_obj_invoker2, std::allocator >, impala::TTopicDelta, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::TTopicDelta> > > const&, std::vector >*>::invoke(boost::detail::function::function_buffer&, std::map, std::allocator >, impala::TTopicDelta, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::TTopicDelta> > > const&, std::vector >*) /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11 (impalad+0x21cdf2c) #7 boost::function2, std::allocator >, impala::TTopicDelta, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::TTopicDelta> > > const&, std::vector >*>::operator()(std::map, std::allocator >, impala::TTopicDelta, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::TTopicDelta> > > const&, std::vector >*) const /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 (impalad+0x23fa960) #8 impala::StatestoreSubscriber::UpdateState(std::map, std::allocator >, impala::TTopicDelta, std::less, std::allocator > >, std::allocator, std::allocator > const, impala::TTopicDelta> > > const&, impala::TUniqueId const&, std::vector >*, bool*) /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:471:7 (impalad+0x23f7899) #9 impala::StatestoreSubscriberThriftIf::UpdateState(impala::TUpdateStateResponse&, impala::TUpdateStateRequest const&) /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:110:18 (impalad+0x23fabbf) #10 impala::StatestoreSubscriberProcessor::process_UpdateState(int, apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, void*) /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/StatestoreSubscriber.cpp:543:13 (impalad+0x29adba4) #11 impala::StatestoreSubscriberProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::__cxx11::basic_string, std::allocator > const&, int, void*) /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/StatestoreSubscriber.cpp:516:3 (impalad+0x29ad982) #12 apache::thrift::TDispatchProcessor::process(boost::shared_ptr, boost::shared_ptr, void*) /data/jenkins/workspace/impala-asf-master-core-
[jira] [Resolved] (IMPALA-10030) Remove unneeded jars from fe/pom.xml
[ https://issues.apache.org/jira/browse/IMPALA-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-10030. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Remove unneeded jars from fe/pom.xml > > > Key: IMPALA-10030 > URL: https://issues.apache.org/jira/browse/IMPALA-10030 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > There are several jars dependencies that are (1) not needed, (2) can easily > be removed, (3) can be converted to test dependencies, or (4) pull in > unnecessary transitive dependencies. > Removing all these jar dependencies can help decrease the size of Impala > Docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10117) Skip calls to FsPermissionCache for blob stores
Sahil Takiar created IMPALA-10117: - Summary: Skip calls to FsPermissionCache for blob stores Key: IMPALA-10117 URL: https://issues.apache.org/jira/browse/IMPALA-10117 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The {{FsPermissionCache}} is described as: {code:java} /** * Simple non-thread-safe cache for resolved file permissions. This allows * pre-caching permissions by listing the status of all files within a directory, * and then using that cache to avoid round trips to the FileSystem for later * queries of those paths. */ {code} I confirmed, and {{FsPermissionCache#precacheChildrenOf}} is actually called for data stored on S3. The issue is that {{FsPermissionCache#getPermissions}} is called inside {{HdfsTable#getAvailableAccessLevel}}, which is skipped for S3. So all the cached metadata is not used. The problem is that {{precacheChildrenOf}} calls {{getFileStatus}} for all files, which results in a bunch of unnecessary metadata operations to S3 + a bunch of cached metadata that is never used. {{precacheChildrenOf}} is actually only invoked in the specific scenario described below: {code} // Only preload permissions if the number of partitions to be added is // large (3x) relative to the number of existing partitions. This covers // two common cases: // // 1) initial load of a table (no existing partition metadata) // 2) ALTER TABLE RECOVER PARTITIONS after creating a table pointing to // an already-existing partition directory tree // // Without this heuristic, we would end up using a "listStatus" call to // potentially fetch a bunch of irrelevant information about existing // partitions when we only want to know about a small number of newly-added // partitions. {code} Regardless, skipping the call to {{precacheChildrenOf}} for blob stores should (1) improve table loading time for S3 backed tables, and (2) decrease catalogd memory requirements when loading a bunch of tables stored on S3. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10073) Create shaded dependency for S3A and aws-java-sdk-bundle
[ https://issues.apache.org/jira/browse/IMPALA-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-10073. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Create shaded dependency for S3A and aws-java-sdk-bundle > > > Key: IMPALA-10073 > URL: https://issues.apache.org/jira/browse/IMPALA-10073 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > One of the largest dependencies in Impala Docker containers is the > aws-java-sdk-bundle jar. One way to decrease the size of this dependency is > to apply a similar technique used for the hive-exec shaded jar: > [https://github.com/apache/impala/blob/master/shaded-deps/pom.xml] > The aws-java-sdk-bundle contains SDKs for all AWS services, even though > Impala-S3A only requires a few of the more basic SDKs. > IMPALA-10028 and HADOOP-17197 both discuss this a bit as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8547) get_json_object fails to get value for numeric key
[ https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8547. -- Fix Version/s: Impala 4.0 Resolution: Fixed > get_json_object fails to get value for numeric key > -- > > Key: IMPALA-8547 > URL: https://issues.apache.org/jira/browse/IMPALA-8547 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Eugene Zimichev >Assignee: Eugene Zimichev >Priority: Minor > Labels: built-in-function > Fix For: Impala 4.0 > > > {code:java} > select get_json_object('{"1": 5}', '$.1'); > {code} > returns error: > > {code:java} > "Expected key at position 2" > {code} > > I guess it's caused by using function FindEndOfIdentifier that expects first > symbol of key to be a letter. > Hive version of get_json_object works fine in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10085) Table level stats are not honored when partition has corrupt stats
Sahil Takiar created IMPALA-10085: - Summary: Table level stats are not honored when partition has corrupt stats Key: IMPALA-10085 URL: https://issues.apache.org/jira/browse/IMPALA-10085 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar This is more of an edge case of IMPALA-9744, but when any partition in a table has corrupt stats, the table-level stats will not be honored. On the other hand, if a table just has missing stats, the table-level stats will be honored. Given the a partitioned table with the following partitions and their row counts: {code:java} [localhost:21000] default> show partitions part_test; Query: show partitions part_test +-+++--+--+---++---+---+ | partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +-+++--+--+---++---+---+ | 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED| TEXT | false | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 | | 2 | -438290| 1 | 6B | NOT CACHED | NOT CACHED| TEXT | false | hdfs://localhost:20500/test-warehouse/part_test/partcol=2 | | 3 | 3 | 1 | 6B | NOT CACHED | NOT CACHED| TEXT | false | hdfs://localhost:20500/test-warehouse/part_test/partcol=3 | | Total | 100100 | 3 | 22B | 0B | | | | | +-+++--+--+---++---+---+ {code} The query {{explain select * from part_test order by col limit 10}} will cause {{HdfsScanNode#getStatsNumRows}} to return 5. Given the following set of partitions with different row counts than above: {code} +-+++--+--+---++---+---+ | partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +-+++--+--+---++---+---+ | 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED| TEXT | false | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 | | 2 | -1 | 1 | 6B | NOT CACHED | NOT CACHED| TEXT | false | hdfs://localhost:20500/test-warehouse/part_test/partcol=2 | | 3 | 3 | 1 | 6B | NOT CACHED | NOT CACHED| TEXT | false | hdfs://localhost:20500/test-warehouse/part_test/partcol=3 | | Total | 100100 | 3 | 22B | 0B | | | | | +-+++--+--+---++---+---+ {code} The same method returns 100100. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10084) Display the number of estimated rows for a table
Sahil Takiar created IMPALA-10084: - Summary: Display the number of estimated rows for a table Key: IMPALA-10084 URL: https://issues.apache.org/jira/browse/IMPALA-10084 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar AFAICT, there is no way to determine the number of rows estimated for a table when row counts have been estimated via file size: {code:java} [localhost:21000] default> create table test (col int); [localhost:21000] default> insert into table test values (1), (2), (3), (4), (5); [localhost:21000] default> show table stats test; +---++--+--+---++---++ | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +---++--+--+---++---++ | -1| 1 | 10B | NOT CACHED | NOT CACHED| TEXT | false | hdfs://localhost:20500/test-warehouse/test | +---++--+--+---++---++ [localhost:21000] default> explain select * from test order by col limit 10; ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=8.00KB Threads=3 | | Per-Host Resource Estimates: Memory=32MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.test | | | | PLAN-ROOT SINK | | | | | 02:MERGING-EXCHANGE [UNPARTITIONED] | | | order by: col ASC | | | limit: 10 | | | | | 01:TOP-N [LIMIT=10] | | | order by: col ASC | | | row-size=4B cardinality=3 | | | | | 00:SCAN HDFS [default.test] | |HDFS partitions=1/1 files=1 size=10B | |row-size=4B cardinality=3 | ++ [localhost:21000] default> set explain_level=3; localhost:21000] default> explain select * from test order by col limit 10; +--+ | Explain String | +--+ | Max Per-Host Resource Reservation: Memory=8.00KB Threads=3 | | Per-Host Resource Estimates: Memory=32MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.test | | Analyzed query: SELECT * FROM `default`.test ORDER BY col ASC LIMIT CAST(10 AS | | TINYINT) | | | | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1 | | PLAN-ROOT SINK | | | output exprs: col | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | | | 02:MERGING-EXCHANGE [UNPARTITIONED]
[jira] [Created] (IMPALA-10083) Improve row count estimates when stats are not available
Sahil Takiar created IMPALA-10083: - Summary: Improve row count estimates when stats are not available Key: IMPALA-10083 URL: https://issues.apache.org/jira/browse/IMPALA-10083 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Sahil Takiar There are various improvements that we can make to estimate row count stats even if stats are not available for a table. There are various factors to consider here: * Handling for partitioned vs. non-partitioned tables ** Handling for partitioned tables can be a bit tricky if the table is in a mixed state - some partitions have row counts while other don't * Interoperability with other systems such as Hive and Spark * Users can run alter table statements to manually set the value of the row count The JIRA will be used to track the various improvements via sub-tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10029) Strip debug symbols from libkudu_client and libstdc++ binaries
[ https://issues.apache.org/jira/browse/IMPALA-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-10029. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Strip debug symbols from libkudu_client and libstdc++ binaries > -- > > Key: IMPALA-10029 > URL: https://issues.apache.org/jira/browse/IMPALA-10029 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > IMPALA-8425 strips the debug symbols of the impalad binary. libkudu_client.so > and libstdc++ also take up a non-trivial amount of space in the Docker > containers, so we should strip debug symbols from them as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10073) Created shaded dependency for S3A and aws-java-sdk-bundle
Sahil Takiar created IMPALA-10073: - Summary: Created shaded dependency for S3A and aws-java-sdk-bundle Key: IMPALA-10073 URL: https://issues.apache.org/jira/browse/IMPALA-10073 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar One of the largest dependencies in Impala Docker containers is the aws-java-sdk-bundle jar. One way to decrease the size of this dependency is to apply a similar technique used for the hive-exec shaded jar: [https://github.com/apache/impala/blob/master/shaded-deps/pom.xml] The aws-java-sdk-bundle contains SDKs for all AWS services, even though Impala-S3A only requires a few of the more basic SDKs. IMPALA-10028 and HADOOP-17197 both discuss this a bit as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10072) Data load failures in ubuntu-16.04-from-scratch
Sahil Takiar created IMPALA-10072: - Summary: Data load failures in ubuntu-16.04-from-scratch Key: IMPALA-10072 URL: https://issues.apache.org/jira/browse/IMPALA-10072 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar Seems like there are consistent data load failures on several unrelated patches: [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11627/] [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11629/|https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11629/#showFailuresLink] [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11631/|https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11631/#showFailuresLink] [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11633/|https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11633/#showFailuresLink] [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11635/] Almost all seem to be failing with an error like this: {code:java} 02:06:32 Loading nested parquet data (logging to /home/ubuntu/Impala/logs/data_loading/load-nested.log)... 02:08:06 FAILED (Took: 1 min 34 sec) 02:08:06 '/home/ubuntu/Impala/testdata/bin/load_nested.py -t tpch_nested_parquet -f parquet/none' failed. Tail of log: 02:08:06at javax.security.auth.Subject.doAs(Subject.java:422) 02:08:06at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) 02:08:06at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) 02:08:06 02:08:06at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:220) 02:08:06at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1361) 02:08:06at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732) 02:08:06at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756) 02:08:06at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756) 02:08:06at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756) 02:08:06at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:471) 02:08:06... 17 more 02:08:06 Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test-warehouse/tpch_nested_parquet.db/.hive-staging_hive_2020-08-11_02-07-45_902_3668710725192096563-193/_task_tmp.-ext-10004/_tmp.00_3 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation. 02:08:06at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2259) 02:08:06at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) 02:08:06at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2773) 02:08:06at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:879) 02:08:06at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:583) 02:08:06at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) 02:08:06at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) 02:08:06at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) 02:08:06at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985) 02:08:06at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913) 02:08:06at java.security.AccessController.doPrivileged(Native Method) 02:08:06at javax.security.auth.Subject.doAs(Subject.java:422) 02:08:06at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) 02:08:06at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) 02:08:06 02:08:06at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1553) 02:08:06at org.apache.hadoop.ipc.Client.call(Client.java:1499) 02:08:06at org.apache.hadoop.ipc.Client.call(Client.java:1396) 02:08:06at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) 02:08:06at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) 02:08:06at com.sun.proxy.$Proxy15.addBlock(Unknown Source) 02:08:06at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:520) 02:08:06at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 02:08:06at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 02:08:06at sun.
[jira] [Created] (IMPALA-10068) Split out jars for catalog Docker images
Sahil Takiar created IMPALA-10068: - Summary: Split out jars for catalog Docker images Key: IMPALA-10068 URL: https://issues.apache.org/jira/browse/IMPALA-10068 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar One way to decrease the size of the catalogd images is to only include jar files necessary to run the catalogd. Currently, all Impala coordiantor / executor jars are included in the catalogd images, which is not necessary. This can be fixed by splitting the fe/ Java code into fe/ and catalogd/ folders (and perhaps a java-common/ folder). This is probably a nice improvement to make regardless because the fe and catalogd code should really be in separate Maven modules. By separating all catalogd code into a separate Maven module it should be easy to modify the Docker built scripts to only copy in the catalogd jars for the catalogd Impala image. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10067) TestImpalaShell.test_large_sql is flaky
Sahil Takiar created IMPALA-10067: - Summary: TestImpalaShell.test_large_sql is flaky Key: IMPALA-10067 URL: https://issues.apache.org/jira/browse/IMPALA-10067 Project: IMPALA Issue Type: Test Components: Clients Reporter: Sahil Takiar {code:java} shell.test_shell_commandline.TestImpalaShell.test_large_sql[table_format_and_file_extension: ('textfile', '.txt') | protocol: hs2-http] {code} This test failed recently in a pre-commit job: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2920/testReport/junit/shell.test_shell_commandline/TestImpalaShell/test_large_sql_table_format_and_file_extensiontextfile_txt_protocol__hs2_http_/ {code} Error Message shell/test_shell_commandline.py:882: in test_large_sql assert actual_time_s <= time_limit_s, ( E AssertionError: It took 20.2972311974 seconds to execute the query. Time limit is 20 seconds. E assert 20.297231197357178 <= 20 Stacktrace shell/test_shell_commandline.py:882: in test_large_sql assert actual_time_s <= time_limit_s, ( E AssertionError: It took 20.2972311974 seconds to execute the query. Time limit is 20 seconds. E assert 20.297231197357178 <= 20 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9478) Runtime profiles should indicate if custom UDFs are being used
[ https://issues.apache.org/jira/browse/IMPALA-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9478. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Runtime profiles should indicate if custom UDFs are being used > -- > > Key: IMPALA-9478 > URL: https://issues.apache.org/jira/browse/IMPALA-9478 > Project: IMPALA > Issue Type: Task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > Custom UDFs can include arbitrary user code that can cause query slowdown. In > order to better diagnose queries with UDF issues, it is first important to > know when a query is even using an UDF. > Runtime profiles should list out any custom UDFs used by the query, as well > as the library the UDF is loaded from. > For Java UDFs, the full classname of the UDF would be good as well. > Any other metadata associated with the UDF might be useful as well. There are > a few things that are printed by {{show functions}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10049) Include RPC call_id in slow RPC logs
Sahil Takiar created IMPALA-10049: - Summary: Include RPC call_id in slow RPC logs Key: IMPALA-10049 URL: https://issues.apache.org/jira/browse/IMPALA-10049 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar The current code for logging slow RPCs on the sender side looks something like this: {code:java} template void KrpcDataStreamSender::Channel::LogSlowRpc( ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) { int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns(); LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_ ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << "): " ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) << ". " ¦ ¦ ¦ ¦ ¦ << "Receiver time: " ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), TUnit::TIME_NS) ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, TUnit::TIME_NS); }void KrpcDataStreamSender::Channel::LogSlowFailedRpc( ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) { LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_ ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << "): " ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) << ". " ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString(); } {code} It would be nice to include the call_id in the logs as well so that RPCs can more easily be traced. The RPC call_id is dumped in RPC traces on the receiver side, as well as in the /rpcz output on the debug ui. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10035) send_bytes_per_sec in /rpcz json stats can be negative
Sahil Takiar created IMPALA-10035: - Summary: send_bytes_per_sec in /rpcz json stats can be negative Key: IMPALA-10035 URL: https://issues.apache.org/jira/browse/IMPALA-10035 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar {code:java} { "remote_ip": "10.196.10.165:27000", "num_calls_in_flight": 0, "outbound_queue_size": 0, "socket_stats": { "rtt": 91, "rttvar": 9, "snd_cwnd": 10, "total_retrans": 0, "pacing_rate": 4294967295, "max_pacing_rate": 4294967295, "bytes_acked": 7995867431, "bytes_received": 17908351, "segs_out": 1186603, "segs_in": 927339, "send_queue_bytes": 0, "receive_queue_bytes": 0, "send_bytes_per_sec": -694198066 }, "calls_in_flight": [] }, {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10030) Remove unneeded jars from fe/pom.xml
Sahil Takiar created IMPALA-10030: - Summary: Remove unneeded jars from fe/pom.xml Key: IMPALA-10030 URL: https://issues.apache.org/jira/browse/IMPALA-10030 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar There are several jars dependencies that are (1) not needed, (2) can easily be removed, (3) can be converted to test dependencies, or (4) pull in unnecessary transitive dependencies. Removing all these jar dependencies can help decrease the size of Impala Docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10029) Strip debug symbols from libkudu_client and libstdc++ binaries
Sahil Takiar created IMPALA-10029: - Summary: Strip debug symbols from libkudu_client and libstdc++ binaries Key: IMPALA-10029 URL: https://issues.apache.org/jira/browse/IMPALA-10029 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar IMPALA-8425 strips the debug symbols of the impalad binary. libkudu_client.so and libstdc++ also take up a non-trivial amount of space in the Docker containers, so we should strip debug symbols from them as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10028) Additional optimizations of Impala docker container sizes
Sahil Takiar created IMPALA-10028: - Summary: Additional optimizations of Impala docker container sizes Key: IMPALA-10028 URL: https://issues.apache.org/jira/browse/IMPALA-10028 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Assignee: Sahil Takiar There are some more optimizations we can make to get the images to be even smaller. It looks like we may have regressed with regards to image size as well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release build and they are currently 1.01 GB. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10016) Split jars for Impala executors and coordinators Docker images
Sahil Takiar created IMPALA-10016: - Summary: Split jars for Impala executors and coordinators Docker images Key: IMPALA-10016 URL: https://issues.apache.org/jira/browse/IMPALA-10016 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Assignee: Sahil Takiar Impala executors and coordinator currently have a common base images. The base image defines a set of jar files needed by either the coordinator or the executor. In order to reduce the image size, we should split out the jars into two categories: those necessary for the coordinator and those necessary for the executor. This should help reduce overall image size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9479) Include GC time in runtime profiles
[ https://issues.apache.org/jira/browse/IMPALA-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9479. -- Resolution: Duplicate Closing as duplicate of IMPALA-9046 > Include GC time in runtime profiles > --- > > Key: IMPALA-9479 > URL: https://issues.apache.org/jira/browse/IMPALA-9479 > Project: IMPALA > Issue Type: Task >Reporter: Sahil Takiar >Priority: Major > > The JvmPauseMonitor prints out logs whenever it detects an excessive amount > of time being spent in GC. However, these log lines can often go unnoticed, > it would be useful to include some GC related information in the runtime > profiles. > This is useful for diagnosing: > * Issues with Java UDFs that spend a lot of time in GC > * GC issues on the Coordinator from the fe/ code > * Some S3 operations could potentially be GC intensive - e.g. S3A block > output stream > I'm not sure there is a way to track GC per query, since GC happens globally > inside the JVM. There are a few ways to get GC information into the profile: > * If the JvmPauseMonitor detects a GC pause it can insert a warning in the > profiles of all running queries > * JMX metrics can be used to detect how much time was spent in GC from when > a fragment began to when it ended -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8754) S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB
[ https://issues.apache.org/jira/browse/IMPALA-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8754. -- Resolution: Duplicate Closing as a duplicate of IMPALA-9058. > S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB > - > > Key: IMPALA-8754 > URL: https://issues.apache.org/jira/browse/IMPALA-8754 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky > Attachments: load-tpch-core-impala-generated-kudu-none-none.sql.log > > > When running tests on s3 with s3guard, various tests can encounter the > following error coming from the DynamoDB: > {noformat} > EQuery aborted:Disk I/O error on > impala-ec2-centos74-m5-4xlarge-ondemand-02c8.vpc.cloudera.com:22002: Failed > to open HDFS file > s3a://impala-test-uswest2-1/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2451718/6843d8a91fc5ae1d-88b2af4b0004_156969840_data.0.parq > E Error(2): No such file or directory > E Root cause: ResourceNotFoundException: Requested resource not found > (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: > ResourceNotFoundException; Request ID: > XXX){noformat} > Tests that have seen this (this is flaky): > * TestTpcdsQuery.test_tpcds_count > * TestHdfsFdCaching.test_caching_disabled_by_param > * TestMtDop.test_compute_stats > * TestScanRangeLengths.test_scan_ranges -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9996) An S3 test failing with ResourceNotFoundException
[ https://issues.apache.org/jira/browse/IMPALA-9996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9996. -- Resolution: Duplicate Looks like a duplicate of IMPALA-8754 and IMPALA-9058 > An S3 test failing with ResourceNotFoundException > - > > Key: IMPALA-9996 > URL: https://issues.apache.org/jira/browse/IMPALA-9996 > Project: IMPALA > Issue Type: Bug >Reporter: Fang-Yu Rao >Assignee: Sahil Takiar >Priority: Critical > Labels: broken-build, flaky > > In a recent S3 build, we have seen that > [test_tpcds_count|https://github.com/apache/impala/blob/master/tests/query_test/test_tpcds_queries.py#L52-L53] > > ([https://github.com/apache/impala/blob/master/testdata/workloads/tpcds/queries/count.test#L114-L119]) > failed with {{ResourceNotFoundException}}. > The issue may be related to IMPALA-9058. > The error message in {{impalad.INFO}} (under the directory of {{ee_tests}}) > is as follows. > {code:java} > I0722 10:31:44.524209 13047 coordinator.cc:684] ExecState: query > id=7d4f684028848784:ad2f6f0e > finstance=7d4f684028848784:ad2f6f0e0001 on > host=impala-ec2-centos74-m5-4xlarge-ondemand-1230.vpc.cloudera.com:22002 > (EXECUTING -> ERROR) status=Disk I/O error on > impala-ec2-centos74-m5-4xlarge-ondemand-1230.vpc.cloudera.com:22002: Failed > to open HDFS file > s3a://impala-test-uswest2-1/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2451752/d245f3a054fd2c66-f7f705220004_1984874558_data.0.parq > Error(2): No such file or directory > Root cause: ResourceNotFoundException: Requested resource not found (Service: > AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; > Request ID: G9QMA17VTSIKOQK33V9TDGMF13VV4KQNSO5AEMVJF66Q9ASUAAJG) > {code} > Maybe [~stakiar] and [~joemcdonnell] could offer some insight into it. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9799) Flakiness in TestFetchFirst due to wrong results of get_num_in_flight_queries
[ https://issues.apache.org/jira/browse/IMPALA-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9799. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Flakiness in TestFetchFirst due to wrong results of get_num_in_flight_queries > - > > Key: IMPALA-9799 > URL: https://issues.apache.org/jira/browse/IMPALA-9799 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Quanlong Huang >Assignee: Sahil Takiar >Priority: Critical > Labels: broken-build > Fix For: Impala 4.0 > > > Saw two failures for this test in different jenkins jobs: > hs2.test_fetch_first.TestFetchFirst.test_query_stmts_v6 (from pytest) > Stacktrace: > {code:java} > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:63: > in add_session > lambda: fn(self)) > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:44: > in add_session_helper > fn() > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:63: > in > lambda: fn(self)) > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:110: > in test_query_stmts_v6 > self.run_query_stmts_test() > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:181: > in run_query_stmts_test > self.__test_invalid_result_caching("SELECT COUNT(*) FROM > functional.alltypes") > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:63: > in __test_invalid_result_caching > assert 0 == impalad.get_num_in_flight_queries() > E assert 0 == 1 > E+ where 1 = >() > E+where > = > 0x6d25d10>.get_num_in_flight_queries{code} > hs2.test_fetch_first.TestFetchFirst.test_query_stmts_v6_with_result_spooling > (from pytest) > Stacktrace: > {code:java} > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:63: > in add_session > lambda: fn(self)) > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:44: > in add_session_helper > fn() > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:63: > in > lambda: fn(self)) > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:120: > in test_query_stmts_v6_with_result_spooling > self.run_query_stmts_test({'spool_query_results': 'true'}) > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:181: > in run_query_stmts_test > self.__test_invalid_result_caching("SELECT COUNT(*) FROM > functional.alltypes") > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:63: > in __test_invalid_result_caching > assert 0 == impalad.get_num_in_flight_queries() > E assert 0 == 1 > E+ where 1 = >() > E+where > = > 0x81d4990>.get_num_in_flight_queries{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9953) Shell does not return all rows if a fetch times out in FINISHED state
[ https://issues.apache.org/jira/browse/IMPALA-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9953. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Shell does not return all rows if a fetch times out in FINISHED state > - > > Key: IMPALA-9953 > URL: https://issues.apache.org/jira/browse/IMPALA-9953 > Project: IMPALA > Issue Type: Bug > Components: Clients >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Blocker > Labels: correctness > Fix For: Impala 4.0 > > > I noticed that if a fetch times out, impala-shell will stop returning rows > and close the query. It looks like this happens if the query transitions to > FINISHED state, then the fetch times out > I ran into this on an experimental branch where a sort deadlocked. I haven't > been able to repro on master yet but I thought I should report it. > The bug is here: > {noformat} > diff --git a/shell/impala_shell.py b/shell/impala_shell.py > index e0d802626..323aee6c9 100755 > --- a/shell/impala_shell.py > +++ b/shell/impala_shell.py > @@ -1182,8 +1182,7 @@ class ImpalaShell(cmd.Cmd, object): > > for rows in rows_fetched: ># IMPALA-4418: Break out of the loop to prevent printing an > unnecessary empty line. > - if len(rows) == 0: > -break > + if len(rows) == 0: continue >self.output_stream.write(rows) >num_rows += len(rows) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9993) Improve get_json_object path specification format
Sahil Takiar created IMPALA-9993: Summary: Improve get_json_object path specification format Key: IMPALA-9993 URL: https://issues.apache.org/jira/browse/IMPALA-9993 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Filing as a follow up to IMPALA-8547 based on some of the discussion in [https://gerrit.cloudera.org/#/c/14905/] It seems most databases have a slightly different way of handling JSON data. The Hive / Impala behavior seems similar to MySQL in syntax (e.g. JSON_EXTRACT), although MySQL is much more restrictive about the path specification format. Postgres on the other hand has a slightly different syntax for path specification compared to MySQL / Hive / Impala, and is more permissive in what formats it allows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9991) TestShellClient.test_fetch_size_result_spooling is flaky
Sahil Takiar created IMPALA-9991: Summary: TestShellClient.test_fetch_size_result_spooling is flaky Key: IMPALA-9991 URL: https://issues.apache.org/jira/browse/IMPALA-9991 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Assignee: Sahil Takiar shell.test_shell_client.TestShellClient.test_fetch_size_result_spooling[table_format_and_file_extension: ('parquet', '.parq') | protocol: hs2] (from pytest) h3. Error Message shell/test_shell_client.py:70: in test_fetch_size_result_spooling self.__fetch_rows(client.fetch(handle), num_rows / fetch_size, num_rows) shell/test_shell_client.py:80: in __fetch_rows for fetch_batch in fetch_batches: ../shell/impala_client.py:787: in fetch yield self._transpose(col_value_converters, resp.results.columns) E AttributeError: 'NoneType' object has no attribute 'columns' h3. Stacktrace shell/test_shell_client.py:70: in test_fetch_size_result_spooling self.__fetch_rows(client.fetch(handle), num_rows / fetch_size, num_rows) shell/test_shell_client.py:80: in __fetch_rows for fetch_batch in fetch_batches: ../shell/impala_client.py:787: in fetch yield self._transpose(col_value_converters, resp.results.columns) E AttributeError: 'NoneType' object has no attribute 'columns' h3. Standard Error Opened TCP connection to localhost:21050 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9833) query_test.test_observability.TestQueryStates.test_error_query_state is flaky
[ https://issues.apache.org/jira/browse/IMPALA-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9833. -- Fix Version/s: Impala 4.0 Resolution: Fixed Closing for now. We can re-open if the issue occurs again. > query_test.test_observability.TestQueryStates.test_error_query_state is flaky > - > > Key: IMPALA-9833 > URL: https://issues.apache.org/jira/browse/IMPALA-9833 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Xiaomeng Zhang >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2521/testReport/junit/query_test.test_observability/TestQueryStates/test_error_query_state/] > It seems the test could not get query profile after retries in 30s. > {code:java} > Stacktracequery_test/test_observability.py:777: in test_error_query_state > lambda: self.client.get_runtime_profile(handle)) > common/impala_test_suite.py:1120: in assert_eventually > count, timeout_s, error_msg_str)) > E Timeout: Check failed to return True after 30 tries and 30 seconds error > message: Query (id=fe45e8bfd138acd3:c67a3796) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9954) RpcRecvrTime can be negative
Sahil Takiar created IMPALA-9954: Summary: RpcRecvrTime can be negative Key: IMPALA-9954 URL: https://issues.apache.org/jira/browse/IMPALA-9954 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar Attachments: profile_034e7209bd98c96c_9a448dfc.txt Saw this on a recent version of master. Attached the full runtime profile. {code:java} KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, % non-child: 32.30%) ExecOption: Unpartitioned Sender Codegen Disabled: not needed - BytesSent (500.000ms): 0, 0 - NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 4.34 MB/sec ; Number of samples: 1) - RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; Number of samples: 2) - RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: -71077.000ns ; Number of samples: 2) - EosSent: 1 (1) - PeakMemoryUsage: 416.00 B (416) - RowsSent: 100 (100) - RpcFailure: 0 (0) - RpcRetry: 0 (0) - SerializeBatchTime: 2.880ms - TotalBytesSent: 28.67 KB (29355) - UncompressedRowBatchSize: 69.29 KB (70950) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-5534) Fix and re-enable run-process-failure-tests.sh
[ https://issues.apache.org/jira/browse/IMPALA-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-5534. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Fix and re-enable run-process-failure-tests.sh > -- > > Key: IMPALA-5534 > URL: https://issues.apache.org/jira/browse/IMPALA-5534 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, > Impala 2.8.0 >Reporter: Alexander Behm >Assignee: Sahil Takiar >Priority: Major > Labels: test > Fix For: Impala 4.0 > > > See bin/run-all-tests.sh: > {code} > ... > # Finally, run the process failure tests. > # Disabled temporarily until we figure out the proper timeouts required to > make the test > # succeed. > # ${IMPALA_HOME}/tests/run-process-failure-tests.sh > ... > {code} > We should fix and re-enable these tests or alternatively re-implement the > tests in a different way to get the same coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9834) test_query_retries.TestQueryRetries is flaky on erasure coding configurations
[ https://issues.apache.org/jira/browse/IMPALA-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9834. -- Fix Version/s: Impala 4.0 Resolution: Fixed We disabled all these tests on EC builds (see commit message in previous comment), so this shouldn't be an issue anymore. > test_query_retries.TestQueryRetries is flaky on erasure coding configurations > - > > Key: IMPALA-9834 > URL: https://issues.apache.org/jira/browse/IMPALA-9834 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Assignee: Sahil Takiar >Priority: Blocker > Labels: broken-build, flaky > Fix For: Impala 4.0 > > > Multiple tests from test_query_retries.TestQueryRetries hit errors like this > (test_retry_query_cancel): > {noformat} > custom_cluster/test_query_retries.py:321: in test_retry_query_cancel > self.__validate_runtime_profiles_from_service(impalad_service, handle) > custom_cluster/test_query_retries.py:435: in > __validate_runtime_profiles_from_service > self.__validate_runtime_profiles(retried_profile, handle.get_handle().id) > custom_cluster/test_query_retries.py:503: in __validate_runtime_profiles > retried_query_id = > self.__get_query_id_from_profile(retried_runtime_profile) > custom_cluster/test_query_retries.py:474: in __get_query_id_from_profile > assert query_id_search, "Invalid query profile, has no query id" > E AssertionError: Invalid query profile, has no query id > E assert None{noformat} > Or this (test_kill_impalad_expect_retries, test_kill_impalad_expect_retry, > test_retry_query_hs2): > {noformat} > custom_cluster/test_query_retries.py:424: in test_retry_query_hs2 > self.hs2_client.get_query_id(handle)) > custom_cluster/test_query_retries.py:508: in __validate_runtime_profiles > original_query_id) > custom_cluster/test_query_retries.py:489: in __validate_original_id_in_profile > assert original_id_search, \ > E AssertionError: Could not find original id pattern 'Original Query Id: > (.*)' in profile: > ...{noformat} > I have only seen these errors on erasure coding so far, and it isn't > deterministic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-3380) Add TCP timeouts to all RPCs that don't block
[ https://issues.apache.org/jira/browse/IMPALA-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-3380. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Add TCP timeouts to all RPCs that don't block > - > > Key: IMPALA-3380 > URL: https://issues.apache.org/jira/browse/IMPALA-3380 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec >Affects Versions: Impala 2.5.0 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Minor > Labels: observability, supportability > Fix For: Impala 4.0 > > > Most RPCs should not take an unbounded amount of time to complete (the > exception is {{TransmitData()}}, but that may also change). To handle hang > failures on the remote machine, we should add timeouts to every RPC (so, > really, every RPC client), and handle the timeout failure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9734) ACID-query retry integration
[ https://issues.apache.org/jira/browse/IMPALA-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9734. -- Resolution: Not A Problem After talking with several folks offline. This does not seem to be an issue. Impala currently does not open a transaction for read-only queries (although Hive does, and perhaps Impala will at some point in the future). Transactions are only opened for write-only queries. Transparent query retries currently don't support write queries (and there are no current plans to implement this in the near-term). The only ACID consideration is that the snapshot view of the data from the original query should be the same view of the data in the retried query. e.g. the set of files and version of the tables scanned in the original query should be the same for the retried query. The current transparent query logic already handles this because the TExecRequest is simply copied from the original query to the retried query. The planning phase will be skipped, so the set of files will to be scanned will be the same. > ACID-query retry integration > > > Key: IMPALA-9734 > URL: https://issues.apache.org/jira/browse/IMPALA-9734 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Priority: Major > > We need to consider how query retries interact with ACID transactions. As of > IMPALA-9199, Impala will create new ClientRequestStates for each query retry > and will cache the TExecRequest between ClientRequestStates. This might not > be safe for ACID transactions. If the first query attempt fails, then the > transaction will fail and a new one will be required. However, the query > retry will use the transaction id / info from the original query attempt. > I think the semantics are not entirely clear here, and we don't have any > tests for this. So the goal of this JIRA is to (1) identify if there are any > issues with the current approach, (2) fix any issues with transactions during > query retries, and (3) add some query retry tests that enable transactions. > We might want to consider whether a query and it's retry should be in the > same, or different transactions. Keeping them in the same transaction should > allow us cache the TExecRequest. If they are in separate transactions, then > Impala might need to create a new TExecRequest for each retry. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9502) Avoid copying TExecRequest when retrying queries
[ https://issues.apache.org/jira/browse/IMPALA-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9502. -- Resolution: Later Closing as 'Later'. We can revisit this later if we think it is actually an issue. > Avoid copying TExecRequest when retrying queries > > > Key: IMPALA-9502 > URL: https://issues.apache.org/jira/browse/IMPALA-9502 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > There are a few issues that occur when re-using a {{TExecRequest}} across > query retries. We should investigate if there is a way to work around those > issues so that the {{TExecRequest}} does not need to be copied when retrying > a query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9854) TSAN data race in QueryDriver::CreateRetriedClientRequestState
[ https://issues.apache.org/jira/browse/IMPALA-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9854. -- Fix Version/s: Impala 4.0 Resolution: Fixed > TSAN data race in QueryDriver::CreateRetriedClientRequestState > -- > > Key: IMPALA-9854 > URL: https://issues.apache.org/jira/browse/IMPALA-9854 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > Seeing the following data race in {{test_query_retries.py}} > {code:java} > WARNING: ThreadSanitizer: data race (pid=5460) > Write of size 8 at 0x7b8c00261510 by thread T38: > #0 impala::TUniqueId::operator=(impala::TUniqueId&&) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/Types_types.cpp:967:6 > (impalad+0x1de1968) > #1 impala::ImpalaServer::PrepareQueryContext(impala::TNetworkAddress > const&, impala::TNetworkAddress const&, impala::TQueryCtx*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1069:23 > (impalad+0x2210dbf) > #2 impala::ImpalaServer::PrepareQueryContext(impala::TQueryCtx*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1024:3 > (impalad+0x220f3c1) > #3 > impala::QueryDriver::CreateRetriedClientRequestState(impala::ClientRequestState*, > std::unique_ptr std::default_delete >*, > std::shared_ptr*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:302:19 > (impalad+0x29de3ec) > #4 impala::QueryDriver::RetryQueryFromThread(impala::Status const&, > std::shared_ptr) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:203:3 > (impalad+0x29dd01f) > #5 boost::_mfi::mf2 std::shared_ptr >::operator()(impala::QueryDriver*, > impala::Status const&, std::shared_ptr) const > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280:29 > (impalad+0x29e1669) > #6 void boost::_bi::list3, > boost::_bi::value, > boost::_bi::value > > >::operator() const&, std::shared_ptr >, > boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 impala::QueryDriver, impala::Status const&, > std::shared_ptr >&, boost::_bi::list0&, int) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:398:9 > (impalad+0x29e1578) > #7 boost::_bi::bind_t impala::Status const&, std::shared_ptr >, > boost::_bi::list3, > boost::_bi::value, > boost::_bi::value > > >::operator()() > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 > (impalad+0x29e14c3) > #8 > boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf2 std::shared_ptr >, > boost::_bi::list3, > boost::_bi::value, > boost::_bi::value > > >, > void>::invoke(boost::detail::function::function_buffer&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11 > (impalad+0x29e1221) > #9 boost::function0::operator()() const > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 > (impalad+0x1e5ba81) > #10 impala::Thread::SuperviseThread(std::string const&, std::string > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3 > (impalad+0x2453776) > #11 void boost::_bi::list5, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > >::operator() boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, > std::string const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), boost::_bi::list0&, int) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9 > (impalad+0x245b93c) > #12 boost::_bi::bind_t const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list5, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > > >::operator()() > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 > (impalad+0x245b853) > #13 boost::detail::thread_data (*)(std::string const&, std::string const&, boost::fu
[jira] [Resolved] (IMPALA-9855) TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread
[ https://issues.apache.org/jira/browse/IMPALA-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9855. -- Fix Version/s: Impala 4.0 Resolution: Fixed > TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread > -- > > Key: IMPALA-9855 > URL: https://issues.apache.org/jira/browse/IMPALA-9855 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > TSAN reports the following error in {{test_query_retries.py}}. > {code:java} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=3786) > Cycle in lock order graph: M17348 (0x7b140035d2d8) => M804309746609755832 > (0x) => M17348 Mutex M804309746609755832 acquired here while > holding mutex M17348 in thread T370: > #0 AnnotateRWLockAcquired > /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cc:271 > (impalad+0x19bafcc) > #1 base::SpinLock::Lock() > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/gutil/spinlock.h:77:5 > (impalad+0x1a11585) > #2 impala::SpinLock::lock() > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/spinlock.h:34:8 > (impalad+0x1a11519) > #3 impala::ScopedShardedMapRef > >::ScopedShardedMapRef(impala::TUniqueId const&, > impala::ShardedQueryMap >*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/sharded-query-map-util.h:98:23 > (impalad+0x2220661) > #4 impala::ImpalaServer::GetQueryDriver(impala::TUniqueId const&, bool) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1296:53 > (impalad+0x22124ba) > #5 impala::QueryDriver::RetryQueryFromThread(impala::Status const&, > std::shared_ptr) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:279:25 > (impalad+0x29dd92c) > #6 boost::_mfi::mf2 std::shared_ptr >::operator()(impala::QueryDriver*, > impala::Status const&, std::shared_ptr) const > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280:29 > (impalad+0x29e1669) > #7 void boost::_bi::list3, > boost::_bi::value, > boost::_bi::value > > >::operator() const&, std::shared_ptr >, > boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 impala::QueryDriver, impala::Status const&, > std::shared_ptr >&, boost::_bi::list0&, int) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:398:9 > (impalad+0x29e1578) > #8 boost::_bi::bind_t impala::Status const&, std::shared_ptr >, > boost::_bi::list3, > boost::_bi::value, > boost::_bi::value > > >::operator()() > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 > (impalad+0x29e14c3) > #9 > boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf2 std::shared_ptr >, > boost::_bi::list3, > boost::_bi::value, > boost::_bi::value > > >, > void>::invoke(boost::detail::function::function_buffer&) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11 > (impalad+0x29e1221) > #10 boost::function0::operator()() const > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 > (impalad+0x1e5ba81) > #11 impala::Thread::SuperviseThread(std::string const&, std::string > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) > /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3 > (impalad+0x2453776) > #12 void boost::_bi::list5, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > >::operator() boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, > std::string const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), boost::_bi::list0&, int) > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9 > (impalad+0x245b93c) > #13 boost::_bi::bind_t const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list5, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > > >::operator()() > /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1
[jira] [Created] (IMPALA-9910) Impala Doc: Add docs for transparent query retries
Sahil Takiar created IMPALA-9910: Summary: Impala Doc: Add docs for transparent query retries Key: IMPALA-9910 URL: https://issues.apache.org/jira/browse/IMPALA-9910 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Sahil Takiar Add docs for transparent query retries (IMPALA-9124). The parent JIRA has a design doc describing the feature. The commit message for IMPALA-9199 should pretty helpful as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9849) Set halt_on_error=1 for TSAN builds
[ https://issues.apache.org/jira/browse/IMPALA-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9849. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Set halt_on_error=1 for TSAN builds > --- > > Key: IMPALA-9849 > URL: https://issues.apache.org/jira/browse/IMPALA-9849 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > IMPALA-9568 mistakingly removed the halt_on_error flag from TSAN builds. The > intention in IMPALA-9568 was to make sure that Impala crashes when a TSAN bug > is detected, Impala does this for ASAN builds already. The confusing part > about halt_on_error is that by default it is true in ASAN builds, but by > default it is false in TSAN builds. So halt_on_error needs to explicitly be > set to true for TSAN builds (but not for ASAN builds). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9844) Ozone support for load data inpath
[ https://issues.apache.org/jira/browse/IMPALA-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9844. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Ozone support for load data inpath > -- > > Key: IMPALA-9844 > URL: https://issues.apache.org/jira/browse/IMPALA-9844 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > Currently, attempts to run {{load data inpath}} against Ozone tables fail: > {code} > default> CREATE EXTERNAL TABLE o3_tab1 (id INT, col_1 string) ROW FORMAT > DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION > 'o3fs://bucket1.volume1.ozone1/o3_tab1'; > Query: CREATE EXTERNAL TABLE o3_tab1 (id INT, col_1 string) ROW FORMAT > DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION > 'o3fs://bucket1.volume1.ozone1/o3_tab1' > +-+ > | summary | > +-+ > | Table has been created. | > +-+ > Fetched 1 row(s) in 0.36s > default> load data inpath 'o3fs://bucket1.volume1.ozone1/file' into table > ozone_test_table2; > Query: load data inpath 'o3fs://bucket1.volume1.ozone1/file' into table > ozone_test_table2 > ERROR: AnalysisException: INPATH location > 'o3fs://bucket1.volume1.ozone1/file' must point to an HDFS, S3A, ADL or ABFS > filesystem. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9856) Enable result spooling by default
Sahil Takiar created IMPALA-9856: Summary: Enable result spooling by default Key: IMPALA-9856 URL: https://issues.apache.org/jira/browse/IMPALA-9856 Project: IMPALA Issue Type: Task Components: Backend Reporter: Sahil Takiar Result spooling has been relatively stable since it was introduced, and it has several benefits described in IMPALA-8656. It would be good to enable it by default. I looked into doing this a while ago, and there are a bunch of tests that rely on the "fetch one row batch at a time" behavior. Those tests fail when result spooling is enabled. The remaining linked tasks in IMPALA-8656 should be completed as well before enabling result spooling by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9855) TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread
Sahil Takiar created IMPALA-9855: Summary: TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread Key: IMPALA-9855 URL: https://issues.apache.org/jira/browse/IMPALA-9855 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar TSAN reports the following error in {{test_query_retries.py}}. {code:java} WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=3786) Cycle in lock order graph: M17348 (0x7b140035d2d8) => M804309746609755832 (0x) => M17348 Mutex M804309746609755832 acquired here while holding mutex M17348 in thread T370: #0 AnnotateRWLockAcquired /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cc:271 (impalad+0x19bafcc) #1 base::SpinLock::Lock() /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/gutil/spinlock.h:77:5 (impalad+0x1a11585) #2 impala::SpinLock::lock() /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/spinlock.h:34:8 (impalad+0x1a11519) #3 impala::ScopedShardedMapRef >::ScopedShardedMapRef(impala::TUniqueId const&, impala::ShardedQueryMap >*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/sharded-query-map-util.h:98:23 (impalad+0x2220661) #4 impala::ImpalaServer::GetQueryDriver(impala::TUniqueId const&, bool) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1296:53 (impalad+0x22124ba) #5 impala::QueryDriver::RetryQueryFromThread(impala::Status const&, std::shared_ptr) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:279:25 (impalad+0x29dd92c) #6 boost::_mfi::mf2 >::operator()(impala::QueryDriver*, impala::Status const&, std::shared_ptr) const /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280:29 (impalad+0x29e1669) #7 void boost::_bi::list3, boost::_bi::value, boost::_bi::value > >::operator() >, boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 >&, boost::_bi::list0&, int) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:398:9 (impalad+0x29e1578) #8 boost::_bi::bind_t >, boost::_bi::list3, boost::_bi::value, boost::_bi::value > > >::operator()() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 (impalad+0x29e14c3) #9 boost::detail::function::void_function_obj_invoker0 >, boost::_bi::list3, boost::_bi::value, boost::_bi::value > > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11 (impalad+0x29e1221) #10 boost::function0::operator()() const /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 (impalad+0x1e5ba81) #11 impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3 (impalad+0x2453776) #12 void boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9 (impalad+0x245b93c) #13 boost::_bi::bind_t, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > >::operator()() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 (impalad+0x245b853) #14 boost::detail::thread_data, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > > >::run() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116:17 (impalad+0x245b540) #15 thread_proxy (impalad+0x3171659)Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message Mutex M17348 acquired here while holding mutex M804309746609755832 in thread T392: #0 AnnotateRWLockAcquired /mnt/source/
[jira] [Created] (IMPALA-9854) TSAN data race in QueryDriver::CreateRetriedClientRequestState
Sahil Takiar created IMPALA-9854: Summary: TSAN data race in QueryDriver::CreateRetriedClientRequestState Key: IMPALA-9854 URL: https://issues.apache.org/jira/browse/IMPALA-9854 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Seeing the following data race in {{test_query_retries.py}} {code:java} WARNING: ThreadSanitizer: data race (pid=5460) Write of size 8 at 0x7b8c00261510 by thread T38: #0 impala::TUniqueId::operator=(impala::TUniqueId&&) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/Types_types.cpp:967:6 (impalad+0x1de1968) #1 impala::ImpalaServer::PrepareQueryContext(impala::TNetworkAddress const&, impala::TNetworkAddress const&, impala::TQueryCtx*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1069:23 (impalad+0x2210dbf) #2 impala::ImpalaServer::PrepareQueryContext(impala::TQueryCtx*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1024:3 (impalad+0x220f3c1) #3 impala::QueryDriver::CreateRetriedClientRequestState(impala::ClientRequestState*, std::unique_ptr >*, std::shared_ptr*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:302:19 (impalad+0x29de3ec) #4 impala::QueryDriver::RetryQueryFromThread(impala::Status const&, std::shared_ptr) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:203:3 (impalad+0x29dd01f) #5 boost::_mfi::mf2 >::operator()(impala::QueryDriver*, impala::Status const&, std::shared_ptr) const /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280:29 (impalad+0x29e1669) #6 void boost::_bi::list3, boost::_bi::value, boost::_bi::value > >::operator() >, boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 >&, boost::_bi::list0&, int) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:398:9 (impalad+0x29e1578) #7 boost::_bi::bind_t >, boost::_bi::list3, boost::_bi::value, boost::_bi::value > > >::operator()() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 (impalad+0x29e14c3) #8 boost::detail::function::void_function_obj_invoker0 >, boost::_bi::list3, boost::_bi::value, boost::_bi::value > > >, void>::invoke(boost::detail::function::function_buffer&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11 (impalad+0x29e1221) #9 boost::function0::operator()() const /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 (impalad+0x1e5ba81) #10 impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3 (impalad+0x2453776) #11 void boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9 (impalad+0x245b93c) #12 boost::_bi::bind_t, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > >::operator()() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 (impalad+0x245b853) #13 boost::detail::thread_data, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > > >::run() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116:17 (impalad+0x245b540) #14 thread_proxy (impalad+0x3171659) Previous read of size 8 at 0x7b8c00261510 by thread T100: #0 impala::PrintId(impala::TUniqueId const&, std::string const&) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/debug-util.cc:108:48 (impalad+0x237557f) #1 impala::Coordinator::ReleaseQueryAdmissionControlResources() /data/jenkins/workspace/impala-private-parameterized/repos/Impala/
[jira] [Created] (IMPALA-9849) Set halt_on_error=1 for TSAN builds
Sahil Takiar created IMPALA-9849: Summary: Set halt_on_error=1 for TSAN builds Key: IMPALA-9849 URL: https://issues.apache.org/jira/browse/IMPALA-9849 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar IMPALA-9568 mistakingly removed the halt_on_error flag from TSAN builds. The intention in IMPALA-9568 was to make sure that Impala crashes when a TSAN bug is detected, Impala does this for ASAN builds already. The confusing part about halt_on_error is that by default it is true in ASAN builds, but by default it is false in TSAN builds. So halt_on_error needs to explicitly be set to true for TSAN builds (but not for ASAN builds). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9848) Coordinator unnecessarily invalidating locally cached table metadata
Sahil Takiar created IMPALA-9848: Summary: Coordinator unnecessarily invalidating locally cached table metadata Key: IMPALA-9848 URL: https://issues.apache.org/jira/browse/IMPALA-9848 Project: IMPALA Issue Type: Improvement Components: Catalog, Frontend Reporter: Sahil Takiar The following fails when run locally on master: {code:java} ./bin/start-impala-cluster.py --catalogd_args='--catalog_topic_mode=minimal' --impalad_args='--use_local_catalog' ./bin/impala-shell.sh [localhost:21000] default> select count(l_comment) from tpch.lineitem; <--- THIS WORKS # kill the catalogd process [localhost:21000] default> select count(l_comment) from tpch.lineitem; <--- THIS FAILS ERROR: AnalysisException: Failed to load metadata for table: 'tpch.lineitem' CAUSED BY: TableLoadingException: Could not load table tpch.lineitem from catalog CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't open transport for localhost:26000 (connect() failed: Connection refused)CAUSED BY: InternalException: Couldn't open transport for localhost:26000 (connect() failed: Connection refused {code} The above experiment works with catalog v1 - e.g. if you remove the startup flags in the {{./bin/start-impala-cluster.py}} everything works. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9818) Add fetch size as option to impala shell
[ https://issues.apache.org/jira/browse/IMPALA-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9818. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Add fetch size as option to impala shell > > > Key: IMPALA-9818 > URL: https://issues.apache.org/jira/browse/IMPALA-9818 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > The impala shell should have an option to control the fetch size (e.g. the > number of rows fetched at a time). Currently the value is hard-coded to 1024. > Other clients (e.g. JDBC) have similar options (e.g. Statement#setFetchSize). > When result spooling is enabled, setting a higher fetch size can improve > performance for clients with a high RTT to/from the Impala coordinator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9844) Ozone support for load data inpath
Sahil Takiar created IMPALA-9844: Summary: Ozone support for load data inpath Key: IMPALA-9844 URL: https://issues.apache.org/jira/browse/IMPALA-9844 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar Currently, attempts to run {{load data inpath}} against Ozone tables fail: {code} default> CREATE EXTERNAL TABLE o3_tab1 (id INT, col_1 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 'o3fs://bucket1.volume1.ozone1/o3_tab1'; Query: CREATE EXTERNAL TABLE o3_tab1 (id INT, col_1 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 'o3fs://bucket1.volume1.ozone1/o3_tab1' +-+ | summary | +-+ | Table has been created. | +-+ Fetched 1 row(s) in 0.36s default> load data inpath 'o3fs://bucket1.volume1.ozone1/file' into table ozone_test_table2; Query: load data inpath 'o3fs://bucket1.volume1.ozone1/file' into table ozone_test_table2 ERROR: AnalysisException: INPATH location 'o3fs://bucket1.volume1.ozone1/file' must point to an HDFS, S3A, ADL or ABFS filesystem. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9843) Add ability to run schematool against HMS in minicluster
Sahil Takiar created IMPALA-9843: Summary: Add ability to run schematool against HMS in minicluster Key: IMPALA-9843 URL: https://issues.apache.org/jira/browse/IMPALA-9843 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar When the CDP version is bumped, we often need to re-format the HMS postgres database because the HMS schema needs updating. Hive provides a standalone tool for performing schema updates: [https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool] Impala should be able to integrate with this tool, so that developers don't have to blow away their HMS database every time the CDP version is bumped up. Even worse, blowing away the HMS data requires performing a full data load. It would be great to have a wrapper around the schematool that can easily be invoked by developers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9840) ThreadSanitizer: data race internal-queue.h in InternalQueueBase::Enqueue
Sahil Takiar created IMPALA-9840: Summary: ThreadSanitizer: data race internal-queue.h in InternalQueueBase::Enqueue Key: IMPALA-9840 URL: https://issues.apache.org/jira/browse/IMPALA-9840 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar Assignee: Bikramjeet Vig Seems like this was introduced in IMPALA-9655. On my TSAN build, the error occurred during data-load. {code:java} WARNING: ThreadSanitizer: data race (pid=24164) Write of size 8 at 0x7b6f9bb0 by thread T394 (mutexes: write M443436): #0 impala::InternalQueueBase::Enqueue(impala::io::ScanRange*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/internal-queue.h:108:35 (impalad+0x24fdd19) #1 impala::ScanRangeSharedState::EnqueueScanRange(std::vector > const&, bool) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:1220:25 (impalad+0x24f860a) #2 impala::HdfsScanNodeMt::AddDiskIoRanges(std::vector > const&, impala::EnqueueLocation) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-mt.cc:157:18 (impalad+0x251934c) #3 impala::HdfsScanNodeBase::AddDiskIoRanges(impala::HdfsFileDesc const*, impala::EnqueueLocation) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-base.h:516:12 (impalad+0x255b3ab) #4 impala::HdfsTextScanner::IssueInitialRanges(impala::HdfsScanNodeBase*, std::vector > const&) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-text-scanner.cc:116:9 (impalad+0x255441b) #5 impala::HdfsScanNodeBase::IssueInitialScanRanges(impala::RuntimeState*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:680:9 (impalad+0x24f5b14) #6 impala::HdfsScanNodeMt::Open(impala::RuntimeState*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-mt.cc:58:3 (impalad+0x2518819) #7 impala::AggregationNode::Open(impala::RuntimeState*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/aggregation-node.cc:48:3 (impalad+0x266726a) #8 impala::FragmentInstanceState::Open() /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/fragment-instance-state.cc:348:5 (impalad+0x206f037) #9 impala::FragmentInstanceState::Exec() /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/fragment-instance-state.cc:93:12 (impalad+0x206d53b) #10 impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:763:24 (impalad+0x2081f33) #11 impala::QueryState::StartFInstances()::$_7::operator()() const /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:671:37 (impalad+0x20840f2) #12 boost::detail::function::void_function_obj_invoker0::invoke(boost::detail::function::function_buffer&) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11 (impalad+0x2083f19) #13 boost::function0::operator()() const /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 (impalad+0x1e41101) #14 impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3 (impalad+0x2438056) #15 void boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, std::string const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9 (impalad+0x244021c) #16 boost::_bi::bind_t, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > >::operator()() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 (impalad+0x2440133) #17 boost::detail::thread_data, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, boost::_bi::value, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> > > >::run() /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116:17 (impalad+0x243fe20) #18 thread_proxy (impalad+0x314f369)
[jira] [Created] (IMPALA-9819) Separate data cache and HDFS scan node runtime profile metrics
Sahil Takiar created IMPALA-9819: Summary: Separate data cache and HDFS scan node runtime profile metrics Key: IMPALA-9819 URL: https://issues.apache.org/jira/browse/IMPALA-9819 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Assignee: Joe McDonnell When a query reads data from both a remote storage system (e.g. S3) and the data cache, the HDFS_SCAN_NODE runtime profiles are hard to reason about. For example, in the following runtime profile snippet: {code:java} HDFS_SCAN_NODE (id=0):(Total: 59s374ms, non-child: 0.000ns, % non-child: 0.00%) - AverageHdfsReadThreadConcurrency: 0.62 - AverageScannerThreadConcurrency: 0.91 - BytesRead: 587.97 MB (616533483) - BytesReadDataNodeCache: 0 - BytesReadLocal: 0 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 0 - CachedFileHandlesHitCount: 323 (323) - CachedFileHandlesMissCount: 94 (94) - CollectionItemsRead: 0 (0) - DataCacheHitBytes: 212.00 MB (94996) - DataCacheHitCount: 107 (107) - DataCacheMissBytes: 375.98 MB (394238486) - DataCacheMissCount: 310 (310) - DataCachePartialHitCount: 0 (0) - DecompressionTime: 2s428ms - MaterializeTupleTime: 19s444ms - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDictFilteredRowGroups: 0 (0) - NumDisksAccessed: 1 (1) - NumPages: 53.30K (53300) - NumRowGroups: 83 (83) - NumRowGroupsWithPageIndex: 83 (83) - NumScannerThreadMemUnavailable: 0 (0) - NumScannerThreadReservationsDenied: 0 (0) - NumScannerThreadsStarted: 1 (1) - NumScannersWithNoReads: 0 (0) - NumStatsFilteredPages: 0 (0) - NumStatsFilteredRowGroups: 0 (0) - PeakMemoryUsage: 16.00 MB (16781312) - PeakScannerThreadConcurrency: 1 (1) - PerReadThreadRawHdfsThroughput: 15.11 MB/sec - RemoteScanRanges: 0 (0) - RowBatchBytesEnqueued: 670.68 MB (703260541) - RowBatchQueueGetWaitTime: 59s368ms - RowBatchQueuePeakMemoryUsage: 4.17 MB (4368285) - RowBatchQueuePutWaitTime: 0.000ns - RowBatchesEnqueued: 915 (915) - RowsRead: 413.47M (413466507) - RowsReturned: 722.27K (722275) - RowsReturnedRate: 12.17 K/sec - ScanRangesComplete: 83 (83) - ScannerIoWaitTime: 33s454ms - ScannerThreadWorklessLoops: 0 (0) - ScannerThreadsInvoluntaryContextSwitches: 1.94K (1940) - ScannerThreadsTotalWallClockTime: 1m - ScannerThreadsSysTime: 1s181ms - ScannerThreadsUserTime: 20s581ms - ScannerThreadsVoluntaryContextSwitches: 770 (770) - TotalRawHdfsOpenFileTime: 3s396ms - TotalRawHdfsReadTime: 38s940ms - TotalReadThroughput: 8.86 MB/sec {code} The query scanned part of the data from S3 and part of the data from the data cache. The confusing part is that metrics such as PerReadThreadRawHdfsThroughput are measured across S3 and data cache reads. So there is no straightforward way to determine the throughput for *just* S3 reads. Users might want this value to determine if S3 was particularly slow for their query. It would be nice if the scan node metrics more clearly differentiate between reads from S3 vs. the data cache. The aggregate metrics (*Total* metrics) are still useful, but it would be useful to have fine-grained metrics that are specific to a data storage system (e.g. either the data cache or S3). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9818) Add fetch size as option to impala shell
Sahil Takiar created IMPALA-9818: Summary: Add fetch size as option to impala shell Key: IMPALA-9818 URL: https://issues.apache.org/jira/browse/IMPALA-9818 Project: IMPALA Issue Type: Improvement Components: Clients Reporter: Sahil Takiar Assignee: Sahil Takiar The impala shell should have an option to control the fetch size (e.g. the number of rows fetched at a time). Currently the value is hard-coded to 1024. Other clients (e.g. JDBC) have similar options (e.g. Statement#setFetchSize). When result spooling is enabled, setting a higher fetch size can improve performance for clients with a high RTT to/from the Impala coordinator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9794) OutOfMemoryError when loading tpcds text data via Hive
[ https://issues.apache.org/jira/browse/IMPALA-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9794. -- Fix Version/s: Impala 4.0 Resolution: Fixed This was fixed in IMPALA-9777. Impala-EC data-loading is passing now. > OutOfMemoryError when loading tpcds text data via Hive > -- > > Key: IMPALA-9794 > URL: https://issues.apache.org/jira/browse/IMPALA-9794 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Quanlong Huang >Assignee: Sahil Takiar >Priority: Blocker > Labels: broken-build > Fix For: Impala 4.0 > > Attachments: load-tpcds-core-hive-generated-text-none-none.sql.log > > > Saw a data loading failure casued by OutOfMemoryError in a test with erasure > coding. The impacted query is inserting data to the store_sales table and > fails: > {code} > Getting log thread is interrupted, since query is done! > ERROR : Status: Failed > ERROR : Vertex failed, vertexName=Reducer 2, > vertexId=vertex_1590450092775_0009_3_01, diagnostics=[Task failed, > taskId=task_1590450092775_0009_3_01_01, diagnostics=[TaskAttempt 0 > failed, info=[Container container_1590450092775_0009_01_03 finished with > diagnostics set to [Container failed, exitCode=-104. [2020-05-25 > 16:49:18.814]Container > [pid=14180,containerID=container_1590450092775_0009_01_03] is running > 44290048B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB > physical memory used; 3.3 GB of 2.1 GB virtual memory used. Killing container. > Dump of the process-tree for container_1590450092775_0009_01_03 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 14180 14176 14180 14180 (bash) 0 0 115851264 352 /bin/bash -c > /usr/java/jdk1.8.0_144/bin/java -Xmx819m -server > -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_03 > -Dtez.root.logger=INFO,CLA > -Djava.io.tmpdir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/jenkins/nm-local-dir/usercache/jenkins/appcache/application_1590450092775_0009/container_1590450092775_0009_01_03/tmp > org.apache.tez.runtime.task.TezChild localhost 43422 > container_1590450092775_0009_01_03 application_1590450092775_0009 1 > 1>/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_03/stdout > > 2>/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_03/stderr > |- 14191 14180 14180 14180 (java) 3167 127 3468886016 272605 > /usr/java/jdk1.8.0_144/bin/java -Xmx819m -server > -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_03 > -Dtez.root.logger=INFO,CLA > -Djava.io.tmpdir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/jenkins/nm-local-dir/usercache/jenkins/appcache/application_1590450092775_0009/container_1590450092775_0009_01_03/tmp > org.apache.tez.runtime.task.TezChild localhost 43422 > container_1590450092775_0009_01_03 application_1590450092775_0009 1 > [2020-05-25 16:49:18.884]Container killed on request. Exit code is 143 > [2020-05-25 16:49:18.887]Container exited with a non-zero exit code 143. > ]], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:96) > at > org.apache.hadoop.hdfs.DFSStripedOutputStre
[jira] [Resolved] (IMPALA-9777) Reduce the diskspace requirements of loading the text version of tpcds.store_sales
[ https://issues.apache.org/jira/browse/IMPALA-9777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9777. -- Fix Version/s: Impala 4.0 Resolution: Fixed Fixed. Looks like Impala-EC data loading is passing now. > Reduce the diskspace requirements of loading the text version of > tpcds.store_sales > -- > > Key: IMPALA-9777 > URL: https://issues.apache.org/jira/browse/IMPALA-9777 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > Attachments: namenodeparse.py > > > Currently, dataload for the Impala development environment uses Hive to > populate tpcds.store_sales. We use several insert statements that select from > tpcds.stores_sales_unpartitioned, which is loaded from text files. The > inserts have this form: > {noformat} > insert overwrite table {table_name} partition(ss_sold_date_sk) > select ss_sold_time_sk, > ss_item_sk, > ss_customer_sk, > ss_cdemo_sk, > ss_hdemo_sk, > ss_addr_sk, > ss_store_sk, > ss_promo_sk, > ss_ticket_number, > ss_quantity, > ss_wholesale_cost, > ss_list_price, > ss_sales_price, > ss_ext_discount_amt, > ss_ext_sales_price, > ss_ext_wholesale_cost, > ss_ext_list_price, > ss_ext_tax, > ss_coupon_amt, > ss_net_paid, > ss_net_paid_inc_tax, > ss_net_profit, > ss_sold_date_sk > from store_sales_unpartitioned > WHERE ss_sold_date_sk < 2451272 > distribute by ss_sold_date_sk;{noformat} > Since this is inserting into a partitioned table, it is creating a file per > partition. Each statement manipulates hundreds of partitions. With the > current settings, the Hive implementation of this insert opens several > hundred files simultaneously (by my measurement, ~450). HDFS reserves a whole > block for each file (even though the resulting files are not large), and if > there isn't enough disk space for all of the reservations, then these inserts > can fail. This is a common problem on development environments. This is > currently failing for erasure coding tests. > Impala uses clustered inserts where the input is sorted and files are written > one at a time (per backend). This limits the number of simultaneously open > files, eliminating the corresponding disk space reservation. Switching > populating tpcds.store_sales to use Impala would reduce the diskspace > requirement for an Impala developer environment. Alternatively, there is > likely equivalent Hive functionality for doing an initial sort so that only > one partition needs to be written at a time. > This only applies to the text version of store_sales, which is created from > store_sales_unpartitioned. All other formats are created from the text > version of store_sales. Since the text store_sales is already partitioned in > the same way as the destination store_sales, Hive can be more efficient, > processing a small number of partitions at a time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9757) Test failures with HiveServer2Error: Invalid session id
[ https://issues.apache.org/jira/browse/IMPALA-9757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9757. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Test failures with HiveServer2Error: Invalid session id > --- > > Key: IMPALA-9757 > URL: https://issues.apache.org/jira/browse/IMPALA-9757 > Project: IMPALA > Issue Type: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.0 > > > Only seen once so far on an exhaustive build. It's not clear if the > "HiveServer2Error: Invalid session id" error is specific to this test or not. > {code:java} > query_test.test_queries.TestQueries.test_inline_view[protocol: hs2-http | > exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] (from pytest) > Error Message > query_test/test_queries.py:104: in test_inline_view > self.run_test_case('QueryTest/inline-view', vector) > common/impala_test_suite.py:567: in run_test_case table_format_info, > use_db, pytest.config.option.scale_factor) common/impala_test_suite.py:782: > in change_database impala_client.execute(query) > common/impala_connection.py:331: in execute handle = > self.execute_async(sql_stmt, user) common/impala_connection.py:354: in > execute_async self.__cursor.execute_async(sql_stmt, > configuration=self.__query_options) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:375: > in execute_async self._execute_async(op) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:394: > in _execute_async operation_fn() > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:372: > in op run_async=True) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1096: > in execute return self._operation('ExecuteStatement', req) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1026: > in _operation resp = self._rpc(kind, request) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:994: > in _rpc err_if_rpc_not_ok(response) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:748: > in err_if_rpc_not_ok raise HiveServer2Error(resp.status.errorMessage) E > HiveServer2Error: Invalid session id: 3345279d9b2e75ab:3aef93f7a80d7d8a > Stacktrace > query_test/test_queries.py:104: in test_inline_view > self.run_test_case('QueryTest/inline-view', vector) > common/impala_test_suite.py:567: in run_test_case > table_format_info, use_db, pytest.config.option.scale_factor) > common/impala_test_suite.py:782: in change_database > impala_client.execute(query) > common/impala_connection.py:331: in execute > handle = self.execute_async(sql_stmt, user) > common/impala_connection.py:354: in execute_async > self.__cursor.execute_async(sql_stmt, configuration=self.__query_options) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:375: > in execute_async > self._execute_async(op) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:394: > in _execute_async > operation_fn() > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:372: > in op > run_async=True) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1096: > in execute > return self._operation('ExecuteStatement', req) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1026: > in _operation > resp = self._rpc(kind, request) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:994: > in _rpc > err_if_rpc_not_ok(response) > /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impal
[jira] [Resolved] (IMPALA-9806) Multiple data load failures on HDFS errors for erasure coding builds
[ https://issues.apache.org/jira/browse/IMPALA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9806. -- Resolution: Duplicate Closing as dup of IMPALA-9794 and IMPALA-9777 > Multiple data load failures on HDFS errors for erasure coding builds > > > Key: IMPALA-9806 > URL: https://issues.apache.org/jira/browse/IMPALA-9806 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Laszlo Gaal >Priority: Blocker > > Erasure coding build shows data load failures for TPC-H, TPC-DS and > functional-query data sets, all on HDFS errors. Errors are triggered both > from Hive and Impala. Pasting the failure log section for TPC-H as it is a > lot shorter, but the Java backtrace for functional-query (breaking in > Hive/Tez) eventually runs into the same HDFS log pattern: > {code} > INSERT OVERWRITE TABLE tpch_parquet.region SELECT * FROM tpch.region > Summary: Inserted 5 rows > Success: True > Took: 0.264951944351(s) > Data: > : 5 > ERROR: INSERT OVERWRITE TABLE tpch_parquet.orders SELECT * FROM tpch.orders > Traceback (most recent call last): > File > "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/load-data.py", > line 208, in exec_impala_query_from_file > result = impala_client.execute(query) > File > "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", > line 187, in execute > handle = self.__execute_query(query_string.strip(), user=user) > File > "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", > line 365, in __execute_query > self.wait_for_finished(handle) > File > "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", > line 386, in wait_for_finished > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > ImpalaBeeswaxException: ImpalaBeeswaxException: > Query aborted:Failed to write data (length: 159515) to Hdfs file: > hdfs://localhost:20500/test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b7/.7c411965970f926e-f61b13b7_2077531399_dir/7c411965970f926e-f61b13b7_1445532249_data.0.parq > > Error(255): Unknown error 255 > Root cause: RemoteException: File > /test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b7/.7c411965970f926e-f61b13b7_2077531399_dir/7c411965970f926e-f61b13b7_1445532249_data.0.parq > could only be written to 0 of the 3 required nodes for RS-3-2-1024k. There > are 5 datanode(s) running and 5 node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2266) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2773) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:879) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:583) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) > Failed to close HDFS file: > hdfs://localhost:20500/test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b7/.7c411965970f926e-f61b13b7_2077531399_dir/7c411965970f926e-f61b13b7_1445532249_data.0.parq > Error(255): Unknown error 255 > Root cause: RemoteException: File > /test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b7/.7c411965970f926e-f61b13b7_2077531399_dir/7c411965970f926e-f61b13b7_1445532249_data.0.parq > could only be written to 0 of the 3 required nodes for RS-3-2-1024k. There > are 5 datanode(s) running and 5 node(s) ar
[jira] [Created] (IMPALA-9767) ASAN crash during coordinator runtime filter updates
Sahil Takiar created IMPALA-9767: Summary: ASAN crash during coordinator runtime filter updates Key: IMPALA-9767 URL: https://issues.apache.org/jira/browse/IMPALA-9767 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Assignee: Fang-Yu Rao ASAN crash output: {code:java} Error MessageAddress Sanitizer message detected in /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/ee_tests/impalad.ERRORStandard Error==4808==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f6288cbe818 at pc 0x0199f6fe bp 0x7f63c1a8b270 sp 0x7f63c1a8aa20 READ of size 1048576 at 0x7f6288cbe818 thread T73 (rpc reactor-552) #0 0x199f6fd in read_iovec(void*, __sanitizer::__sanitizer_iovec*, unsigned long, unsigned long) /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904 #1 0x19a1f57 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, long) /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781 #2 0x19a46c3 in __interceptor_sendmsg /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796 #3 0x372034d in kudu::Socket::Writev(iovec const*, int, long*) /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3 #4 0x331c095 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26 #5 0x3324da1 in kudu::rpc::Connection::WriteHandler(ev::io&, int) /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31 #6 0x52ca4e2 in ev_invoke_pending (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x52ca4e2) #7 0x32aeadc in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3 #8 0x52cdb03 in ev_run (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x52cdb03) #9 0x32aecd1 in kudu::rpc::ReactorThread::RunThread() /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9 #10 0x32c08db in boost::_bi::bind_t, boost::_bi::list1 > >::operator()() /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 #11 0x2148c26 in boost::function0::operator()() const /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 #12 0x2144b29 in kudu::Thread::SuperviseThread(void*) /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3 #13 0x7f6c0bcf4e24 in start_thread (/lib64/libpthread.so.0+0x7e24) #14 0x7f6c0885834c in __clone (/lib64/libc.so.6+0xf834c) 0x7f6288cbe818 is located 24 bytes inside of 1052640-byte region [0x7f6288cbe800,0x7f6288dbf7e0) freed by thread T114 here: #0 0x1a773e0 in operator delete(void*) /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/asan_new_delete.cc:137 #1 0x7f6c090faed3 in __gnu_cxx::new_allocator::deallocate(char*, unsigned long) /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:110 #2 0x7f6c090faed3 in std::string::_Rep::_M_destroy(std::allocator const&) /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:449 #3 0x7f6c090faed3 in std::string::_Rep::_M_dispose(std::allocator const&) /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:249 #4 0x7f6c090faed3 in std::string::reserve(unsigned long) /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:511 #5 0x2781865 in impala::ClientRequestState::UpdateFilter(impala::UpdateFilterParamsPB const&, kudu::rpc::RpcContext*) /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/service/client-request-state.cc:1451:11 #6 0x26d57d5 in impala::ImpalaServer::UpdateFilter(impala::UpdateFilterResultPB*, impala::UpdateFilterParamsPB const&, kudu::rpc::RpcContext*) /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/service/impala-server.cc:2694:19 #7 0x266bd65 in impala::DataStreamService::UpdateFilter(impala::UpdateFilterParamsPB const*, impala::UpdateFilterResultPB*, kudu::rpc::RpcContext*) /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/service/data-stream-service.cc:119:44 #8 0x27a1eed in std::_Function_handler const&, scoped_refptr const&)::$_5>::_M_invok
[jira] [Resolved] (IMPALA-9755) Flaky test: test_global_exchange_counters
[ https://issues.apache.org/jira/browse/IMPALA-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9755. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Flaky test: test_global_exchange_counters > - > > Key: IMPALA-9755 > URL: https://issues.apache.org/jira/browse/IMPALA-9755 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Blocker > Labels: flaky > Fix For: Impala 4.0 > > > {noformat} > query_test.test_observability.TestObservability.test_global_exchange_counters > (from pytest) > Failing for the past 1 build (Since Failed#10637 ) > Took 22 sec. > add description > Error Message > query_test/test_observability.py:504: in test_global_exchange_counters > assert m, "Cannot match pattern for key %s in line '%s'" % (key, line) E > AssertionError: Cannot match pattern for key TotalBytesSent in line ' > - TotalBytesSent: 0' E assert None > Stacktrace > query_test/test_observability.py:504: in test_global_exchange_counters > assert m, "Cannot match pattern for key %s in line '%s'" % (key, line) > E AssertionError: Cannot match pattern for key TotalBytesSent in line ' > - TotalBytesSent: 0' > E assert None > {noformat} > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/10637/testReport/junit/query_test.test_observability/TestObservability/test_global_exchange_counters/ > Filing in case it reoccurs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9534) Kudu show create table tests fail due to case difference for external.table.purge
[ https://issues.apache.org/jira/browse/IMPALA-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9534. -- Fix Version/s: Impala 4.0 Resolution: Fixed This was fixed a while ago by a bug fix on the Hive side. > Kudu show create table tests fail due to case difference for > external.table.purge > - > > Key: IMPALA-9534 > URL: https://issues.apache.org/jira/browse/IMPALA-9534 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Assignee: Sahil Takiar >Priority: Blocker > Labels: broken-build > Fix For: Impala 4.0 > > > When updating to the latest CDP GBN, there are test failures due to our tests > expecting external.table.purge=TRUE (upper case) whereas it is actually > external.table.purge=true (lower case): > > {noformat} > query_test/test_kudu.py:862: in test_primary_key_and_distribution > db=cursor.conn.db_name, kudu_addr=KUDU_MASTER_HOSTS)) > query_test/test_kudu.py:836: in assert_show_create_equals > assert "TBLPROPERTIES ('external.table.purge'='TRUE', " in output > E assert "TBLPROPERTIES ('external.table.purge'='TRUE', " in "CREATE > EXTERNAL TABLE testshowcreatetable_6928_i0obd1.jlxsrpzmcu (\n c INT NOT NULL > ENCODING AUTO_ENCODING COMPRESSI...H (c) PARTITIONS 3\nSTORED AS > KUDU\nTBLPROPERTIES ('external.table.purge'='true', > 'kudu.master_addresses'='localhost')"{noformat} > This impacts the following tests: > > > {noformat} > metadata.test_ddl.TestDdlStatements.test_create_alter_tbl_properties > metadata.test_show_create_table.TestShowCreateTable.test_show_create_table > query_test.test_kudu.TestShowCreateTable.test_primary_key_and_distribution > query_test.test_kudu.TestShowCreateTable.test_timestamp_default_value > query_test.test_kudu.TestShowCreateTable.test_managed_kudu_table_name_with_show_create > org.apache.impala.catalog.local.LocalCatalogTest.testKuduTable{noformat} > I think we can just make these case insensitive. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9608) Multiple query tests failure due to org.apache.hadoop.hive.ql.exec.tez.TezTask execution error
[ https://issues.apache.org/jira/browse/IMPALA-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9608. -- Fix Version/s: Impala 4.0 Resolution: Fixed Resolved since IMPALA-9365 disabled all these tests on non-HDFS filesystems. > Multiple query tests failure due to > org.apache.hadoop.hive.ql.exec.tez.TezTask execution error > -- > > Key: IMPALA-9608 > URL: https://issues.apache.org/jira/browse/IMPALA-9608 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Alice Fan >Priority: Blocker > Labels: broken-build > Fix For: Impala 4.0 > > > Multiple query tests failure due to > org.apache.hadoop.hive.ql.exec.tez.TezTask execution error > at impala-cdpd-master-core-s3 build > {code:java} > query_test.test_acid.TestAcid.test_acid_negative > query_test.test_mt_dop.TestMtDop.test_compute_stats > query_test.test_nested_types.TestNestedTypesNoMtDop.test_partitioned_table_acid > query_test.test_mt_dop.TestMtDop.test_compute_stats > query_test.test_scanners.TestUnmatchedSchema.test_unmatched_schema > query_test.test_mt_dop.TestMtDop.test_compute_stats > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_decimal_tbl > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_uncompressed_parquet_orc > query_test.test_mt_dop.TestMtDop.test_compute_stats > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_decimal_tbl > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_uncompressed_parquet_orc > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_alltypes > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_decimal_tbl > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_nested_types > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_alltypes > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_nested_types > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_alltypes > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_nested_types > query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_uncompressed_parquet_orc > {code} > For example: > Error Message > query_test/test_acid.py:65: in test_acid_negative > self.run_test_case('QueryTest/acid-negative', vector, use_db=unique_database) > common/impala_test_suite.py:659: in run_test_case result = exec_fn(query, > user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:610: in __exec_in_hive result = > h.execute(query, user=user) common/impala_connection.py:334: in execute r > = self.__fetch_results(handle, profile_format=profile_format) > common/impala_connection.py:441: in __fetch_results > cursor._wait_to_finish() > /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:412: > in _wait_to_finish raise OperationalError(resp.errorMessage) E > OperationalError: Error while compiling statement: FAILED: Execution Error, > return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask > Stacktrace > query_test/test_acid.py:65: in test_acid_negative > self.run_test_case('QueryTest/acid-negative', vector, > use_db=unique_database) > common/impala_test_suite.py:659: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:610: in __exec_in_hive > result = h.execute(query, user=user) > common/impala_connection.py:334: in execute > r = self.__fetch_results(handle, profile_format=profile_format) > common/impala_connection.py:441: in __fetch_results > cursor._wait_to_finish() > /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:412: > in _wait_to_finish > raise OperationalError(resp.errorMessage) > E OperationalError: Error while compiling statement: FAILED: Execution > Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9758) TestImpalaShell.test_summary consistently failing
[ https://issues.apache.org/jira/browse/IMPALA-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9758. -- Resolution: Duplicate > TestImpalaShell.test_summary consistently failing > - > > Key: IMPALA-9758 > URL: https://issues.apache.org/jira/browse/IMPALA-9758 > Project: IMPALA > Issue Type: Test > Components: Backend >Reporter: Sahil Takiar >Assignee: Tim Armstrong >Priority: Major > > TestImpalaShell.test_summary[table_format_and_file_extension: ('textfile', > '.txt') | protocol: beeswax] is consistently failing: > {code:java} > shell.test_shell_commandline.TestImpalaShell.test_summary[table_format_and_file_extension: > ('textfile', '.txt') | protocol: beeswax] (from pytest) > Error Message > /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala/tests/shell/test_shell_commandline.py:345: > in test_summary result_set = run_impala_shell_cmd(vector, args) > shell/util.py:172: in run_impala_shell_cmd result.stderr) E > AssertionError: Cmd ['-q', 'show tables; summary;'] was expected to succeed: > Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build > 03391ec2b4649f02307a4a89a504bc8394007158) E Query: show tables E Fetched > 3 row(s) in 0.02s E ERROR: Query id 544943184e4d6a8f:8cdea0fe not > found. EE Could not execute command: summary > Stacktrace > /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala/tests/shell/test_shell_commandline.py:345: > in test_summary > result_set = run_impala_shell_cmd(vector, args) > shell/util.py:172: in run_impala_shell_cmd > result.stderr) > E AssertionError: Cmd ['-q', 'show tables; summary;'] was expected to > succeed: Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build > 03391ec2b4649f02307a4a89a504bc8394007158) > E Query: show tables > E Fetched 3 row(s) in 0.02s > E ERROR: Query id 544943184e4d6a8f:8cdea0fe not found. > E > E Could not execute command: summary{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9758) TestImpalaShell.test_summary consistently failing
Sahil Takiar created IMPALA-9758: Summary: TestImpalaShell.test_summary consistently failing Key: IMPALA-9758 URL: https://issues.apache.org/jira/browse/IMPALA-9758 Project: IMPALA Issue Type: Test Components: Backend Reporter: Sahil Takiar Assignee: Tim Armstrong TestImpalaShell.test_summary[table_format_and_file_extension: ('textfile', '.txt') | protocol: beeswax] is consistently failing: {code:java} shell.test_shell_commandline.TestImpalaShell.test_summary[table_format_and_file_extension: ('textfile', '.txt') | protocol: beeswax] (from pytest) Error Message /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala/tests/shell/test_shell_commandline.py:345: in test_summary result_set = run_impala_shell_cmd(vector, args) shell/util.py:172: in run_impala_shell_cmd result.stderr) E AssertionError: Cmd ['-q', 'show tables; summary;'] was expected to succeed: Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 03391ec2b4649f02307a4a89a504bc8394007158) E Query: show tables E Fetched 3 row(s) in 0.02s E ERROR: Query id 544943184e4d6a8f:8cdea0fe not found. EE Could not execute command: summary Stacktrace /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala/tests/shell/test_shell_commandline.py:345: in test_summary result_set = run_impala_shell_cmd(vector, args) shell/util.py:172: in run_impala_shell_cmd result.stderr) E AssertionError: Cmd ['-q', 'show tables; summary;'] was expected to succeed: Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 03391ec2b4649f02307a4a89a504bc8394007158) E Query: show tables E Fetched 3 row(s) in 0.02s E ERROR: Query id 544943184e4d6a8f:8cdea0fe not found. E E Could not execute command: summary{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9757) TestQueries.test_inline_view fails with HiveServer2Error: Invalid session id
Sahil Takiar created IMPALA-9757: Summary: TestQueries.test_inline_view fails with HiveServer2Error: Invalid session id Key: IMPALA-9757 URL: https://issues.apache.org/jira/browse/IMPALA-9757 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Only seen once so far on an exhaustive build. It's not clear if the "HiveServer2Error: Invalid session id" error is specific to this test or not. {code:java} query_test.test_queries.TestQueries.test_inline_view[protocol: hs2-http | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from pytest) Error Message query_test/test_queries.py:104: in test_inline_view self.run_test_case('QueryTest/inline-view', vector) common/impala_test_suite.py:567: in run_test_case table_format_info, use_db, pytest.config.option.scale_factor) common/impala_test_suite.py:782: in change_database impala_client.execute(query) common/impala_connection.py:331: in execute handle = self.execute_async(sql_stmt, user) common/impala_connection.py:354: in execute_async self.__cursor.execute_async(sql_stmt, configuration=self.__query_options) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:375: in execute_async self._execute_async(op) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:394: in _execute_async operation_fn() /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:372: in op run_async=True) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1096: in execute return self._operation('ExecuteStatement', req) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1026: in _operation resp = self._rpc(kind, request) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:994: in _rpc err_if_rpc_not_ok(response) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:748: in err_if_rpc_not_ok raise HiveServer2Error(resp.status.errorMessage) E HiveServer2Error: Invalid session id: 3345279d9b2e75ab:3aef93f7a80d7d8a Stacktrace query_test/test_queries.py:104: in test_inline_view self.run_test_case('QueryTest/inline-view', vector) common/impala_test_suite.py:567: in run_test_case table_format_info, use_db, pytest.config.option.scale_factor) common/impala_test_suite.py:782: in change_database impala_client.execute(query) common/impala_connection.py:331: in execute handle = self.execute_async(sql_stmt, user) common/impala_connection.py:354: in execute_async self.__cursor.execute_async(sql_stmt, configuration=self.__query_options) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:375: in execute_async self._execute_async(op) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:394: in _execute_async operation_fn() /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:372: in op run_async=True) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1096: in execute return self._operation('ExecuteStatement', req) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1026: in _operation resp = self._rpc(kind, request) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:994: in _rpc err_if_rpc_not_ok(response) /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:748: in err_if_rpc_not_ok raise HiveServer2Error(resp.status.errorMessage) E HiveServer2Error: Invalid session id: 3345279d9b2e75ab:3aef93f7a80d7d8a {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-3717) Additional s3 setting to allow encryption algorithm
[ https://issues.apache.org/jira/browse/IMPALA-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-3717. -- Fix Version/s: Not Applicable Resolution: Fixed > Additional s3 setting to allow encryption algorithm > --- > > Key: IMPALA-3717 > URL: https://issues.apache.org/jira/browse/IMPALA-3717 > Project: IMPALA > Issue Type: New Feature > Components: Infrastructure >Affects Versions: Impala 2.6.0 >Reporter: Pavas Garg >Priority: Minor > Labels: s3 > Fix For: Not Applicable > > > distcp and impala requires an additional s3 setting on the configuration > 1. To allow not only the selection of encryption algorithm but > 2. Also the master key name (which will be held within the AWS KMS). > The S3 API has the following option on the rest service to achieve this > "x-amz-server-side-encryption-aws-kms-key-id". > This should just be a case of adding the config option and passing this onto > the S3 call. > Please see Server-Side Encryption Specific Request Headers on - > http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-2638) Retry queries that fail during scheduling
[ https://issues.apache.org/jira/browse/IMPALA-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-2638. -- Resolution: Duplicate Closing as a duplicate because this use case is handled by Node Blacklisting (IMPALA-9299) and Transparent Query Retries (IMPALA-9124). > Retry queries that fail during scheduling > - > > Key: IMPALA-2638 > URL: https://issues.apache.org/jira/browse/IMPALA-2638 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec >Affects Versions: Impala 2.3.0 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Minor > Labels: scalability > > An important building block for node-decommissioning is the ability to retry > queries if they fail during scheduling for some recoverable reason (e.g. RPC > failed due to unreachable host, fragment could not be started due to memory > pressure). > To do this we can detect failures during {{Coordinator::Exec()}}, cancel the > running query and then re-start from somewhere in > {{QueryExecState::ExecQueryOrDmlRequest()}} - updating a local blacklist of > nodes so that we know to avoid those that have caused failures. > There are some subtleties though: > * Queries shouldn't be retried more than a small number of times, in case > they *cause* the outage (there might be a good way to figure that out at the > time) > * If the query is restarted from the scheduling step (rather than completely > restarting), some care will have to be taken to ensure that none of the old > query's fragments that are being cancelled can affect the new query's > operation in any way (there are several ways to do this). > Eventually the failures will propagate to the rest of the cluster via the > statestore - this mechanism allows queries to recover and continue while the > statestore detects the failure. > This JIRA doesn't address restarting queries that have suffered failures > part-way through execution, because that's strictly harder and not (as) > needed for decommissioning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9502) Avoid copying TExecRequest when retrying queries
[ https://issues.apache.org/jira/browse/IMPALA-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9502. -- Fix Version/s: Impala 4.0 Resolution: Fixed This was fixed in the initial implementation in IMPALA-9199 > Avoid copying TExecRequest when retrying queries > > > Key: IMPALA-9502 > URL: https://issues.apache.org/jira/browse/IMPALA-9502 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > There are a few issues that occur when re-using a {{TExecRequest}} across > query retries. We should investigate if there is a way to work around those > issues so that the {{TExecRequest}} does not need to be copied when retrying > a query. -- This message was sent by Atlassian Jira (v8.3.4#803005)