[jira] [Resolved] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9954.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10241) Impala Doc: RPC troubleshooting guide

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10241:
-

 Summary: Impala Doc: RPC troubleshooting guide
 Key: IMPALA-10241
 URL: https://issues.apache.org/jira/browse/IMPALA-10241
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


There have been several diagnostic improvements to how RPCs can be debugged. We 
should document them a bit along with the associated options for configuring 
them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10240) Impala Doc: Add docs for cluster membership statestore heartbeats

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10240:
-

 Summary: Impala Doc: Add docs for cluster membership statestore 
heartbeats
 Key: IMPALA-10240
 URL: https://issues.apache.org/jira/browse/IMPALA-10240
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


I don't see many docs explaining how the current cluster membership logic works 
(e.g. via the statestored heartbeats). Would be nice to include a high level 
explanation along with how to configure the heartbeat threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10239) Docs: Add docs for node blacklisting

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10239:
-

 Summary: Docs: Add docs for node blacklisting
 Key: IMPALA-10239
 URL: https://issues.apache.org/jira/browse/IMPALA-10239
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should add some docs for node blacklisting explaining what is it, how it 
works at a high level, what errors it captures, how to debug it, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10238) Add fault tolerance docs

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10238:
-

 Summary: Add fault tolerance docs
 Key: IMPALA-10238
 URL: https://issues.apache.org/jira/browse/IMPALA-10238
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Impala docs currently don't have much information about any of our fault 
tolerance features. We should add a dedicated section with several sub-topics 
to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10235:
-

 Summary: Averaged timer profile counters can be negative for 
trivial queries
 Key: IMPALA-10235
 URL: https://issues.apache.org/jira/browse/IMPALA-10235
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
 Attachments: profile-output.txt

Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2020-10-12 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8925.
--
Resolution: Later

This would be nice to have, but not seeing a strong reason to do this at the 
moment. So closing as "Later".

> Consider replacing ClientRequestState ResultCache with result spooling
> --
>
> Key: IMPALA-8925
> URL: https://issues.apache.org/jira/browse/IMPALA-8925
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Clients
>Reporter: Sahil Takiar
>Priority: Minor
>
> The {{ClientRequestState}} maintains an internal results cache (which is 
> really just a {{QueryResultSet}}) in order to provide support for the 
> {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
> [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).
> The cache itself has some limitations:
>  * It caches all results in a {{QueryResultSet}} with limited admission 
> control integration
>  * It has a max size, if the size is exceeded the cache is emptied
>  * It cannot spill to disk
> Result spooling could potentially replace the query result cache and provide 
> a few benefits; it should be able to fit more rows since it can spill to 
> disk. The memory is better tracked as well since it integrates with both 
> admitted and reserved memory. Hue currently sets the max result set fetch 
> size to 
> [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
>  would be good to check how well that value works for Hue users so we can 
> decide if replacing the current result cache with result spooling makes sense.
> This would require some changes to result spooling as well, currently it 
> discards rows whenever it reads them from the underlying 
> {{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
> which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9485.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10028.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IMPALA-10016) Split jars for Impala executor and coordinator Docker images

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-10016.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Split jars for Impala executor and coordinator Docker images
> 
>
> Key: IMPALA-10016
> URL: https://issues.apache.org/jira/browse/IMPALA-10016
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Impala executors and coordinator currently have a common base images. The 
> base image defines a set of jar files needed by either the coordinator or the 
> executor. In order to reduce the image size, we should split out the jars 
> into two categories: those necessary for the coordinator and those necessary 
> for the executor. This should help reduce overall image size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10217:
-

 Summary: 
test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
 Key: IMPALA-10217
 URL: https://issues.apache.org/jira/browse/IMPALA-10217
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Seen this a few times in exhaustive builds:
{code}
query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 
1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] (from 
pytest)

query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:718: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:627: in verify_runtime_profile
% (function, field, expected_value, actual_value, actual))
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results.
E   EXPECTED VALUE:
E   102
E   
E   ACTUAL VALUE:
E   38
E   
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10216:
-

 Summary: BufferPoolTest.WriteErrorBlacklistCompression is flaky on 
UBSAN builds
 Key: IMPALA-10216
 URL: https://issues.apache.org/jira/browse/IMPALA-10216
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Only seen this once so far:

{code}
BufferPoolTest.WriteErrorBlacklistCompression

Error Message
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true

Stacktrace

Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10214) Ozone support for file handle cache

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10214:
-

 Summary: Ozone support for file handle cache
 Key: IMPALA-10214
 URL: https://issues.apache.org/jira/browse/IMPALA-10214
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


This is dependent on the Ozone input streams supporting the {{CanUnbuffer}} 
interface first (last I checked, the input streams don't implement the 
interface).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10202.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for ABFS files
> ---
>
> Key: IMPALA-10202
> URL: https://issues.apache.org/jira/browse/IMPALA-10202
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We should enable the file handle cache for ABFS, we have already seen it 
> benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9606) ABFS reads should use hdfsPreadFully

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9606.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> ABFS reads should use hdfsPreadFully
> 
>
> Key: IMPALA-9606
> URL: https://issues.apache.org/jira/browse/IMPALA-9606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> In IMPALA-8525, hdfs preads were enabled by default when reading data from 
> S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't 
> significantly improve performance. After some more investigation into the 
> ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS 
> reads.
> The ABFS client uses a different model for fetching data compared to S3A. 
> Details are beyond the scope of this JIRA, but it is related to a feature in 
> ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will 
> be required by the client. By default, it pre-fetches # cores * 4 MB of data. 
> If the requested data exists in the client cache, it is read from the cache.
> However, there is no real drawback to using {{hdfsPreadFully}} for ABFS 
> reads. It's definitely safer, because while the current implementation of 
> ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} 
> API makes that guarantee.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-3335.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
> Fix For: Impala 4.0
>
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-01 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10202:
-

 Summary: Enable file handle cache for ABFS files
 Key: IMPALA-10202
 URL: https://issues.apache.org/jira/browse/IMPALA-10202
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should enable the file handle cache for ABFS, we have already seen it 
benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8577) Crash during OpenSSLSocket.read

2020-09-28 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8577.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

This was fixed a while ago. Impala has been using wildfly for communication 
with S3 for a while now and everything seems stable.

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10191) Test impalad_coordinator and impalad_executor in Dockerized tests

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10191:
-

 Summary: Test impalad_coordinator and impalad_executor in 
Dockerized tests
 Key: IMPALA-10191
 URL: https://issues.apache.org/jira/browse/IMPALA-10191
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


Currently only the impalad_coord_exec images are tested in the Dockerized 
tests, it would be nice to get test coverage for the other images as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10190) Remove impalad_coord_exec Dockerfile

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10190:
-

 Summary: Remove impalad_coord_exec Dockerfile
 Key: IMPALA-10190
 URL: https://issues.apache.org/jira/browse/IMPALA-10190
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The impalad_coord_exec Dockerfile is a bit redundant because it basically 
contains all the same dependencies as the impalad_coordinator Dockerfile. The 
only different between the two files is that the startup flags for 
impalad_coordinator contain {{is_executor=false}}. We should find a way to 
remove the {{impalad_coord_exec}} altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-24 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10170.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Data race on Webserver::UrlHandler::is_on_nav_bar_
> --
>
> Key: IMPALA-10170
> URL: https://issues.apache.org/jira/browse/IMPALA-10170
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=31102)
>   Read of size 1 at 0x7b2c0006e3b0 by thread T42:
> #0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
>  (impalad+0x256ff39)
> #1 
> impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
>  rapidjson::MemoryPoolAllocator, 
> rapidjson::CrtAllocator>*, sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
>  (impalad+0x256be13)
> #2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
> const&, std::__cxx11::basic_stringstream, 
> std::allocator >*, impala::ContentType*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
>  (impalad+0x256e882)
> #3 impala::Webserver::BeginRequestCallback(sq_connection*, 
> sq_request_info*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
>  (impalad+0x256cfbb)
> #4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
>  (impalad+0x256ba98)
> #5 handle_request  (impalad+0x2582d59)
>   Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
> #0 
> impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
>  (impalad+0x2570dbc)
> #1 std::pair, 
> std::allocator > const, 
> impala::Webserver::UrlHandler>::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler, 
> true>(std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
>  (impalad+0x25738b3)
> #2 void 
> __gnu_cxx::new_allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> >(std::pair std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
>  (impalad+0x2573848)
> #3 void 
> std::allocator_traits  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > >&, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>*, 
> std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
>  (impalad+0x25737f1)
> #4 void std::_Rb_tree std::char_traits, std::allocator >, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>, 
> std::_Select1st std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >, std::less std::char_traits, std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::_M_construct_node std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::_Rb_tree_node std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> /data/j

[jira] [Resolved] (IMPALA-9046) Profile counter that indicates if a process or JVM pause occurred

2020-09-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9046.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Profile counter that indicates if a process or JVM pause occurred
> -
>
> Key: IMPALA-9046
> URL: https://issues.apache.org/jira/browse/IMPALA-9046
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We currently log a message if a process or JVM pause is detected but there's 
> no indication in the query profile if it got affected. I suggest that we 
> should:
> * Add metrics that indicate the number and duration of detected pauses
> * Add counters to the backend profile for the deltas in those metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9229) Link failed and retried runtime profiles

2020-09-18 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9229.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

Marking as resolved. The Web UI improvements are tracked in a separate JIRA.

> Link failed and retried runtime profiles
> 
>
> Key: IMPALA-9229
> URL: https://issues.apache.org/jira/browse/IMPALA-9229
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 4.0
>
>
> There should be a way for clients to link the runtime profiles from failed 
> queries to all retry attempts (whether successful or not), and vice versa.
> There are a few ways to do this:
>  * The simplest way would be to include the query id of the retried query in 
> the runtime profile of the failed query, and vice versa; users could then 
> manually create a chain of runtime profiles in order to fetch all failed / 
> successful attempts
>  * Extend TGetRuntimeProfileReq to include an option to fetch all runtime 
> profiles for the given query id + all retry attempts (or add a new Thrift 
> call TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a 
> given query id)
>  * The Impala debug UI should include a simple way to view all the runtime 
> profiles of a query (the failed attempts + all retry attempts) side by side 
> (perhaps the query_profile?query_id profile should include tabs to easily 
> switch between the runtime profiles of each attempt)
> These are not mutually exclusive, and it might be good to stage these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10180) Add average size of fetch requests in runtime profile

2020-09-18 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10180:
-

 Summary: Add average size of fetch requests in runtime profile
 Key: IMPALA-10180
 URL: https://issues.apache.org/jira/browse/IMPALA-10180
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Reporter: Sahil Takiar


When queries with a high {{ClientFetchWaitTimer}} it would be useful to know 
the average number of rows requested by the client per fetch request. This can 
help determine if setting a higher fetch size would help improve fetch 
performance where the network RTT between the client and Impala is high.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-16 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10170:
-

 Summary: Data race on Webserver::UrlHandler::is_on_nav_bar_
 Key: IMPALA-10170
 URL: https://issues.apache.org/jira/browse/IMPALA-10170
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{code}
WARNING: ThreadSanitizer: data race (pid=31102)
  Read of size 1 at 0x7b2c0006e3b0 by thread T42:
#0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
 (impalad+0x256ff39)
#1 
impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
 rapidjson::MemoryPoolAllocator, 
rapidjson::CrtAllocator>*, sq_connection const*, 
kudu::WebCallbackRegistry::WebRequest const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
 (impalad+0x256be13)
#2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
const&, std::__cxx11::basic_stringstream, 
std::allocator >*, impala::ContentType*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
 (impalad+0x256e882)
#3 impala::Webserver::BeginRequestCallback(sq_connection*, 
sq_request_info*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
 (impalad+0x256cfbb)
#4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
 (impalad+0x256ba98)
#5 handle_request  (impalad+0x2582d59)

  Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
#0 
impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
 (impalad+0x2570dbc)
#1 std::pair, 
std::allocator > const, 
impala::Webserver::UrlHandler>::pair, std::allocator >, impala::Webserver::UrlHandler, 
true>(std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
 (impalad+0x25738b3)
#2 void 
__gnu_cxx::new_allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::construct, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler> 
>(std::pair, 
std::allocator > const, impala::Webserver::UrlHandler>*, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
 (impalad+0x2573848)
#3 void 
std::allocator_traits, std::allocator > const, 
impala::Webserver::UrlHandler> > > 
>::construct, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler> 
>(std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > >&, std::pair, std::allocator > const, 
impala::Webserver::UrlHandler>*, std::pair, std::allocator >, 
impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
 (impalad+0x25737f1)
#4 void std::_Rb_tree, std::allocator >, 
std::pair, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::_Select1st, std::allocator > const, 
impala::Webserver::UrlHandler> >, std::less, std::allocator > >, 
std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::_M_construct_node, std::allocator >, impala::Webserver::UrlHandler> 
>(std::_Rb_tree_node, std::allocator > const, 
impala::Webserver::UrlHandler> >*, std::pair, std::allocator >, 
impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:626:8
 (impalad+0x257369b)
#5 std::_Rb_tree_node, std::allocator > const, 
impala::Webserver::UrlHandler> >* 
std::_Rb_tree, 
std::allocator >, std::pair, std::allocator > const, 
impala::Webserver::UrlHandler>, 
std::_Select1st, std::allocator > const, 
impala::Webserver::UrlHandler> >, std::less, std::allocator > >, 
std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::_M_create_node, std::allocator >, impala::Webserver::UrlHandler> 
>(std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/worksp

[jira] [Resolved] (IMPALA-9740) TSAN data race in hdfs-bulk-ops

2020-09-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9740.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TSAN data race in hdfs-bulk-ops
> ---
>
> Key: IMPALA-9740
> URL: https://issues.apache.org/jira/browse/IMPALA-9740
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> hdfs-bulk-ops usage of a local connection cache (HdfsFsCache::HdfsFsMap) has 
> a data race:
> {code:java}
>  WARNING: ThreadSanitizer: data race (pid=23205)
>   Write of size 8 at 0x7b24005642d8 by thread T47:
> #0 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::add_node(boost::unordered::detail::node_constructor  const, hdfs_internal*> > > >&, unsigned long) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:329:26
>  (impalad+0x1f93832)
> #1 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace_impl >(std::string 
> const&, std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:420:41
>  (impalad+0x1f933ed)
> #2 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:384:20
>  (impalad+0x1f932d1)
> #3 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:241:27
>  (impalad+0x1f93238)
> #4 boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::insert(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:390:26
>  (impalad+0x1f92038)
> #5 impala::HdfsFsCache::GetConnection(std::string const&, 
> hdfs_internal**, boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > >*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/hdfs-fs-cache.cc:115:18
>  (impalad+0x1f916b3)
> #6 impala::HdfsOp::Execute() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:84:55
>  (impalad+0x23444d5)
> #7 HdfsThreadPoolHelper(int, impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:137:6
>  (impalad+0x2344ea9)
> #8 boost::detail::function::void_function_invoker2 impala::HdfsOp const&), void, int, impala::HdfsOp 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:118:11
>  (impalad+0x2345e80)
> #9 boost::function2::operator()(int, 
> impala::HdfsOp const&) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x1f883be)
> #10 impala::ThreadPool::WorkerThread(int) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread-pool.h:166:9
>  (impalad+0x1f874e5)
> #11 boost::_mfi::mf1, 
> int>::operator()(impala::ThreadPool*, int) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165:29
>  (impalad+0x1f87b7d)
> #12 void 
> boost::_bi::list2*>, 
> boost::_bi::value >::operator() impala::ThreadPool, int>, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1 impala::ThreadPool, int>&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:319:9
>  (impalad+0x1f87abc)
> #13 boost::_bi::bind_t impala::ThreadPool, int>, 
> boost::_bi::list2*>, 
> boost::_bi::value > >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x1f87a23)
> #14 
> boost::detail::function::voi

[jira] [Created] (IMPALA-10160) kernel_stack_watchdog cannot print user stack

2020-09-09 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10160:
-

 Summary: kernel_stack_watchdog cannot print user stack
 Key: IMPALA-10160
 URL: https://issues.apache.org/jira/browse/IMPALA-10160
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar


I've seen this a few times now, the kernel_stack_watchdog is used in a few 
places in the KRPC code and it prints out the kernel + user stack whenever a 
thread is stuck in some method call for too long. The issue is that the user 
stack does not get printed:

{code}
W0908 17:15:00.365721  6605 kernel_stack_watchdog.cc:198] Thread 6612 stuck at 
outbound_call.cc:273 for 120ms:
Kernel stack:
[] futex_wait_queue_me+0xc6/0x130
[] futex_wait+0x17b/0x280
[] do_futex+0x106/0x5a0
[] SyS_futex+0x80/0x180
[] system_call_fastpath+0x16/0x1b
[] 0x

User stack:

{code}

It says that the signal handler of taking the thread stack is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10154) Data race on coord_backend_id

2020-09-08 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10154:
-

 Summary: Data race on coord_backend_id
 Key: IMPALA-10154
 URL: https://issues.apache.org/jira/browse/IMPALA-10154
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Wenzhe Zhou


TSAN is reporting a data race on 
{{ExecQueryFInstancesRequestPB#coord_backend_id}}
{code:java}
WARNING: ThreadSanitizer: data race (pid=15392)
  Write of size 8 at 0x7b74001104a8 by thread T83 (mutexes: write 
M871582266043729400):
#0 impala::ExecQueryFInstancesRequestPB::mutable_coord_backend_id() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.pb.h:6625:23
 (impalad+0x20c03ed)
#1 impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, 
impala::TExecPlanFragmentInfo const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:216:21
 (impalad+0x20b8b29)
#2 impala::QueryExecMgr::StartQuery(impala::ExecQueryFInstancesRequestPB 
const*, impala::TQueryCtx const&, impala::TExecPlanFragmentInfo const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-exec-mgr.cc:80:23
 (impalad+0x20acb59)
#3 
impala::ControlService::ExecQueryFInstances(impala::ExecQueryFInstancesRequestPB
 const*, impala::ExecQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/control-service.cc:157:66
 (impalad+0x22a621d)
#4 
impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
const&, scoped_refptr 
const&)::$_1::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.service.cc:70:13
 (impalad+0x23622a4)
#5 std::_Function_handler 
const&, scoped_refptr 
const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316:2
 (impalad+0x23620ed)
#6 std::function::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14
 (impalad+0x2a4a453)
#7 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/kudu/rpc/service_if.cc:139:3
 (impalad+0x2a49efe)
#8 impala::ImpalaServicePool::RunThread() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/impala-service-pool.cc:272:15
 (impalad+0x2011a12)
#9 boost::_mfi::mf0::operator()(impala::ImpalaServicePool*) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:49:29
 (impalad+0x2017a16)
#10 void boost::_bi::list1 
>::operator(), 
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0&, boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:259:9
 (impalad+0x201796a)
#11 boost::_bi::bind_t, 
boost::_bi::list1 > 
>::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x20178f3)
#12 
boost::detail::function::void_function_obj_invoker0, 
boost::_bi::list1 > >, 
void>::invoke(boost::detail::function::function_buffer&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x20176e9)
#13 boost::function0::operator()() const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x1f666f1)
#14 impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
 (impalad+0x252644b)
#15 void 
boost::_bi::list5, std::allocator > >, 
boost::_bi::value, 
std::allocator > >, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), 
std::allocator > const&, std::__c

[jira] [Created] (IMPALA-10142) Add RPC sender tracing

2020-09-03 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10142:
-

 Summary: Add RPC sender tracing
 Key: IMPALA-10142
 URL: https://issues.apache.org/jira/browse/IMPALA-10142
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


We currently have RPC tracing on the receiver side, but not on the the sender 
side. For slow RPCs, the logs print out the total amount of time spent sending 
the RPC + the network time. Adding tracing will basically make this more 
granular. It will help determine where exactly in the stack the time was spent 
when sending RPCs.

Combined with the trace logs in the receiver, it should be much easier to 
determine the timeline of a given slow RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10141:
-

 Summary: Include aggregate TCP metrics in per-node profiles
 Key: IMPALA-10141
 URL: https://issues.apache.org/jira/browse/IMPALA-10141
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
metrics per kRPC connection for all inbound / outbound connections. It would be 
useful to aggregate some of these metrics and put them in the per-node 
profiles. Since it is not possible to currently split these metrics out per 
query, they should be added at the per-host level. Furthermore, only metrics 
that can be sanely aggregated across all connections should be included. For 
example, tracking the number of Retransmitted TCP Packets across all 
connections for the duration of the query would be useful. TCP retransmissions 
should be rare and are typically indicate of network hardware issues or network 
congestions, having at least some high level idea of the number of TCP 
retransmissions that occur during a query can drastically help determine if the 
network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10139:
-

 Summary: Slow RPC logs can be misleading
 Key: IMPALA-10139
 URL: https://issues.apache.org/jira/browse/IMPALA-10139
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
successfully complete a RPC. The issue is that there are many reasons why an 
RPC might take a long time to complete. An RPC is considered complete only when 
the receiver has processed that RPC. 

The problem is that due to client-driven back-pressure mechanism, it is 
entirely possible that the receiver RPC does not process a receiver RPC because 
{{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been called yet 
(indirectly called by {{ExchangeNode::GetNext}}).

This can lead to flood of slow RPC logs, even though the RPCs might not 
actually be slow themselves. What is worse is that the because of the 
back-pressure mechanism, slowness from the client (e.g. Hue users) will 
propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10138) Add fragment instance id to RPC trace output

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10138:
-

 Summary: Add fragment instance id to RPC trace output
 Key: IMPALA-10138
 URL: https://issues.apache.org/jira/browse/IMPALA-10138
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The RPC traces added in IMPALA-9128 are hard to correlate to specific queries 
because the output does not include the fragment instance id. I'm not sure if 
this is actually possible in the current kRPC code, but it would be nice if the 
tracing output included the fragment instance id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10137) Network Debugging / Supportability Improvements

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10137:
-

 Summary: Network Debugging / Supportability Improvements
 Key: IMPALA-10137
 URL: https://issues.apache.org/jira/browse/IMPALA-10137
 Project: IMPALA
  Issue Type: Epic
Reporter: Sahil Takiar


There are various improvements Impala should make to improve debugging of 
network issues (e.g. slow RPCs, TCP retransmissions, etc.).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10126) asf-master-core-s3 test_aggregation.TestWideAggregationQueries.test_many_grouping_columns failed

2020-09-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10126.
---
Resolution: Duplicate

Duplicate of IMPALA-9058

> asf-master-core-s3 
> test_aggregation.TestWideAggregationQueries.test_many_grouping_columns failed
> 
>
> Key: IMPALA-10126
> URL: https://issues.apache.org/jira/browse/IMPALA-10126
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yongzhi Chen
>Priority: Major
>
> query_test.test_aggregation.TestWideAggregationQueries.test_many_grouping_columns[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> {noformat}
> Error Message
> query_test/test_aggregation.py:453: in test_many_grouping_columns result 
> = self.execute_query(query, exec_option, table_format=table_format) 
> common/impala_test_suite.py:811: in wrapper return function(*args, 
> **kwargs) common/impala_test_suite.py:843: in execute_query return 
> self.__execute_query(self.client, query, query_options) 
> common/impala_test_suite.py:909: in __execute_query return 
> impalad_client.execute(query, user=user) common/impala_connection.py:205: in 
> execute return self.__beeswax_client.execute(sql_stmt, user=user) 
> beeswax/impala_beeswax.py:187: in execute handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:365: in __execute_query 
> self.wait_for_finished(handle) beeswax/impala_beeswax.py:386: in 
> wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
> error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: E
> Query aborted:Disk I/O error on 
> impala-ec2-centos74-m5-4xlarge-ondemand-1129.vpc.cloudera.com:22001: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/widetable_1000_cols_parquet/1f4ec08992b6e3f9-6fd9a17d_1482052561_data.0.parq
>  E   Error(2): No such file or directory E   Root cause: 
> ResourceNotFoundException: Requested resource not found (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; 
> Request ID: 1HMMG39MJ9GP2JEENAUFVFDVA3VV4KQNSO5AEMVJF66Q9ASUAAJG)
> Stacktrace
> query_test/test_aggregation.py:453: in test_many_grouping_columns
> result = self.execute_query(query, exec_option, table_format=table_format)
> common/impala_test_suite.py:811: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:843: in execute_query
> return self.__execute_query(self.client, query, query_options)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:386: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos74-m5-4xlarge-ondemand-1129.vpc.cloudera.com:22001: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/widetable_1000_cols_parquet/1f4ec08992b6e3f9-6fd9a17d_1482052561_data.0.parq
> E   Error(2): No such file or directory
> E   Root cause: ResourceNotFoundException: Requested resource not found 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ResourceNotFoundException; Request ID: 
> 1HMMG39MJ9GP2JEENAUFVFDVA3VV4KQNSO5AEMVJF66Q9ASUAAJG)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10128) AnalyzeDDLTest.TestCreateTableLikeFileOrc failed

2020-09-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10128.
---
Resolution: Duplicate

Looks like a duplicate of IMPALA-9351

> AnalyzeDDLTest.TestCreateTableLikeFileOrc failed
> 
>
> Key: IMPALA-10128
> URL: https://issues.apache.org/jira/browse/IMPALA-10128
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Priority: Major
>
> Parallel-all-tests:
> In ubuntu-16.04-from-scratch, 
> org.apache.impala.analysis.AnalyzeDDLTest.TestCreateTableLikeFileOrc
> failed with
> Error during analysis:
> org.apache.impala.common.AnalysisException: Cannot infer schema, path does 
> not exist: 
> hdfs://localhost:20500/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0
> sql:
> create table if not exists newtbl_DNE like orc 
> '/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0'
> Stacktrace
> java.lang.AssertionError: 
> Error during analysis:
> org.apache.impala.common.AnalysisException: Cannot infer schema, path does 
> not exist: 
> hdfs://localhost:20500/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0
> sql:
> create table if not exists newtbl_DNE like orc 
> '/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0'
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.impala.common.FrontendFixture.analyzeStmt(FrontendFixture.java:397)
>   at 
> org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:246)
>   at 
> org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:186)
>   at 
> org.apache.impala.analysis.AnalyzeDDLTest.TestCreateTableLikeFileOrc(AnalyzeDDLTest.java:2027)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IMPALA-10123) asf-master-core-tsan load data error

2020-09-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-10123.
-
Resolution: Duplicate

I think this is a duplicate of IMPALA-10129. The underlying error was in the 
impalad.ERROR logs for data load.

> asf-master-core-tsan load data error
> 
>
> Key: IMPALA-10123
> URL: https://issues.apache.org/jira/browse/IMPALA-10123
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Priority: Major
>
> The load data failed in asf-master-core-tsan two builds in a row:
> 19:32:54 16:32:54 Error executing impala SQL: 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/logs/data_loading/sql/functional/invalidate-functional-query-exhaustive-impala-generated.sql
>  See: 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/logs/data_loading/sql/functional/invalidate-functional-query-exhaustive-impala-generated.sql.log
> In the log, it shows:
> Encounter errors before parsing any queries.
> Traceback (most recent call last):
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/bin/load-data.py",
>  line 202, in exec_impala_query_from_file
> impala_client.connect()
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 162, in connect
> raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION: 
>  MESSAGE: Could not connect to localhost:21000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10129) Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats

2020-09-01 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10129:
-

 Summary: Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats
 Key: IMPALA-10129
 URL: https://issues.apache.org/jira/browse/IMPALA-10129
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar
Assignee: Qifan Chen


TSAN is reporting a data race in 
{{MemTracker::GetTopNQueriesAndUpdatePoolStats}}

{code}
WARNING: ThreadSanitizer: data race (pid=6436)
  Read of size 1 at 0x7b480017aaa8 by thread T320 (mutexes: write 
M861448892003377216, write M862574791910219632, write M623321199144890016, 
write M1054540811927503496):
#0 
impala::MemTracker::GetTopNQueriesAndUpdatePoolStats(std::priority_queue >, 
std::greater >&, int, impala::TPoolStats&) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:453:19
 (impalad+0x20b13b1)
#1 impala::MemTracker::UpdatePoolStatsForQueries(int, impala::TPoolStats&) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:432:3
 (impalad+0x20b123d)
#2 impala::AdmissionController::PoolStats::UpdateMemTrackerStats() 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1642:14
 (impalad+0x21c9d10)
#3 
impala::AdmissionController::AddPoolUpdates(std::vector >*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1662:18
 (impalad+0x21c7053)
#4 
impala::AdmissionController::UpdatePoolStats(std::map, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1355:5
 (impalad+0x21c6d7d)
#5 
impala::AdmissionController::Init()::$_4::operator()(std::map, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*) const 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:643:45
 (impalad+0x21ce0e1)
#6 
boost::detail::function::void_function_obj_invoker2, 
std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*>::invoke(boost::detail::function::function_buffer&, 
std::map, 
std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x21cdf2c)
#7 boost::function2, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*>::operator()(std::map, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*) const 
/data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x23fa960)
#8 
impala::StatestoreSubscriber::UpdateState(std::map, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, impala::TUniqueId const&, std::vector >*, bool*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:471:7
 (impalad+0x23f7899)
#9 
impala::StatestoreSubscriberThriftIf::UpdateState(impala::TUpdateStateResponse&,
 impala::TUpdateStateRequest const&) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:110:18
 (impalad+0x23fabbf)
#10 impala::StatestoreSubscriberProcessor::process_UpdateState(int, 
apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, 
void*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/StatestoreSubscriber.cpp:543:13
 (impalad+0x29adba4)
#11 
impala::StatestoreSubscriberProcessor::dispatchCall(apache::thrift::protocol::TProtocol*,
 apache::thrift::protocol::TProtocol*, std::__cxx11::basic_string, std::allocator > const&, int, void*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/StatestoreSubscriber.cpp:516:3
 (impalad+0x29ad982)
#12 
apache::thrift::TDispatchProcessor::process(boost::shared_ptr,
 boost::shared_ptr, void*) 
/data/jenkins/workspace/impala-asf-master-core-

[jira] [Resolved] (IMPALA-10030) Remove unneeded jars from fe/pom.xml

2020-08-31 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10030.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Remove unneeded jars from fe/pom.xml
> 
>
> Key: IMPALA-10030
> URL: https://issues.apache.org/jira/browse/IMPALA-10030
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are several jars dependencies that are (1) not needed, (2) can easily 
> be removed, (3) can be converted to test dependencies, or (4) pull in 
> unnecessary transitive dependencies.
> Removing all these jar dependencies can help decrease the size of Impala 
> Docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10117) Skip calls to FsPermissionCache for blob stores

2020-08-31 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10117:
-

 Summary: Skip calls to FsPermissionCache for blob stores
 Key: IMPALA-10117
 URL: https://issues.apache.org/jira/browse/IMPALA-10117
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The {{FsPermissionCache}} is described as:
{code:java}
/**
 * Simple non-thread-safe cache for resolved file permissions. This allows
 * pre-caching permissions by listing the status of all files within a 
directory,
 * and then using that cache to avoid round trips to the FileSystem for later
 * queries of those paths.
 */ {code}

I confirmed, and {{FsPermissionCache#precacheChildrenOf}} is actually called 
for data stored on S3. The issue is that {{FsPermissionCache#getPermissions}} 
is called inside {{HdfsTable#getAvailableAccessLevel}}, which is skipped for 
S3. So all the cached metadata is not used. The problem is that 
{{precacheChildrenOf}} calls {{getFileStatus}} for all files, which results in 
a bunch of unnecessary metadata operations to S3 + a bunch of cached metadata 
that is never used.

{{precacheChildrenOf}} is actually only invoked in the specific scenario 
described below:
{code}
// Only preload permissions if the number of partitions to be added is
// large (3x) relative to the number of existing partitions. This covers
// two common cases:
//
// 1) initial load of a table (no existing partition metadata)
// 2) ALTER TABLE RECOVER PARTITIONS after creating a table pointing to
// an already-existing partition directory tree
//
// Without this heuristic, we would end up using a "listStatus" call to
// potentially fetch a bunch of irrelevant information about existing
// partitions when we only want to know about a small number of newly-added
// partitions.
{code}

Regardless, skipping the call to {{precacheChildrenOf}} for blob stores should 
(1) improve table loading time for S3 backed tables, and (2) decrease catalogd 
memory requirements when loading a bunch of tables stored on S3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10073) Create shaded dependency for S3A and aws-java-sdk-bundle

2020-08-31 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10073.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Create shaded dependency for S3A and aws-java-sdk-bundle
> 
>
> Key: IMPALA-10073
> URL: https://issues.apache.org/jira/browse/IMPALA-10073
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> One of the largest dependencies in Impala Docker containers is the 
> aws-java-sdk-bundle jar. One way to decrease the size of this dependency is 
> to apply a similar technique used for the hive-exec shaded jar: 
> [https://github.com/apache/impala/blob/master/shaded-deps/pom.xml]
> The aws-java-sdk-bundle contains SDKs for all AWS services, even though 
> Impala-S3A only requires a few of the more basic SDKs.
> IMPALA-10028 and HADOOP-17197 both discuss this a bit as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8547) get_json_object fails to get value for numeric key

2020-08-25 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8547.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> get_json_object fails to get value for numeric key
> --
>
> Key: IMPALA-8547
> URL: https://issues.apache.org/jira/browse/IMPALA-8547
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Eugene Zimichev
>Assignee: Eugene Zimichev
>Priority: Minor
>  Labels: built-in-function
> Fix For: Impala 4.0
>
>
> {code:java}
> select get_json_object('{"1": 5}', '$.1');
> {code}
> returns error:
>  
> {code:java}
> "Expected key at position 2"
> {code}
>  
> I guess it's caused by using function FindEndOfIdentifier that expects first 
> symbol of key to be a letter.
> Hive version of get_json_object works fine in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10085) Table level stats are not honored when partition has corrupt stats

2020-08-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10085:
-

 Summary: Table level stats are not honored when partition has 
corrupt stats
 Key: IMPALA-10085
 URL: https://issues.apache.org/jira/browse/IMPALA-10085
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


This is more of an edge case of IMPALA-9744, but when any partition in a table 
has corrupt stats, the table-level stats will not be honored. On the other 
hand, if a table just has missing stats, the table-level stats will be honored.

Given the a partitioned table with the following partitions and their row 
counts:

{code:java}
[localhost:21000] default> show partitions part_test;
Query: show partitions part_test
+-+++--+--+---++---+---+
| partcol | #Rows  | #Files | Size | Bytes Cached | Cache Replication | 
Format | Incremental stats | Location   
   |
+-+++--+--+---++---+---+
| 1   | -1 | 1  | 10B  | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| 2   | -438290| 1  | 6B   | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
| 3   | 3  | 1  | 6B   | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
| Total   | 100100 | 3  | 22B  | 0B   |   | 
   |   |
   |
+-+++--+--+---++---+---+
 {code}

The query {{explain select * from part_test order by col limit 10}} will cause 
{{HdfsScanNode#getStatsNumRows}} to return 5.

Given the following set of partitions with different row counts than above:

{code}
+-+++--+--+---++---+---+
| partcol | #Rows  | #Files | Size | Bytes Cached | Cache Replication | 
Format | Incremental stats | Location   
   |
+-+++--+--+---++---+---+
| 1   | -1 | 1  | 10B  | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| 2   | -1 | 1  | 6B   | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
| 3   | 3  | 1  | 6B   | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
| Total   | 100100 | 3  | 22B  | 0B   |   | 
   |   |
   |
+-+++--+--+---++---+---+
{code}

The same method returns 100100.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10084) Display the number of estimated rows for a table

2020-08-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10084:
-

 Summary: Display the number of estimated rows for a table
 Key: IMPALA-10084
 URL: https://issues.apache.org/jira/browse/IMPALA-10084
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


AFAICT, there is no way to determine the number of rows estimated for a table 
when row counts have been estimated via file size:
{code:java}
[localhost:21000] default> create table test (col int);
[localhost:21000] default> insert into table test values (1), (2), (3), (4), 
(5);
[localhost:21000] default> show table stats test;
+---++--+--+---++---++
| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
Incremental stats | Location   |
+---++--+--+---++---++
| -1| 1  | 10B  | NOT CACHED   | NOT CACHED| TEXT   | false 
| hdfs://localhost:20500/test-warehouse/test |
+---++--+--+---++---++
[localhost:21000] default> explain select * from test order by col limit 10;
++
| Explain String
 |
++
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3
 |
| Per-Host Resource Estimates: Memory=32MB  
 |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| default.test  
 |
|   
 |
| PLAN-ROOT SINK
 |
| | 
 |
| 02:MERGING-EXCHANGE [UNPARTITIONED]   
 |
| |  order by: col ASC  
 |
| |  limit: 10  
 |
| | 
 |
| 01:TOP-N [LIMIT=10]   
 |
| |  order by: col ASC  
 |
| |  row-size=4B cardinality=3  
 |
| | 
 |
| 00:SCAN HDFS [default.test]   
 |
|HDFS partitions=1/1 files=1 size=10B   
 |
|row-size=4B cardinality=3  
 |
++
[localhost:21000] default> set explain_level=3;
localhost:21000] default> explain select * from test order by col limit 10;
+--+
| Explain String
   |
+--+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3
   |
| Per-Host Resource Estimates: Memory=32MB  
   |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| default.test  
   |
| Analyzed query: SELECT * FROM `default`.test ORDER BY col ASC LIMIT CAST(10 
AS   |
| TINYINT)  
   |
|   
   |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
   |
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B 
thread-reservation=1 |
|   PLAN-ROOT SINK  
   |
|   |  output exprs: col
   |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
   |
|   |   
   |
|   02:MERGING-EXCHANGE [UNPARTITIONED]  

[jira] [Created] (IMPALA-10083) Improve row count estimates when stats are not available

2020-08-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10083:
-

 Summary: Improve row count estimates when stats are not available
 Key: IMPALA-10083
 URL: https://issues.apache.org/jira/browse/IMPALA-10083
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Sahil Takiar


There are various improvements that we can make to estimate row count stats 
even if stats are not available for a table.

There are various factors to consider here:
 * Handling for partitioned vs. non-partitioned tables
 ** Handling for partitioned tables can be a bit tricky if the table is in a 
mixed state - some partitions have row counts while other don't
 * Interoperability with other systems such as Hive and Spark
 * Users can run alter table statements to manually set the value of the row 
count

The JIRA will be used to track the various improvements via sub-tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10029) Strip debug symbols from libkudu_client and libstdc++ binaries

2020-08-12 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10029.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Strip debug symbols from libkudu_client and libstdc++ binaries
> --
>
> Key: IMPALA-10029
> URL: https://issues.apache.org/jira/browse/IMPALA-10029
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> IMPALA-8425 strips the debug symbols of the impalad binary. libkudu_client.so 
> and libstdc++ also take up a non-trivial amount of space in the Docker 
> containers, so we should strip debug symbols from them as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10073) Created shaded dependency for S3A and aws-java-sdk-bundle

2020-08-11 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10073:
-

 Summary: Created shaded dependency for S3A and aws-java-sdk-bundle
 Key: IMPALA-10073
 URL: https://issues.apache.org/jira/browse/IMPALA-10073
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


One of the largest dependencies in Impala Docker containers is the 
aws-java-sdk-bundle jar. One way to decrease the size of this dependency is to 
apply a similar technique used for the hive-exec shaded jar: 
[https://github.com/apache/impala/blob/master/shaded-deps/pom.xml]

The aws-java-sdk-bundle contains SDKs for all AWS services, even though 
Impala-S3A only requires a few of the more basic SDKs.

IMPALA-10028 and HADOOP-17197 both discuss this a bit as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10072) Data load failures in ubuntu-16.04-from-scratch

2020-08-11 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10072:
-

 Summary: Data load failures in ubuntu-16.04-from-scratch
 Key: IMPALA-10072
 URL: https://issues.apache.org/jira/browse/IMPALA-10072
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar


Seems like there are consistent data load failures on several unrelated patches:

[https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11627/]

[https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11629/|https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11629/#showFailuresLink]

[https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11631/|https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11631/#showFailuresLink]

[https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11633/|https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11633/#showFailuresLink]

[https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11635/]

Almost all seem to be failing with an error like this:
{code:java}
02:06:32 Loading nested parquet data (logging to 
/home/ubuntu/Impala/logs/data_loading/load-nested.log)... 
02:08:06 FAILED (Took: 1 min 34 sec)
02:08:06 '/home/ubuntu/Impala/testdata/bin/load_nested.py -t 
tpch_nested_parquet -f parquet/none' failed. Tail of log:
02:08:06at javax.security.auth.Subject.doAs(Subject.java:422)
02:08:06at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
02:08:06at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
02:08:06 
02:08:06at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:220)
02:08:06at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1361)
02:08:06at 
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
02:08:06at 
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
02:08:06at 
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
02:08:06at 
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
02:08:06at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:471)
02:08:06... 17 more
02:08:06 Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
File 
/test-warehouse/tpch_nested_parquet.db/.hive-staging_hive_2020-08-11_02-07-45_902_3668710725192096563-193/_task_tmp.-ext-10004/_tmp.00_3
 could only be written to 0 of the 1 minReplication nodes. There are 3 
datanode(s) running and 3 node(s) are excluded in this operation.
02:08:06at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2259)
02:08:06at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
02:08:06at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2773)
02:08:06at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:879)
02:08:06at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:583)
02:08:06at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
02:08:06at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
02:08:06at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
02:08:06at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
02:08:06at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
02:08:06at java.security.AccessController.doPrivileged(Native Method)
02:08:06at javax.security.auth.Subject.doAs(Subject.java:422)
02:08:06at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
02:08:06at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
02:08:06 
02:08:06at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1553)
02:08:06at org.apache.hadoop.ipc.Client.call(Client.java:1499)
02:08:06at org.apache.hadoop.ipc.Client.call(Client.java:1396)
02:08:06at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
02:08:06at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
02:08:06at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
02:08:06at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:520)
02:08:06at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
02:08:06at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
02:08:06at 
sun.

[jira] [Created] (IMPALA-10068) Split out jars for catalog Docker images

2020-08-10 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10068:
-

 Summary: Split out jars for catalog Docker images
 Key: IMPALA-10068
 URL: https://issues.apache.org/jira/browse/IMPALA-10068
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


One way to decrease the size of the catalogd images is to only include jar 
files necessary to run the catalogd. Currently, all Impala coordiantor / 
executor jars are included in the catalogd images, which is not necessary.

This can be fixed by splitting the fe/ Java code into fe/ and catalogd/ folders 
(and perhaps a  java-common/ folder). This is probably a nice improvement to 
make regardless because the fe and catalogd code should really be in separate 
Maven modules. By separating all catalogd code into a separate Maven module it 
should be easy to modify the Docker built scripts to only copy in the catalogd 
jars for the catalogd Impala image.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10067) TestImpalaShell.test_large_sql is flaky

2020-08-10 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10067:
-

 Summary: TestImpalaShell.test_large_sql is flaky
 Key: IMPALA-10067
 URL: https://issues.apache.org/jira/browse/IMPALA-10067
 Project: IMPALA
  Issue Type: Test
  Components: Clients
Reporter: Sahil Takiar


{code:java}
shell.test_shell_commandline.TestImpalaShell.test_large_sql[table_format_and_file_extension:
 ('textfile', '.txt') | protocol: hs2-http] {code}

This test failed recently in a pre-commit job: 
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2920/testReport/junit/shell.test_shell_commandline/TestImpalaShell/test_large_sql_table_format_and_file_extensiontextfile_txt_protocol__hs2_http_/

{code}
Error Message

shell/test_shell_commandline.py:882: in test_large_sql assert actual_time_s 
<= time_limit_s, ( E   AssertionError: It took 20.2972311974 seconds to execute 
the query. Time limit is 20 seconds. E   assert 20.297231197357178 <= 20

Stacktrace

shell/test_shell_commandline.py:882: in test_large_sql
assert actual_time_s <= time_limit_s, (
E   AssertionError: It took 20.2972311974 seconds to execute the query. Time 
limit is 20 seconds.
E   assert 20.297231197357178 <= 20
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9478) Runtime profiles should indicate if custom UDFs are being used

2020-08-07 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9478.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Runtime profiles should indicate if custom UDFs are being used
> --
>
> Key: IMPALA-9478
> URL: https://issues.apache.org/jira/browse/IMPALA-9478
> Project: IMPALA
>  Issue Type: Task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Custom UDFs can include arbitrary user code that can cause query slowdown. In 
> order to better diagnose queries with UDF issues, it is first important to 
> know when a query is even using an UDF.
> Runtime profiles should list out any custom UDFs used by the query, as well 
> as the library the UDF is loaded from.
> For Java UDFs, the full classname of the UDF would be good as well.
> Any other metadata associated with the UDF might be useful as well. There are 
> a few things that are printed by {{show functions}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10049) Include RPC call_id in slow RPC logs

2020-08-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10049:
-

 Summary: Include RPC call_id in slow RPC logs
 Key: IMPALA-10049
 URL: https://issues.apache.org/jira/browse/IMPALA-10049
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The current code for logging slow RPCs on the sender side looks something like 
this:
{code:java}
template 
void KrpcDataStreamSender::Channel::LogSlowRpc(
  ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
  int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
  LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
  ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
"): "
  ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
<< ". "
  ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
  ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), TUnit::TIME_NS)
  ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, 
TUnit::TIME_NS);
}void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
  ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
  LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
  ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
"): "
  ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
<< ". "
  ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
} {code}

It would be nice to include the call_id in the logs as well so that RPCs can 
more easily be traced. The RPC call_id is dumped in RPC traces on the receiver 
side, as well as in the /rpcz output on the debug ui.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10035) send_bytes_per_sec in /rpcz json stats can be negative

2020-07-31 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10035:
-

 Summary: send_bytes_per_sec in /rpcz json stats can be negative
 Key: IMPALA-10035
 URL: https://issues.apache.org/jira/browse/IMPALA-10035
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar


{code:java}
{
"remote_ip": "10.196.10.165:27000",
"num_calls_in_flight": 0,
"outbound_queue_size": 0,
"socket_stats": {
"rtt": 91,
"rttvar": 9,
"snd_cwnd": 10,
"total_retrans": 0,
"pacing_rate": 4294967295,
"max_pacing_rate": 4294967295,
"bytes_acked": 7995867431,
"bytes_received": 17908351,
"segs_out": 1186603,
"segs_in": 927339,
"send_queue_bytes": 0,
"receive_queue_bytes": 0,
"send_bytes_per_sec": -694198066
},
"calls_in_flight": []
}, {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10030) Remove unneeded jars from fe/pom.xml

2020-07-30 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10030:
-

 Summary: Remove unneeded jars from fe/pom.xml
 Key: IMPALA-10030
 URL: https://issues.apache.org/jira/browse/IMPALA-10030
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


There are several jars dependencies that are (1) not needed, (2) can easily be 
removed, (3) can be converted to test dependencies, or (4) pull in unnecessary 
transitive dependencies.

Removing all these jar dependencies can help decrease the size of Impala Docker 
images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10029) Strip debug symbols from libkudu_client and libstdc++ binaries

2020-07-30 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10029:
-

 Summary: Strip debug symbols from libkudu_client and libstdc++ 
binaries
 Key: IMPALA-10029
 URL: https://issues.apache.org/jira/browse/IMPALA-10029
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


IMPALA-8425 strips the debug symbols of the impalad binary. libkudu_client.so 
and libstdc++ also take up a non-trivial amount of space in the Docker 
containers, so we should strip debug symbols from them as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-07-30 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10028:
-

 Summary: Additional optimizations of Impala docker container sizes
 Key: IMPALA-10028
 URL: https://issues.apache.org/jira/browse/IMPALA-10028
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


There are some more optimizations we can make to get the images to be even 
smaller. It looks like we may have regressed with regards to image size as 
well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10016) Split jars for Impala executors and coordinators Docker images

2020-07-28 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10016:
-

 Summary: Split jars for Impala executors and coordinators Docker 
images
 Key: IMPALA-10016
 URL: https://issues.apache.org/jira/browse/IMPALA-10016
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Impala executors and coordinator currently have a common base images. The base 
image defines a set of jar files needed by either the coordinator or the 
executor. In order to reduce the image size, we should split out the jars into 
two categories: those necessary for the coordinator and those necessary for the 
executor. This should help reduce overall image size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9479) Include GC time in runtime profiles

2020-07-27 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9479.
--
Resolution: Duplicate

Closing as duplicate of IMPALA-9046

> Include GC time in runtime profiles
> ---
>
> Key: IMPALA-9479
> URL: https://issues.apache.org/jira/browse/IMPALA-9479
> Project: IMPALA
>  Issue Type: Task
>Reporter: Sahil Takiar
>Priority: Major
>
> The JvmPauseMonitor prints out logs whenever it detects an excessive amount 
> of time being spent in GC. However, these log lines can often go unnoticed, 
> it would be useful to include some GC related information in the runtime 
> profiles.
> This is useful for diagnosing:
>  * Issues with Java UDFs that spend a lot of time in GC
>  * GC issues on the Coordinator from the fe/ code
>  * Some S3 operations could potentially be GC intensive - e.g. S3A block 
> output stream
> I'm not sure there is a way to track GC per query, since GC happens globally 
> inside the JVM. There are a few ways to get GC information into the profile:
>  * If the JvmPauseMonitor detects a GC pause it can insert a warning in the 
> profiles of all running queries
>  * JMX metrics can be used to detect how much time was spent in GC from when 
> a fragment began to when it ended



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8754) S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB

2020-07-23 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8754.
--
Resolution: Duplicate

Closing as a duplicate of IMPALA-9058.

> S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB
> -
>
> Key: IMPALA-8754
> URL: https://issues.apache.org/jira/browse/IMPALA-8754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: load-tpch-core-impala-generated-kudu-none-none.sql.log
>
>
> When running tests on s3 with s3guard, various tests can encounter the 
> following error coming from the DynamoDB:
> {noformat}
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos74-m5-4xlarge-ondemand-02c8.vpc.cloudera.com:22002: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2451718/6843d8a91fc5ae1d-88b2af4b0004_156969840_data.0.parq
> E   Error(2): No such file or directory
> E   Root cause: ResourceNotFoundException: Requested resource not found 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ResourceNotFoundException; Request ID: 
> XXX){noformat}
> Tests that have seen this (this is flaky):
>  * TestTpcdsQuery.test_tpcds_count
>  * TestHdfsFdCaching.test_caching_disabled_by_param
>  * TestMtDop.test_compute_stats
>  * TestScanRangeLengths.test_scan_ranges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9996) An S3 test failing with ResourceNotFoundException

2020-07-23 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9996.
--
Resolution: Duplicate

Looks like a duplicate of IMPALA-8754 and IMPALA-9058

> An S3 test failing with ResourceNotFoundException
> -
>
> Key: IMPALA-9996
> URL: https://issues.apache.org/jira/browse/IMPALA-9996
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Sahil Takiar
>Priority: Critical
>  Labels: broken-build, flaky
>
> In a recent S3 build, we have seen that 
> [test_tpcds_count|https://github.com/apache/impala/blob/master/tests/query_test/test_tpcds_queries.py#L52-L53]
>  
> ([https://github.com/apache/impala/blob/master/testdata/workloads/tpcds/queries/count.test#L114-L119])
>  failed with {{ResourceNotFoundException}}.
> The issue may be related to IMPALA-9058.
> The error message in {{impalad.INFO}} (under the directory of {{ee_tests}}) 
> is as follows.
> {code:java}
> I0722 10:31:44.524209 13047 coordinator.cc:684] ExecState: query 
> id=7d4f684028848784:ad2f6f0e 
> finstance=7d4f684028848784:ad2f6f0e0001 on 
> host=impala-ec2-centos74-m5-4xlarge-ondemand-1230.vpc.cloudera.com:22002 
> (EXECUTING -> ERROR) status=Disk I/O error on 
> impala-ec2-centos74-m5-4xlarge-ondemand-1230.vpc.cloudera.com:22002: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2451752/d245f3a054fd2c66-f7f705220004_1984874558_data.0.parq
> Error(2): No such file or directory
> Root cause: ResourceNotFoundException: Requested resource not found (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; 
> Request ID: G9QMA17VTSIKOQK33V9TDGMF13VV4KQNSO5AEMVJF66Q9ASUAAJG)
> {code}
> Maybe [~stakiar] and [~joemcdonnell] could offer some insight into it. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9799) Flakiness in TestFetchFirst due to wrong results of get_num_in_flight_queries

2020-07-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9799.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Flakiness in TestFetchFirst due to wrong results of get_num_in_flight_queries
> -
>
> Key: IMPALA-9799
> URL: https://issues.apache.org/jira/browse/IMPALA-9799
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Quanlong Huang
>Assignee: Sahil Takiar
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 4.0
>
>
> Saw two failures for this test in different jenkins jobs:
> hs2.test_fetch_first.TestFetchFirst.test_query_stmts_v6 (from pytest)
>  Stacktrace:
> {code:java}
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:63:
>  in add_session
> lambda: fn(self))
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:44:
>  in add_session_helper
> fn()
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:63:
>  in 
> lambda: fn(self))
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:110:
>  in test_query_stmts_v6
> self.run_query_stmts_test()
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:181:
>  in run_query_stmts_test
> self.__test_invalid_result_caching("SELECT COUNT(*) FROM 
> functional.alltypes")
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:63:
>  in __test_invalid_result_caching
> assert 0 == impalad.get_num_in_flight_queries()
> E   assert 0 == 1
> E+  where 1 =  >()
> E+where  > = 
>  0x6d25d10>.get_num_in_flight_queries{code}
> hs2.test_fetch_first.TestFetchFirst.test_query_stmts_v6_with_result_spooling 
> (from pytest)
>  Stacktrace:
> {code:java}
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:63:
>  in add_session
> lambda: fn(self))
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:44:
>  in add_session_helper
> fn()
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/hs2_test_suite.py:63:
>  in 
> lambda: fn(self))
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:120:
>  in test_query_stmts_v6_with_result_spooling
> self.run_query_stmts_test({'spool_query_results': 'true'})
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:181:
>  in run_query_stmts_test
> self.__test_invalid_result_caching("SELECT COUNT(*) FROM 
> functional.alltypes")
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/hs2/test_fetch_first.py:63:
>  in __test_invalid_result_caching
> assert 0 == impalad.get_num_in_flight_queries()
> E   assert 0 == 1
> E+  where 1 =  >()
> E+where  > = 
>  0x81d4990>.get_num_in_flight_queries{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9953) Shell does not return all rows if a fetch times out in FINISHED state

2020-07-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9953.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Shell does not return all rows if a fetch times out in FINISHED state
> -
>
> Key: IMPALA-9953
> URL: https://issues.apache.org/jira/browse/IMPALA-9953
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Blocker
>  Labels: correctness
> Fix For: Impala 4.0
>
>
> I noticed that if a fetch times out, impala-shell will stop returning rows 
> and close the query. It looks like this happens if the query transitions to 
> FINISHED state, then the fetch times out
> I ran into this on an experimental branch where a sort deadlocked. I haven't 
> been able to repro on master yet but I thought I should report it.
> The bug is here:
> {noformat}
> diff --git a/shell/impala_shell.py b/shell/impala_shell.py
> index e0d802626..323aee6c9 100755
> --- a/shell/impala_shell.py
> +++ b/shell/impala_shell.py
> @@ -1182,8 +1182,7 @@ class ImpalaShell(cmd.Cmd, object):
>  
>  for rows in rows_fetched:
># IMPALA-4418: Break out of the loop to prevent printing an 
> unnecessary empty line.
> -  if len(rows) == 0:
> -break
> +  if len(rows) == 0: continue
>self.output_stream.write(rows)
>num_rows += len(rows)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9993) Improve get_json_object path specification format

2020-07-22 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9993:


 Summary: Improve get_json_object path specification format
 Key: IMPALA-9993
 URL: https://issues.apache.org/jira/browse/IMPALA-9993
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar


Filing as a follow up to IMPALA-8547 based on some of the discussion in 
[https://gerrit.cloudera.org/#/c/14905/]

It seems most databases have a slightly different way of handling JSON data. 
The Hive / Impala behavior seems similar to MySQL in syntax (e.g. 
JSON_EXTRACT), although MySQL is much more restrictive about the path 
specification format. Postgres on the other hand has a slightly different 
syntax for path specification compared to MySQL / Hive / Impala, and is more 
permissive in what formats it allows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9991) TestShellClient.test_fetch_size_result_spooling is flaky

2020-07-22 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9991:


 Summary: TestShellClient.test_fetch_size_result_spooling is flaky
 Key: IMPALA-9991
 URL: https://issues.apache.org/jira/browse/IMPALA-9991
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar
Assignee: Sahil Takiar


shell.test_shell_client.TestShellClient.test_fetch_size_result_spooling[table_format_and_file_extension:
 ('parquet', '.parq') | protocol: hs2] (from pytest)
h3. Error Message

shell/test_shell_client.py:70: in test_fetch_size_result_spooling 
self.__fetch_rows(client.fetch(handle), num_rows / fetch_size, num_rows) 
shell/test_shell_client.py:80: in __fetch_rows for fetch_batch in 
fetch_batches: ../shell/impala_client.py:787: in fetch yield 
self._transpose(col_value_converters, resp.results.columns) E AttributeError: 
'NoneType' object has no attribute 'columns'
h3. Stacktrace

shell/test_shell_client.py:70: in test_fetch_size_result_spooling 
self.__fetch_rows(client.fetch(handle), num_rows / fetch_size, num_rows) 
shell/test_shell_client.py:80: in __fetch_rows for fetch_batch in 
fetch_batches: ../shell/impala_client.py:787: in fetch yield 
self._transpose(col_value_converters, resp.results.columns) E AttributeError: 
'NoneType' object has no attribute 'columns'
h3. Standard Error

Opened TCP connection to localhost:21050



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9833) query_test.test_observability.TestQueryStates.test_error_query_state is flaky

2020-07-20 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9833.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

Closing for now. We can re-open if the issue occurs again.

> query_test.test_observability.TestQueryStates.test_error_query_state is flaky
> -
>
> Key: IMPALA-9833
> URL: https://issues.apache.org/jira/browse/IMPALA-9833
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Xiaomeng Zhang
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2521/testReport/junit/query_test.test_observability/TestQueryStates/test_error_query_state/]
> It seems the test could not get query profile after retries in 30s.
> {code:java}
> Stacktracequery_test/test_observability.py:777: in test_error_query_state
> lambda: self.client.get_runtime_profile(handle))
> common/impala_test_suite.py:1120: in assert_eventually
> count, timeout_s, error_msg_str))
> E   Timeout: Check failed to return True after 30 tries and 30 seconds error 
> message: Query (id=fe45e8bfd138acd3:c67a3796)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9954) RpcRecvrTime can be negative

2020-07-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9954:


 Summary: RpcRecvrTime can be negative
 Key: IMPALA-9954
 URL: https://issues.apache.org/jira/browse/IMPALA-9954
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar
 Attachments: profile_034e7209bd98c96c_9a448dfc.txt

Saw this on a recent version of master. Attached the full runtime profile.
{code:java}
KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, % 
non-child: 32.30%)
  ExecOption: Unpartitioned Sender Codegen Disabled: not needed
   - BytesSent (500.000ms): 0, 0
   - NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
4.34 MB/sec ; Number of samples: 1)
   - RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
Number of samples: 2)
   - RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
-71077.000ns ; Number of samples: 2)
   - EosSent: 1 (1)
   - PeakMemoryUsage: 416.00 B (416)
   - RowsSent: 100 (100)
   - RpcFailure: 0 (0)
   - RpcRetry: 0 (0)
   - SerializeBatchTime: 2.880ms
   - TotalBytesSent: 28.67 KB (29355)
   - UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-5534) Fix and re-enable run-process-failure-tests.sh

2020-07-13 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-5534.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Fix and re-enable run-process-failure-tests.sh
> --
>
> Key: IMPALA-5534
> URL: https://issues.apache.org/jira/browse/IMPALA-5534
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, 
> Impala 2.8.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: test
> Fix For: Impala 4.0
>
>
> See bin/run-all-tests.sh:
> {code}
> ...
>   # Finally, run the process failure tests.
>   # Disabled temporarily until we figure out the proper timeouts required to 
> make the test
>   # succeed.
>   # ${IMPALA_HOME}/tests/run-process-failure-tests.sh
> ...
> {code}
> We should fix and re-enable these tests or alternatively re-implement the 
> tests in a different way to get the same coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9834) test_query_retries.TestQueryRetries is flaky on erasure coding configurations

2020-07-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9834.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

We disabled all these tests on EC builds (see commit message in previous 
comment), so this shouldn't be an issue anymore.

> test_query_retries.TestQueryRetries is flaky on erasure coding configurations
> -
>
> Key: IMPALA-9834
> URL: https://issues.apache.org/jira/browse/IMPALA-9834
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Sahil Takiar
>Priority: Blocker
>  Labels: broken-build, flaky
> Fix For: Impala 4.0
>
>
> Multiple tests from test_query_retries.TestQueryRetries hit errors like this 
> (test_retry_query_cancel):
> {noformat}
> custom_cluster/test_query_retries.py:321: in test_retry_query_cancel
> self.__validate_runtime_profiles_from_service(impalad_service, handle)
> custom_cluster/test_query_retries.py:435: in 
> __validate_runtime_profiles_from_service
> self.__validate_runtime_profiles(retried_profile, handle.get_handle().id)
> custom_cluster/test_query_retries.py:503: in __validate_runtime_profiles
> retried_query_id = 
> self.__get_query_id_from_profile(retried_runtime_profile)
> custom_cluster/test_query_retries.py:474: in __get_query_id_from_profile
> assert query_id_search, "Invalid query profile, has no query id"
> E   AssertionError: Invalid query profile, has no query id
> E   assert None{noformat}
> Or this (test_kill_impalad_expect_retries, test_kill_impalad_expect_retry, 
> test_retry_query_hs2):
> {noformat}
> custom_cluster/test_query_retries.py:424: in test_retry_query_hs2
> self.hs2_client.get_query_id(handle))
> custom_cluster/test_query_retries.py:508: in __validate_runtime_profiles
> original_query_id)
> custom_cluster/test_query_retries.py:489: in __validate_original_id_in_profile
> assert original_id_search, \
> E   AssertionError: Could not find original id pattern 'Original Query Id: 
> (.*)' in profile:
> ...{noformat}
> I have only seen these errors on erasure coding so far, and it isn't 
> deterministic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-3380) Add TCP timeouts to all RPCs that don't block

2020-07-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-3380.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Add TCP timeouts to all RPCs that don't block
> -
>
> Key: IMPALA-3380
> URL: https://issues.apache.org/jira/browse/IMPALA-3380
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.5.0
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: observability, supportability
> Fix For: Impala 4.0
>
>
> Most RPCs should not take an unbounded amount of time to complete (the 
> exception is {{TransmitData()}}, but that may also change). To handle hang 
> failures on the remote machine, we should add timeouts to every RPC (so, 
> really, every RPC client), and handle the timeout failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9734) ACID-query retry integration

2020-07-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9734.
--
Resolution: Not A Problem

After talking with several folks offline. This does not seem to be an issue. 
Impala currently does not open a transaction for read-only queries (although 
Hive does, and perhaps Impala will at some point in the future). Transactions 
are only opened for write-only queries. Transparent query retries currently 
don't support write queries (and there are no current plans to implement this 
in the near-term).

The only ACID consideration is that the snapshot view of the data from the 
original query should be the same view of the data in the retried query. e.g. 
the set of files and version of the tables scanned in the original query should 
be the same for the retried query. The current transparent query logic already 
handles this because the TExecRequest is simply copied from the original query 
to the retried query. The planning phase will be skipped, so the set of files 
will to be scanned will be the same.

> ACID-query retry integration
> 
>
> Key: IMPALA-9734
> URL: https://issues.apache.org/jira/browse/IMPALA-9734
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
>
> We need to consider how query retries interact with ACID transactions. As of 
> IMPALA-9199, Impala will create new ClientRequestStates for each query retry 
> and will cache the TExecRequest between ClientRequestStates. This might not 
> be safe for ACID transactions. If the first query attempt fails, then the 
> transaction will fail and a new one will be required. However, the query 
> retry will use the transaction id / info from the original query attempt.
> I think the semantics are not entirely clear here, and we don't have any 
> tests for this. So the goal of this JIRA is to (1) identify if there are any 
> issues with the current approach, (2) fix any issues with transactions during 
> query retries, and (3) add some query retry tests that enable transactions.
> We might want to consider whether a query and it's retry should be in the 
> same, or different transactions. Keeping them in the same transaction should 
> allow us cache the TExecRequest. If they are in separate transactions, then 
> Impala might need to create a new TExecRequest for each retry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9502) Avoid copying TExecRequest when retrying queries

2020-07-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9502.
--
Resolution: Later

Closing as 'Later'. We can revisit this later if we think it is actually an 
issue.

> Avoid copying TExecRequest when retrying queries
> 
>
> Key: IMPALA-9502
> URL: https://issues.apache.org/jira/browse/IMPALA-9502
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are a few issues that occur when re-using a {{TExecRequest}} across 
> query retries. We should investigate if there is a way to work around those 
> issues so that the {{TExecRequest}} does not need to be copied when retrying 
> a query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9854) TSAN data race in QueryDriver::CreateRetriedClientRequestState

2020-07-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9854.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TSAN data race in QueryDriver::CreateRetriedClientRequestState
> --
>
> Key: IMPALA-9854
> URL: https://issues.apache.org/jira/browse/IMPALA-9854
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Seeing the following data race in {{test_query_retries.py}}
> {code:java}
> WARNING: ThreadSanitizer: data race (pid=5460)
>   Write of size 8 at 0x7b8c00261510 by thread T38:
> #0 impala::TUniqueId::operator=(impala::TUniqueId&&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/Types_types.cpp:967:6
>  (impalad+0x1de1968)
> #1 impala::ImpalaServer::PrepareQueryContext(impala::TNetworkAddress 
> const&, impala::TNetworkAddress const&, impala::TQueryCtx*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1069:23
>  (impalad+0x2210dbf)
> #2 impala::ImpalaServer::PrepareQueryContext(impala::TQueryCtx*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1024:3
>  (impalad+0x220f3c1)
> #3 
> impala::QueryDriver::CreateRetriedClientRequestState(impala::ClientRequestState*,
>  std::unique_ptr std::default_delete >*, 
> std::shared_ptr*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:302:19
>  (impalad+0x29de3ec)
> #4 impala::QueryDriver::RetryQueryFromThread(impala::Status const&, 
> std::shared_ptr) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:203:3
>  (impalad+0x29dd01f)
> #5 boost::_mfi::mf2 std::shared_ptr >::operator()(impala::QueryDriver*, 
> impala::Status const&, std::shared_ptr) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280:29
>  (impalad+0x29e1669)
> #6 void boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value > 
> >::operator() const&, std::shared_ptr >, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 impala::QueryDriver, impala::Status const&, 
> std::shared_ptr >&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:398:9
>  (impalad+0x29e1578)
> #7 boost::_bi::bind_t impala::Status const&, std::shared_ptr >, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value > > >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x29e14c3)
> #8 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf2 std::shared_ptr >, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value > > >, 
> void>::invoke(boost::detail::function::function_buffer&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
>  (impalad+0x29e1221)
> #9 boost::function0::operator()() const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x1e5ba81)
> #10 impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
>  (impalad+0x2453776)
> #11 void boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9
>  (impalad+0x245b93c)
> #12 boost::_bi::bind_t const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
> >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x245b853)
> #13 boost::detail::thread_data (*)(std::string const&, std::string const&, boost::fu

[jira] [Resolved] (IMPALA-9855) TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread

2020-07-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9855.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread
> --
>
> Key: IMPALA-9855
> URL: https://issues.apache.org/jira/browse/IMPALA-9855
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> TSAN reports the following error in {{test_query_retries.py}}.
> {code:java}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=3786)
>   Cycle in lock order graph: M17348 (0x7b140035d2d8) => M804309746609755832 
> (0x) => M17348  Mutex M804309746609755832 acquired here while 
> holding mutex M17348 in thread T370:
> #0 AnnotateRWLockAcquired 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cc:271
>  (impalad+0x19bafcc)
> #1 base::SpinLock::Lock() 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/gutil/spinlock.h:77:5
>  (impalad+0x1a11585)
> #2 impala::SpinLock::lock() 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/spinlock.h:34:8
>  (impalad+0x1a11519)
> #3 impala::ScopedShardedMapRef 
> >::ScopedShardedMapRef(impala::TUniqueId const&, 
> impala::ShardedQueryMap >*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/sharded-query-map-util.h:98:23
>  (impalad+0x2220661)
> #4 impala::ImpalaServer::GetQueryDriver(impala::TUniqueId const&, bool) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1296:53
>  (impalad+0x22124ba)
> #5 impala::QueryDriver::RetryQueryFromThread(impala::Status const&, 
> std::shared_ptr) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:279:25
>  (impalad+0x29dd92c)
> #6 boost::_mfi::mf2 std::shared_ptr >::operator()(impala::QueryDriver*, 
> impala::Status const&, std::shared_ptr) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280:29
>  (impalad+0x29e1669)
> #7 void boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value > 
> >::operator() const&, std::shared_ptr >, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 impala::QueryDriver, impala::Status const&, 
> std::shared_ptr >&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:398:9
>  (impalad+0x29e1578)
> #8 boost::_bi::bind_t impala::Status const&, std::shared_ptr >, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value > > >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x29e14c3)
> #9 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf2 std::shared_ptr >, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value > > >, 
> void>::invoke(boost::detail::function::function_buffer&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
>  (impalad+0x29e1221)
> #10 boost::function0::operator()() const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x1e5ba81)
> #11 impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
>  (impalad+0x2453776)
> #12 void boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9
>  (impalad+0x245b93c)
> #13 boost::_bi::bind_t const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
> >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1

[jira] [Created] (IMPALA-9910) Impala Doc: Add docs for transparent query retries

2020-06-29 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9910:


 Summary: Impala Doc: Add docs for transparent query retries
 Key: IMPALA-9910
 URL: https://issues.apache.org/jira/browse/IMPALA-9910
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Sahil Takiar


Add docs for transparent query retries (IMPALA-9124). The parent JIRA has a 
design doc describing the feature. The commit message for IMPALA-9199 should 
pretty helpful as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9849) Set halt_on_error=1 for TSAN builds

2020-06-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9849.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Set halt_on_error=1 for TSAN builds
> ---
>
> Key: IMPALA-9849
> URL: https://issues.apache.org/jira/browse/IMPALA-9849
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> IMPALA-9568 mistakingly removed the halt_on_error flag from TSAN builds. The 
> intention in IMPALA-9568 was to make sure that Impala crashes when a TSAN bug 
> is detected, Impala does this for ASAN builds already. The confusing part 
> about halt_on_error is that by default it is true in ASAN builds, but by 
> default it is false in TSAN builds. So halt_on_error needs to explicitly be 
> set to true for TSAN builds (but not for ASAN builds).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9844) Ozone support for load data inpath

2020-06-17 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9844.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Ozone support for load data inpath
> --
>
> Key: IMPALA-9844
> URL: https://issues.apache.org/jira/browse/IMPALA-9844
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently, attempts to run {{load data inpath}} against Ozone tables fail:
> {code}
> default> CREATE EXTERNAL TABLE o3_tab1 (id INT, col_1 string) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 
> 'o3fs://bucket1.volume1.ozone1/o3_tab1';
> Query: CREATE EXTERNAL TABLE o3_tab1 (id INT, col_1 string) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 
> 'o3fs://bucket1.volume1.ozone1/o3_tab1'
> +-+
> | summary |
> +-+
> | Table has been created. |
> +-+
> Fetched 1 row(s) in 0.36s
> default> load data inpath 'o3fs://bucket1.volume1.ozone1/file' into table 
> ozone_test_table2;
> Query: load data inpath 'o3fs://bucket1.volume1.ozone1/file' into table 
> ozone_test_table2
> ERROR: AnalysisException: INPATH location 
> 'o3fs://bucket1.volume1.ozone1/file' must point to an HDFS, S3A, ADL or ABFS 
> filesystem.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9856) Enable result spooling by default

2020-06-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9856:


 Summary: Enable result spooling by default
 Key: IMPALA-9856
 URL: https://issues.apache.org/jira/browse/IMPALA-9856
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Reporter: Sahil Takiar


Result spooling has been relatively stable since it was introduced, and it has 
several benefits described in IMPALA-8656. It would be good to enable it by 
default.

I looked into doing this a while ago, and there are a bunch of tests that rely 
on the "fetch one row batch at a time" behavior. Those tests fail when result 
spooling is enabled.

The remaining linked tasks in IMPALA-8656 should be completed as well before 
enabling result spooling by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9855) TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread

2020-06-12 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9855:


 Summary: TSAN lock-order-inversion warning in 
QueryDriver::RetryQueryFromThread
 Key: IMPALA-9855
 URL: https://issues.apache.org/jira/browse/IMPALA-9855
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


TSAN reports the following error in {{test_query_retries.py}}.
{code:java}
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=3786)
  Cycle in lock order graph: M17348 (0x7b140035d2d8) => M804309746609755832 
(0x) => M17348  Mutex M804309746609755832 acquired here while 
holding mutex M17348 in thread T370:
#0 AnnotateRWLockAcquired 
/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cc:271
 (impalad+0x19bafcc)
#1 base::SpinLock::Lock() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/gutil/spinlock.h:77:5
 (impalad+0x1a11585)
#2 impala::SpinLock::lock() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/spinlock.h:34:8
 (impalad+0x1a11519)
#3 impala::ScopedShardedMapRef 
>::ScopedShardedMapRef(impala::TUniqueId const&, 
impala::ShardedQueryMap >*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/sharded-query-map-util.h:98:23
 (impalad+0x2220661)
#4 impala::ImpalaServer::GetQueryDriver(impala::TUniqueId const&, bool) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1296:53
 (impalad+0x22124ba)
#5 impala::QueryDriver::RetryQueryFromThread(impala::Status const&, 
std::shared_ptr) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:279:25
 (impalad+0x29dd92c)
#6 boost::_mfi::mf2 >::operator()(impala::QueryDriver*, 
impala::Status const&, std::shared_ptr) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280:29
 (impalad+0x29e1669)
#7 void boost::_bi::list3, 
boost::_bi::value, 
boost::_bi::value > 
>::operator() >, 
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 >&, boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:398:9
 (impalad+0x29e1578)
#8 boost::_bi::bind_t >, 
boost::_bi::list3, 
boost::_bi::value, 
boost::_bi::value > > >::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x29e14c3)
#9 
boost::detail::function::void_function_obj_invoker0 >, 
boost::_bi::list3, 
boost::_bi::value, 
boost::_bi::value > > >, 
void>::invoke(boost::detail::function::function_buffer&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x29e1221)
#10 boost::function0::operator()() const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x1e5ba81)
#11 impala::Thread::SuperviseThread(std::string const&, std::string const&, 
boost::function, impala::ThreadDebugInfo const*, impala::Promise*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
 (impalad+0x2453776)
#12 void boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::string const&, std::string const&, boost::function, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9
 (impalad+0x245b93c)
#13 boost::_bi::bind_t, impala::ThreadDebugInfo const*, 
impala::Promise*), 
boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > 
>::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x245b853)
#14 boost::detail::thread_data, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > > >::run() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116:17
 (impalad+0x245b540)
#15 thread_proxy  (impalad+0x3171659)Hint: use 
TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message


  Mutex M17348 acquired here while holding mutex M804309746609755832 in thread 
T392:
#0 AnnotateRWLockAcquired 
/mnt/source/

[jira] [Created] (IMPALA-9854) TSAN data race in QueryDriver::CreateRetriedClientRequestState

2020-06-12 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9854:


 Summary: TSAN data race in 
QueryDriver::CreateRetriedClientRequestState
 Key: IMPALA-9854
 URL: https://issues.apache.org/jira/browse/IMPALA-9854
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Seeing the following data race in {{test_query_retries.py}}
{code:java}
WARNING: ThreadSanitizer: data race (pid=5460)
  Write of size 8 at 0x7b8c00261510 by thread T38:
#0 impala::TUniqueId::operator=(impala::TUniqueId&&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/Types_types.cpp:967:6
 (impalad+0x1de1968)
#1 impala::ImpalaServer::PrepareQueryContext(impala::TNetworkAddress 
const&, impala::TNetworkAddress const&, impala::TQueryCtx*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1069:23
 (impalad+0x2210dbf)
#2 impala::ImpalaServer::PrepareQueryContext(impala::TQueryCtx*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-server.cc:1024:3
 (impalad+0x220f3c1)
#3 
impala::QueryDriver::CreateRetriedClientRequestState(impala::ClientRequestState*,
 std::unique_ptr >*, 
std::shared_ptr*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:302:19
 (impalad+0x29de3ec)
#4 impala::QueryDriver::RetryQueryFromThread(impala::Status const&, 
std::shared_ptr) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-driver.cc:203:3
 (impalad+0x29dd01f)
#5 boost::_mfi::mf2 >::operator()(impala::QueryDriver*, 
impala::Status const&, std::shared_ptr) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:280:29
 (impalad+0x29e1669)
#6 void boost::_bi::list3, 
boost::_bi::value, 
boost::_bi::value > 
>::operator() >, 
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 >&, boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:398:9
 (impalad+0x29e1578)
#7 boost::_bi::bind_t >, 
boost::_bi::list3, 
boost::_bi::value, 
boost::_bi::value > > >::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x29e14c3)
#8 
boost::detail::function::void_function_obj_invoker0 >, 
boost::_bi::list3, 
boost::_bi::value, 
boost::_bi::value > > >, 
void>::invoke(boost::detail::function::function_buffer&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x29e1221)
#9 boost::function0::operator()() const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x1e5ba81)
#10 impala::Thread::SuperviseThread(std::string const&, std::string const&, 
boost::function, impala::ThreadDebugInfo const*, impala::Promise*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
 (impalad+0x2453776)
#11 void boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::string const&, std::string const&, boost::function, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9
 (impalad+0x245b93c)
#12 boost::_bi::bind_t, impala::ThreadDebugInfo const*, 
impala::Promise*), 
boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > 
>::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x245b853)
#13 boost::detail::thread_data, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > > >::run() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116:17
 (impalad+0x245b540)
#14 thread_proxy  (impalad+0x3171659)

  Previous read of size 8 at 0x7b8c00261510 by thread T100:
#0 impala::PrintId(impala::TUniqueId const&, std::string const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/debug-util.cc:108:48
 (impalad+0x237557f)
#1 impala::Coordinator::ReleaseQueryAdmissionControlResources() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/

[jira] [Created] (IMPALA-9849) Set halt_on_error=1 for TSAN builds

2020-06-11 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9849:


 Summary: Set halt_on_error=1 for TSAN builds
 Key: IMPALA-9849
 URL: https://issues.apache.org/jira/browse/IMPALA-9849
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


IMPALA-9568 mistakingly removed the halt_on_error flag from TSAN builds. The 
intention in IMPALA-9568 was to make sure that Impala crashes when a TSAN bug 
is detected, Impala does this for ASAN builds already. The confusing part about 
halt_on_error is that by default it is true in ASAN builds, but by default it 
is false in TSAN builds. So halt_on_error needs to explicitly be set to true 
for TSAN builds (but not for ASAN builds).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9848) Coordinator unnecessarily invalidating locally cached table metadata

2020-06-11 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9848:


 Summary: Coordinator unnecessarily invalidating locally cached 
table metadata
 Key: IMPALA-9848
 URL: https://issues.apache.org/jira/browse/IMPALA-9848
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog, Frontend
Reporter: Sahil Takiar


The following fails when run locally on master:
{code:java}
./bin/start-impala-cluster.py --catalogd_args='--catalog_topic_mode=minimal' 
--impalad_args='--use_local_catalog'
./bin/impala-shell.sh
[localhost:21000] default> select count(l_comment) from tpch.lineitem; <--- 
THIS WORKS
# kill the catalogd process
[localhost:21000] default> select count(l_comment) from tpch.lineitem; <--- 
THIS FAILS
ERROR: AnalysisException: Failed to load metadata for table: 'tpch.lineitem'
CAUSED BY: TableLoadingException: Could not load table tpch.lineitem from 
catalog
CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't 
open transport for localhost:26000 (connect() failed: Connection refused)CAUSED 
BY: InternalException: Couldn't open transport for localhost:26000 (connect() 
failed: Connection refused {code}
The above experiment works with catalog v1 - e.g. if you remove the startup 
flags in the {{./bin/start-impala-cluster.py}} everything works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9818) Add fetch size as option to impala shell

2020-06-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9818.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Add fetch size as option to impala shell
> 
>
> Key: IMPALA-9818
> URL: https://issues.apache.org/jira/browse/IMPALA-9818
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> The impala shell should have an option to control the fetch size (e.g. the 
> number of rows fetched at a time). Currently the value is hard-coded to 1024. 
> Other clients (e.g. JDBC) have similar options (e.g. Statement#setFetchSize).
> When result spooling is enabled, setting a higher fetch size can improve 
> performance for clients with a high RTT to/from the Impala coordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9844) Ozone support for load data inpath

2020-06-09 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9844:


 Summary: Ozone support for load data inpath
 Key: IMPALA-9844
 URL: https://issues.apache.org/jira/browse/IMPALA-9844
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Currently, attempts to run {{load data inpath}} against Ozone tables fail:

{code}
default> CREATE EXTERNAL TABLE o3_tab1 (id INT, col_1 string) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 
'o3fs://bucket1.volume1.ozone1/o3_tab1';
Query: CREATE EXTERNAL TABLE o3_tab1 (id INT, col_1 string) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION 
'o3fs://bucket1.volume1.ozone1/o3_tab1'
+-+
| summary |
+-+
| Table has been created. |
+-+
Fetched 1 row(s) in 0.36s
default> load data inpath 'o3fs://bucket1.volume1.ozone1/file' into table 
ozone_test_table2;
Query: load data inpath 'o3fs://bucket1.volume1.ozone1/file' into table 
ozone_test_table2
ERROR: AnalysisException: INPATH location 'o3fs://bucket1.volume1.ozone1/file' 
must point to an HDFS, S3A, ADL or ABFS filesystem.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9843) Add ability to run schematool against HMS in minicluster

2020-06-09 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9843:


 Summary: Add ability to run schematool against HMS in minicluster
 Key: IMPALA-9843
 URL: https://issues.apache.org/jira/browse/IMPALA-9843
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


When the CDP version is bumped, we often need to re-format the HMS postgres 
database because the HMS schema needs updating. Hive provides a standalone tool 
for performing schema updates: 
[https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool]

Impala should be able to integrate with this tool, so that developers don't 
have to blow away their HMS database every time the CDP version is bumped up. 
Even worse, blowing away the HMS data requires performing a full data load.

It would be great to have a wrapper around the schematool that can easily be 
invoked by developers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9840) ThreadSanitizer: data race internal-queue.h in InternalQueueBase::Enqueue

2020-06-08 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9840:


 Summary: ThreadSanitizer: data race internal-queue.h in 
InternalQueueBase::Enqueue
 Key: IMPALA-9840
 URL: https://issues.apache.org/jira/browse/IMPALA-9840
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Bikramjeet Vig


Seems like this was introduced in IMPALA-9655. On my TSAN build, the error 
occurred during data-load.
{code:java}
 WARNING: ThreadSanitizer: data race (pid=24164)
 Write of size 8 at 0x7b6f9bb0 by thread T394 (mutexes: write M443436):
 #0 impala::InternalQueueBase::Enqueue(impala::io::ScanRange*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/internal-queue.h:108:35
 (impalad+0x24fdd19)
 #1 
impala::ScanRangeSharedState::EnqueueScanRange(std::vector > const&, bool) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:1220:25
 (impalad+0x24f860a)
 #2 impala::HdfsScanNodeMt::AddDiskIoRanges(std::vector > const&, impala::EnqueueLocation) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-mt.cc:157:18
 (impalad+0x251934c)
 #3 impala::HdfsScanNodeBase::AddDiskIoRanges(impala::HdfsFileDesc const*, 
impala::EnqueueLocation) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-base.h:516:12
 (impalad+0x255b3ab)
 #4 impala::HdfsTextScanner::IssueInitialRanges(impala::HdfsScanNodeBase*, 
std::vector > 
const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-text-scanner.cc:116:9
 (impalad+0x255441b)
 #5 impala::HdfsScanNodeBase::IssueInitialScanRanges(impala::RuntimeState*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:680:9
 (impalad+0x24f5b14)
 #6 impala::HdfsScanNodeMt::Open(impala::RuntimeState*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/hdfs-scan-node-mt.cc:58:3
 (impalad+0x2518819)
 #7 impala::AggregationNode::Open(impala::RuntimeState*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/aggregation-node.cc:48:3
 (impalad+0x266726a)
 #8 impala::FragmentInstanceState::Open() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/fragment-instance-state.cc:348:5
 (impalad+0x206f037)
 #9 impala::FragmentInstanceState::Exec() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/fragment-instance-state.cc:93:12
 (impalad+0x206d53b)
 #10 impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:763:24
 (impalad+0x2081f33)
 #11 impala::QueryState::StartFInstances()::$_7::operator()() const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:671:37
 (impalad+0x20840f2)
 #12 
boost::detail::function::void_function_obj_invoker0::invoke(boost::detail::function::function_buffer&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x2083f19)
 #13 boost::function0::operator()() const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x1e41101)
 #14 impala::Thread::SuperviseThread(std::string const&, std::string const&, 
boost::function, impala::ThreadDebugInfo const*, impala::Promise*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
 (impalad+0x2438056)
 #15 void boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::string const&, std::string const&, boost::function, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:531:9
 (impalad+0x244021c)
 #16 boost::_bi::bind_t, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > 
>::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x2440133)
 #17 boost::detail::thread_data, impala::ThreadDebugInfo 
const*, impala::Promise*), 
boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > > >::run() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116:17
 (impalad+0x243fe20)
 #18 thread_proxy  (impalad+0x314f369)


[jira] [Created] (IMPALA-9819) Separate data cache and HDFS scan node runtime profile metrics

2020-06-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9819:


 Summary: Separate data cache and HDFS scan node runtime profile 
metrics
 Key: IMPALA-9819
 URL: https://issues.apache.org/jira/browse/IMPALA-9819
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Joe McDonnell


When a query reads data from both a remote storage system (e.g. S3) and the 
data cache, the HDFS_SCAN_NODE runtime profiles are hard to reason about.

For example, in the following runtime profile snippet:
{code:java}
HDFS_SCAN_NODE (id=0):(Total: 59s374ms, non-child: 0.000ns, % non-child: 0.00%)
 - AverageHdfsReadThreadConcurrency: 0.62 
 - AverageScannerThreadConcurrency: 0.91 
 - BytesRead: 587.97 MB (616533483)
 - BytesReadDataNodeCache: 0
 - BytesReadLocal: 0
 - BytesReadRemoteUnexpected: 0
 - BytesReadShortCircuit: 0
 - CachedFileHandlesHitCount: 323 (323)
 - CachedFileHandlesMissCount: 94 (94)
 - CollectionItemsRead: 0 (0)
 - DataCacheHitBytes: 212.00 MB (94996)
 - DataCacheHitCount: 107 (107)
 - DataCacheMissBytes: 375.98 MB (394238486)
 - DataCacheMissCount: 310 (310)
 - DataCachePartialHitCount: 0 (0)
 - DecompressionTime: 2s428ms
 - MaterializeTupleTime: 19s444ms
 - MaxCompressedTextFileLength: 0
 - NumColumns: 3 (3)
 - NumDictFilteredRowGroups: 0 (0)
 - NumDisksAccessed: 1 (1)
 - NumPages: 53.30K (53300)
 - NumRowGroups: 83 (83)
 - NumRowGroupsWithPageIndex: 83 (83)
 - NumScannerThreadMemUnavailable: 0 (0)
 - NumScannerThreadReservationsDenied: 0 (0)
 - NumScannerThreadsStarted: 1 (1)
 - NumScannersWithNoReads: 0 (0)
 - NumStatsFilteredPages: 0 (0)
 - NumStatsFilteredRowGroups: 0 (0)
 - PeakMemoryUsage: 16.00 MB (16781312)
 - PeakScannerThreadConcurrency: 1 (1)
 - PerReadThreadRawHdfsThroughput: 15.11 MB/sec
 - RemoteScanRanges: 0 (0)
 - RowBatchBytesEnqueued: 670.68 MB (703260541)
 - RowBatchQueueGetWaitTime: 59s368ms
 - RowBatchQueuePeakMemoryUsage: 4.17 MB (4368285)
 - RowBatchQueuePutWaitTime: 0.000ns
 - RowBatchesEnqueued: 915 (915)
 - RowsRead: 413.47M (413466507)
 - RowsReturned: 722.27K (722275)
 - RowsReturnedRate: 12.17 K/sec
 - ScanRangesComplete: 83 (83)
 - ScannerIoWaitTime: 33s454ms
 - ScannerThreadWorklessLoops: 0 (0)
 - ScannerThreadsInvoluntaryContextSwitches: 1.94K (1940)
 - ScannerThreadsTotalWallClockTime: 1m
   - ScannerThreadsSysTime: 1s181ms
   - ScannerThreadsUserTime: 20s581ms
 - ScannerThreadsVoluntaryContextSwitches: 770 (770)
 - TotalRawHdfsOpenFileTime: 3s396ms
 - TotalRawHdfsReadTime: 38s940ms
 - TotalReadThroughput: 8.86 MB/sec {code}
The query scanned part of the data from S3 and part of the data from the data 
cache.

The confusing part is that metrics such as PerReadThreadRawHdfsThroughput are 
measured across S3 and data cache reads. So there is no straightforward way to 
determine the throughput for *just* S3 reads. Users might want this value to 
determine if S3 was particularly slow for their query.

It would be nice if the scan node metrics more clearly differentiate between 
reads from S3 vs. the data cache. The aggregate metrics (*Total* metrics) are 
still useful, but it would be useful to have fine-grained metrics that are 
specific to a data storage system (e.g. either the data cache or S3).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9818) Add fetch size as option to impala shell

2020-06-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9818:


 Summary: Add fetch size as option to impala shell
 Key: IMPALA-9818
 URL: https://issues.apache.org/jira/browse/IMPALA-9818
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The impala shell should have an option to control the fetch size (e.g. the 
number of rows fetched at a time). Currently the value is hard-coded to 1024. 
Other clients (e.g. JDBC) have similar options (e.g. Statement#setFetchSize).

When result spooling is enabled, setting a higher fetch size can improve 
performance for clients with a high RTT to/from the Impala coordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9794) OutOfMemoryError when loading tpcds text data via Hive

2020-06-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9794.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

This was fixed in IMPALA-9777. Impala-EC data-loading is passing now.

> OutOfMemoryError when loading tpcds text data via Hive
> --
>
> Key: IMPALA-9794
> URL: https://issues.apache.org/jira/browse/IMPALA-9794
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Quanlong Huang
>Assignee: Sahil Takiar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.0
>
> Attachments: load-tpcds-core-hive-generated-text-none-none.sql.log
>
>
> Saw a data loading failure casued by OutOfMemoryError in a test with erasure 
> coding. The impacted query is inserting data to the store_sales table and 
> fails:
> {code}
> Getting log thread is interrupted, since query is done!
> ERROR : Status: Failed
> ERROR : Vertex failed, vertexName=Reducer 2, 
> vertexId=vertex_1590450092775_0009_3_01, diagnostics=[Task failed, 
> taskId=task_1590450092775_0009_3_01_01, diagnostics=[TaskAttempt 0 
> failed, info=[Container container_1590450092775_0009_01_03 finished with 
> diagnostics set to [Container failed, exitCode=-104. [2020-05-25 
> 16:49:18.814]Container 
> [pid=14180,containerID=container_1590450092775_0009_01_03] is running 
> 44290048B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
> physical memory used; 3.3 GB of 2.1 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1590450092775_0009_01_03 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 14180 14176 14180 14180 (bash) 0 0 115851264 352 /bin/bash -c 
> /usr/java/jdk1.8.0_144/bin/java  -Xmx819m -server 
> -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  
> -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
> -Dlog4j.configuration=tez-container-log4j.properties 
> -Dyarn.app.container.log.dir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_03
>  -Dtez.root.logger=INFO,CLA  
> -Djava.io.tmpdir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/jenkins/nm-local-dir/usercache/jenkins/appcache/application_1590450092775_0009/container_1590450092775_0009_01_03/tmp
>  org.apache.tez.runtime.task.TezChild localhost 43422 
> container_1590450092775_0009_01_03 application_1590450092775_0009 1 
> 1>/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_03/stdout
>  
> 2>/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_03/stderr
> |- 14191 14180 14180 14180 (java) 3167 127 3468886016 272605 
> /usr/java/jdk1.8.0_144/bin/java -Xmx819m -server 
> -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 
> -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
> -Dlog4j.configuration=tez-container-log4j.properties 
> -Dyarn.app.container.log.dir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1590450092775_0009/container_1590450092775_0009_01_03
>  -Dtez.root.logger=INFO,CLA 
> -Djava.io.tmpdir=/data0/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/jenkins/nm-local-dir/usercache/jenkins/appcache/application_1590450092775_0009/container_1590450092775_0009_01_03/tmp
>  org.apache.tez.runtime.task.TezChild localhost 43422 
> container_1590450092775_0009_01_03 application_1590450092775_0009 1
> [2020-05-25 16:49:18.884]Container killed on request. Exit code is 143
> [2020-05-25 16:49:18.887]Container exited with a non-zero exit code 143.
> ]], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:96)
> at 
> org.apache.hadoop.hdfs.DFSStripedOutputStre

[jira] [Resolved] (IMPALA-9777) Reduce the diskspace requirements of loading the text version of tpcds.store_sales

2020-06-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9777.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

Fixed. Looks like Impala-EC data loading is passing now.

> Reduce the diskspace requirements of loading the text version of 
> tpcds.store_sales
> --
>
> Key: IMPALA-9777
> URL: https://issues.apache.org/jira/browse/IMPALA-9777
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: namenodeparse.py
>
>
> Currently, dataload for the Impala development environment uses Hive to 
> populate tpcds.store_sales. We use several insert statements that select from 
> tpcds.stores_sales_unpartitioned, which is loaded from text files. The 
> inserts have this form:
> {noformat}
> insert overwrite table {table_name} partition(ss_sold_date_sk)
> select ss_sold_time_sk,
>   ss_item_sk,
>   ss_customer_sk,
>   ss_cdemo_sk,
>   ss_hdemo_sk,
>   ss_addr_sk,
>   ss_store_sk,
>   ss_promo_sk,
>   ss_ticket_number,
>   ss_quantity,
>   ss_wholesale_cost,
>   ss_list_price,
>   ss_sales_price,
>   ss_ext_discount_amt,
>   ss_ext_sales_price,
>   ss_ext_wholesale_cost,
>   ss_ext_list_price,
>   ss_ext_tax,
>   ss_coupon_amt,
>   ss_net_paid,
>   ss_net_paid_inc_tax,
>   ss_net_profit,
>   ss_sold_date_sk
> from store_sales_unpartitioned
> WHERE ss_sold_date_sk < 2451272
> distribute by ss_sold_date_sk;{noformat}
> Since this is inserting into a partitioned table, it is creating a file per 
> partition. Each statement manipulates hundreds of partitions. With the 
> current settings, the Hive implementation of this insert opens several 
> hundred files simultaneously (by my measurement, ~450). HDFS reserves a whole 
> block for each file (even though the resulting files are not large), and if 
> there isn't enough disk space for all of the reservations, then these inserts 
> can fail. This is a common problem on development environments. This is 
> currently failing for erasure coding tests.
> Impala uses clustered inserts where the input is sorted and files are written 
> one at a time (per backend). This limits the number of simultaneously open 
> files, eliminating the corresponding disk space reservation. Switching 
> populating tpcds.store_sales to use Impala would reduce the diskspace 
> requirement for an Impala developer environment. Alternatively, there is 
> likely equivalent Hive functionality for doing an initial sort so that only 
> one partition needs to be written at a time.
> This only applies to the text version of store_sales, which is created from 
> store_sales_unpartitioned. All other formats are created from the text 
> version of store_sales. Since the text store_sales is already partitioned in 
> the same way as the destination store_sales, Hive can be more efficient, 
> processing a small number of partitions at a time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9757) Test failures with HiveServer2Error: Invalid session id

2020-06-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9757.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Test failures with HiveServer2Error: Invalid session id
> ---
>
> Key: IMPALA-9757
> URL: https://issues.apache.org/jira/browse/IMPALA-9757
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.0
>
>
> Only seen once so far on an exhaustive build. It's not clear if the 
> "HiveServer2Error: Invalid session id" error is specific to this test or not.
> {code:java}
> query_test.test_queries.TestQueries.test_inline_view[protocol: hs2-http | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> Error Message
> query_test/test_queries.py:104: in test_inline_view 
> self.run_test_case('QueryTest/inline-view', vector) 
> common/impala_test_suite.py:567: in run_test_case table_format_info, 
> use_db, pytest.config.option.scale_factor) common/impala_test_suite.py:782: 
> in change_database impala_client.execute(query) 
> common/impala_connection.py:331: in execute handle = 
> self.execute_async(sql_stmt, user) common/impala_connection.py:354: in 
> execute_async self.__cursor.execute_async(sql_stmt, 
> configuration=self.__query_options) 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:375:
>  in execute_async self._execute_async(op) 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:394:
>  in _execute_async operation_fn() 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:372:
>  in op run_async=True) 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1096:
>  in execute return self._operation('ExecuteStatement', req) 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1026:
>  in _operation resp = self._rpc(kind, request) 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:994:
>  in _rpc err_if_rpc_not_ok(response) 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:748:
>  in err_if_rpc_not_ok raise HiveServer2Error(resp.status.errorMessage) E  
>  HiveServer2Error: Invalid session id: 3345279d9b2e75ab:3aef93f7a80d7d8a
> Stacktrace
> query_test/test_queries.py:104: in test_inline_view
> self.run_test_case('QueryTest/inline-view', vector)
> common/impala_test_suite.py:567: in run_test_case
> table_format_info, use_db, pytest.config.option.scale_factor)
> common/impala_test_suite.py:782: in change_database
> impala_client.execute(query)
> common/impala_connection.py:331: in execute
> handle = self.execute_async(sql_stmt, user)
> common/impala_connection.py:354: in execute_async
> self.__cursor.execute_async(sql_stmt, configuration=self.__query_options)
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:375:
>  in execute_async
> self._execute_async(op)
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:394:
>  in _execute_async
> operation_fn()
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:372:
>  in op
> run_async=True)
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1096:
>  in execute
> return self._operation('ExecuteStatement', req)
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1026:
>  in _operation
> resp = self._rpc(kind, request)
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:994:
>  in _rpc
> err_if_rpc_not_ok(response)
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impal

[jira] [Resolved] (IMPALA-9806) Multiple data load failures on HDFS errors for erasure coding builds

2020-06-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9806.
--
Resolution: Duplicate

Closing as dup of IMPALA-9794 and IMPALA-9777

> Multiple data load failures on HDFS errors for erasure coding builds
> 
>
> Key: IMPALA-9806
> URL: https://issues.apache.org/jira/browse/IMPALA-9806
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Priority: Blocker
>
> Erasure coding build shows data load failures for TPC-H, TPC-DS and 
> functional-query data sets, all on HDFS errors. Errors are triggered both 
> from Hive and Impala. Pasting the  failure log section for TPC-H as it is a 
> lot shorter, but the Java backtrace for functional-query (breaking in 
> Hive/Tez) eventually runs into the same HDFS log pattern:
> {code}
> INSERT OVERWRITE TABLE tpch_parquet.region SELECT * FROM tpch.region
> Summary: Inserted 5 rows
> Success: True
> Took: 0.264951944351(s)
> Data:
> : 5
> ERROR: INSERT OVERWRITE TABLE tpch_parquet.orders SELECT * FROM tpch.orders
> Traceback (most recent call last):
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/load-data.py",
>  line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 365, in __execute_query
> self.wait_for_finished(handle)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 386, in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  Query aborted:Failed to write data (length: 159515) to Hdfs file: 
> hdfs://localhost:20500/test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b7/.7c411965970f926e-f61b13b7_2077531399_dir/7c411965970f926e-f61b13b7_1445532249_data.0.parq
>  
> Error(255): Unknown error 255
> Root cause: RemoteException: File 
> /test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b7/.7c411965970f926e-f61b13b7_2077531399_dir/7c411965970f926e-f61b13b7_1445532249_data.0.parq
>  could only be written to 0 of the 3 required nodes for RS-3-2-1024k. There 
> are 5 datanode(s) running and 5 node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2266)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2773)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:879)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:583)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> Failed to close HDFS file: 
> hdfs://localhost:20500/test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b7/.7c411965970f926e-f61b13b7_2077531399_dir/7c411965970f926e-f61b13b7_1445532249_data.0.parq
> Error(255): Unknown error 255
> Root cause: RemoteException: File 
> /test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b7/.7c411965970f926e-f61b13b7_2077531399_dir/7c411965970f926e-f61b13b7_1445532249_data.0.parq
>  could only be written to 0 of the 3 required nodes for RS-3-2-1024k. There 
> are 5 datanode(s) running and 5 node(s) ar

[jira] [Created] (IMPALA-9767) ASAN crash during coordinator runtime filter updates

2020-05-20 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9767:


 Summary: ASAN crash during coordinator runtime filter updates
 Key: IMPALA-9767
 URL: https://issues.apache.org/jira/browse/IMPALA-9767
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar
Assignee: Fang-Yu Rao


ASAN crash output:
{code:java}
Error MessageAddress Sanitizer message detected in 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/ee_tests/impalad.ERRORStandard
 Error==4808==ERROR: AddressSanitizer: heap-use-after-free on address 
0x7f6288cbe818 at pc 0x0199f6fe bp 0x7f63c1a8b270 sp 0x7f63c1a8aa20
READ of size 1048576 at 0x7f6288cbe818 thread T73 (rpc reactor-552)
#0 0x199f6fd in read_iovec(void*, __sanitizer::__sanitizer_iovec*, unsigned 
long, unsigned long) 
/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904
#1 0x19a1f57 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, long) 
/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781
#2 0x19a46c3 in __interceptor_sendmsg 
/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796
#3 0x372034d in kudu::Socket::Writev(iovec const*, int, long*) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3
#4 0x331c095 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26
#5 0x3324da1 in kudu::rpc::Connection::WriteHandler(ev::io&, int) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31
#6 0x52ca4e2 in ev_invoke_pending 
(/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x52ca4e2)
#7 0x32aeadc in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3
#8 0x52cdb03 in ev_run 
(/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x52cdb03)
#9 0x32aecd1 in kudu::rpc::ReactorThread::RunThread() 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9
#10 0x32c08db in boost::_bi::bind_t, 
boost::_bi::list1 > 
>::operator()() 
/data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
#11 0x2148c26 in boost::function0::operator()() const 
/data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
#12 0x2144b29 in kudu::Thread::SuperviseThread(void*) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3
#13 0x7f6c0bcf4e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
#14 0x7f6c0885834c in __clone (/lib64/libc.so.6+0xf834c)

0x7f6288cbe818 is located 24 bytes inside of 1052640-byte region 
[0x7f6288cbe800,0x7f6288dbf7e0)
freed by thread T114 here:
#0 0x1a773e0 in operator delete(void*) 
/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/asan_new_delete.cc:137
#1 0x7f6c090faed3 in __gnu_cxx::new_allocator::deallocate(char*, 
unsigned long) 
/mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:110
#2 0x7f6c090faed3 in std::string::_Rep::_M_destroy(std::allocator 
const&) 
/mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:449
#3 0x7f6c090faed3 in std::string::_Rep::_M_dispose(std::allocator 
const&) 
/mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:249
#4 0x7f6c090faed3 in std::string::reserve(unsigned long) 
/mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:511
#5 0x2781865 in 
impala::ClientRequestState::UpdateFilter(impala::UpdateFilterParamsPB const&, 
kudu::rpc::RpcContext*) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/service/client-request-state.cc:1451:11
#6 0x26d57d5 in 
impala::ImpalaServer::UpdateFilter(impala::UpdateFilterResultPB*, 
impala::UpdateFilterParamsPB const&, kudu::rpc::RpcContext*) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/service/impala-server.cc:2694:19
#7 0x266bd65 in 
impala::DataStreamService::UpdateFilter(impala::UpdateFilterParamsPB const*, 
impala::UpdateFilterResultPB*, kudu::rpc::RpcContext*) 
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/service/data-stream-service.cc:119:44
#8 0x27a1eed in std::_Function_handler
 const&, scoped_refptr 
const&)::$_5>::_M_invok

[jira] [Resolved] (IMPALA-9755) Flaky test: test_global_exchange_counters

2020-05-20 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9755.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Flaky test: test_global_exchange_counters
> -
>
> Key: IMPALA-9755
> URL: https://issues.apache.org/jira/browse/IMPALA-9755
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Blocker
>  Labels: flaky
> Fix For: Impala 4.0
>
>
> {noformat}
> query_test.test_observability.TestObservability.test_global_exchange_counters 
> (from pytest)
> Failing for the past 1 build (Since Failed#10637 )
> Took 22 sec.
> add description
> Error Message
> query_test/test_observability.py:504: in test_global_exchange_counters 
> assert m, "Cannot match pattern for key %s in line '%s'" % (key, line) E   
> AssertionError: Cannot match pattern for key TotalBytesSent in line ' 
>   - TotalBytesSent: 0' E   assert None
> Stacktrace
> query_test/test_observability.py:504: in test_global_exchange_counters
> assert m, "Cannot match pattern for key %s in line '%s'" % (key, line)
> E   AssertionError: Cannot match pattern for key TotalBytesSent in line ' 
>   - TotalBytesSent: 0'
> E   assert None
> {noformat}
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/10637/testReport/junit/query_test.test_observability/TestObservability/test_global_exchange_counters/
> Filing in case it reoccurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9534) Kudu show create table tests fail due to case difference for external.table.purge

2020-05-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9534.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

This was fixed a while ago by a bug fix on the Hive side.

> Kudu show create table tests fail due to case difference for 
> external.table.purge
> -
>
> Key: IMPALA-9534
> URL: https://issues.apache.org/jira/browse/IMPALA-9534
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Sahil Takiar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.0
>
>
> When updating to the latest CDP GBN, there are test failures due to our tests 
> expecting external.table.purge=TRUE (upper case) whereas it is actually 
> external.table.purge=true (lower case):
>  
> {noformat}
> query_test/test_kudu.py:862: in test_primary_key_and_distribution
> db=cursor.conn.db_name, kudu_addr=KUDU_MASTER_HOSTS))
> query_test/test_kudu.py:836: in assert_show_create_equals
> assert "TBLPROPERTIES ('external.table.purge'='TRUE', " in output
> E   assert "TBLPROPERTIES ('external.table.purge'='TRUE', " in "CREATE 
> EXTERNAL TABLE testshowcreatetable_6928_i0obd1.jlxsrpzmcu (\n  c INT NOT NULL 
> ENCODING AUTO_ENCODING COMPRESSI...H (c) PARTITIONS 3\nSTORED AS 
> KUDU\nTBLPROPERTIES ('external.table.purge'='true', 
> 'kudu.master_addresses'='localhost')"{noformat}
> This impacts the following tests:
>  
>  
> {noformat}
> metadata.test_ddl.TestDdlStatements.test_create_alter_tbl_properties
> metadata.test_show_create_table.TestShowCreateTable.test_show_create_table
> query_test.test_kudu.TestShowCreateTable.test_primary_key_and_distribution
> query_test.test_kudu.TestShowCreateTable.test_timestamp_default_value
> query_test.test_kudu.TestShowCreateTable.test_managed_kudu_table_name_with_show_create
> org.apache.impala.catalog.local.LocalCatalogTest.testKuduTable{noformat}
> I think we can just make these case insensitive.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9608) Multiple query tests failure due to org.apache.hadoop.hive.ql.exec.tez.TezTask execution error

2020-05-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9608.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

Resolved since IMPALA-9365 disabled all these tests on non-HDFS filesystems.

> Multiple query tests failure due to 
> org.apache.hadoop.hive.ql.exec.tez.TezTask execution error
> --
>
> Key: IMPALA-9608
> URL: https://issues.apache.org/jira/browse/IMPALA-9608
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Alice Fan
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.0
>
>
> Multiple query tests failure due to 
> org.apache.hadoop.hive.ql.exec.tez.TezTask execution error
> at impala-cdpd-master-core-s3 build
> {code:java}
> query_test.test_acid.TestAcid.test_acid_negative
> query_test.test_mt_dop.TestMtDop.test_compute_stats
> query_test.test_nested_types.TestNestedTypesNoMtDop.test_partitioned_table_acid
> query_test.test_mt_dop.TestMtDop.test_compute_stats
> query_test.test_scanners.TestUnmatchedSchema.test_unmatched_schema
> query_test.test_mt_dop.TestMtDop.test_compute_stats
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_decimal_tbl
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_uncompressed_parquet_orc
> query_test.test_mt_dop.TestMtDop.test_compute_stats
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_decimal_tbl
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_uncompressed_parquet_orc
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_alltypes
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_decimal_tbl
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_nested_types
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_alltypes
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_nested_types
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_alltypes
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_nested_types
> query_test.test_scanners_fuzz.TestScannersFuzzing.test_fuzz_uncompressed_parquet_orc
> {code}
> For example:
> Error Message
> query_test/test_acid.py:65: in test_acid_negative 
> self.run_test_case('QueryTest/acid-negative', vector, use_db=unique_database) 
> common/impala_test_suite.py:659: in run_test_case result = exec_fn(query, 
> user=test_section.get('USER', '').strip() or None) 
> common/impala_test_suite.py:610: in __exec_in_hive result = 
> h.execute(query, user=user) common/impala_connection.py:334: in execute r 
> = self.__fetch_results(handle, profile_format=profile_format) 
> common/impala_connection.py:441: in __fetch_results 
> cursor._wait_to_finish() 
> /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:412:
>  in _wait_to_finish raise OperationalError(resp.errorMessage) E   
> OperationalError: Error while compiling statement: FAILED: Execution Error, 
> return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
> Stacktrace
> query_test/test_acid.py:65: in test_acid_negative
> self.run_test_case('QueryTest/acid-negative', vector, 
> use_db=unique_database)
> common/impala_test_suite.py:659: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:610: in __exec_in_hive
> result = h.execute(query, user=user)
> common/impala_connection.py:334: in execute
> r = self.__fetch_results(handle, profile_format=profile_format)
> common/impala_connection.py:441: in __fetch_results
> cursor._wait_to_finish()
> /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:412:
>  in _wait_to_finish
> raise OperationalError(resp.errorMessage)
> E   OperationalError: Error while compiling statement: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9758) TestImpalaShell.test_summary consistently failing

2020-05-18 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9758.
--
Resolution: Duplicate

> TestImpalaShell.test_summary consistently failing
> -
>
> Key: IMPALA-9758
> URL: https://issues.apache.org/jira/browse/IMPALA-9758
> Project: IMPALA
>  Issue Type: Test
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Tim Armstrong
>Priority: Major
>
> TestImpalaShell.test_summary[table_format_and_file_extension: ('textfile', 
> '.txt') | protocol: beeswax] is consistently failing:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_summary[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: beeswax] (from pytest)
> Error Message
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala/tests/shell/test_shell_commandline.py:345:
>  in test_summary result_set = run_impala_shell_cmd(vector, args) 
> shell/util.py:172: in run_impala_shell_cmd result.stderr) E   
> AssertionError: Cmd ['-q', 'show tables; summary;'] was expected to succeed: 
> Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 
> 03391ec2b4649f02307a4a89a504bc8394007158) E   Query: show tables E   Fetched 
> 3 row(s) in 0.02s E   ERROR: Query id 544943184e4d6a8f:8cdea0fe not 
> found. EE   Could not execute command: summary
> Stacktrace
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala/tests/shell/test_shell_commandline.py:345:
>  in test_summary
> result_set = run_impala_shell_cmd(vector, args)
> shell/util.py:172: in run_impala_shell_cmd
> result.stderr)
> E   AssertionError: Cmd ['-q', 'show tables; summary;'] was expected to 
> succeed: Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 
> 03391ec2b4649f02307a4a89a504bc8394007158)
> E   Query: show tables
> E   Fetched 3 row(s) in 0.02s
> E   ERROR: Query id 544943184e4d6a8f:8cdea0fe not found.
> E   
> E   Could not execute command: summary{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9758) TestImpalaShell.test_summary consistently failing

2020-05-18 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9758:


 Summary: TestImpalaShell.test_summary consistently failing
 Key: IMPALA-9758
 URL: https://issues.apache.org/jira/browse/IMPALA-9758
 Project: IMPALA
  Issue Type: Test
  Components: Backend
Reporter: Sahil Takiar
Assignee: Tim Armstrong


TestImpalaShell.test_summary[table_format_and_file_extension: ('textfile', 
'.txt') | protocol: beeswax] is consistently failing:
{code:java}
shell.test_shell_commandline.TestImpalaShell.test_summary[table_format_and_file_extension:
 ('textfile', '.txt') | protocol: beeswax] (from pytest)

Error Message

/data/jenkins/workspace/impala-cdpd-master-core/repos/Impala/tests/shell/test_shell_commandline.py:345:
 in test_summary result_set = run_impala_shell_cmd(vector, args) 
shell/util.py:172: in run_impala_shell_cmd result.stderr) E   
AssertionError: Cmd ['-q', 'show tables; summary;'] was expected to succeed: 
Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 
03391ec2b4649f02307a4a89a504bc8394007158) E   Query: show tables E   Fetched 3 
row(s) in 0.02s E   ERROR: Query id 544943184e4d6a8f:8cdea0fe not 
found. EE   Could not execute command: summary

Stacktrace

/data/jenkins/workspace/impala-cdpd-master-core/repos/Impala/tests/shell/test_shell_commandline.py:345:
 in test_summary
result_set = run_impala_shell_cmd(vector, args)
shell/util.py:172: in run_impala_shell_cmd
result.stderr)
E   AssertionError: Cmd ['-q', 'show tables; summary;'] was expected to 
succeed: Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 
03391ec2b4649f02307a4a89a504bc8394007158)
E   Query: show tables
E   Fetched 3 row(s) in 0.02s
E   ERROR: Query id 544943184e4d6a8f:8cdea0fe not found.
E   
E   Could not execute command: summary{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9757) TestQueries.test_inline_view fails with HiveServer2Error: Invalid session id

2020-05-18 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9757:


 Summary: TestQueries.test_inline_view fails with HiveServer2Error: 
Invalid session id
 Key: IMPALA-9757
 URL: https://issues.apache.org/jira/browse/IMPALA-9757
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Only seen once so far on an exhaustive build. It's not clear if the 
"HiveServer2Error: Invalid session id" error is specific to this test or not.
{code:java}
query_test.test_queries.TestQueries.test_inline_view[protocol: hs2-http | 
exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none] (from pytest)

Error Message

query_test/test_queries.py:104: in test_inline_view 
self.run_test_case('QueryTest/inline-view', vector) 
common/impala_test_suite.py:567: in run_test_case table_format_info, 
use_db, pytest.config.option.scale_factor) common/impala_test_suite.py:782: in 
change_database impala_client.execute(query) 
common/impala_connection.py:331: in execute handle = 
self.execute_async(sql_stmt, user) common/impala_connection.py:354: in 
execute_async self.__cursor.execute_async(sql_stmt, 
configuration=self.__query_options) 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:375:
 in execute_async self._execute_async(op) 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:394:
 in _execute_async operation_fn() 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:372:
 in op run_async=True) 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1096:
 in execute return self._operation('ExecuteStatement', req) 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1026:
 in _operation resp = self._rpc(kind, request) 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:994:
 in _rpc err_if_rpc_not_ok(response) 
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:748:
 in err_if_rpc_not_ok raise HiveServer2Error(resp.status.errorMessage) E   
HiveServer2Error: Invalid session id: 3345279d9b2e75ab:3aef93f7a80d7d8a

Stacktrace

query_test/test_queries.py:104: in test_inline_view
self.run_test_case('QueryTest/inline-view', vector)
common/impala_test_suite.py:567: in run_test_case
table_format_info, use_db, pytest.config.option.scale_factor)
common/impala_test_suite.py:782: in change_database
impala_client.execute(query)
common/impala_connection.py:331: in execute
handle = self.execute_async(sql_stmt, user)
common/impala_connection.py:354: in execute_async
self.__cursor.execute_async(sql_stmt, configuration=self.__query_options)
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:375:
 in execute_async
self._execute_async(op)
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:394:
 in _execute_async
operation_fn()
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:372:
 in op
run_async=True)
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1096:
 in execute
return self._operation('ExecuteStatement', req)
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:1026:
 in _operation
resp = self._rpc(kind, request)
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:994:
 in _rpc
err_if_rpc_not_ok(response)
/data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py:748:
 in err_if_rpc_not_ok
raise HiveServer2Error(resp.status.errorMessage)
E   HiveServer2Error: Invalid session id: 3345279d9b2e75ab:3aef93f7a80d7d8a 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-3717) Additional s3 setting to allow encryption algorithm

2020-05-18 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-3717.
--
Fix Version/s: Not Applicable
   Resolution: Fixed

> Additional s3 setting to allow encryption algorithm
> ---
>
> Key: IMPALA-3717
> URL: https://issues.apache.org/jira/browse/IMPALA-3717
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Pavas Garg
>Priority: Minor
>  Labels: s3
> Fix For: Not Applicable
>
>
> distcp and impala requires an additional s3 setting on the configuration 
> 1. To allow not only the selection of encryption algorithm but 
> 2. Also the master key name (which will be held within the AWS KMS).
> The S3 API has the following option on the rest service to achieve this 
> "x-amz-server-side-encryption-aws-kms-key-id". 
> This should just be a case of adding the config option and passing this onto 
> the S3 call.
> Please see Server-Side Encryption Specific Request Headers on -
> http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-2638) Retry queries that fail during scheduling

2020-05-15 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-2638.
--
Resolution: Duplicate

Closing as a duplicate because this use case is handled by Node Blacklisting 
(IMPALA-9299) and Transparent Query Retries (IMPALA-9124).

> Retry queries that fail during scheduling
> -
>
> Key: IMPALA-2638
> URL: https://issues.apache.org/jira/browse/IMPALA-2638
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 2.3.0
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: scalability
>
> An important building block for node-decommissioning is the ability to retry 
> queries if they fail during scheduling for some recoverable reason (e.g. RPC 
> failed due to unreachable host, fragment could not be started due to memory 
> pressure). 
> To do this we can detect failures during {{Coordinator::Exec()}}, cancel the 
> running query and then re-start from somewhere in 
> {{QueryExecState::ExecQueryOrDmlRequest()}} - updating a local blacklist of 
> nodes so that we know to avoid those that have caused failures.
> There are some subtleties though:
> * Queries shouldn't be retried more than a small number of times, in case 
> they *cause* the outage (there might be a good way to figure that out at the 
> time)
> * If the query is restarted from the scheduling step (rather than completely 
> restarting), some care will have to be taken to ensure that none of the old 
> query's fragments that are being cancelled can affect the new query's 
> operation in any way (there are several ways to do this). 
> Eventually the failures will propagate to the rest of the cluster via the 
> statestore - this mechanism allows queries to recover and continue while the 
> statestore detects the failure. 
> This JIRA doesn't address restarting queries that have suffered failures 
> part-way through execution, because that's strictly harder and not (as) 
> needed for decommissioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9502) Avoid copying TExecRequest when retrying queries

2020-05-15 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9502.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

This was fixed in the initial implementation in IMPALA-9199

> Avoid copying TExecRequest when retrying queries
> 
>
> Key: IMPALA-9502
> URL: https://issues.apache.org/jira/browse/IMPALA-9502
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are a few issues that occur when re-using a {{TExecRequest}} across 
> query retries. We should investigate if there is a way to work around those 
> issues so that the {{TExecRequest}} does not need to be copied when retrying 
> a query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >