[jira] [Commented] (IMPALA-6957) Include number of required threads in explain plan

2018-05-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472883#comment-16472883
 ] 

ASF subversion and git services commented on IMPALA-6957:
-

Commit e12ee485cf4c77203b144c053ee167509cc39374 in impala's branch 
refs/heads/master from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=e12ee48 ]

IMPALA-6957: calc thread resource requirement in planner

This only factors in fragment execution threads. E.g. this does *not*
try to account for the number of threads on the old Thrift RPC
code path if that is enabled.

This is loosely related to the old VCores estimate, but is different in
that it:
* Directly ties into the notion of required threads in
  ThreadResourceMgr.
* Is a strict upper bound on the number of such threads, rather than
  an estimate.

Does not include "optional" threads. ThreadResourceMgr in the backend
bounds the number of "optional" threads per impalad, so the number of
execution threads on a backend is limited by

  sum(required threads per query) +
  CpuInfo::num_cores() * FLAGS_num_threads_per_core

DCHECKS in the backend enforce that the calculation is correct. They
were actually hit in KuduScanNode because of some races in thread
management leading to multiple "required" threads running. Now the
first thread in the multithreaded scans never exits, which means
that it's always safe for any of the other threads to exit early,
which simplifies the logic a lot.

Testing:
Updated planner tests.

Ran core tests.

Change-Id: I982837ef883457fa4d2adc3bdbdc727353469140
Reviewed-on: http://gerrit.cloudera.org:8080/10256
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Include number of required threads in explain plan
> --
>
> Key: IMPALA-6957
> URL: https://issues.apache.org/jira/browse/IMPALA-6957
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Not Applicable
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> Impala has an internal notion of "required threads" to execute a fragment, 
> e.g. the fragment execution thread and the first scanner thread. It's 
> possible to compute the number of required threads per fragment instance 
> based on the plan.
> We should include this in the resource profile and expose it in the explain 
> plan. This could then be a step toward implementing something like 
> IMPALA-6035.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5384) Simplify coordinator locking protocol

2018-05-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472882#comment-16472882
 ] 

ASF subversion and git services commented on IMPALA-5384:
-

Commit 6ca87e46736a1e591ed7d7d5fee05b4b4d2fbb50 in impala's branch 
refs/heads/master from [~dhecht]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=6ca87e4 ]

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed

Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Reviewed-on: http://gerrit.cloudera.org:8080/10158
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 


> Simplify coordinator locking protocol
> -
>
> Key: IMPALA-5384
> URL: https://issues.apache.org/jira/browse/IMPALA-5384
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.9.0
>Reporter: Marcel Kornacker
>Assignee: Dan Hecht
>Priority: Major
>
> The coordinator has a central lock (lock_) which is used very liberally to 
> synchronize state changes that don't need to be synchronized, creating a 
> concurrency bottleneck.
> Also, the coordinator contains a number of data structures related to INSERT 
> finalization that don't need to be part of and synchronized with the rest of 
> the coordinator state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6999) Upgrade to sqlparse 0.1.19 in Impala Shell

2018-05-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472879#comment-16472879
 ] 

ASF subversion and git services commented on IMPALA-6999:
-

Commit 417bc8c802bee7d789394570a671fddd9baa8fe2 in impala's branch 
refs/heads/2.x from [~fredyw]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=417bc8c ]

IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell

sqlparse-0.1.19 is the last version of sqlparse that supports Python
2.6.

Testing:
- Ran all end-to-end tests

Change-Id: Ide51ef3ac52d25a96b0fa832e29b6535197d23cb
Reviewed-on: http://gerrit.cloudera.org:8080/10354
Reviewed-by: David Knupp 
Tested-by: Impala Public Jenkins 


> Upgrade to sqlparse 0.1.19 in Impala Shell
> --
>
> Key: IMPALA-6999
> URL: https://issues.apache.org/jira/browse/IMPALA-6999
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Minor
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6966) Estimated Memory in Catalogd webpage is not sorted correctly

2018-05-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472880#comment-16472880
 ] 

ASF subversion and git services commented on IMPALA-6966:
-

Commit 7b8bd6a190cd3070527baf6507b58f03bc6ee2e5 in impala's branch 
refs/heads/2.x from stiga-huang
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=7b8bd6a ]

IMPALA-6966: sort table memory by size in catalogd web UI

This patch fix the sorting order in "Top-K Tables with Highest
Memory Requirements" in which "Estimated memory" column is sorted
as strings.

Values got from the catalog-server are changed from pretty-printed
strings to bytes numbers. So the web UI is able to sort and render
them correctly.

Change-Id: I60dc253f862f5fde6fa96147f114d8765bb31a85
Reviewed-on: http://gerrit.cloudera.org:8080/10292
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins 


> Estimated Memory in Catalogd webpage is not sorted correctly
> 
>
> Key: IMPALA-6966
> URL: https://issues.apache.org/jira/browse/IMPALA-6966
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: newbie
> Fix For: Impala 2.13.0, Impala 3.1.0
>
> Attachments: Screen Shot 2018-05-03 at 9.38.45 PM.png
>
>
> The "Top-N Tables with Highest Memory Requirements" in Catalogd webpage 
> doesn't sort "Estimated Memory" correctly. In fact, it sorts them as strings 
> instead of size. This is confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7019) Discard block locations and schedule as remote read with erasure coding

2018-05-11 Thread Tianyi Wang (JIRA)
Tianyi Wang created IMPALA-7019:
---

 Summary: Discard block locations and schedule as remote read with 
erasure coding
 Key: IMPALA-7019
 URL: https://issues.apache.org/jira/browse/IMPALA-7019
 Project: IMPALA
  Issue Type: Sub-task
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Tianyi Wang
Assignee: Tianyi Wang


Currently Impala schedules erasure coded scan in the same way as scheduling 
regular HDFS scan: it tries to schedule the scan on a datanode processing the 
block. This makes little sense with erasure coding so we should schedule it as 
if the block is remote.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7010) Multiple flaky tests failing with MemLimitExceeded on S3

2018-05-11 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7010.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0
   Impala 2.13.0

> Multiple flaky tests failing with MemLimitExceeded on S3
> 
>
> Key: IMPALA-7010
> URL: https://issues.apache.org/jira/browse/IMPALA-7010
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.0, Impala 2.13.0
>Reporter: Sailesh Mukil
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: flaky
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> *test_low_mem_limit_orderby_all*
> {code:java}
> Error Message
> query_test/test_mem_usage_scaling.py:272: in test_low_mem_limit_orderby_all   
>   self.run_primitive_query(vector, 'primitive_orderby_all') 
> query_test/test_mem_usage_scaling.py:260: in run_primitive_query 
> self.low_memory_limit_test(vector, query_name, self.MIN_MEM[query_name]) 
> query_test/test_mem_usage_scaling.py:114: in low_memory_limit_test 
> self.run_test_case(tpch_query, new_vector) common/impala_test_suite.py:405: 
> in run_test_case result = self.__execute_query(target_impalad_client, 
> query, user=user) common/impala_test_suite.py:620: in __execute_query 
> return impalad_client.execute(query, user=user) 
> common/impala_connection.py:160: in execute return 
> self.__beeswax_client.execute(sql_stmt, user=user) 
> beeswax/impala_beeswax.py:173: in execute handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:341: in __execute_query 
> self.wait_for_completion(handle) beeswax/impala_beeswax.py:361: in 
> wait_for_completion raise ImpalaBeeswaxException("Query aborted:" + 
> error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: E
> Query aborted:Memory limit exceeded: Failed to allocate tuple buffer E   
> HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit. E 
>   Error occurred on backend 
> ec2-m2-4xlarge-centos-6-4-0e8b.vpc.cloudera.com:22001 by fragment 
> db44c56dcd2fce95:7d746e080003 E   Memory left in process limit: 11.40 GB 
> E   Memory left in query limit: 51.61 KB E   
> Query(db44c56dcd2fce95:7d746e08): Limit=200.00 MB Reservation=158.50 
> MB ReservationLimit=160.00 MB OtherMemory=41.45 MB Total=199.95 MB 
> Peak=199.95 MB E Fragment db44c56dcd2fce95:7d746e080003: 
> Reservation=158.50 MB OtherMemory=41.45 MB Total=199.95 MB Peak=199.95 MB E   
> SORT_NODE (id=1): Reservation=9.00 MB OtherMemory=8.00 KB Total=9.01 MB 
> Peak=22.31 MB E   HDFS_SCAN_NODE (id=0): Reservation=149.50 MB 
> OtherMemory=41.43 MB Total=190.93 MB Peak=192.13 MB E Exprs: 
> Total=4.00 KB Peak=4.00 KB E   KrpcDataStreamSender (dst_id=4): 
> Total=688.00 B Peak=688.00 B E   CodeGen: Total=7.72 KB Peak=973.50 KB E  
>   E   Memory limit exceeded: Failed to allocate tuple buffer E   
> HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit. E 
>   Error occurred on backend 
> ec2-m2-4xlarge-centos-6-4-0e8b.vpc.cloudera.com:22001 by fragment 
> db44c56dcd2fce95:7d746e080003 E   Memory left in process limit: 11.40 GB 
> E   Memory left in query limit: 51.61 KB E   
> Query(db44c56dcd2fce95:7d746e08): Limit=200.00 MB Reservation=158.50 
> MB ReservationLimit=160.00 MB OtherMemory=41.45 MB Total=199.95 MB 
> Peak=199.95 MB E Fragment db44c56dcd2fce95:7d746e080003: 
> Reservation=158.50 MB OtherMemory=41.45 MB Total=199.95 MB Peak=199.95 MB E   
> SORT_NODE (id=1): Reservation=9.00 MB OtherMemory=8.00 KB Total=9.01 MB 
> Peak=22.31 MB E   HDFS_SCAN_NODE (id=0): Reservation=149.50 MB 
> OtherMemory=41.43 MB Total=190.93 MB Peak=192.13 MB E Exprs: 
> Total=4.00 KB Peak=4.00 KB E   KrpcDataStreamSender (dst_id=4): 
> Total=688.00 B Peak=688.00 B E   CodeGen: Total=7.72 KB Peak=973.50 KB (1 
> of 3 similar)
> Stacktrace
> query_test/test_mem_usage_scaling.py:272: in test_low_mem_limit_orderby_all
> self.run_primitive_query(vector, 'primitive_orderby_all')
> query_test/test_mem_usage_scaling.py:260: in run_primitive_query
> self.low_memory_limit_test(vector, query_name, self.MIN_MEM[query_name])
> query_test/test_mem_usage_scaling.py:114: in low_memory_limit_test
> self.run_test_case(tpch_query, new_vector)
> common/impala_test_suite.py:405: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:620: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:173: in 

[jira] [Updated] (IMPALA-6966) Estimated Memory in Catalogd webpage is not sorted correctly

2018-05-11 Thread Quanlong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6966:
---
Fix Version/s: Impala 3.1.0
   Impala 2.13.0

> Estimated Memory in Catalogd webpage is not sorted correctly
> 
>
> Key: IMPALA-6966
> URL: https://issues.apache.org/jira/browse/IMPALA-6966
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: newbie
> Fix For: Impala 2.13.0, Impala 3.1.0
>
> Attachments: Screen Shot 2018-05-03 at 9.38.45 PM.png
>
>
> The "Top-N Tables with Highest Memory Requirements" in Catalogd webpage 
> doesn't sort "Estimated Memory" correctly. In fact, it sorts them as strings 
> instead of size. This is confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7015) Insert into Kudu table returns with Status OK even if there are Kudu errors

2018-05-11 Thread Greg Rahn (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472689#comment-16472689
 ] 

Greg Rahn commented on IMPALA-7015:
---

IIRC the current behavior was chosen to make the query run to competition, 
despite hitting errors.  This is mainly done because of the lack of atomicity 
with multi-row txns.  For example, when doing a bulk insert containing 
duplicate keys, it would be impossible to have the command run for all 
non-violating records unless one removed them in the source/input set.  The 
current behavior at least lets the command work on as many tuples as possible 
without adjusting the input.  I'm all for better error message propagation but 
AFAIK this was also a limitation of the current protocols as mentioned in 
IMPALA-4416 and IMPALA-1789.  If there is a way to provide a better UX I'm all 
for it.

> Insert into Kudu table returns with Status OK even if there are Kudu errors
> ---
>
> Key: IMPALA-7015
> URL: https://issues.apache.org/jira/browse/IMPALA-7015
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Priority: Major
> Attachments: Insert into kudu profile with errors.txt
>
>
> DML statements against Kudu tables return status OK even if there are Kudu 
> errors.
> This behavior is misleading. 
> {code}
>   Summary:
> Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4
> Session Type: BEESWAX
> Start Time: 2018-05-11 10:10:07.314218000
> End Time: 2018-05-11 10:10:07.434017000
> Query Type: DML
> Query State: FINISHED
> Query Status: OK
> Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
> 2f9498d5c2f980aa7ff9505c56654c8e59e026ca)
> User: mmokhtar
> Connected User: mmokhtar
> Delegated User: 
> Network Address: :::10.17.234.27:60760
> Default Db: tpcds_1000_kudu
> Sql Statement: insert into store_2 select * from store
> Coordinator: vd1317.foo:22000
> Query Options (set by configuration): 
> Query Options (set by configuration and planner): MT_DOP=0
> Plan: 
> {code}
> {code}
> Operator  #Hosts   Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  
> Est. Peak Mem  Detail
> -
> 02:PARTIAL SORT5  909.030us   1.025ms  1.00K   1.00K   6.14 MB
> 4.00 MB
> 01:EXCHANGE56.262ms   7.232ms  1.00K   1.00K  75.50 KB
>   0  KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk)) 
> 00:SCAN KUDU   53.694ms   4.137ms  1.00K   1.00K   4.34 MB
>   0  tpcds_1000_kudu.store 
> Errors: Key already present in Kudu table 
> 'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4268) Allow PlanRootSink to buffer more than a batch of rows

2018-05-11 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472642#comment-16472642
 ] 

Tim Armstrong commented on IMPALA-4268:
---

I'm going to steal this JIRA and extend it slightly. I think we should do this 
in a way that execution resources can be released without the client fetching 
all the resources. I.e. if the result set is reasonably small, we should buffer 
all the results in the ClientRequestState and release all of the query 
execution resources. This would make Impala's resource consumption less hostage 
to the behaviour of clients.

> Allow PlanRootSink to buffer more than a batch of rows
> --
>
> Key: IMPALA-4268
> URL: https://issues.apache.org/jira/browse/IMPALA-4268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Henry Robinson
>Priority: Major
>  Labels: resource-management
>
> In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the 
> production of output rows at the root of a plan.
> The implementation in IMPALA-2905 has the plan execute in a separate thread 
> to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the 
> sender thread will block until {{GetNext()}} is called, so that there are no 
> complications about memory usage and ownership due to having several batches 
> in flight at one time.
> However, this also leads to many context switches, as each {{GetNext()}} call 
> yields to the sender to produce the rows. If the sender was to fill a buffer 
> asynchronously, the consumer could pull out of that buffer without taking a 
> context switch in many cases (and the extra buffering might smooth out any 
> performance spikes due to client delays, which currently directly affect plan 
> execution).
> The tricky part is managing the mismatch between the size of the row batches 
> processed in {{Send()}} and the size of the fetch result asked for by the 
> client. The sender materializes output rows in a {{QueryResultSet}} that is 
> owned by the coordinator. That is not, currently, a splittable object - 
> instead it contains the actual RPC response struct that will hit the wire 
> when the RPC completes. As asynchronous sender cannot know the batch size, 
> which may change on every fetch call. So the {{GetNext()}} implementation 
> would need to be able to split out the {{QueryResultSet}} to match the 
> correct fetch size, and handle stitching together other {{QueryResultSets}} - 
> without doing extra copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7017) TestMetadataReplicas.test_catalog_restart fails with exception

2018-05-11 Thread Joe McDonnell (JIRA)
Joe McDonnell created IMPALA-7017:
-

 Summary: TestMetadataReplicas.test_catalog_restart fails with 
exception
 Key: IMPALA-7017
 URL: https://issues.apache.org/jira/browse/IMPALA-7017
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 2.13.0
Reporter: Joe McDonnell


An exhaustive build with Thrift RPC on the 2.x branch encountered an error on 
custom_cluster.test_metadata_replicas.TestMetadataReplicas.test_catalog_restart:
{noformat}
custom_cluster/test_metadata_replicas.py:71: in test_catalog_restart
assert False, "Unexpected exception: " + str(e)
E   AssertionError: Unexpected exception: 'version'
E   assert False{noformat}
This has happened once. I will attach more log information below.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6948) Coordinators don't detect the deletion of tables that occurred outside of impala after catalog restart

2018-05-11 Thread Dimitris Tsirogiannis (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimitris Tsirogiannis resolved IMPALA-6948.
---
Resolution: Fixed

> Coordinators don't detect the deletion of tables that occurred outside of 
> impala after catalog restart
> --
>
> Key: IMPALA-6948
> URL: https://issues.apache.org/jira/browse/IMPALA-6948
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Dimitris Tsirogiannis
>Priority: Blocker
>  Labels: catalog-server
>
> Upon catalog restart the coordinators detect this event and request a full 
> topic update from the statestore. In certain cases, the topic update protocol 
> executed between the statestore and the catalog fails to detect catalog 
> objects that were deleted from the Metastore externally (e.g. via HIVE), thus 
> causing these objects to show up again in each coordinator's catalog cache. 
> The end result is that the catalog server and the coordinator's cache are out 
> of sync and in some cases the only solution is to restart both the catalog 
> and the statestore. 
> The following sequence can reproduce this issue:
> {code:java}
> impala> create table lala(int a);
> bash> kill -9 `pidof catalogd`
> hive> drop table lala;
> bash> restart catalogd 
> impala> show tables;
> --- lala shows up in the list of tables;{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-6983) stress test binary search exits if process mem_limit is too low

2018-05-11 Thread Michael Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472414#comment-16472414
 ] 

Michael Brown commented on IMPALA-6983:
---

Reproduction with TPCH SF=1, the default. Use Impala with 
{{--mem_limit=196433879}}.

> stress test binary search exits if process mem_limit is too low
> ---
>
> Key: IMPALA-6983
> URL: https://issues.apache.org/jira/browse/IMPALA-6983
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Dan Hecht
>Assignee: Michael Brown
>Priority: Major
>
> This was running stress test on tpch SF=20 and minicluster process 
> mem_limit=7857355161.
> {code:java}
> 2018-05-04 18:25:03,800 18531 MainThread 
> INFO:concurrent_select[1303]:Collecting runtime info for query q5:
> select
> n_name,
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> customer,
> orders,
> lineitem,
> supplier,
> nation,
> region
> where
> c_custkey = o_custkey
> and l_orderkey = o_orderkey
> and l_suppkey = s_suppkey
> and c_nationkey = s_nationkey
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = 'ASIA'
> and o_orderdate >= '1994-01-01'
> and o_orderdate < '1995-01-01'
> group by
> n_name
> order by
> revenue desc
> 2018-05-04 18:25:07,790 18531 MainThread INFO:concurrent_select[1406]:Finding 
> a starting point for binary search
> 2018-05-04 18:25:07,790 18531 MainThread INFO:concurrent_select[1409]:Next 
> mem_limit: 7493
> 2018-05-04 18:28:06,380 18531 MainThread 
> WARNING:concurrent_select[1416]:Query couldn't be run even when using all 
> available memory
> select
> n_name,
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> customer,
> orders,
> lineitem,
> supplier,
> nation,
> region
> where
> c_custkey = o_custkey
> and l_orderkey = o_orderkey
> and l_suppkey = s_suppkey
> and c_nationkey = s_nationkey
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = 'ASIA'
> and o_orderdate >= '1994-01-01'
> and o_orderdate < '1995-01-01'
> group by
> n_name
> order by
> revenue desc
> Traceback (most recent call last):
> File "./tests/stress/concurrent_select.py", line 2265, in 
> main()
> File "./tests/stress/concurrent_select.py", line 2162, in main
> queries, impala, converted_args, 
> queries_with_runtime_info_by_db_sql_and_options)
> File "./tests/stress/concurrent_select.py", line 1879, in populate_all_queries
> os.path.join(converted_args.results_dir, PROFILES_DIR))
> File "./tests/stress/concurrent_select.py", line 964, in 
> write_runtime_info_profiles
> fh.write(profile)
> TypeError: expected a string or other character buffer object{code}
> I don't understand the details of {{concurrent_select.py}} control flow, but 
> it looks like in this case {{update_runtime_info()}} won't get called leading 
> to this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7016) Statement to allow setting ownership for database

2018-05-11 Thread Adam Holley (JIRA)
Adam Holley created IMPALA-7016:
---

 Summary: Statement to allow setting ownership for database
 Key: IMPALA-7016
 URL: https://issues.apache.org/jira/browse/IMPALA-7016
 Project: IMPALA
  Issue Type: Sub-task
  Components: Frontend
Affects Versions: Impala 3.0, Impala 2.13.0
Reporter: Adam Holley


Create statement to allow setting owner on database

{{ALTER TABLE database_name.table_name SET OWNER [USER|ROLE] user_or_role;}}

examples:

ALTER TABLE  SET OWNER USER 

ALTER TABLE  SET OWNER ROLE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-6988) Statement to allow setting ownership

2018-05-11 Thread Adam Holley (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holley updated IMPALA-6988:

Description: 
Create statement to allow setting owner.

{{ALTER (DATABASE|TABLE) database_name.table_name SET OWNER [USER|ROLE] 
user_or_role;}}

examples:

ALTER DATABASE  SET OWNER USER 

ALTER DATABASE  SET OWNER ROLE 

ALTER TABLE . SET OWNER USER 

ALTER TABLE  SET OWNER ROLE 

  was:
Create statement to allow setting owner.

ALTER DATABASE  SET OWNER=""

ALTER TABLE  SET OWNER=""


> Statement to allow setting ownership
> 
>
> Key: IMPALA-6988
> URL: https://issues.apache.org/jira/browse/IMPALA-6988
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 3.0, Impala 2.13.0
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
>
> Create statement to allow setting owner.
> {{ALTER (DATABASE|TABLE) database_name.table_name SET OWNER [USER|ROLE] 
> user_or_role;}}
> examples:
> ALTER DATABASE  SET OWNER USER 
> ALTER DATABASE  SET OWNER ROLE 
> ALTER TABLE . SET OWNER USER 
> ALTER TABLE  SET OWNER ROLE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7015) Insert into Kudu table returns with Status OK even if there are Kudu errors

2018-05-11 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472302#comment-16472302
 ] 

Mostafa Mokhtar commented on IMPALA-7015:
-


[~tmarsh] FYI

> Insert into Kudu table returns with Status OK even if there are Kudu errors
> ---
>
> Key: IMPALA-7015
> URL: https://issues.apache.org/jira/browse/IMPALA-7015
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Priority: Major
> Attachments: Insert into kudu profile with errors.txt
>
>
> DML statements against Kudu tables return status OK even if there are Kudu 
> errors.
> This behavior is misleading. 
> {code}
>   Summary:
> Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4
> Session Type: BEESWAX
> Start Time: 2018-05-11 10:10:07.314218000
> End Time: 2018-05-11 10:10:07.434017000
> Query Type: DML
> Query State: FINISHED
> Query Status: OK
> Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
> 2f9498d5c2f980aa7ff9505c56654c8e59e026ca)
> User: mmokhtar
> Connected User: mmokhtar
> Delegated User: 
> Network Address: :::10.17.234.27:60760
> Default Db: tpcds_1000_kudu
> Sql Statement: insert into store_2 select * from store
> Coordinator: vd1317.foo:22000
> Query Options (set by configuration): 
> Query Options (set by configuration and planner): MT_DOP=0
> Plan: 
> {code}
> {code}
> Operator  #Hosts   Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  
> Est. Peak Mem  Detail
> -
> 02:PARTIAL SORT5  909.030us   1.025ms  1.00K   1.00K   6.14 MB
> 4.00 MB
> 01:EXCHANGE56.262ms   7.232ms  1.00K   1.00K  75.50 KB
>   0  KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk)) 
> 00:SCAN KUDU   53.694ms   4.137ms  1.00K   1.00K   4.34 MB
>   0  tpcds_1000_kudu.store 
> Errors: Key already present in Kudu table 
> 'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7015) Insert into Kudu table returns with Status OK even if there are Kudu errors

2018-05-11 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created IMPALA-7015:
---

 Summary: Insert into Kudu table returns with Status OK even if 
there are Kudu errors
 Key: IMPALA-7015
 URL: https://issues.apache.org/jira/browse/IMPALA-7015
 Project: IMPALA
  Issue Type: Bug
Reporter: Mostafa Mokhtar
 Attachments: Insert into kudu profile with errors.txt

DML statements against Kudu tables return status OK even if there are Kudu 
errors.
This behavior is misleading. 

{code}
  Summary:
Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4
Session Type: BEESWAX
Start Time: 2018-05-11 10:10:07.314218000
End Time: 2018-05-11 10:10:07.434017000
Query Type: DML
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
2f9498d5c2f980aa7ff9505c56654c8e59e026ca)
User: mmokhtar
Connected User: mmokhtar
Delegated User: 
Network Address: :::10.17.234.27:60760
Default Db: tpcds_1000_kudu
Sql Statement: insert into store_2 select * from store
Coordinator: vd1317.foo:22000
Query Options (set by configuration): 
Query Options (set by configuration and planner): MT_DOP=0
Plan: 
{code}

{code}
Operator  #Hosts   Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  
Est. Peak Mem  Detail
-
02:PARTIAL SORT5  909.030us   1.025ms  1.00K   1.00K   6.14 MB  
  4.00 MB
01:EXCHANGE56.262ms   7.232ms  1.00K   1.00K  75.50 KB  
0  KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk)) 
00:SCAN KUDU   53.694ms   4.137ms  1.00K   1.00K   4.34 MB  
0  tpcds_1000_kudu.store 
Errors: Key already present in Kudu table 
'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7014) Disable stacktrace symbolisation by default

2018-05-11 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7014:
-

 Summary: Disable stacktrace symbolisation by default
 Key: IMPALA-7014
 URL: https://issues.apache.org/jira/browse/IMPALA-7014
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Not Applicable
Reporter: Tim Armstrong
Assignee: Joe McDonnell


We got burned by the code of producing stacktrace again with IMPALA-6996. I did 
a quick investigation into this, based on the hypothesis that the symbolisation 
was the expensive part, rather than getting the addresses. I added a stopwatch 
to GetStackTrace() to measure the time in nanoseconds and ran a test that 
produces a backtrace

The first experiment was 
{noformat}
$ start-impala-cluster.py --impalad_args='--symbolize_stacktrace=true' && 
impala-py.test tests/query_test/test_scanners.py -k codec

I0511 09:45:11.897944 30904 debug-util.cc:283] stacktrace time: 75175573
I0511 09:45:11.897956 30904 status.cc:125] File 
'hdfs://localhost:20500/test-warehouse/test_bad_compression_codec_308108.db/bad_codec/bad_codec.parquet'
 uses an unsupported compression: 5000 for column 'id'.
@  0x18782ef  impala::Status::Status()
@  0x2cbe96f  impala::ParquetMetadataUtils::ValidateRowGroupColumn()
@  0x205f597  impala::BaseScalarColumnReader::Reset()
@  0x1feebe6  impala::HdfsParquetScanner::InitScalarColumns()
@  0x1fe6ff3  impala::HdfsParquetScanner::NextRowGroup()
@  0x1fe58d8  impala::HdfsParquetScanner::GetNextInternal()
@  0x1fe3eea  impala::HdfsParquetScanner::ProcessSplit()
@  0x1f6ba36  impala::HdfsScanNode::ProcessSplit()
@  0x1f6adc4  impala::HdfsScanNode::ScannerThread()
@  0x1f6a1c4  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x1f6c2a6  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x1bd3b1a  boost::function0<>::operator()()
@  0x1ebecd5  impala::Thread::SuperviseThread()
@  0x1ec6e71  boost::_bi::list5<>::operator()<>()
@  0x1ec6d95  boost::_bi::bind_t<>::operator()()
@  0x1ec6d58  boost::detail::thread_data<>::run()
@  0x31b3ada  thread_proxy
@ 0x7f9be67d36ba  start_thread
@ 0x7f9be650941d  clone

{noformat}
The stacktrace took 75ms, which is pretty bad! It would be worse on a 
production system with more memory maps.

The next experiment was to disable it:
{noformat}
start-impala-cluster.py --impalad_args='--symbolize_stacktrace=false' && 
impala-py.test tests/query_test/test_scanners.py -k codec

I0511 09:43:47.574185 29514 debug-util.cc:283] stacktrace time: 29528
I0511 09:43:47.574193 29514 status.cc:125] File 
'hdfs://localhost:20500/test-warehouse/test_bad_compression_codec_cb5d0225.db/bad_codec/bad_codec.parquet'
 uses an unsupported compression: 5000 for column 'id'.
@  0x18782ef
@  0x2cbe96f
@  0x205f597
@  0x1feebe6
@  0x1fe6ff3
@  0x1fe58d8
@  0x1fe3eea
@  0x1f6ba36
@  0x1f6adc4
@  0x1f6a1c4
@  0x1f6c2a6
@  0x1bd3b1a
@  0x1ebecd5
@  0x1ec6e71
@  0x1ec6d95
@  0x1ec6d58
@  0x31b3ada
@ 0x7fbdcbdef6ba
@ 0x7fbdcbb2541d
{noformat}
That's 2545x faster! If the addresses are in the statically linked binary, we 
can use addrline to get back the line numbers:
{noformat}
$ addr2line -e be/build/latest/service/impalad 0x2cbe96f
/home/tarmstrong/Impala/incubator-impala/be/src/exec/parquet-metadata-utils.cc:166
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IMPALA-6988) Statement to allow setting ownership

2018-05-11 Thread Adam Holley (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holley reassigned IMPALA-6988:
---

Assignee: Adam Holley

> Statement to allow setting ownership
> 
>
> Key: IMPALA-6988
> URL: https://issues.apache.org/jira/browse/IMPALA-6988
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 3.0, Impala 2.13.0
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
>
> Create statement to allow setting owner.
> ALTER DATABASE  SET OWNER=""
> ALTER TABLE  SET OWNER=""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org