[jira] [Commented] (IMPALA-6957) Include number of required threads in explain plan
[ https://issues.apache.org/jira/browse/IMPALA-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472883#comment-16472883 ] ASF subversion and git services commented on IMPALA-6957: - Commit e12ee485cf4c77203b144c053ee167509cc39374 in impala's branch refs/heads/master from [~tarmstr...@cloudera.com] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=e12ee48 ] IMPALA-6957: calc thread resource requirement in planner This only factors in fragment execution threads. E.g. this does *not* try to account for the number of threads on the old Thrift RPC code path if that is enabled. This is loosely related to the old VCores estimate, but is different in that it: * Directly ties into the notion of required threads in ThreadResourceMgr. * Is a strict upper bound on the number of such threads, rather than an estimate. Does not include "optional" threads. ThreadResourceMgr in the backend bounds the number of "optional" threads per impalad, so the number of execution threads on a backend is limited by sum(required threads per query) + CpuInfo::num_cores() * FLAGS_num_threads_per_core DCHECKS in the backend enforce that the calculation is correct. They were actually hit in KuduScanNode because of some races in thread management leading to multiple "required" threads running. Now the first thread in the multithreaded scans never exits, which means that it's always safe for any of the other threads to exit early, which simplifies the logic a lot. Testing: Updated planner tests. Ran core tests. Change-Id: I982837ef883457fa4d2adc3bdbdc727353469140 Reviewed-on: http://gerrit.cloudera.org:8080/10256 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins > Include number of required threads in explain plan > -- > > Key: IMPALA-6957 > URL: https://issues.apache.org/jira/browse/IMPALA-6957 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Not Applicable >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: resource-management > > Impala has an internal notion of "required threads" to execute a fragment, > e.g. the fragment execution thread and the first scanner thread. It's > possible to compute the number of required threads per fragment instance > based on the plan. > We should include this in the resource profile and expose it in the explain > plan. This could then be a step toward implementing something like > IMPALA-6035. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5384) Simplify coordinator locking protocol
[ https://issues.apache.org/jira/browse/IMPALA-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472882#comment-16472882 ] ASF subversion and git services commented on IMPALA-5384: - Commit 6ca87e46736a1e591ed7d7d5fee05b4b4d2fbb50 in impala's branch refs/heads/master from [~dhecht] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=6ca87e4 ] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state The is the final change to clarify and break up the Coordinator's lock. The state machine for the coordinator is made explicit, distinguishing between executing state and multiple terminal states. Logic to transition into a terminal state is centralized in one location and executes exactly once for each coordinator object. Derived from a patch for IMPALA_5384 by Marcel Kornacker. Testing: - exhaustive functional tests - stress test on minicluster with memory overcommitment. Verified from the logs that this exercises all these paths: - successful queries - client requested cancellation - error from exec FInstances RPC - error reported asynchronously via report status RPC - eos before backend execution completed Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e Reviewed-on: http://gerrit.cloudera.org:8080/10158 Reviewed-by: Dan HechtTested-by: Impala Public Jenkins > Simplify coordinator locking protocol > - > > Key: IMPALA-5384 > URL: https://issues.apache.org/jira/browse/IMPALA-5384 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 2.9.0 >Reporter: Marcel Kornacker >Assignee: Dan Hecht >Priority: Major > > The coordinator has a central lock (lock_) which is used very liberally to > synchronize state changes that don't need to be synchronized, creating a > concurrency bottleneck. > Also, the coordinator contains a number of data structures related to INSERT > finalization that don't need to be part of and synchronized with the rest of > the coordinator state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6999) Upgrade to sqlparse 0.1.19 in Impala Shell
[ https://issues.apache.org/jira/browse/IMPALA-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472879#comment-16472879 ] ASF subversion and git services commented on IMPALA-6999: - Commit 417bc8c802bee7d789394570a671fddd9baa8fe2 in impala's branch refs/heads/2.x from [~fredyw] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=417bc8c ] IMPALA-6999: Upgrade to sqlparse-0.1.19 for Impala shell sqlparse-0.1.19 is the last version of sqlparse that supports Python 2.6. Testing: - Ran all end-to-end tests Change-Id: Ide51ef3ac52d25a96b0fa832e29b6535197d23cb Reviewed-on: http://gerrit.cloudera.org:8080/10354 Reviewed-by: David KnuppTested-by: Impala Public Jenkins > Upgrade to sqlparse 0.1.19 in Impala Shell > -- > > Key: IMPALA-6999 > URL: https://issues.apache.org/jira/browse/IMPALA-6999 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Reporter: Fredy Wijaya >Assignee: Fredy Wijaya >Priority: Minor > Fix For: Impala 2.13.0, Impala 3.1.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6966) Estimated Memory in Catalogd webpage is not sorted correctly
[ https://issues.apache.org/jira/browse/IMPALA-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472880#comment-16472880 ] ASF subversion and git services commented on IMPALA-6966: - Commit 7b8bd6a190cd3070527baf6507b58f03bc6ee2e5 in impala's branch refs/heads/2.x from stiga-huang [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=7b8bd6a ] IMPALA-6966: sort table memory by size in catalogd web UI This patch fix the sorting order in "Top-K Tables with Highest Memory Requirements" in which "Estimated memory" column is sorted as strings. Values got from the catalog-server are changed from pretty-printed strings to bytes numbers. So the web UI is able to sort and render them correctly. Change-Id: I60dc253f862f5fde6fa96147f114d8765bb31a85 Reviewed-on: http://gerrit.cloudera.org:8080/10292 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins > Estimated Memory in Catalogd webpage is not sorted correctly > > > Key: IMPALA-6966 > URL: https://issues.apache.org/jira/browse/IMPALA-6966 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > Labels: newbie > Fix For: Impala 2.13.0, Impala 3.1.0 > > Attachments: Screen Shot 2018-05-03 at 9.38.45 PM.png > > > The "Top-N Tables with Highest Memory Requirements" in Catalogd webpage > doesn't sort "Estimated Memory" correctly. In fact, it sorts them as strings > instead of size. This is confusing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7019) Discard block locations and schedule as remote read with erasure coding
Tianyi Wang created IMPALA-7019: --- Summary: Discard block locations and schedule as remote read with erasure coding Key: IMPALA-7019 URL: https://issues.apache.org/jira/browse/IMPALA-7019 Project: IMPALA Issue Type: Sub-task Components: Frontend Affects Versions: Impala 3.1.0 Reporter: Tianyi Wang Assignee: Tianyi Wang Currently Impala schedules erasure coded scan in the same way as scheduling regular HDFS scan: it tries to schedule the scan on a datanode processing the block. This makes little sense with erasure coding so we should schedule it as if the block is remote. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7010) Multiple flaky tests failing with MemLimitExceeded on S3
[ https://issues.apache.org/jira/browse/IMPALA-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7010. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 Impala 2.13.0 > Multiple flaky tests failing with MemLimitExceeded on S3 > > > Key: IMPALA-7010 > URL: https://issues.apache.org/jira/browse/IMPALA-7010 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.0, Impala 2.13.0 >Reporter: Sailesh Mukil >Assignee: Tim Armstrong >Priority: Blocker > Labels: flaky > Fix For: Impala 2.13.0, Impala 3.1.0 > > > *test_low_mem_limit_orderby_all* > {code:java} > Error Message > query_test/test_mem_usage_scaling.py:272: in test_low_mem_limit_orderby_all > self.run_primitive_query(vector, 'primitive_orderby_all') > query_test/test_mem_usage_scaling.py:260: in run_primitive_query > self.low_memory_limit_test(vector, query_name, self.MIN_MEM[query_name]) > query_test/test_mem_usage_scaling.py:114: in low_memory_limit_test > self.run_test_case(tpch_query, new_vector) common/impala_test_suite.py:405: > in run_test_case result = self.__execute_query(target_impalad_client, > query, user=user) common/impala_test_suite.py:620: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:160: in execute return > self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:173: in execute handle = > self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:341: in __execute_query > self.wait_for_completion(handle) beeswax/impala_beeswax.py:361: in > wait_for_completion raise ImpalaBeeswaxException("Query aborted:" + > error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: E > Query aborted:Memory limit exceeded: Failed to allocate tuple buffer E > HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit. E > Error occurred on backend > ec2-m2-4xlarge-centos-6-4-0e8b.vpc.cloudera.com:22001 by fragment > db44c56dcd2fce95:7d746e080003 E Memory left in process limit: 11.40 GB > E Memory left in query limit: 51.61 KB E > Query(db44c56dcd2fce95:7d746e08): Limit=200.00 MB Reservation=158.50 > MB ReservationLimit=160.00 MB OtherMemory=41.45 MB Total=199.95 MB > Peak=199.95 MB E Fragment db44c56dcd2fce95:7d746e080003: > Reservation=158.50 MB OtherMemory=41.45 MB Total=199.95 MB Peak=199.95 MB E > SORT_NODE (id=1): Reservation=9.00 MB OtherMemory=8.00 KB Total=9.01 MB > Peak=22.31 MB E HDFS_SCAN_NODE (id=0): Reservation=149.50 MB > OtherMemory=41.43 MB Total=190.93 MB Peak=192.13 MB E Exprs: > Total=4.00 KB Peak=4.00 KB E KrpcDataStreamSender (dst_id=4): > Total=688.00 B Peak=688.00 B E CodeGen: Total=7.72 KB Peak=973.50 KB E > E Memory limit exceeded: Failed to allocate tuple buffer E > HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit. E > Error occurred on backend > ec2-m2-4xlarge-centos-6-4-0e8b.vpc.cloudera.com:22001 by fragment > db44c56dcd2fce95:7d746e080003 E Memory left in process limit: 11.40 GB > E Memory left in query limit: 51.61 KB E > Query(db44c56dcd2fce95:7d746e08): Limit=200.00 MB Reservation=158.50 > MB ReservationLimit=160.00 MB OtherMemory=41.45 MB Total=199.95 MB > Peak=199.95 MB E Fragment db44c56dcd2fce95:7d746e080003: > Reservation=158.50 MB OtherMemory=41.45 MB Total=199.95 MB Peak=199.95 MB E > SORT_NODE (id=1): Reservation=9.00 MB OtherMemory=8.00 KB Total=9.01 MB > Peak=22.31 MB E HDFS_SCAN_NODE (id=0): Reservation=149.50 MB > OtherMemory=41.43 MB Total=190.93 MB Peak=192.13 MB E Exprs: > Total=4.00 KB Peak=4.00 KB E KrpcDataStreamSender (dst_id=4): > Total=688.00 B Peak=688.00 B E CodeGen: Total=7.72 KB Peak=973.50 KB (1 > of 3 similar) > Stacktrace > query_test/test_mem_usage_scaling.py:272: in test_low_mem_limit_orderby_all > self.run_primitive_query(vector, 'primitive_orderby_all') > query_test/test_mem_usage_scaling.py:260: in run_primitive_query > self.low_memory_limit_test(vector, query_name, self.MIN_MEM[query_name]) > query_test/test_mem_usage_scaling.py:114: in low_memory_limit_test > self.run_test_case(tpch_query, new_vector) > common/impala_test_suite.py:405: in run_test_case > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:620: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:160: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:173: in
[jira] [Updated] (IMPALA-6966) Estimated Memory in Catalogd webpage is not sorted correctly
[ https://issues.apache.org/jira/browse/IMPALA-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-6966: --- Fix Version/s: Impala 3.1.0 Impala 2.13.0 > Estimated Memory in Catalogd webpage is not sorted correctly > > > Key: IMPALA-6966 > URL: https://issues.apache.org/jira/browse/IMPALA-6966 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > Labels: newbie > Fix For: Impala 2.13.0, Impala 3.1.0 > > Attachments: Screen Shot 2018-05-03 at 9.38.45 PM.png > > > The "Top-N Tables with Highest Memory Requirements" in Catalogd webpage > doesn't sort "Estimated Memory" correctly. In fact, it sorts them as strings > instead of size. This is confusing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7015) Insert into Kudu table returns with Status OK even if there are Kudu errors
[ https://issues.apache.org/jira/browse/IMPALA-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472689#comment-16472689 ] Greg Rahn commented on IMPALA-7015: --- IIRC the current behavior was chosen to make the query run to competition, despite hitting errors. This is mainly done because of the lack of atomicity with multi-row txns. For example, when doing a bulk insert containing duplicate keys, it would be impossible to have the command run for all non-violating records unless one removed them in the source/input set. The current behavior at least lets the command work on as many tuples as possible without adjusting the input. I'm all for better error message propagation but AFAIK this was also a limitation of the current protocols as mentioned in IMPALA-4416 and IMPALA-1789. If there is a way to provide a better UX I'm all for it. > Insert into Kudu table returns with Status OK even if there are Kudu errors > --- > > Key: IMPALA-7015 > URL: https://issues.apache.org/jira/browse/IMPALA-7015 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.12.0 >Reporter: Mostafa Mokhtar >Priority: Major > Attachments: Insert into kudu profile with errors.txt > > > DML statements against Kudu tables return status OK even if there are Kudu > errors. > This behavior is misleading. > {code} > Summary: > Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4 > Session Type: BEESWAX > Start Time: 2018-05-11 10:10:07.314218000 > End Time: 2018-05-11 10:10:07.434017000 > Query Type: DML > Query State: FINISHED > Query Status: OK > Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build > 2f9498d5c2f980aa7ff9505c56654c8e59e026ca) > User: mmokhtar > Connected User: mmokhtar > Delegated User: > Network Address: :::10.17.234.27:60760 > Default Db: tpcds_1000_kudu > Sql Statement: insert into store_2 select * from store > Coordinator: vd1317.foo:22000 > Query Options (set by configuration): > Query Options (set by configuration and planner): MT_DOP=0 > Plan: > {code} > {code} > Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem > Est. Peak Mem Detail > - > 02:PARTIAL SORT5 909.030us 1.025ms 1.00K 1.00K 6.14 MB > 4.00 MB > 01:EXCHANGE56.262ms 7.232ms 1.00K 1.00K 75.50 KB > 0 KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk)) > 00:SCAN KUDU 53.694ms 4.137ms 1.00K 1.00K 4.34 MB > 0 tpcds_1000_kudu.store > Errors: Key already present in Kudu table > 'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-4268) Allow PlanRootSink to buffer more than a batch of rows
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472642#comment-16472642 ] Tim Armstrong commented on IMPALA-4268: --- I'm going to steal this JIRA and extend it slightly. I think we should do this in a way that execution resources can be released without the client fetching all the resources. I.e. if the result set is reasonably small, we should buffer all the results in the ClientRequestState and release all of the query execution resources. This would make Impala's resource consumption less hostage to the behaviour of clients. > Allow PlanRootSink to buffer more than a batch of rows > -- > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Priority: Major > Labels: resource-management > > In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the > production of output rows at the root of a plan. > The implementation in IMPALA-2905 has the plan execute in a separate thread > to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the > sender thread will block until {{GetNext()}} is called, so that there are no > complications about memory usage and ownership due to having several batches > in flight at one time. > However, this also leads to many context switches, as each {{GetNext()}} call > yields to the sender to produce the rows. If the sender was to fill a buffer > asynchronously, the consumer could pull out of that buffer without taking a > context switch in many cases (and the extra buffering might smooth out any > performance spikes due to client delays, which currently directly affect plan > execution). > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client. The sender materializes output rows in a {{QueryResultSet}} that is > owned by the coordinator. That is not, currently, a splittable object - > instead it contains the actual RPC response struct that will hit the wire > when the RPC completes. As asynchronous sender cannot know the batch size, > which may change on every fetch call. So the {{GetNext()}} implementation > would need to be able to split out the {{QueryResultSet}} to match the > correct fetch size, and handle stitching together other {{QueryResultSets}} - > without doing extra copies. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7017) TestMetadataReplicas.test_catalog_restart fails with exception
Joe McDonnell created IMPALA-7017: - Summary: TestMetadataReplicas.test_catalog_restart fails with exception Key: IMPALA-7017 URL: https://issues.apache.org/jira/browse/IMPALA-7017 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 2.13.0 Reporter: Joe McDonnell An exhaustive build with Thrift RPC on the 2.x branch encountered an error on custom_cluster.test_metadata_replicas.TestMetadataReplicas.test_catalog_restart: {noformat} custom_cluster/test_metadata_replicas.py:71: in test_catalog_restart assert False, "Unexpected exception: " + str(e) E AssertionError: Unexpected exception: 'version' E assert False{noformat} This has happened once. I will attach more log information below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6948) Coordinators don't detect the deletion of tables that occurred outside of impala after catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dimitris Tsirogiannis resolved IMPALA-6948. --- Resolution: Fixed > Coordinators don't detect the deletion of tables that occurred outside of > impala after catalog restart > -- > > Key: IMPALA-6948 > URL: https://issues.apache.org/jira/browse/IMPALA-6948 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Dimitris Tsirogiannis >Assignee: Dimitris Tsirogiannis >Priority: Blocker > Labels: catalog-server > > Upon catalog restart the coordinators detect this event and request a full > topic update from the statestore. In certain cases, the topic update protocol > executed between the statestore and the catalog fails to detect catalog > objects that were deleted from the Metastore externally (e.g. via HIVE), thus > causing these objects to show up again in each coordinator's catalog cache. > The end result is that the catalog server and the coordinator's cache are out > of sync and in some cases the only solution is to restart both the catalog > and the statestore. > The following sequence can reproduce this issue: > {code:java} > impala> create table lala(int a); > bash> kill -9 `pidof catalogd` > hive> drop table lala; > bash> restart catalogd > impala> show tables; > --- lala shows up in the list of tables;{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-6983) stress test binary search exits if process mem_limit is too low
[ https://issues.apache.org/jira/browse/IMPALA-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472414#comment-16472414 ] Michael Brown commented on IMPALA-6983: --- Reproduction with TPCH SF=1, the default. Use Impala with {{--mem_limit=196433879}}. > stress test binary search exits if process mem_limit is too low > --- > > Key: IMPALA-6983 > URL: https://issues.apache.org/jira/browse/IMPALA-6983 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Dan Hecht >Assignee: Michael Brown >Priority: Major > > This was running stress test on tpch SF=20 and minicluster process > mem_limit=7857355161. > {code:java} > 2018-05-04 18:25:03,800 18531 MainThread > INFO:concurrent_select[1303]:Collecting runtime info for query q5: > select > n_name, > sum(l_extendedprice * (1 - l_discount)) as revenue > from > customer, > orders, > lineitem, > supplier, > nation, > region > where > c_custkey = o_custkey > and l_orderkey = o_orderkey > and l_suppkey = s_suppkey > and c_nationkey = s_nationkey > and s_nationkey = n_nationkey > and n_regionkey = r_regionkey > and r_name = 'ASIA' > and o_orderdate >= '1994-01-01' > and o_orderdate < '1995-01-01' > group by > n_name > order by > revenue desc > 2018-05-04 18:25:07,790 18531 MainThread INFO:concurrent_select[1406]:Finding > a starting point for binary search > 2018-05-04 18:25:07,790 18531 MainThread INFO:concurrent_select[1409]:Next > mem_limit: 7493 > 2018-05-04 18:28:06,380 18531 MainThread > WARNING:concurrent_select[1416]:Query couldn't be run even when using all > available memory > select > n_name, > sum(l_extendedprice * (1 - l_discount)) as revenue > from > customer, > orders, > lineitem, > supplier, > nation, > region > where > c_custkey = o_custkey > and l_orderkey = o_orderkey > and l_suppkey = s_suppkey > and c_nationkey = s_nationkey > and s_nationkey = n_nationkey > and n_regionkey = r_regionkey > and r_name = 'ASIA' > and o_orderdate >= '1994-01-01' > and o_orderdate < '1995-01-01' > group by > n_name > order by > revenue desc > Traceback (most recent call last): > File "./tests/stress/concurrent_select.py", line 2265, in > main() > File "./tests/stress/concurrent_select.py", line 2162, in main > queries, impala, converted_args, > queries_with_runtime_info_by_db_sql_and_options) > File "./tests/stress/concurrent_select.py", line 1879, in populate_all_queries > os.path.join(converted_args.results_dir, PROFILES_DIR)) > File "./tests/stress/concurrent_select.py", line 964, in > write_runtime_info_profiles > fh.write(profile) > TypeError: expected a string or other character buffer object{code} > I don't understand the details of {{concurrent_select.py}} control flow, but > it looks like in this case {{update_runtime_info()}} won't get called leading > to this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7016) Statement to allow setting ownership for database
Adam Holley created IMPALA-7016: --- Summary: Statement to allow setting ownership for database Key: IMPALA-7016 URL: https://issues.apache.org/jira/browse/IMPALA-7016 Project: IMPALA Issue Type: Sub-task Components: Frontend Affects Versions: Impala 3.0, Impala 2.13.0 Reporter: Adam Holley Create statement to allow setting owner on database {{ALTER TABLE database_name.table_name SET OWNER [USER|ROLE] user_or_role;}} examples: ALTER TABLE SET OWNER USER ALTER TABLE SET OWNER ROLE -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IMPALA-6988) Statement to allow setting ownership
[ https://issues.apache.org/jira/browse/IMPALA-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holley updated IMPALA-6988: Description: Create statement to allow setting owner. {{ALTER (DATABASE|TABLE) database_name.table_name SET OWNER [USER|ROLE] user_or_role;}} examples: ALTER DATABASE SET OWNER USER ALTER DATABASE SET OWNER ROLE ALTER TABLE . SET OWNER USER ALTER TABLE SET OWNER ROLE was: Create statement to allow setting owner. ALTER DATABASE SET OWNER="" ALTER TABLE SET OWNER="" > Statement to allow setting ownership > > > Key: IMPALA-6988 > URL: https://issues.apache.org/jira/browse/IMPALA-6988 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Affects Versions: Impala 3.0, Impala 2.13.0 >Reporter: Adam Holley >Assignee: Adam Holley >Priority: Major > > Create statement to allow setting owner. > {{ALTER (DATABASE|TABLE) database_name.table_name SET OWNER [USER|ROLE] > user_or_role;}} > examples: > ALTER DATABASE SET OWNER USER > ALTER DATABASE SET OWNER ROLE > ALTER TABLE . SET OWNER USER > ALTER TABLE SET OWNER ROLE -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7015) Insert into Kudu table returns with Status OK even if there are Kudu errors
[ https://issues.apache.org/jira/browse/IMPALA-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472302#comment-16472302 ] Mostafa Mokhtar commented on IMPALA-7015: - [~tmarsh] FYI > Insert into Kudu table returns with Status OK even if there are Kudu errors > --- > > Key: IMPALA-7015 > URL: https://issues.apache.org/jira/browse/IMPALA-7015 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 2.12.0 >Reporter: Mostafa Mokhtar >Priority: Major > Attachments: Insert into kudu profile with errors.txt > > > DML statements against Kudu tables return status OK even if there are Kudu > errors. > This behavior is misleading. > {code} > Summary: > Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4 > Session Type: BEESWAX > Start Time: 2018-05-11 10:10:07.314218000 > End Time: 2018-05-11 10:10:07.434017000 > Query Type: DML > Query State: FINISHED > Query Status: OK > Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build > 2f9498d5c2f980aa7ff9505c56654c8e59e026ca) > User: mmokhtar > Connected User: mmokhtar > Delegated User: > Network Address: :::10.17.234.27:60760 > Default Db: tpcds_1000_kudu > Sql Statement: insert into store_2 select * from store > Coordinator: vd1317.foo:22000 > Query Options (set by configuration): > Query Options (set by configuration and planner): MT_DOP=0 > Plan: > {code} > {code} > Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem > Est. Peak Mem Detail > - > 02:PARTIAL SORT5 909.030us 1.025ms 1.00K 1.00K 6.14 MB > 4.00 MB > 01:EXCHANGE56.262ms 7.232ms 1.00K 1.00K 75.50 KB > 0 KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk)) > 00:SCAN KUDU 53.694ms 4.137ms 1.00K 1.00K 4.34 MB > 0 tpcds_1000_kudu.store > Errors: Key already present in Kudu table > 'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7015) Insert into Kudu table returns with Status OK even if there are Kudu errors
Mostafa Mokhtar created IMPALA-7015: --- Summary: Insert into Kudu table returns with Status OK even if there are Kudu errors Key: IMPALA-7015 URL: https://issues.apache.org/jira/browse/IMPALA-7015 Project: IMPALA Issue Type: Bug Reporter: Mostafa Mokhtar Attachments: Insert into kudu profile with errors.txt DML statements against Kudu tables return status OK even if there are Kudu errors. This behavior is misleading. {code} Summary: Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4 Session Type: BEESWAX Start Time: 2018-05-11 10:10:07.314218000 End Time: 2018-05-11 10:10:07.434017000 Query Type: DML Query State: FINISHED Query Status: OK Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 2f9498d5c2f980aa7ff9505c56654c8e59e026ca) User: mmokhtar Connected User: mmokhtar Delegated User: Network Address: :::10.17.234.27:60760 Default Db: tpcds_1000_kudu Sql Statement: insert into store_2 select * from store Coordinator: vd1317.foo:22000 Query Options (set by configuration): Query Options (set by configuration and planner): MT_DOP=0 Plan: {code} {code} Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 02:PARTIAL SORT5 909.030us 1.025ms 1.00K 1.00K 6.14 MB 4.00 MB 01:EXCHANGE56.262ms 7.232ms 1.00K 1.00K 75.50 KB 0 KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk)) 00:SCAN KUDU 53.694ms 4.137ms 1.00K 1.00K 4.34 MB 0 tpcds_1000_kudu.store Errors: Key already present in Kudu table 'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7014) Disable stacktrace symbolisation by default
Tim Armstrong created IMPALA-7014: - Summary: Disable stacktrace symbolisation by default Key: IMPALA-7014 URL: https://issues.apache.org/jira/browse/IMPALA-7014 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Not Applicable Reporter: Tim Armstrong Assignee: Joe McDonnell We got burned by the code of producing stacktrace again with IMPALA-6996. I did a quick investigation into this, based on the hypothesis that the symbolisation was the expensive part, rather than getting the addresses. I added a stopwatch to GetStackTrace() to measure the time in nanoseconds and ran a test that produces a backtrace The first experiment was {noformat} $ start-impala-cluster.py --impalad_args='--symbolize_stacktrace=true' && impala-py.test tests/query_test/test_scanners.py -k codec I0511 09:45:11.897944 30904 debug-util.cc:283] stacktrace time: 75175573 I0511 09:45:11.897956 30904 status.cc:125] File 'hdfs://localhost:20500/test-warehouse/test_bad_compression_codec_308108.db/bad_codec/bad_codec.parquet' uses an unsupported compression: 5000 for column 'id'. @ 0x18782ef impala::Status::Status() @ 0x2cbe96f impala::ParquetMetadataUtils::ValidateRowGroupColumn() @ 0x205f597 impala::BaseScalarColumnReader::Reset() @ 0x1feebe6 impala::HdfsParquetScanner::InitScalarColumns() @ 0x1fe6ff3 impala::HdfsParquetScanner::NextRowGroup() @ 0x1fe58d8 impala::HdfsParquetScanner::GetNextInternal() @ 0x1fe3eea impala::HdfsParquetScanner::ProcessSplit() @ 0x1f6ba36 impala::HdfsScanNode::ProcessSplit() @ 0x1f6adc4 impala::HdfsScanNode::ScannerThread() @ 0x1f6a1c4 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv @ 0x1f6c2a6 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x1bd3b1a boost::function0<>::operator()() @ 0x1ebecd5 impala::Thread::SuperviseThread() @ 0x1ec6e71 boost::_bi::list5<>::operator()<>() @ 0x1ec6d95 boost::_bi::bind_t<>::operator()() @ 0x1ec6d58 boost::detail::thread_data<>::run() @ 0x31b3ada thread_proxy @ 0x7f9be67d36ba start_thread @ 0x7f9be650941d clone {noformat} The stacktrace took 75ms, which is pretty bad! It would be worse on a production system with more memory maps. The next experiment was to disable it: {noformat} start-impala-cluster.py --impalad_args='--symbolize_stacktrace=false' && impala-py.test tests/query_test/test_scanners.py -k codec I0511 09:43:47.574185 29514 debug-util.cc:283] stacktrace time: 29528 I0511 09:43:47.574193 29514 status.cc:125] File 'hdfs://localhost:20500/test-warehouse/test_bad_compression_codec_cb5d0225.db/bad_codec/bad_codec.parquet' uses an unsupported compression: 5000 for column 'id'. @ 0x18782ef @ 0x2cbe96f @ 0x205f597 @ 0x1feebe6 @ 0x1fe6ff3 @ 0x1fe58d8 @ 0x1fe3eea @ 0x1f6ba36 @ 0x1f6adc4 @ 0x1f6a1c4 @ 0x1f6c2a6 @ 0x1bd3b1a @ 0x1ebecd5 @ 0x1ec6e71 @ 0x1ec6d95 @ 0x1ec6d58 @ 0x31b3ada @ 0x7fbdcbdef6ba @ 0x7fbdcbb2541d {noformat} That's 2545x faster! If the addresses are in the statically linked binary, we can use addrline to get back the line numbers: {noformat} $ addr2line -e be/build/latest/service/impalad 0x2cbe96f /home/tarmstrong/Impala/incubator-impala/be/src/exec/parquet-metadata-utils.cc:166 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IMPALA-6988) Statement to allow setting ownership
[ https://issues.apache.org/jira/browse/IMPALA-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Holley reassigned IMPALA-6988: --- Assignee: Adam Holley > Statement to allow setting ownership > > > Key: IMPALA-6988 > URL: https://issues.apache.org/jira/browse/IMPALA-6988 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Affects Versions: Impala 3.0, Impala 2.13.0 >Reporter: Adam Holley >Assignee: Adam Holley >Priority: Major > > Create statement to allow setting owner. > ALTER DATABASE SET OWNER="" > ALTER TABLE SET OWNER="" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org