[jira] [Created] (IMPALA-8529) ccache is ignored when using ninja generator
Todd Lipcon created IMPALA-8529: --- Summary: ccache is ignored when using ninja generator Key: IMPALA-8529 URL: https://issues.apache.org/jira/browse/IMPALA-8529 Project: IMPALA Issue Type: Task Reporter: Todd Lipcon The CMakeLists.txt sets up ccache by using RULE_LAUNCH_PREFIX, which is only respected by the Makefile generator. So, if we use ninja (which is generally better at parallelism) then ccache won't kick in. Newer versions of cmake have more explicit support for ccache that ought to also work with the ninja generator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8529) ccache is ignored when using ninja generator
Todd Lipcon created IMPALA-8529: --- Summary: ccache is ignored when using ninja generator Key: IMPALA-8529 URL: https://issues.apache.org/jira/browse/IMPALA-8529 Project: IMPALA Issue Type: Task Reporter: Todd Lipcon The CMakeLists.txt sets up ccache by using RULE_LAUNCH_PREFIX, which is only respected by the Makefile generator. So, if we use ninja (which is generally better at parallelism) then ccache won't kick in. Newer versions of cmake have more explicit support for ccache that ought to also work with the ninja generator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IMPALA-6746) Reduce the number of comparison for analytical functions with partitioning when incoming data is clustered
[ https://issues.apache.org/jira/browse/IMPALA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho updated IMPALA-6746: --- Labels: performance (was: ) > Reduce the number of comparison for analytical functions with partitioning > when incoming data is clustered > -- > > Key: IMPALA-6746 > URL: https://issues.apache.org/jira/browse/IMPALA-6746 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.13.0 >Reporter: Mostafa Mokhtar >Assignee: Adrian Ng >Priority: Major > Labels: performance > Attachments: percentile query profile 2.txt > > > Checking if the current row belongs to the same partition in ANALYTIC is very > expensive, as it does N comparisons where N is number of rows, in cases when > the cardinality of the partition column(s) is relatively small the values > will be clustered. > One optimization as proposed by [~alex.behm] is to check the first and last > tuples in the batch and if they match go avoid calling > AnalyticEvalNode::PrevRowCompare for the entire batch. > For the query attached which is a common pattern the expected speedup is > 20-30%. > Query > {code} > select l_commitdate > ,avg(l_extendedprice) as avg_perc > ,percentile_cont (.25) within group (order by l_extendedprice asc) as > perc_25 > ,percentile_cont (.5) within group (order by l_extendedprice asc) as > perc_50 > ,percentile_cont (.75) within group (order by l_extendedprice asc) as > perc_75 > ,percentile_cont (.90) within group (order by l_extendedprice asc) as > perc_90 > from lineitem > group by l_commitdate > order by l_commitdate > {code} > Plan > {code} > F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | Per-Host Resources: mem-estimate=0B mem-reservation=0B > PLAN-ROOT SINK > | mem-estimate=0B mem-reservation=0B > | > 09:MERGING-EXCHANGE [UNPARTITIONED] > | order by: l_commitdate ASC > | mem-estimate=0B mem-reservation=0B > | tuple-ids=5 row-size=66B cardinality=2559 > | > F02:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1 > Per-Host Resources: mem-estimate=22.00MB mem-reservation=13.94MB > 05:SORT > | order by: l_commitdate ASC > | mem-estimate=12.00MB mem-reservation=12.00MB spill-buffer=2.00MB > | tuple-ids=5 row-size=66B cardinality=2559 > | > 08:AGGREGATE [FINALIZE] > | output: avg:merge(l_extendedprice), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_0`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_1`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_2`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_3`) > | group by: l_commitdate > | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB > | tuple-ids=4 row-size=66B cardinality=2559 > | > 07:EXCHANGE [HASH(l_commitdate)] > | mem-estimate=0B mem-reservation=0B > | tuple-ids=3 row-size=66B cardinality=2559 > | > F01:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1 > Per-Host Resources: mem-estimate=64.00MB mem-reservation=22.00MB > 04:AGGREGATE [STREAMING] > | output: avg(l_extendedprice), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.25), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.5), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.75), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.90) > | group by: l_commitdate > | mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB > | tuple-ids=3 row-size=66B cardinality=2559 > | > 03:ANALYTIC > | functions: count(l_extendedprice) > | partition by: l_commitdate > | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > | tuple-ids=9,7,8 row-size=50B cardinality=59986052 > | > 02:ANALYTIC > | functions: row_number() > | partition by: l_commitdate > | order by: l_extendedprice ASC > | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > | tuple-ids=9,7 row-size=42B cardinality=59986052 > | > 01:SORT > | order by: l_commitdate ASC NULLS FIRST, l_extendedprice ASC NULLS LAST > | mem-estimate=46.00MB mem-reservation=12.00MB spill-buffer=2.00MB > | tuple-ids=9 row-size=34B cardinality=59986052 > | > 06:EXCHANGE [HASH(l_commitdate)] > | mem-estimate=0B mem-reservation=0B > | tuple-ids=0 row-size=34B cardinality=59986052 > | > F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 > Per-Host Resources:
[jira] [Assigned] (IMPALA-3566) Refactor HashTable/HashTableCtx parameters into a parameter class
[ https://issues.apache.org/jira/browse/IMPALA-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho reassigned IMPALA-3566: -- Assignee: (was: Michael Ho) > Refactor HashTable/HashTableCtx parameters into a parameter class > - > > Key: IMPALA-3566 > URL: https://issues.apache.org/jira/browse/IMPALA-3566 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.6.0 >Reporter: Tim Armstrong >Priority: Minor > > The HashTable and HashTableCtx both have a number of instantiation-time > parameters. These parameters are split between the two classes and the caller > is expected to thread them through to the appropriate places. > It would be cleaner to store all HashTable params in a struct that can be > used to instantiate both HashTable and HastTableCtx, and also to do codegen > constant replacement. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-3566) Refactor HashTable/HashTableCtx parameters into a parameter class
[ https://issues.apache.org/jira/browse/IMPALA-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho updated IMPALA-3566: --- Labels: cleanup (was: ) > Refactor HashTable/HashTableCtx parameters into a parameter class > - > > Key: IMPALA-3566 > URL: https://issues.apache.org/jira/browse/IMPALA-3566 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.6.0 >Reporter: Tim Armstrong >Priority: Minor > Labels: cleanup > > The HashTable and HashTableCtx both have a number of instantiation-time > parameters. These parameters are split between the two classes and the caller > is expected to thread them through to the appropriate places. > It would be cleaner to store all HashTable params in a struct that can be > used to instantiate both HashTable and HastTableCtx, and also to do codegen > constant replacement. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5746) Remote fragments continue to hold onto memory after stopping the coordinator daemon
[ https://issues.apache.org/jira/browse/IMPALA-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836010#comment-16836010 ] Michael Ho commented on IMPALA-5746: FYI, [~stakiar], [~joemcdonnell], [~twm378] this is probably a case which we want to test for with fault injection testing. > Remote fragments continue to hold onto memory after stopping the coordinator > daemon > --- > > Key: IMPALA-5746 > URL: https://issues.apache.org/jira/browse/IMPALA-5746 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.10.0 >Reporter: Mostafa Mokhtar >Assignee: Michael Ho >Priority: Critical > Attachments: remote_fragments_holding_memory.txt > > > Repro > # Start running queries > # Kill the coordinator node > # On the running Impalad check the memz tab, remote fragments continue to run > and hold on to resources > Remote fragments held on to memory +30 minutes after stopping the coordinator > service. > Attached thread dump from an Impalad running remote fragments . > Snapshot of memz tab 30 minutes after killing the coordinator > {code} > Process: Limit=201.73 GB Total=5.32 GB Peak=179.36 GB > Free Disk IO Buffers: Total=1.87 GB Peak=1.87 GB > RequestPool=root.default: Total=1.35 GB Peak=178.51 GB > Query(f64169d4bb3c901c:3a21d8ae): Total=2.64 MB Peak=104.73 MB > Fragment f64169d4bb3c901c:3a21d8ae0051: Total=2.64 MB Peak=2.67 MB > AGGREGATION_NODE (id=15): Total=2.54 MB Peak=2.57 MB > Exprs: Total=30.12 KB Peak=30.12 KB > EXCHANGE_NODE (id=14): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=12.29 KB > DataStreamSender (dst_id=17): Total=85.31 KB Peak=85.31 KB > CodeGen: Total=1.53 KB Peak=374.50 KB > Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.54 MB > Query(2a4f12b3b4b1dc8c:db7e8cf2): Total=258.29 MB Peak=412.98 MB > Fragment 2a4f12b3b4b1dc8c:db7e8cf2008c: Total=2.29 MB Peak=2.29 MB > SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB > AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB > Exprs: Total=25.12 KB Peak=25.12 KB > EXCHANGE_NODE (id=19): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB > CodeGen: Total=4.17 KB Peak=1.05 MB > Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.66 MB > Query(68421d2a5dea0775:83f5d972): Total=282.77 MB Peak=443.53 MB > Fragment 68421d2a5dea0775:83f5d972004a: Total=26.77 MB Peak=26.92 MB > SORT_NODE (id=8): Total=8.00 KB Peak=8.00 KB > Exprs: Total=4.00 KB Peak=4.00 KB > ANALYTIC_EVAL_NODE (id=7): Total=4.00 KB Peak=4.00 KB > Exprs: Total=4.00 KB Peak=4.00 KB > SORT_NODE (id=6): Total=24.00 MB Peak=24.00 MB > AGGREGATION_NODE (id=12): Total=2.72 MB Peak=2.83 MB > Exprs: Total=85.12 KB Peak=85.12 KB > EXCHANGE_NODE (id=11): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=84.80 KB > DataStreamSender (dst_id=13): Total=1.27 KB Peak=1.27 KB > CodeGen: Total=24.80 KB Peak=4.13 MB > Block Manager: Limit=161.39 GB Total=280.50 MB Peak=286.52 MB > Query(e94c89fa89a74d27:82812bf9): Total=258.29 MB Peak=436.85 MB > Fragment e94c89fa89a74d27:82812bf9008e: Total=2.29 MB Peak=2.29 MB > SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB > AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB > Exprs: Total=25.12 KB Peak=25.12 KB > EXCHANGE_NODE (id=19): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB > CodeGen: Total=4.17 KB Peak=1.05 MB > Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.62 MB > Query(4e43dad3bdc935d8:938b8b7e): Total=2.65 MB Peak=105.60 MB > Fragment 4e43dad3bdc935d8:938b8b7e0052: Total=2.65 MB Peak=2.68 MB > AGGREGATION_NODE (id=15): Total=2.55 MB Peak=2.57 MB > Exprs: Total=30.12 KB Peak=30.12 KB > EXCHANGE_NODE (id=14): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=13.68 KB > DataStreamSender (dst_id=17): Total=91.41 KB Peak=91.41 KB > CodeGen: Total=1.53 KB Peak=374.50 KB > Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.30 MB > Query(b34bdd65f1ed017e:5a0291bd): Total=2.37 MB Peak=106.56 MB > Fragment b34bdd65f1ed017e:5a0291bd004b: Total=2.37 MB Peak=2.37 MB > SORT_NODE (id=6): Total=4.00 KB Peak=4.00 KB > AGGREGATION_NODE (id=10): Total=2.35 MB Peak=2.35 MB > Exprs: Total=34.12 KB Peak=34.12 KB > EXCHANGE_NODE (id=9): Total=0
[jira] [Commented] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835984#comment-16835984 ] Tim Armstrong commented on IMPALA-8527: --- It looks like there may have been some infra flakiness - INFRA-18347 also reported hangs. in the #asfinfra slack channel one of the admins made reference to some iptables rules that they'd used for the repo to prevent abuse that might have caused problems. Anyway, I think it's good for us to reduce our reliance on that repository for now so that we are not accidentally or deliberately blacklisted. > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8527 > URL: https://issues.apache.org/jira/browse/IMPALA-8527 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > Fix For: Impala 3.3.0 > > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-6042) Allow Impala shell to also use a global impalarc configuration
[ https://issues.apache.org/jira/browse/IMPALA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikramjeet Vig reassigned IMPALA-6042: -- Assignee: Ethan > Allow Impala shell to also use a global impalarc configuration > -- > > Key: IMPALA-6042 > URL: https://issues.apache.org/jira/browse/IMPALA-6042 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Reporter: Balazs Jeszenszky >Assignee: Ethan >Priority: Minor > Labels: newbie, shell, usability > > Currently, impalarc files can be specified on a per-user basis (stored in > ~/.impalarc), and they aren't created by default. > The Impala shell should pick up /etc/impalarc as well, in addition to the > user-specific configurations. > The intent here is to allow a "global" configuration of the shell by a system > administrator with common options like: > {code} > --ssl > -l > -k > -u > -i > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-6777) Whitespace inconsistencies in pretty printer across units
[ https://issues.apache.org/jira/browse/IMPALA-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikramjeet Vig closed IMPALA-6777. -- Resolution: Not A Bug > Whitespace inconsistencies in pretty printer across units > - > > Key: IMPALA-6777 > URL: https://issues.apache.org/jira/browse/IMPALA-6777 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.12.0 >Reporter: Lars Volker >Priority: Trivial > Labels: newbie > > Depending on the unit we sometimes print a whitespace between a value and its > unit and sometimes we don't: > > {noformat} > "human_readable": "Count: 9, min / max: 13.000us / 22.000us, 25th %-ile: > 13.000us, 50th %-ile: 16.000us, 75th %-ile: 17.000us, 90th %-ile: 18.000us, > 95th %-ile: 22.000us, 99.9th %-ile: 22.000us", > "human_readable": "Count: 9, min / max: 80.00 B / 80.00 B, 25th %-ile: 80.00 > B, 50th %-ile: 80.00 B, 75th %-ile: 80.00 B, 90th %-ile: 80.00 B, 95th %-ile: > 80.00 B, 99.9th %-ile: 80.00 B", > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6777) Whitespace inconsistencies in pretty printer across units
[ https://issues.apache.org/jira/browse/IMPALA-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835952#comment-16835952 ] Bikramjeet Vig commented on IMPALA-6777: I think the intention there was that memory is always printed as a single unit, eg: 1.23 KB, 10.1 MB, 12.00 GB. So the space looks ok there, whereas time is often printed as multiple units stacked together like 12s123ms so if we add a space there we will have to make it look like 12 s 123 ms to make the spacing consistent. Also, I am a lil wary that this might break some external parsers that monitor impala metrics or read profiles, so this can break compatibility. Closing this since there is no strong incentive to change the representation. > Whitespace inconsistencies in pretty printer across units > - > > Key: IMPALA-6777 > URL: https://issues.apache.org/jira/browse/IMPALA-6777 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.12.0 >Reporter: Lars Volker >Priority: Trivial > Labels: newbie > > Depending on the unit we sometimes print a whitespace between a value and its > unit and sometimes we don't: > > {noformat} > "human_readable": "Count: 9, min / max: 13.000us / 22.000us, 25th %-ile: > 13.000us, 50th %-ile: 16.000us, 75th %-ile: 17.000us, 90th %-ile: 18.000us, > 95th %-ile: 22.000us, 99.9th %-ile: 22.000us", > "human_readable": "Count: 9, min / max: 80.00 B / 80.00 B, 25th %-ile: 80.00 > B, 50th %-ile: 80.00 B, 75th %-ile: 80.00 B, 90th %-ile: 80.00 B, 95th %-ile: > 80.00 B, 99.9th %-ile: 80.00 B", > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8528) Refactor authorization code from AnalysisContext to AuthorizationChecker
Fredy Wijaya created IMPALA-8528: Summary: Refactor authorization code from AnalysisContext to AuthorizationChecker Key: IMPALA-8528 URL: https://issues.apache.org/jira/browse/IMPALA-8528 Project: IMPALA Issue Type: Sub-task Components: Frontend Reporter: Fredy Wijaya Assignee: Fredy Wijaya Currently the authorization code is scattered in few places, such as AnalysisContext and AuthorizationChecker. This makes it difficult to add things such as doing pre and post authorization check for audit logging, etc. We need to consolidate the authorization code into a single place and perhaps make AuthorizationChecker as an interface and create a BaseAuthorizationChecker that contains many useful authorization methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8528) Refactor authorization code from AnalysisContext to AuthorizationChecker
[ https://issues.apache.org/jira/browse/IMPALA-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya updated IMPALA-8528: - Priority: Critical (was: Major) > Refactor authorization code from AnalysisContext to AuthorizationChecker > > > Key: IMPALA-8528 > URL: https://issues.apache.org/jira/browse/IMPALA-8528 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Fredy Wijaya >Assignee: Fredy Wijaya >Priority: Critical > > Currently the authorization code is scattered in few places, such as > AnalysisContext and AuthorizationChecker. This makes it difficult to add > things such as doing pre and post authorization check for audit logging, etc. > We need to consolidate the authorization code into a single place and perhaps > make AuthorizationChecker as an interface and create a > BaseAuthorizationChecker that contains many useful authorization methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8528) Refactor authorization code from AnalysisContext to AuthorizationChecker
[ https://issues.apache.org/jira/browse/IMPALA-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8528 started by Fredy Wijaya. > Refactor authorization code from AnalysisContext to AuthorizationChecker > > > Key: IMPALA-8528 > URL: https://issues.apache.org/jira/browse/IMPALA-8528 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Fredy Wijaya >Assignee: Fredy Wijaya >Priority: Critical > > Currently the authorization code is scattered in few places, such as > AnalysisContext and AuthorizationChecker. This makes it difficult to add > things such as doing pre and post authorization check for audit logging, etc. > We need to consolidate the authorization code into a single place and perhaps > make AuthorizationChecker as an interface and create a > BaseAuthorizationChecker that contains many useful authorization methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8528) Refactor authorization code from AnalysisContext to AuthorizationChecker
Fredy Wijaya created IMPALA-8528: Summary: Refactor authorization code from AnalysisContext to AuthorizationChecker Key: IMPALA-8528 URL: https://issues.apache.org/jira/browse/IMPALA-8528 Project: IMPALA Issue Type: Sub-task Components: Frontend Reporter: Fredy Wijaya Assignee: Fredy Wijaya Currently the authorization code is scattered in few places, such as AnalysisContext and AuthorizationChecker. This makes it difficult to add things such as doing pre and post authorization check for audit logging, etc. We need to consolidate the authorization code into a single place and perhaps make AuthorizationChecker as an interface and create a BaseAuthorizationChecker that contains many useful authorization methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-8526) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8526. --- Resolution: Duplicate > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8526 > URL: https://issues.apache.org/jira/browse/IMPALA-8526 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8527. --- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8527 > URL: https://issues.apache.org/jira/browse/IMPALA-8527 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > Fix For: Impala 3.3.0 > > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reopened IMPALA-8527: --- > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8527 > URL: https://issues.apache.org/jira/browse/IMPALA-8527 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8527. --- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8527 > URL: https://issues.apache.org/jira/browse/IMPALA-8527 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > Fix For: Impala 3.3.0 > > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8526) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8526. --- Resolution: Duplicate > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8526 > URL: https://issues.apache.org/jira/browse/IMPALA-8526 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-3418) The Impala FE project relies on Z-tools snapshot builds
[ https://issues.apache.org/jira/browse/IMPALA-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835876#comment-16835876 ] ASF subversion and git services commented on IMPALA-3418: - Commit 4c6ac151ef2c6efac8a8a3d02c342334bf9d688c in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4c6ac15 ] IMPALA-8527: prune maven repo dependencies We transitively pull in references to repository.apache.org, which in turn means we'll go looking for most of our dependencies there. Downloading from repository.apache.org occasionally hangs, so there's a high probability of a build getting stuck. I was able to disable repository.apache.org entirely - the same packages are available from other repositories that we don't see the same issues with. Locating snapshot versions is very chatty - we reach out to mvnrepository.com and apache.org repeatedly, but I don't think we actually need to consume any snapshots from them. So I tried to minimise the number of repositories that we'll consume snapshots from - I think we only intend to download snapshots from the CDH repo. Also remove the plugin snapshots repository. We historically needed it because we used a snapshot version of the cup plugin, but that was fixed by IMPALA-3418. Otherwise depending on plugin snapshots seems like a bad idea. Change-Id: I08e1f1b7d7742edd61179ee52b5e268c3b4dc61d Reviewed-on: http://gerrit.cloudera.org:8080/13279 Reviewed-by: Fredy Wijaya Reviewed-by: Todd Lipcon Tested-by: Tim Armstrong > The Impala FE project relies on Z-tools snapshot builds > --- > > Key: IMPALA-3418 > URL: https://issues.apache.org/jira/browse/IMPALA-3418 > Project: IMPALA > Issue Type: Task > Components: Frontend >Affects Versions: Impala 2.0.2 >Reporter: Charlie Helin >Assignee: Charlie Helin >Priority: Major > Fix For: Impala 2.7.0 > > > The FE project relies on the CUP-Maven plugin which is a maven snapshot. This > means that it's dependents like CUP-Java and the Java Runtime that are in > turn referenced as Maven snapshots will be updated as soon as a new build is > available. > The suggestion is to make a release build of the current bits of Z-tools and > assign it a proper version string. Then modify the FE pom to depend on this > version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835875#comment-16835875 ] ASF subversion and git services commented on IMPALA-8527: - Commit 4c6ac151ef2c6efac8a8a3d02c342334bf9d688c in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4c6ac15 ] IMPALA-8527: prune maven repo dependencies We transitively pull in references to repository.apache.org, which in turn means we'll go looking for most of our dependencies there. Downloading from repository.apache.org occasionally hangs, so there's a high probability of a build getting stuck. I was able to disable repository.apache.org entirely - the same packages are available from other repositories that we don't see the same issues with. Locating snapshot versions is very chatty - we reach out to mvnrepository.com and apache.org repeatedly, but I don't think we actually need to consume any snapshots from them. So I tried to minimise the number of repositories that we'll consume snapshots from - I think we only intend to download snapshots from the CDH repo. Also remove the plugin snapshots repository. We historically needed it because we used a snapshot version of the cup plugin, but that was fixed by IMPALA-3418. Otherwise depending on plugin snapshots seems like a bad idea. Change-Id: I08e1f1b7d7742edd61179ee52b5e268c3b4dc61d Reviewed-on: http://gerrit.cloudera.org:8080/13279 Reviewed-by: Fredy Wijaya Reviewed-by: Todd Lipcon Tested-by: Tim Armstrong > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8527 > URL: https://issues.apache.org/jira/browse/IMPALA-8527 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3381) Impala to support AM/PM format in unix_timestamp and from_unixtime
[ https://issues.apache.org/jira/browse/IMPALA-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835815#comment-16835815 ] Greg Rahn commented on IMPALA-3381: --- I'm good with supporting this use case in the SQL:2016 stuff (IMPALA-4018) and recommending it's use vs trying to add the functionality to the legacy methods. > Impala to support AM/PM format in unix_timestamp and from_unixtime > -- > > Key: IMPALA-3381 > URL: https://issues.apache.org/jira/browse/IMPALA-3381 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 2.5.0 >Reporter: Yibing Shi >Assignee: Attila Jeges >Priority: Minor > Labels: built-in-function > > Need a way to support below format in {{unix_timestamp}}, and > {{from_unixtime}} if possible: > {noformat} > 20151214 05:15:00.1234 AM > 20151214 05:15:00.1234 PM > {noformat} > An example: > {noformat} > unix_timestamp('20151214 05:15:00.1234 PM', 'MMdd HH:mm:ss.SSS PP') > {noformat} > Of course we can choose a different format, only if we can interpret the > AM/PM part of a date time string. > *The function need to support up to nanosecond* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8324) Cleanup for test_ddl.TestDdlStatements.test_alter_table fails on s3
[ https://issues.apache.org/jira/browse/IMPALA-8324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8324. --- Resolution: Duplicate This looks like the same general issue > Cleanup for test_ddl.TestDdlStatements.test_alter_table fails on s3 > --- > > Key: IMPALA-8324 > URL: https://issues.apache.org/jira/browse/IMPALA-8324 > Project: IMPALA > Issue Type: Bug > Components: Frontend, Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Joe McDonnell >Priority: Critical > Labels: broken-build > > test_ddl.TestDdlStatements.test_alter_table has failed twice on s3 with the > following: > {noformat} > conftest.py:315: in cleanup > {'sync_ddl': sync_ddl}) > common/impala_test_suite.py:601: in wrapper > return function(*args, **kwargs) > common/impala_test_suite.py:609: in execute_query_expect_success > result = cls.__execute_query(impalad_client, query, query_options, user) > common/impala_test_suite.py:699: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:174: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:183: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:358: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:352: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:512: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive > Metastore: > E CAUSED BY: NoSuchObjectException: test_alter_table_db234c3f{noformat} > This is the unique database created/destroyed by the unique_database test > fixture. In looking at the logs, I don't see any duplicate dropping of this > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8324) Cleanup for test_ddl.TestDdlStatements.test_alter_table fails on s3
[ https://issues.apache.org/jira/browse/IMPALA-8324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8324. --- Resolution: Duplicate This looks like the same general issue > Cleanup for test_ddl.TestDdlStatements.test_alter_table fails on s3 > --- > > Key: IMPALA-8324 > URL: https://issues.apache.org/jira/browse/IMPALA-8324 > Project: IMPALA > Issue Type: Bug > Components: Frontend, Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Joe McDonnell >Priority: Critical > Labels: broken-build > > test_ddl.TestDdlStatements.test_alter_table has failed twice on s3 with the > following: > {noformat} > conftest.py:315: in cleanup > {'sync_ddl': sync_ddl}) > common/impala_test_suite.py:601: in wrapper > return function(*args, **kwargs) > common/impala_test_suite.py:609: in execute_query_expect_success > result = cls.__execute_query(impalad_client, query, query_options, user) > common/impala_test_suite.py:699: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:174: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:183: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:358: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:352: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:512: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive > Metastore: > E CAUSED BY: NoSuchObjectException: test_alter_table_db234c3f{noformat} > This is the unique database created/destroyed by the unique_database test > fixture. In looking at the logs, I don't see any duplicate dropping of this > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-7154) Error making 'dropDatabase' RPC to Hive Metastore. NoSuchObjectException thrown
[ https://issues.apache.org/jira/browse/IMPALA-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835811#comment-16835811 ] Tim Armstrong commented on IMPALA-7154: --- Failed again: {noformat} 19:18:01 ERRORS 19:18:01 ERROR at teardown of TestMtDop.test_compute_stats[mt_dop: 2 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] 19:18:01 [gw1] linux2 -- Python 2.7.5 /data/jenkins/workspace/impala-cdh5-trunk-core-s3/repos/Impala/bin/../infra/python/env/bin/python 19:18:01 conftest.py:297: in cleanup 19:18:01 {'sync_ddl': sync_ddl}) 19:18:01 common/impala_test_suite.py:523: in wrapper 19:18:01 return function(*args, **kwargs) 19:18:01 common/impala_test_suite.py:531: in execute_query_expect_success 19:18:01 result = cls.__execute_query(impalad_client, query, query_options, user) 19:18:01 common/impala_test_suite.py:621: in __execute_query 19:18:01 return impalad_client.execute(query, user=user) 19:18:01 common/impala_connection.py:160: in execute 19:18:01 return self.__beeswax_client.execute(sql_stmt, user=user) 19:18:01 beeswax/impala_beeswax.py:173: in execute 19:18:01 handle = self.__execute_query(query_string.strip(), user=user) 19:18:01 beeswax/impala_beeswax.py:339: in __execute_query 19:18:01 handle = self.execute_query_async(query_string, user=user) 19:18:01 beeswax/impala_beeswax.py:335: in execute_query_async 19:18:01 return self.__do_rpc(lambda: self.imp_service.query(query,)) 19:18:01 beeswax/impala_beeswax.py:460: in __do_rpc 19:18:01 raise ImpalaBeeswaxException(self.__build_error_message(b), b) 19:18:01 E ImpalaBeeswaxException: ImpalaBeeswaxException: 19:18:01 EINNER EXCEPTION: 19:18:01 EMESSAGE: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore: 19:18:01 E CAUSED BY: NoSuchObjectException: test_compute_stats_2b9fb4b2 19:18:01 Captured stderr setup - 19:18:01 SET sync_ddl=False; 19:18:01 -- executing against localhost:21000 19:18:01 DROP DATABASE IF EXISTS `test_compute_stats_2b9fb4b2` CASCADE; 19:18:01 19:18:01 SET sync_ddl=False; 19:18:01 -- executing against localhost:21000 19:18:01 CREATE DATABASE `test_compute_stats_2b9fb4b2`; 19:18:01 19:18:01 MainThread: Created database "test_compute_stats_2b9fb4b2" for test ID "query_test/test_mt_dop.py::TestMtDop::()::test_compute_stats[mt_dop: 2 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none]" 19:18:01 - Captured stderr call - 19:18:01 -- executing against localhost:21000 19:18:01 use functional; 19:18:01 19:18:01 -- executing against localhost:21000 19:18:01 describe formatted alltypes; 19:18:01 19:18:01 -- executing against localhost:21000 19:18:01 use functional; 19:18:01 19:18:01 -- executing against localhost:21000 19:18:01 create external table test_compute_stats_2b9fb4b2.mt_dop like alltypes location 's3a://impala-test-uswest2-2/test-warehouse/alltypes'; 19:18:01 19:18:01 -- executing against localhost:21000 19:18:01 alter table test_compute_stats_2b9fb4b2.mt_dop recover partitions; 19:18:01 19:18:01 SET mt_dop=2; 19:18:01 SET batch_size=0; 19:18:01 SET num_nodes=0; 19:18:01 SET disable_codegen_rows_threshold=0; 19:18:01 SET disable_codegen=False; 19:18:01 SET abort_on_error=1; 19:18:01 SET exec_single_node_rows_threshold=0; 19:18:01 -- executing against localhost:21000 19:18:01 compute stats test_compute_stats_2b9fb4b2.mt_dop; 19:18:01 19:18:01 --- Captured stderr teardown --- 19:18:01 -- executing against localhost:21000 19:18:01 use default; 19:18:01 19:18:01 SET sync_ddl=False; 19:18:01 -- executing against localhost:21000 19:18:01 DROP DATABASE `test_compute_stats_2b9fb4b2` CASCADE; 19:18:01 19:18:01 ERROR at teardown of TestMtDop.test_compute_stats[mt_dop: 8 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] 19:18:01 [gw6] linux2 -- Python 2.7.5 /data/jenkins/workspace/impala-cdh5-trunk-core-s3/repos/Impala/bin/../infra/python/env/bin/python 19:18:01 conftest.py:297: in cleanup 19:18:01 {'sync_ddl': sync_ddl}) 19:18:01 common/impala_test_suite.py:523: in wrapper 19:18:01 return function(*args, **kwargs) 19:18:01 common/impala_test_suite.py:531: in execute_query_expect_success 19:18:01 result = cls.__execute_query(impalad_client, query, query_options, user) 19:18:01
[jira] [Resolved] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8527. --- Resolution: Duplicate > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8527 > URL: https://issues.apache.org/jira/browse/IMPALA-8527 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8527. --- Resolution: Duplicate > Maven hangs on jenkins.impala.io talking to repository.apache.org > - > > Key: IMPALA-8527 > URL: https://issues.apache.org/jira/browse/IMPALA-8527 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build > > We're seeing most precommit builds failing because mvn gets stuck talking to > repository.apache.org. See IMPALA-8516. > I'm going to see if we can avoid it by pruning down our Maven repository > dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8512) Data cache tests failing on older CentOS 6 versions
[ https://issues.apache.org/jira/browse/IMPALA-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835797#comment-16835797 ] ASF subversion and git services commented on IMPALA-8512: - Commit 460aef657a7afdb0aa3bf91ac0f92c0496624379 in impala's branch refs/heads/master from Michael Ho [ https://gitbox.apache.org/repos/asf?p=impala.git;h=460aef6 ] IMPALA-8512: Disable certain tests on Centos6 The data cache related tests rely on data cache files being created successfully on local filesystem. The cache initialization may fail if the cache directory resides on a ext filesystem which is affected by KUDU-1508 (metadata corruption after hole punching in some files). On some older versions of Centos6, the tests fail as a result of this bug. This change skips these tests if they detect that it's running on an old system affected by KUDU-1508. This patch also disables a filesystem-util test which relies on readdir() returning the correct entries' types. On some older platforms such as Centos6, this feature may not be fully supported on all filesystems. Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Reviewed-on: http://gerrit.cloudera.org:8080/13271 Reviewed-by: Impala Public Jenkins Tested-by: Tim Armstrong > Data cache tests failing on older CentOS 6 versions > --- > > Key: IMPALA-8512 > URL: https://issues.apache.org/jira/browse/IMPALA-8512 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Michael Ho >Priority: Blocker > Labels: broken-build > > They are failing with errors like: > {noformat} > Error: Data dir /tmp/data-cache-test.0 is on an ext4 filesystem vulnerable to > KUDU-1508. > {noformat} > {noformat} > custom_cluster.test_data_cache.TestDataCache.test_data_cache_deterministic[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none] > DataCacheTest.TestBasics > DataCacheTest.RotateFiles > DataCacheTest.RotateAndDeleteFiles > DataCacheTest.Eviction > DataCacheTest.MultiThreadedNoMisses > DataCacheTest.MultiThreadedWithMisses > DataCacheTest.MultiPartitions > DataCacheTest.LargeFootprint > FilesystemUtil.DirEntryTypes > custom_cluster.test_data_cache.TestDataCache.test_data_cache[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none] > {noformat} > Can we disable these tests on affected systems? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8517) bootstrap_toolchain.py failed: 'ascii' codec can't encode character u'\u2018' in position 742: ordinal not in range(128)
[ https://issues.apache.org/jira/browse/IMPALA-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835798#comment-16835798 ] ASF subversion and git services commented on IMPALA-8517: - Commit 9a216f1de96722a43056a724ea22071383b58360 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=9a216f1 ] IMPALA-8517: print backtrace to debug bootstrap_toolchain This should help track down the source of the exception if the flakiness reoccurs. Change-Id: Ia6205d024c67c6c70ec49e4e65967d5c91b48428 Reviewed-on: http://gerrit.cloudera.org:8080/13270 Tested-by: Tim Armstrong Reviewed-by: Tim Armstrong > bootstrap_toolchain.py failed: 'ascii' codec can't encode character u'\u2018' > in position 742: ordinal not in range(128) > > > Key: IMPALA-8517 > URL: https://issues.apache.org/jira/browse/IMPALA-8517 > Project: IMPALA > Issue Type: Bug >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Critical > Labels: flaky > > {noformat} > 18:38:17 2019-05-06 18:38:17,093 Thread-203 INFO: Downloading > https://native-toolchain.s3.amazonaws.com/build/cdh_components/1055188/tarballs/hadoop-3.0.0-cdh6.x-SNAPSHOT.tar.gz > to > /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/cdh_components-1055188/hadoop-3.0.0-cdh6.x-SNAPSHOT.tar.gz > (attempt 1) > 18:38:17 2019-05-06 18:38:17,093 Thread-204 INFO: Downloading > https://native-toolchain.s3.amazonaws.com/build/cdh_components/1055188/tarballs/sentry-2.1.0-cdh6.x-SNAPSHOT.tar.gz > to > /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/cdh_components-1055188/sentry-2.1.0-cdh6.x-SNAPSHOT.tar.gz > (attempt 1) > 18:38:17 2019-05-06 18:38:17,093 Thread-202 INFO: Downloading > https://native-toolchain.s3.amazonaws.com/build/cdh_components/1055188/tarballs/hbase-2.1.0-cdh6.x-SNAPSHOT.tar.gz > to > /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/cdh_components-1055188/hbase-2.1.0-cdh6.x-SNAPSHOT.tar.gz > (attempt 1) > 18:38:17 2019-05-06 18:38:17,094 Thread-205 INFO: Downloading > https://native-toolchain.s3.amazonaws.com/build/cdh_components/1055188/tarballs/hive-2.1.1-cdh6.x-SNAPSHOT.tar.gz > to > /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/cdh_components-1055188/hive-2.1.1-cdh6.x-SNAPSHOT.tar.gz > (attempt 1) > 18:38:21 2019-05-06 18:38:21,430 Thread-205 INFO: Extracting > hive-2.1.1-cdh6.x-SNAPSHOT.tar.gz > 18:38:23 2019-05-06 18:38:23,031 Thread-203 INFO: Extracting > hadoop-3.0.0-cdh6.x-SNAPSHOT.tar.gz > 18:38:24 2019-05-06 18:38:24,012 Thread-205 INFO: Downloading > https://native-toolchain.s3.amazonaws.com/build/cdh_components/1055188/tarballs/kudu-1.10.0-cdh6.x-SNAPSHOT-redhat7.tar.gz > to > /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/cdh_components-1055188/kudu-1.10.0-cdh6.x-SNAPSHOT-redhat7.tar.gz > (attempt 1) > 18:38:37 2019-05-06 18:38:37,805 Thread-202 INFO: Extracting > hbase-2.1.0-cdh6.x-SNAPSHOT.tar.gz > 18:38:38 2019-05-06 18:38:38,786 Thread-205 INFO: Extracting > kudu-1.10.0-cdh6.x-SNAPSHOT-redhat7.tar.gz > 18:43:33 Traceback (most recent call last): > 18:43:33 File > "/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/bin/bootstrap_toolchain.py", > line 564, in > 18:43:33 download_cdh_components(toolchain_root, cdh_components, > download_path_prefix) > 18:43:33 File > "/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/bin/bootstrap_toolchain.py", > line 437, in download_cdh_components > 18:43:33 execute_many(download, cdh_components) > 18:43:33 File > "/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/bin/bootstrap_toolchain.py", > line 399, in execute_many > 18:43:33 return pool.map(f, args, 1) > 18:43:33 File "/usr/lib64/python2.7/multiprocessing/pool.py", line 250, in > map > 18:43:33 return self.map_async(func, iterable, chunksize).get() > 18:43:33 File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in > get > 18:43:33 raise self._value > 18:43:33 UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' > in position 742: ordinal not in range(128) > {noformat} > Unfortunately I can see where the original exception came from. I'll see if I > can add something that would print the backtrace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8520) Add Solr into the Impala minicluster
[ https://issues.apache.org/jira/browse/IMPALA-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya resolved IMPALA-8520. -- Resolution: Won't Fix I decided to not introduce another heavyweight dependency since Impala already has many components in its mini cluster. I'll figure out a way to test the Ranger audit log without having to spin off Solr. > Add Solr into the Impala minicluster > > > Key: IMPALA-8520 > URL: https://issues.apache.org/jira/browse/IMPALA-8520 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Fredy Wijaya >Assignee: Fredy Wijaya >Priority: Critical > > Solr is needed by Ranger for audit log. In order to test Impala/Ranger audit > log integration, we need to download and configure Solr with Ranger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8520) Add Solr into the Impala minicluster
[ https://issues.apache.org/jira/browse/IMPALA-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya resolved IMPALA-8520. -- Resolution: Won't Fix I decided to not introduce another heavyweight dependency since Impala already has many components in its mini cluster. I'll figure out a way to test the Ranger audit log without having to spin off Solr. > Add Solr into the Impala minicluster > > > Key: IMPALA-8520 > URL: https://issues.apache.org/jira/browse/IMPALA-8520 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Fredy Wijaya >Assignee: Fredy Wijaya >Priority: Critical > > Solr is needed by Ranger for audit log. In order to test Impala/Ranger audit > log integration, we need to download and configure Solr with Ranger. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8526) Maven hangs on jenkins.impala.io talking to repository.apache.org
Tim Armstrong created IMPALA-8526: - Summary: Maven hangs on jenkins.impala.io talking to repository.apache.org Key: IMPALA-8526 URL: https://issues.apache.org/jira/browse/IMPALA-8526 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.3.0 Reporter: Tim Armstrong Assignee: Tim Armstrong We're seeing most precommit builds failing because mvn gets stuck talking to repository.apache.org. See IMPALA-8516. I'm going to see if we can avoid it by pruning down our Maven repository dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
Tim Armstrong created IMPALA-8527: - Summary: Maven hangs on jenkins.impala.io talking to repository.apache.org Key: IMPALA-8527 URL: https://issues.apache.org/jira/browse/IMPALA-8527 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.3.0 Reporter: Tim Armstrong Assignee: Tim Armstrong We're seeing most precommit builds failing because mvn gets stuck talking to repository.apache.org. See IMPALA-8516. I'm going to see if we can avoid it by pruning down our Maven repository dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8526) Maven hangs on jenkins.impala.io talking to repository.apache.org
Tim Armstrong created IMPALA-8526: - Summary: Maven hangs on jenkins.impala.io talking to repository.apache.org Key: IMPALA-8526 URL: https://issues.apache.org/jira/browse/IMPALA-8526 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.3.0 Reporter: Tim Armstrong Assignee: Tim Armstrong We're seeing most precommit builds failing because mvn gets stuck talking to repository.apache.org. See IMPALA-8516. I'm going to see if we can avoid it by pruning down our Maven repository dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org
Tim Armstrong created IMPALA-8527: - Summary: Maven hangs on jenkins.impala.io talking to repository.apache.org Key: IMPALA-8527 URL: https://issues.apache.org/jira/browse/IMPALA-8527 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.3.0 Reporter: Tim Armstrong Assignee: Tim Armstrong We're seeing most precommit builds failing because mvn gets stuck talking to repository.apache.org. See IMPALA-8516. I'm going to see if we can avoid it by pruning down our Maven repository dependencies - we should be able to get all the artifacts from other mirrors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-8516) Update maven on Jenkins Ubuntu build slaves
[ https://issues.apache.org/jira/browse/IMPALA-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835711#comment-16835711 ] Tim Armstrong commented on IMPALA-8516: --- Until it gets stuck again downloading from repository.apache.org: {noformat} Downloading: https://repository.apache.org/content/repositories/org/apache/kerby/kerb-crypto/1.0.0/kerb-crypto-1.0.0.pom {noformat} > Update maven on Jenkins Ubuntu build slaves > --- > > Key: IMPALA-8516 > URL: https://issues.apache.org/jira/browse/IMPALA-8516 > Project: IMPALA > Issue Type: Task >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > > Currently we're installing maven from an apt repository, which ends up giving > us a relatively old version. It seems we might be hitting HTTPCLIENT-1478 in > the version of httpclient that ends up getting bundled into that package. We > should update to an explicitly downloaded maven tarball and see if it fixes > the hangs in SSL connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8516) Update maven on Jenkins Ubuntu build slaves
[ https://issues.apache.org/jira/browse/IMPALA-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835708#comment-16835708 ] Tim Armstrong commented on IMPALA-8516: --- I ran locally on ubuntu 16.04 after blowing away ~/.m2/repository and I see it get stuck at: {noformat} Downloading: https://repository.apache.org/content/repositories/com/github/stephenc/jcip/jcip-annotations/1.0-1/jcip-annotations-1.0-1.pom {noformat} When I retry it's fine: {noformat} Downloading: http://nexus-private.hortonworks.com/nexus/content/groups/public/com/github/stephenc/jcip/jcip-annotations/1.0-1/jcip-annotations-1.0-1.pom Downloaded: http://nexus-private.hortonworks.com/nexus/content/groups/public/com/github/stephenc/jcip/jcip-annotations/1.0-1/jcip-annotations-1.0-1.pom (6 KB at 23.7 KB/sec) [DEBUG] Writing tracking file /home/tarmstrong/.m2/repository/com/github/stephenc/jcip/jcip-annotations/1.0-1/_remote.repositories [DEBUG] Writing tracking file /home/tarmstrong/.m2/repository/com/github/stephenc/jcip/jcip-annotations/1.0-1/jcip-annotations-1.0-1.pom.lastUpdated {noformat} > Update maven on Jenkins Ubuntu build slaves > --- > > Key: IMPALA-8516 > URL: https://issues.apache.org/jira/browse/IMPALA-8516 > Project: IMPALA > Issue Type: Task >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > > Currently we're installing maven from an apt repository, which ends up giving > us a relatively old version. It seems we might be hitting HTTPCLIENT-1478 in > the version of httpclient that ends up getting bundled into that package. We > should update to an explicitly downloaded maven tarball and see if it fixes > the hangs in SSL connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8521) Lots of "unreleased ByteBuffers allocated by read()" errors from HDFS client
[ https://issues.apache.org/jira/browse/IMPALA-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835706#comment-16835706 ] Tim Armstrong commented on IMPALA-8521: --- I looked through the job logs and the query is: {noformat} select count(*) from test_hdfs_caching_fallback_path_a02e3860.cached_nation {noformat} So it must be test_hdfs_caching_fallback_path in tests/query_test/test_hdfs_caching.py. I don't have a working minicluster right this second, but I'd bet that running that test is sufficient to reproduce. I'll also send you links to a couple of jenkins jobs offline. > Lots of "unreleased ByteBuffers allocated by read()" errors from HDFS client > > > Key: IMPALA-8521 > URL: https://issues.apache.org/jira/browse/IMPALA-8521 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Critical > > I'm looking at some job logs and seeing a bunch of errors like this. I don't > know if it's benign or if it's something more serious. > {noformat} > I0507 07:34:53.934693 20195 scan-range.cc:607] > dd4d6eb8d2ad9587:6b44fe1b0002] Cache read failed for scan range: > file=hdfs://localhost:20500/test-warehouse/f861f1a3/nation.tbl disk_id=0 > offset=1024 exclusive_hdfs_fh=0xec09220 num_remote_bytes=0 cancel_status= > buffer_queue=0 num_buffers_in_readers=0 unused_iomgr_buffers=0 > unused_iomgr_buffer_bytes=0 blocked_on_buffer=0. Switching to disk read path. > W0507 07:34:53.934787 20195 DFSInputStream.java:668] > dd4d6eb8d2ad9587:6b44fe1b0002] closing file > /test-warehouse/f861f1a3/nation.tbl, but there are still unreleased > ByteBuffers allocated by read(). Please release > java.nio.DirectByteBufferR[pos=1024 lim=2048 cap=2199]. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2990) Coordinator should timeout and cancel queries with unresponsive / stuck executors
[ https://issues.apache.org/jira/browse/IMPALA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835705#comment-16835705 ] Michael Brown commented on IMPALA-2990: --- Congrats on resolving this! > Coordinator should timeout and cancel queries with unresponsive / stuck > executors > - > > Key: IMPALA-2990 > URL: https://issues.apache.org/jira/browse/IMPALA-2990 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.3.0 >Reporter: Sailesh Mukil >Assignee: Thomas Tauber-Marshall >Priority: Critical > Labels: hang, observability, supportability > Fix For: Impala 3.3.0 > > > The coordinator currently waits indefinitely if it does not hear back from a > backend. This could cause a query to hang indefinitely in case of a network > error, etc. > We should add logic for determining when a backend is unresponsive and kill > the query. The logic should mostly revolve around Coordinator::Wait() and > Coordinator::UpdateFragmentExecStatus() based on whether it receives periodic > updates from a backed (via FragmentExecState::ReportStatusCb()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread
Sahil Takiar created IMPALA-8525: Summary: preads should use hdfsPreadFully rather than hdfsPread Key: IMPALA-8525 URL: https://issues.apache.org/jira/browse/IMPALA-8525 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client. {{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes. Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues: (1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary) Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary. Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work). This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14478 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread
Sahil Takiar created IMPALA-8525: Summary: preads should use hdfsPreadFully rather than hdfsPread Key: IMPALA-8525 URL: https://issues.apache.org/jira/browse/IMPALA-8525 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client. {{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes. Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues: (1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary) Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary. Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work). This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14478 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8524) Avoid calling "hive" via command line in EE tests
Csaba Ringhofer created IMPALA-8524: --- Summary: Avoid calling "hive" via command line in EE tests Key: IMPALA-8524 URL: https://issues.apache.org/jira/browse/IMPALA-8524 Project: IMPALA Issue Type: Bug Components: Infrastructure Reporter: Csaba Ringhofer "hive -e SQL..." without further parameters no longer works when USE_CDP_HIVE=true (it doesn't establish a connection). Some tests used this to load data. These calls can be replaced with ImpalaTestSuite.run_stmt_in_hive() which seems like a good idea regardless of the Hive 3 efforts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8524) Avoid calling "hive" via command line in EE tests
Csaba Ringhofer created IMPALA-8524: --- Summary: Avoid calling "hive" via command line in EE tests Key: IMPALA-8524 URL: https://issues.apache.org/jira/browse/IMPALA-8524 Project: IMPALA Issue Type: Bug Components: Infrastructure Reporter: Csaba Ringhofer "hive -e SQL..." without further parameters no longer works when USE_CDP_HIVE=true (it doesn't establish a connection). Some tests used this to load data. These calls can be replaced with ImpalaTestSuite.run_stmt_in_hive() which seems like a good idea regardless of the Hive 3 efforts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8523) Migrate hdfsOpen to builder-based openFile API
Sahil Takiar created IMPALA-8523: Summary: Migrate hdfsOpen to builder-based openFile API Key: IMPALA-8523 URL: https://issues.apache.org/jira/browse/IMPALA-8523 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar When opening files via libhdfs we call {{hdfsOpen}} which ultimately calls {{FileSystem#open(Path f, int bufferSize)}}. As of HADOOP-15229, the HDFS-client now exposes a new API for opening files called {{openFile}}. The new API has a few advantages (1) it is capable of specifying file specific configuration values in a builder-based manner (see {{o.a.h.fs.FSBuilder}} for details), and (2) it can open files asynchronously (e.g. see {{o.a.h.fs.FutureDataInputStreamBuilder}} for details. The async file opens are similar to IMPALA-7738 (Implement timeouts for HDFS open calls). To avoid overlap between IMPALA-7738 and the async file opens in {{openFile}}, HADOOP-15691 can be used to check which filesystems open files asynchronously and which ones don't (currently only S3A opens files asynchronously). The main use case for the new {{openFile}} API is Impala-S3 performance. Performance benchmarks have shown that setting {{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} for Parquet files can significantly improve performance, however, this setting also adversely affects scans of non-splittable file formats such as gzipped files (see HADOOP-13203). One solution to this issue is to just document that setting {{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} for Parquet improves performance, however, a better solution would be to use the new {{openFile}} API to specify different values of fadvise depending on the file type. This work is dependent on exposing the new {{openFile}} API via libhdfs (HDFS-14478). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8523) Migrate hdfsOpen to builder-based openFile API
Sahil Takiar created IMPALA-8523: Summary: Migrate hdfsOpen to builder-based openFile API Key: IMPALA-8523 URL: https://issues.apache.org/jira/browse/IMPALA-8523 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar When opening files via libhdfs we call {{hdfsOpen}} which ultimately calls {{FileSystem#open(Path f, int bufferSize)}}. As of HADOOP-15229, the HDFS-client now exposes a new API for opening files called {{openFile}}. The new API has a few advantages (1) it is capable of specifying file specific configuration values in a builder-based manner (see {{o.a.h.fs.FSBuilder}} for details), and (2) it can open files asynchronously (e.g. see {{o.a.h.fs.FutureDataInputStreamBuilder}} for details. The async file opens are similar to IMPALA-7738 (Implement timeouts for HDFS open calls). To avoid overlap between IMPALA-7738 and the async file opens in {{openFile}}, HADOOP-15691 can be used to check which filesystems open files asynchronously and which ones don't (currently only S3A opens files asynchronously). The main use case for the new {{openFile}} API is Impala-S3 performance. Performance benchmarks have shown that setting {{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} for Parquet files can significantly improve performance, however, this setting also adversely affects scans of non-splittable file formats such as gzipped files (see HADOOP-13203). One solution to this issue is to just document that setting {{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} for Parquet improves performance, however, a better solution would be to use the new {{openFile}} API to specify different values of fadvise depending on the file type. This work is dependent on exposing the new {{openFile}} API via libhdfs (HDFS-14478). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-8521) Lots of "unreleased ByteBuffers allocated by read()" errors from HDFS client
[ https://issues.apache.org/jira/browse/IMPALA-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835635#comment-16835635 ] Sahil Takiar commented on IMPALA-8521: -- [~tarmstrong] do you have an easy way to re-produce this? Do you have the stderr / stdout of the impalads? libhdfs prints directly to stderr / stdout so it may have printed out some error messages there. I'm not sure what is causing the cache read to fail, hoping the stderr / stdout shows something. {{but there are still unreleased ByteBuffers allocated by read()}} is printed by the hdfs-client when we do a zero-copy read (HDFS-4953), but don't call {{releaseBuffer}} on the returned {{ByteBuffer}}. Looks like there are two issues (1) cache reads are failing for some unknown reason, and (2) it seems like {{hadoopReadZero}} in libhdfs can succeed in calling {{HasEnhancedByteBufferAccess#read}} but fail afterwards, and forget to call {{releaseBuffer}} on the buffer returned by {{#read}}. > Lots of "unreleased ByteBuffers allocated by read()" errors from HDFS client > > > Key: IMPALA-8521 > URL: https://issues.apache.org/jira/browse/IMPALA-8521 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Critical > > I'm looking at some job logs and seeing a bunch of errors like this. I don't > know if it's benign or if it's something more serious. > {noformat} > I0507 07:34:53.934693 20195 scan-range.cc:607] > dd4d6eb8d2ad9587:6b44fe1b0002] Cache read failed for scan range: > file=hdfs://localhost:20500/test-warehouse/f861f1a3/nation.tbl disk_id=0 > offset=1024 exclusive_hdfs_fh=0xec09220 num_remote_bytes=0 cancel_status= > buffer_queue=0 num_buffers_in_readers=0 unused_iomgr_buffers=0 > unused_iomgr_buffer_bytes=0 blocked_on_buffer=0. Switching to disk read path. > W0507 07:34:53.934787 20195 DFSInputStream.java:668] > dd4d6eb8d2ad9587:6b44fe1b0002] closing file > /test-warehouse/f861f1a3/nation.tbl, but there are still unreleased > ByteBuffers allocated by read(). Please release > java.nio.DirectByteBufferR[pos=1024 lim=2048 cap=2199]. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-4018) Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... FORMAT )
[ https://issues.apache.org/jira/browse/IMPALA-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835630#comment-16835630 ] Matthias commented on IMPALA-4018: -- [~grahn], [~gaborkaszab]: Thanks for the good offline discussion and the clarification on why you would like to have _only_ the SQL-style date pattern for the `CAST` function. In short: The `TO_TIMESTAMP` and `FROM_TIMESTAMP` should be considered as the "old" APIs (and we might in the future to either deprecate them or update them to support the SQL-style) and the `CAST` API should be considered as the "new" one. Having both at the same time allows people to seamlessly migrate from one to the other. In addition we also have to consider tool vendors that already have implemented the "old" API (and might be unwilling to change that one) and ensure that they continue to work properly. > Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... > FORMAT ) > > > Key: IMPALA-4018 > URL: https://issues.apache.org/jira/browse/IMPALA-4018 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Affects Versions: Impala 2.2.4 >Reporter: Greg Rahn >Assignee: Gabor Kaszab >Priority: Critical > Labels: ansi-sql, compatibility, sql-language > > *Summary* > The format masks/templates for currently are implemented using the [Java > SimpleDateFormat > patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html], > and although this is what Hive has implemented, it is not what most standard > SQL systems implement. For example see > [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm], > > [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html], > > [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212], > and > [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE]. > > *Examples of incompatibilities* > {noformat} > -- PostgreSQL/Netezza/Vertica/Oracle > select to_timestamp('May 15, 2015 12:00:00', 'mon dd, hh:mi:ss'); > -- Impala > select to_timestamp('May 15, 2015 12:00:00', 'MMM dd, HH:mm:ss'); > -- PostgreSQL/Netezza/Vertica/Oracle > select to_timestamp('2015-02-14 20:19:07','-mm-dd hh24:mi:ss'); > -- Impala > select to_timestamp('2015-02-14 20:19:07','-MM-dd HH:mm:ss'); > -- Vertica/Oracle > select to_timestamp('2015-02-14 20:19:07.123456','-mm-dd hh24:mi:ss.ff'); > -- Impala > select to_timestamp('2015-02-14 20:19:07.123456','-MM-dd > HH:mm:ss.SS'); > {noformat} > *Considerations* > Because this is a change in default behavior for to_timestamp(), if possible, > having a feature flag to revert to the legacy Java SimpleDateFormat patterns > should be strongly considered. This would allow users to chose the behavior > they desire and scope it to a session if need be. > SQL:2016 defines the following datetime templates > {noformat} > ::= > { }... > ::= > > | > ::= > > | > | > | > | > | > | > | > | > | > | > | > | > | > ::= > > | > | > | > | > | > | > | > ::= > | YYY | YY | Y > ::= > | RR > ::= > MM > ::= > DD > ::= > DDD > ::= > HH | HH12 > ::= > HH24 > ::= > MI > ::= > SS > ::= > S > ::= > FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9 > ::= > A.M. | P.M. > ::= > TZH > ::= > TZM > {noformat} > SQL:2016 also introduced the FORMAT clause for CAST which is the standard way > to do string <> datetime conversions > {noformat} > ::= > CAST >AS > [ FORMAT ] > > ::= > > | > ::= > > | > ::= > > {noformat} > For example: > {noformat} > CAST( AS [FORMAT ]) > CAST( AS [FORMAT ]) > cast(dt as string format 'DD-MM-') > cast('01-05-2017' as date format 'DD-MM-') > {noformat} > *Update* > Here is the proposal for the new datetime patterns and their semantics: > https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3381) Impala to support AM/PM format in unix_timestamp and from_unixtime
[ https://issues.apache.org/jira/browse/IMPALA-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835475#comment-16835475 ] Gabor Kaszab commented on IMPALA-3381: -- Note, https://issues.apache.org/jira/browse/IMPALA-4018 that I'm currently actively working on introduces ISO:SQL:2016 compliant datetime patterns that would also include AM/PM. It won't come by default for to_timestamp() and similar functions but will be a feature flag to switch the pattern handling to the new approach for currently existing functions. Note2, users might have to rewrite their patterns to use the new handling (that includes AM/PM) Just a ew examples: - minute token will be case insensitive "MI" instead of lowercase "mm" - "hh24" will serve as 0-23 hours while case insensitive "hh" will accept values 1-12. - For a total list of tokens see the proposal doc under the linked Jira. However, I feel that it will be no point implementing this Jira to extend the old way of pattern handling as we would like everyone to move to the ISO:SQL compliant handling eventually. Are all the participants comfortable of closing this as a duplicate? > Impala to support AM/PM format in unix_timestamp and from_unixtime > -- > > Key: IMPALA-3381 > URL: https://issues.apache.org/jira/browse/IMPALA-3381 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 2.5.0 >Reporter: Yibing Shi >Assignee: Attila Jeges >Priority: Minor > Labels: built-in-function > > Need a way to support below format in {{unix_timestamp}}, and > {{from_unixtime}} if possible: > {noformat} > 20151214 05:15:00.1234 AM > 20151214 05:15:00.1234 PM > {noformat} > An example: > {noformat} > unix_timestamp('20151214 05:15:00.1234 PM', 'MMdd HH:mm:ss.SSS PP') > {noformat} > Of course we can choose a different format, only if we can interpret the > AM/PM part of a date time string. > *The function need to support up to nanosecond* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org