[jira] [Created] (IMPALA-5627) Various dropped statuses in HDFS writers
Tim Armstrong created IMPALA-5627: - Summary: Various dropped statuses in HDFS writers Key: IMPALA-5627 URL: https://issues.apache.org/jira/browse/IMPALA-5627 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.10.0 Reporter: Tim Armstrong Priority: Critical As part of IMPALA-2615 I found various places where the return values of these functions were dropped: Flush() WriteFileHeader() CreateCompressor() AppendRow() MaterializeStatsValues() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-5626) Consider allowing users to set starting day of week to Sunday for date/time functions
Tim Armstrong created IMPALA-5626: - Summary: Consider allowing users to set starting day of week to Sunday for date/time functions Key: IMPALA-5626 URL: https://issues.apache.org/jira/browse/IMPALA-5626 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Tim Armstrong Assignee: Greg Rahn Priority: Minor Some timestamp functions assume that the starting day of the week is Monday, but that may not be true according to convention in all locales. This came up on a user forum: http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Is-there-an-option-to-set-stating-day-of-the-week-to-be-Sunday/m-p/56859#M3173 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-3504) function for current timestamp in UTC, i.e. utc_timestamp()
[ https://issues.apache.org/jira/browse/IMPALA-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikramjeet Vig resolved IMPALA-3504. Resolution: Fixed > function for current timestamp in UTC, i.e. utc_timestamp() > --- > > Key: IMPALA-3504 > URL: https://issues.apache.org/jira/browse/IMPALA-3504 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 2.5.0 >Reporter: kuduser >Assignee: Bikramjeet Vig >Priority: Minor > Labels: built-in-function, ramp-up > > Impala badly needs a way to generate a UTC timestamp. > Current there does not appear to be such a way. > unix_timestamp() does not actually return a timestamp, but a epoch time as an > integer. > Trying to convert this to a timestamp using cast() or from_unixtime() fail > because they both convert to Local time. > This could be implemented either as a timezone argument to now() or > from_unixtime(), so we can ask for a timestamp in UTC. > Yes, there is a to_utc_timestamp() function, but that requires you to specify > the timezone of the timestamp you are converting from. > I just want something like current_utctimestamp(). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-5625) stress test: collect profiles for timed out or errored queries
Matthew Mulder created IMPALA-5625: -- Summary: stress test: collect profiles for timed out or errored queries Key: IMPALA-5625 URL: https://issues.apache.org/jira/browse/IMPALA-5625 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 2.9.0 Reporter: Matthew Mulder Assignee: Matthew Mulder Priority: Minor The stress test currently collects the profile for queries that exceed memory limits. It would be useful to also collect profiles for queries that time out or get an error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-5624) ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process
Tim Armstrong created IMPALA-5624: - Summary: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process Key: IMPALA-5624 URL: https://issues.apache.org/jira/browse/IMPALA-5624 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.10.0 Reporter: Tim Armstrong Forking processes from the Impala daemon after startup is problematic because of the spike in virtual memory it causes (see IMPALA-2294). We should avoid doing this in ProcessStateInfo::ReadProcFileDescriptorInfo(), which is invoked from the web server debug pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-5611) KuduPartitionExpr holds onto memory unnecessarily
[ https://issues.apache.org/jira/browse/IMPALA-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall resolved IMPALA-5611. Resolution: Fixed Fix Version/s: Impala 2.10.0 commit 4e17839033f931f98e0c3ec46d99b250b0bb4660 Author: Thomas Tauber-Marshall Date: Fri Jun 30 12:00:08 2017 -0700 IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes it to the Kudu client to determine what partitionit belongs to. The DataStreamSender never frees the local allocations for the Kudu partition exprs causing it to hang on to memory longer than it needs to. This patch also fixes two other related issues: - DataStreamSender was dropping the Status from AddRow in the Kudu branch. Adds 'RETURN_IF_ERROR' and 'WARN_UNUSED_RESULT' - Changes the HASH case in DataStreamSender to call FreeLocalAllocations on a per-batch basis, instead of a per-row basis. Testing: - Added an e2e test that runs a large insert with a mem limit that failed with oom previously. Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e Reviewed-on: http://gerrit.cloudera.org:8080/7346 Reviewed-by: Dan Hecht Reviewed-by: Michael Ho Tested-by: Impala Public Jenkins > KuduPartitionExpr holds onto memory unnecessarily > - > > Key: IMPALA-5611 > URL: https://issues.apache.org/jira/browse/IMPALA-5611 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.9.0 >Reporter: Thomas Tauber-Marshall >Assignee: Thomas Tauber-Marshall >Priority: Critical > Fix For: Impala 2.10.0 > > > IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes it to > the Kudu client to determine what partition it belongs to. > KuduPartitionExpr never calls ScalarExprEvaluator::FreeLocalAllocations, > causing it to hang on to memory longer than it needs it. > Since we only need the value of the row for the call into the Kudu client, we > can call FreeLocalAllocations after that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-5623) lag() on STRING cols may hold memory until query end
Matthew Jacobs created IMPALA-5623: -- Summary: lag() on STRING cols may hold memory until query end Key: IMPALA-5623 URL: https://issues.apache.org/jira/browse/IMPALA-5623 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.8.0 Reporter: Matthew Jacobs Assignee: Matthew Jacobs IMPALA-4120 fixed an issue where lead/lag was potentially operating on memory that the UDA didn't own, resulting in potentially wrong results. As part of that fix, lead and lag started allocating memory in Init() which needs to be freed in Serialize() or Finalize(), but only lead was updated to free the memory. This memory is eventually freed when the fragment is torn down, but as a result of not freeing the memory in Serialize or Finalize, the memory may be allocated longer than necessary. A warning is printed when this happens: {quote} [localhost:21000] > select concat(' foo ', lag(string_col,1,NULL) over (partition by bool_col order by id)) from functional.alltypestiny order by id; Query: select concat(' foo ', lag(string_col,1,NULL) over (partition by bool_col order by id)) from functional.alltypestiny order by id Query submitted at: 2017-07-06 13:56:24 (Coordinator: http://mj-desktop.ca.cloudera.com:25000) Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=124dfe18a6cee76a:fafdea40 ++ | concat(' foo ', lag(string_col, 1, null) over (partition by bool_col order by id asc)) | ++ | NULL | | NULL | | foo 0 | | foo 1 | | foo 0 | | foo 1 | | foo 0 | | foo 1 | ++ WARNINGS: UDF WARNING: Memory leaked via FunctionContext::Allocate() or FunctionContext::AllocateLocal() Fetched 8 row(s) in 0.12s {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-4687) Get Impala working against HBase 2.0 APIs
[ https://issues.apache.org/jira/browse/IMPALA-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-4687. --- Resolution: Fixed Fix Version/s: Impala 2.10.0 commit 50071597a61eefc5afc7a5ce4db56a343172ea44 Author: Joe McDonnell Date: Thu Jun 22 17:06:49 2017 -0700 IMPALA-4687: Get Impala working against HBase 2.0 This changes Impala code to tolerate the API differences between HBase 1.0 and HBase 2.0. It also drops compatibility code for older HBase versions. Specific changes: 1. Tolerate return value of Scan for Scan.setCaching() and Scan.setCacheBlocks(). This has no impact on our code. 2. HBase 2.0 eliminates the ScannerTimeoutException. The case that previously generated the exception will now recreate the scanner, so it is not necessary for our code to recreate the scanner. Short-circuit HandleResultScannerTimeout on HBase 2.0. 3. HBase 2.0 eliminates the Put.add(), which has been replaced with Put.addColumn(). This API exists in HBase 1.0, so it is safe to switch this completely. This was tested by verifying that an HBase 2.0 cluster starts up. Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27 Reviewed-on: http://gerrit.cloudera.org:8080/7277 Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins > Get Impala working against HBase 2.0 APIs > - > > Key: IMPALA-4687 > URL: https://issues.apache.org/jira/browse/IMPALA-4687 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Tim Armstrong >Assignee: Joe McDonnell > Fix For: Impala 2.10.0 > > > Currently Impala builds against against the HBase 2.0 APIs but crashes > immediately upon startup when returning an error from > HBaseTableScanner::Init() because it tries to find some methods that aren't > present in the APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-5622) Ensure test coverage for spilling disabled for all spilling operators
Tim Armstrong created IMPALA-5622: - Summary: Ensure test coverage for spilling disabled for all spilling operators Key: IMPALA-5622 URL: https://issues.apache.org/jira/browse/IMPALA-5622 Project: IMPALA Issue Type: Test Components: Infrastructure Reporter: Tim Armstrong Assignee: Tim Armstrong We should ensure that we have test coverage for disabling spilling. There are two dimensions. I don't think we need coverage of the whole matrix, but we need coverage along both dimensions. Query operators: * Agg - NOT COVERED * Hash join - NOT COVERED (although will be with IMPALA-5570) * Sort - covered by TestScratchLimit and TestScratchDir * Analytic - NOT COVERED Means of disabling: * scratch_limit = 0 - covered by TestScratchLimit * disable_unsafe_spills = true + a table missing stats - NOT COVERED * starting up with no scratch disks - covered by TestScratchDir -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-5036) Improve COUNT(*) performance of Parquet scans.
[ https://issues.apache.org/jira/browse/IMPALA-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taras Bobrovytsky resolved IMPALA-5036. --- Resolution: Fixed Fix Version/s: Impala 2.10.0 > Improve COUNT(*) performance of Parquet scans. > -- > > Key: IMPALA-5036 > URL: https://issues.apache.org/jira/browse/IMPALA-5036 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0 >Reporter: Alexander Behm >Assignee: Taras Bobrovytsky > Labels: parquet, performance, ramp-up > Fix For: Impala 2.10.0 > > > {code} > select count(*) from parquet_table; > select count(*) from parquet_table group by partition_col; > {code} > Impala already has a special code path for fast Parquet scans when no columns > are scanned and materialized, but the performance can be significantly > improved with a plan+execution change, as follows: > *Execution change* > Instead of returning empty batches until num_rows have been returned, the > Parquet scanner can populate a single slot with the num_rows from the Parquet > row groups > *Plan change* > The count(*) local aggregation needs to be changed to a sum(num_rows_slot) > aggregation. > The final distributed plan will be: > scan -> local agg with sum(num_rows_slot) -> merge agg sum(sum(num_rows_slot)) > This optimization is applicable where is only a count(*) and there are no > scan predicates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-5621) Apply Parquet stats optimizations in conjunction with predicates against Parquet stats
Mostafa Mokhtar created IMPALA-5621: --- Summary: Apply Parquet stats optimizations in conjunction with predicates against Parquet stats Key: IMPALA-5621 URL: https://issues.apache.org/jira/browse/IMPALA-5621 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Mostafa Mokhtar Assignee: Taras Bobrovytsky Impala can skip processing blocks based on predicates against Parquet statistics, for Rowgroups that qualify the predicates use data stored in the Parquet statistics to speedup the query {code} select count(*), max(ss_item_sk) from store_sales where where ss_item_sk > 10 and ss_item_sk < 99; {code} For RowGroups that have min(ss_item_sk) > 10 and max(ss_item_sk) the scanner should use the count stored in the stats opposed to evaluating each row in the RowGroup, same thing applies to min/max values. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-5470) Download page must not link to dist.apache.org
[ https://issues.apache.org/jira/browse/IMPALA-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Apple resolved IMPALA-5470. --- Resolution: Fixed Fixed in https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=commit;h=3d1c7a510c4470db65c2b08b066261f396809406 > Download page must not link to dist.apache.org > -- > > Key: IMPALA-5470 > URL: https://issues.apache.org/jira/browse/IMPALA-5470 > Project: IMPALA > Issue Type: Bug >Reporter: Sebb >Assignee: Jim Apple > > The dist.apache.org SVN area is only intended as the staging area for the ASF > mirror service. > Please do not link to files on it from download pages. > Sigs and hashes should link to: > https://www.apache.org/dist/incubator/impala/ > instead. And later to > https://www.apache.org/dist/impala/ > Also the download page needs some instructions on how to check sigs or hashes. > Please see: > http://www.apache.org/dev/release-publishing.html#distribution_dist -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-5620) Add restriction/authorization for query cancel option on the Web UI
Matyas Orhidi created IMPALA-5620: - Summary: Add restriction/authorization for query cancel option on the Web UI Key: IMPALA-5620 URL: https://issues.apache.org/jira/browse/IMPALA-5620 Project: IMPALA Issue Type: Bug Reporter: Matyas Orhidi Currently everyone can cancel any queries on Impala Web UIs. It would be great to see some authorization logic for functions available on the Web UI e.g query cancellation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-5619) Allow WITH clauses with UPDATEs on Kudu tables
Balazs Jeszenszky created IMPALA-5619: - Summary: Allow WITH clauses with UPDATEs on Kudu tables Key: IMPALA-5619 URL: https://issues.apache.org/jira/browse/IMPALA-5619 Project: IMPALA Issue Type: Improvement Reporter: Balazs Jeszenszky Priority: Minor Currently Impala does not allow UPDATE after a WITH clause, though it would be a valid use case since Kudu. UPSERT seems to be supported already. -- This message was sent by Atlassian JIRA (v6.4.14#64029)