[jira] [Created] (IMPALA-5627) Various dropped statuses in HDFS writers

2017-07-06 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-5627:
-

 Summary: Various dropped statuses in HDFS writers
 Key: IMPALA-5627
 URL: https://issues.apache.org/jira/browse/IMPALA-5627
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.10.0
Reporter: Tim Armstrong
Priority: Critical


As part of IMPALA-2615 I found various places where the return values of these 
functions were dropped:
Flush()
WriteFileHeader()
CreateCompressor()
AppendRow()
MaterializeStatsValues()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IMPALA-5626) Consider allowing users to set starting day of week to Sunday for date/time functions

2017-07-06 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-5626:
-

 Summary: Consider allowing users to set starting day of week to 
Sunday for date/time functions
 Key: IMPALA-5626
 URL: https://issues.apache.org/jira/browse/IMPALA-5626
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Tim Armstrong
Assignee: Greg Rahn
Priority: Minor


Some timestamp functions assume that the starting day of the week is Monday, 
but that may not be true according to convention in all locales.

This came up on a user forum: 
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Is-there-an-option-to-set-stating-day-of-the-week-to-be-Sunday/m-p/56859#M3173



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (IMPALA-3504) function for current timestamp in UTC, i.e. utc_timestamp()

2017-07-06 Thread Bikramjeet Vig (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig resolved IMPALA-3504.

Resolution: Fixed

> function for current timestamp in UTC, i.e. utc_timestamp()
> ---
>
> Key: IMPALA-3504
> URL: https://issues.apache.org/jira/browse/IMPALA-3504
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: kuduser
>Assignee: Bikramjeet Vig
>Priority: Minor
>  Labels: built-in-function, ramp-up
>
> Impala badly needs a way to generate a UTC timestamp.
> Current there does not appear to be such a way.
> unix_timestamp() does not actually return a timestamp, but a epoch time as an 
> integer.
> Trying to convert this to a timestamp using cast() or from_unixtime() fail 
> because they both convert to Local time.
> This could be implemented either as a timezone argument to now() or 
> from_unixtime(), so we can ask for a timestamp in UTC.
> Yes, there is a to_utc_timestamp() function, but that requires you to specify 
> the timezone of the timestamp you are converting from. 
> I just want something like current_utctimestamp().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IMPALA-5625) stress test: collect profiles for timed out or errored queries

2017-07-06 Thread Matthew Mulder (JIRA)
Matthew Mulder created IMPALA-5625:
--

 Summary: stress test: collect profiles for timed out or errored 
queries
 Key: IMPALA-5625
 URL: https://issues.apache.org/jira/browse/IMPALA-5625
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 2.9.0
Reporter: Matthew Mulder
Assignee: Matthew Mulder
Priority: Minor


The stress test currently collects the profile for queries that exceed memory 
limits. It would be useful to also collect profiles for queries that time out 
or get an error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IMPALA-5624) ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process

2017-07-06 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-5624:
-

 Summary: ProcessStateInfo::ReadProcFileDescriptorInfo() should not 
fork a process
 Key: IMPALA-5624
 URL: https://issues.apache.org/jira/browse/IMPALA-5624
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.10.0
Reporter: Tim Armstrong


Forking processes from the Impala daemon after startup is problematic because 
of the spike in virtual memory it causes (see IMPALA-2294). We should avoid 
doing this in ProcessStateInfo::ReadProcFileDescriptorInfo(), which is invoked 
from the web server debug pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (IMPALA-5611) KuduPartitionExpr holds onto memory unnecessarily

2017-07-06 Thread Thomas Tauber-Marshall (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-5611.

   Resolution: Fixed
Fix Version/s: Impala 2.10.0

commit 4e17839033f931f98e0c3ec46d99b250b0bb4660
Author: Thomas Tauber-Marshall 
Date:   Fri Jun 30 12:00:08 2017 -0700

IMPALA-5611: KuduPartitionExpr holds onto memory unnecessarily

IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes
it to the Kudu client to determine what partitionit belongs to.

The DataStreamSender never frees the local allocations for the Kudu
partition exprs causing it to hang on to memory longer than it needs to.

This patch also fixes two other related issues:
- DataStreamSender was dropping the Status from AddRow in the Kudu
  branch. Adds 'RETURN_IF_ERROR' and 'WARN_UNUSED_RESULT'
- Changes the HASH case in DataStreamSender to call FreeLocalAllocations
  on a per-batch basis, instead of a per-row basis.

Testing:
- Added an e2e test that runs a large insert with a mem limit that
  failed with oom previously.

Change-Id: Ia661eb8bed114070728a1497ccf7ed6893237e5e
Reviewed-on: http://gerrit.cloudera.org:8080/7346
Reviewed-by: Dan Hecht 
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins

> KuduPartitionExpr holds onto memory unnecessarily
> -
>
> Key: IMPALA-5611
> URL: https://issues.apache.org/jira/browse/IMPALA-5611
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Critical
> Fix For: Impala 2.10.0
>
>
> IMPALA-3742 introduced KuduPartitionExpr, which takes a row and passes it to 
> the Kudu client to determine what partition it belongs to.
> KuduPartitionExpr never calls ScalarExprEvaluator::FreeLocalAllocations, 
> causing it to hang on to memory longer than it needs it.
> Since we only need the value of the row for the call into the Kudu client, we 
> can call FreeLocalAllocations after that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IMPALA-5623) lag() on STRING cols may hold memory until query end

2017-07-06 Thread Matthew Jacobs (JIRA)
Matthew Jacobs created IMPALA-5623:
--

 Summary: lag() on STRING cols may hold memory until query end
 Key: IMPALA-5623
 URL: https://issues.apache.org/jira/browse/IMPALA-5623
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.8.0
Reporter: Matthew Jacobs
Assignee: Matthew Jacobs


IMPALA-4120 fixed an issue where lead/lag was potentially operating on memory 
that the UDA didn't own, resulting in potentially wrong results. As part of 
that fix, lead and lag started allocating memory in Init() which needs to be 
freed in Serialize() or Finalize(), but only lead was updated to free the 
memory. This memory is eventually freed when the fragment is torn down, but as 
a result of not freeing the memory in Serialize or Finalize, the memory may be 
allocated longer than necessary.

A warning is printed when this happens:
{quote}
[localhost:21000] > select concat(' foo ', lag(string_col,1,NULL) over 
(partition by bool_col order by id)) from functional.alltypestiny order by id;
Query: select concat(' foo ', lag(string_col,1,NULL) over (partition by 
bool_col order by id)) from functional.alltypestiny order by id
Query submitted at: 2017-07-06 13:56:24 (Coordinator: 
http://mj-desktop.ca.cloudera.com:25000)
Query progress can be monitored at: 
http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=124dfe18a6cee76a:fafdea40
++
| concat(' foo ', lag(string_col, 1, null) over (partition by bool_col order by 
id asc)) |
++
| NULL  
 |
| NULL  
 |
|  foo 0
 |
|  foo 1
 |
|  foo 0
 |
|  foo 1
 |
|  foo 0
 |
|  foo 1
 |
++
WARNINGS: UDF WARNING: Memory leaked via FunctionContext::Allocate() or 
FunctionContext::AllocateLocal()

Fetched 8 row(s) in 0.12s
{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (IMPALA-4687) Get Impala working against HBase 2.0 APIs

2017-07-06 Thread Joe McDonnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4687.
---
   Resolution: Fixed
Fix Version/s: Impala 2.10.0

commit 50071597a61eefc5afc7a5ce4db56a343172ea44
Author: Joe McDonnell 
Date:   Thu Jun 22 17:06:49 2017 -0700

IMPALA-4687: Get Impala working against HBase 2.0

This changes Impala code to tolerate the API differences
between HBase 1.0 and HBase 2.0. It also drops
compatibility code for older HBase versions.

Specific changes:
1. Tolerate return value of Scan for Scan.setCaching()
and Scan.setCacheBlocks(). This has no impact on our code.
2. HBase 2.0 eliminates the ScannerTimeoutException. The
case that previously generated the exception will now
recreate the scanner, so it is not necessary for our code
to recreate the scanner. Short-circuit
HandleResultScannerTimeout on HBase 2.0.
3. HBase 2.0 eliminates the Put.add(), which has been
replaced with Put.addColumn(). This API exists in
HBase 1.0, so it is safe to switch this completely.

This was tested by verifying that an HBase 2.0 cluster
starts up.

Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27
Reviewed-on: http://gerrit.cloudera.org:8080/7277
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins


> Get Impala working against HBase 2.0 APIs
> -
>
> Key: IMPALA-4687
> URL: https://issues.apache.org/jira/browse/IMPALA-4687
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Tim Armstrong
>Assignee: Joe McDonnell
> Fix For: Impala 2.10.0
>
>
> Currently Impala builds against against the HBase 2.0 APIs but crashes 
> immediately upon startup when returning an error from 
> HBaseTableScanner::Init() because it tries to find some methods that aren't 
> present in the APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IMPALA-5622) Ensure test coverage for spilling disabled for all spilling operators

2017-07-06 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-5622:
-

 Summary: Ensure test coverage for spilling disabled for all 
spilling operators
 Key: IMPALA-5622
 URL: https://issues.apache.org/jira/browse/IMPALA-5622
 Project: IMPALA
  Issue Type: Test
  Components: Infrastructure
Reporter: Tim Armstrong
Assignee: Tim Armstrong


We should ensure that we have test coverage for disabling spilling. There are 
two dimensions. I don't think we need coverage of the whole matrix, but we need 
coverage along both dimensions.

Query operators:
* Agg - NOT COVERED
* Hash join - NOT COVERED (although will be with IMPALA-5570)
* Sort - covered by TestScratchLimit and TestScratchDir
* Analytic - NOT COVERED

Means of disabling:
* scratch_limit = 0 - covered by TestScratchLimit
* disable_unsafe_spills = true + a table missing stats - NOT COVERED
* starting up with no scratch disks - covered by TestScratchDir



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (IMPALA-5036) Improve COUNT(*) performance of Parquet scans.

2017-07-06 Thread Taras Bobrovytsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taras Bobrovytsky resolved IMPALA-5036.
---
   Resolution: Fixed
Fix Version/s: Impala 2.10.0

> Improve COUNT(*) performance of Parquet scans.
> --
>
> Key: IMPALA-5036
> URL: https://issues.apache.org/jira/browse/IMPALA-5036
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
>Reporter: Alexander Behm
>Assignee: Taras Bobrovytsky
>  Labels: parquet, performance, ramp-up
> Fix For: Impala 2.10.0
>
>
> {code}
> select count(*) from parquet_table;
> select count(*) from parquet_table group by partition_col;
> {code}
> Impala already has a special code path for fast Parquet scans when no columns 
> are scanned and materialized, but the performance can be significantly 
> improved with a plan+execution change, as follows:
> *Execution change*
> Instead of returning empty batches until num_rows have been returned, the 
> Parquet scanner can populate a single slot with the num_rows from the Parquet 
> row groups
> *Plan change*
> The count(*) local aggregation needs to be changed to a sum(num_rows_slot) 
> aggregation.
> The final distributed plan will be:
> scan -> local agg with sum(num_rows_slot) -> merge agg sum(sum(num_rows_slot))
> This optimization is applicable where is only a count(*) and there are no 
> scan predicates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IMPALA-5621) Apply Parquet stats optimizations in conjunction with predicates against Parquet stats

2017-07-06 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created IMPALA-5621:
---

 Summary: Apply Parquet stats optimizations in conjunction with 
predicates against Parquet stats
 Key: IMPALA-5621
 URL: https://issues.apache.org/jira/browse/IMPALA-5621
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Mostafa Mokhtar
Assignee: Taras Bobrovytsky


Impala can skip processing blocks based on predicates against Parquet 
statistics, for Rowgroups that qualify the predicates use data stored in the 
Parquet statistics to speedup the query 

{code}
select count(*), max(ss_item_sk) from store_sales where where ss_item_sk > 10 
and ss_item_sk < 99; 
{code}

For RowGroups that have min(ss_item_sk) > 10 and max(ss_item_sk) the scanner 
should use the count stored in the stats opposed to evaluating each row in the 
RowGroup, same thing applies to min/max values. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (IMPALA-5470) Download page must not link to dist.apache.org

2017-07-06 Thread Jim Apple (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple resolved IMPALA-5470.
---
Resolution: Fixed

Fixed in 
https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=commit;h=3d1c7a510c4470db65c2b08b066261f396809406

> Download page must not link to dist.apache.org
> --
>
> Key: IMPALA-5470
> URL: https://issues.apache.org/jira/browse/IMPALA-5470
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Jim Apple
>
> The dist.apache.org SVN area is only intended as the staging area for the ASF 
> mirror service.
> Please do not link to files on it from download pages.
> Sigs and hashes should link to:
> https://www.apache.org/dist/incubator/impala/
> instead. And later to
> https://www.apache.org/dist/impala/
> Also the download page needs some instructions on how to check sigs or hashes.
> Please see:
> http://www.apache.org/dev/release-publishing.html#distribution_dist



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IMPALA-5620) Add restriction/authorization for query cancel option on the Web UI

2017-07-06 Thread Matyas Orhidi (JIRA)
Matyas Orhidi created IMPALA-5620:
-

 Summary: Add restriction/authorization for query cancel option on 
the Web UI
 Key: IMPALA-5620
 URL: https://issues.apache.org/jira/browse/IMPALA-5620
 Project: IMPALA
  Issue Type: Bug
Reporter: Matyas Orhidi


Currently everyone can cancel any queries on Impala Web UIs. It would be great 
to see some authorization logic for functions available on the Web UI e.g query 
cancellation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IMPALA-5619) Allow WITH clauses with UPDATEs on Kudu tables

2017-07-06 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-5619:
-

 Summary: Allow WITH clauses with UPDATEs on Kudu tables
 Key: IMPALA-5619
 URL: https://issues.apache.org/jira/browse/IMPALA-5619
 Project: IMPALA
  Issue Type: Improvement
Reporter: Balazs Jeszenszky
Priority: Minor


Currently Impala does not allow UPDATE after a WITH clause, though it would be 
a valid use case since Kudu. UPSERT seems to be supported already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)