[jira] [Resolved] (IMPALA-966) Type errors are attributed to wrong expression with insert

2019-05-14 Thread Alice Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alice Fan resolved IMPALA-966.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Type errors are attributed to wrong expression with insert
> --
>
> Key: IMPALA-966
> URL: https://issues.apache.org/jira/browse/IMPALA-966
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 1.3
>Reporter: Henry Robinson
>Assignee: Alice Fan
>Priority: Minor
> Fix For: Impala 3.2.0
>
>
> The type error below belongs to the second row to be inserted ({{sqrt()}} 
> returns {{DOUBLE}}). But the obviously {{FLOAT}} first expression gets blamed 
> for the error.
> {code}
> [localhost:21000] > insert overwrite alltypesnopart_insert(float_col) 
> values(CAST(1.0 AS FLOAT)), (sqrt(-1));
> Query: insert overwrite alltypesnopart_insert(float_col) values(CAST(1.0 AS 
> FLOAT)), (sqrt(-1))
> ERROR: AnalysisException: Possible loss of precision for target table 
> 'functional.alltypesnopart_insert'.
> Expression 'cast(1.0 as float)' (type: DOUBLE) would need to be cast to FLOAT 
> for column 'float_col'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-966) Type errors are attributed to wrong expression with insert

2019-05-14 Thread Alice Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alice Fan resolved IMPALA-966.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Type errors are attributed to wrong expression with insert
> --
>
> Key: IMPALA-966
> URL: https://issues.apache.org/jira/browse/IMPALA-966
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 1.3
>Reporter: Henry Robinson
>Assignee: Alice Fan
>Priority: Minor
> Fix For: Impala 3.2.0
>
>
> The type error below belongs to the second row to be inserted ({{sqrt()}} 
> returns {{DOUBLE}}). But the obviously {{FLOAT}} first expression gets blamed 
> for the error.
> {code}
> [localhost:21000] > insert overwrite alltypesnopart_insert(float_col) 
> values(CAST(1.0 AS FLOAT)), (sqrt(-1));
> Query: insert overwrite alltypesnopart_insert(float_col) values(CAST(1.0 AS 
> FLOAT)), (sqrt(-1))
> ERROR: AnalysisException: Possible loss of precision for target table 
> 'functional.alltypesnopart_insert'.
> Expression 'cast(1.0 as float)' (type: DOUBLE) would need to be cast to FLOAT 
> for column 'float_col'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-8473) Refactor lineage publication mechanism to allow for different consumers

2019-05-14 Thread radford nguyen (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840006#comment-16840006
 ] 

radford nguyen commented on IMPALA-8473:


[~LinaAtAustin], any reason you've changed your mind?  Impala team and I 
actually prefer the interface approach.

> Refactor lineage publication mechanism to allow for different consumers
> ---
>
> Key: IMPALA-8473
> URL: https://issues.apache.org/jira/browse/IMPALA-8473
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: radford nguyen
>Assignee: radford nguyen
>Priority: Critical
> Attachments: ImpalaPostExecHook-infra.patch
>
>
> Impetus for this change is to allow lineage to be consumed by Atlas via Kafka.
> h3. Design Proposal
> Move lineage logging from be to fe, where we can make use of the same plugin 
> approach as {{authorization_provider}} to allow a downstream user to provide 
> their own lineage consumers as runtime dependencies.
> [~mad...@apache.org] has provided a fe patch (attached) with suggested 
> mechanism for allowing multiple hooks to be registered with the fe.  Hooks 
> would be invoked from the be at appropriate places, e.g. 
> [https://github.com/apache/impala/blob/c1b0a073938c144e9bf33901bd4df6dcda0f09ec/be/src/service/impala-server.cc#L466].
>   The hooks should all be executed asynchronously, so the current thinking is 
> that this execution should happen in the fe, since the be does not know about 
> what hooks are registered.  IOW, the 
> {{ImpalaPostExecHookFactory.executeHooks}} method (see patch) should probably 
> make use of a thread-pool executor service (or something similar) in order to 
> execute all hooks in parallel and in a non-blocking manner, returning to the 
> be asap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8550) Sentry refresh privileges has race conditions

2019-05-14 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-8550:

Description: 
Recently, I encountered a race condition in {{SentryProxy}}'s 
refreshSentryAuthorization loop. The race happens when Sentry server is slow to 
update its information based on changes in HMS. Consider the following scenario:
 # Impala session from user A creates a database/table.
 # AuthorizationManager will updateDatabaseOwnerPrivilege 
[here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1159]]
 Note that this add adds the user privilege in Catalog's cache out-of-band 
(without confirming that Sentry has added this privilege in its database)
 # Assume that Sentry is slow to update its database of roles/privileges. 
(Actually depending on the timing of these events, it doesn't really matter but 
likelihood of the issue increases if Sentry is slow.
 # The refreshSentryAuthorization loop is triggered based on a configured 
interval 
[here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L174]].
 Since Sentry has not yet updated its database of the owner information, this 
loop will remove the privilege from Catalog. Any subsequent SQL which requires 
privileges will fail until Sentry is synced and refresh loop adds this 
privilege again the catalog cache.

  was:
Recently, I encountered a race condition in \{{SentryProxy}}'s 
refreshSentryAuthorization loop. The race happens when Sentry server is slow to 
update its information based on changes in HMS. Consider the following scenario:
 # Impala session from user A creates a database/table.
 # AuthorizationManager will updateDatabaseOwnerPrivilege 
[here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1159]]
 Note that this add adds the user privilege in Catalog's cache out-of-band 
(without confirming that Sentry has added this privilege in its database)
 # Assume that Sentry is slow to update its database of roles/privileges. 
(Actually depending on the timing of these events, it doesn't really matter but 
likely increases if Sentry is slow.
 # The refreshSentryAuthorization loop is triggered based on a configured 
interval 
[here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L174]].
 Since Sentry has not yet updated its database of the owner information, this 
loop will remove the privilege from Catalog. Any subsequent SQL which requires 
privileges will fail until Sentry is synced and refresh loop adds this 
privilege again the catalog cache.


> Sentry refresh privileges has race conditions
> -
>
> Key: IMPALA-8550
> URL: https://issues.apache.org/jira/browse/IMPALA-8550
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Recently, I encountered a race condition in {{SentryProxy}}'s 
> refreshSentryAuthorization loop. The race happens when Sentry server is slow 
> to update its information based on changes in HMS. Consider the following 
> scenario:
>  # Impala session from user A creates a database/table.
>  # AuthorizationManager will updateDatabaseOwnerPrivilege 
> [here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1159]]
>  Note that this add adds the user privilege in Catalog's cache out-of-band 
> (without confirming that Sentry has added this privilege in its database)
>  # Assume that Sentry is slow to update its database of roles/privileges. 
> (Actually depending on the timing of these events, it doesn't really matter 
> but likelihood of the issue increases if Sentry is slow.
>  # The refreshSentryAuthorization loop is triggered based on a configured 
> interval 
> [here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L174]].
>  Since Sentry has not yet updated its database of the owner information, this 
> loop will remove the privilege from Catalog. Any subsequent SQL which 
> requires privileges will fail until Sentry is synced and refresh loop adds 
> this privilege again the catalog cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8550) Sentry refresh privileges has race conditions

2019-05-14 Thread Fredy Wijaya (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839908#comment-16839908
 ] 

Fredy Wijaya commented on IMPALA-8550:
--

Yeah, this is a known issue with the Sentry object ownership implementation.

> Sentry refresh privileges has race conditions
> -
>
> Key: IMPALA-8550
> URL: https://issues.apache.org/jira/browse/IMPALA-8550
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Recently, I encountered a race condition in \{{SentryProxy}}'s 
> refreshSentryAuthorization loop. The race happens when Sentry server is slow 
> to update its information based on changes in HMS. Consider the following 
> scenario:
>  # Impala session from user A creates a database/table.
>  # AuthorizationManager will updateDatabaseOwnerPrivilege 
> [here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1159]]
>  Note that this add adds the user privilege in Catalog's cache out-of-band 
> (without confirming that Sentry has added this privilege in its database)
>  # Assume that Sentry is slow to update its database of roles/privileges. 
> (Actually depending on the timing of these events, it doesn't really matter 
> but likely increases if Sentry is slow.
>  # The refreshSentryAuthorization loop is triggered based on a configured 
> interval 
> [here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L174]].
>  Since Sentry has not yet updated its database of the owner information, this 
> loop will remove the privilege from Catalog. Any subsequent SQL which 
> requires privileges will fail until Sentry is synced and refresh loop adds 
> this privilege again the catalog cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8550) Sentry refresh privileges has race conditions

2019-05-14 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839901#comment-16839901
 ] 

Vihang Karajgaonkar commented on IMPALA-8550:
-

The easiest way to reproduce this race is to turn on {{test_owner_privileges}} 
on HMS-3 environment.

> Sentry refresh privileges has race conditions
> -
>
> Key: IMPALA-8550
> URL: https://issues.apache.org/jira/browse/IMPALA-8550
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Recently, I encountered a race condition in \{{SentryProxy}}'s 
> refreshSentryAuthorization loop. The race happens when Sentry server is slow 
> to update its information based on changes in HMS. Consider the following 
> scenario:
>  # Impala session from user A creates a database/table.
>  # AuthorizationManager will updateDatabaseOwnerPrivilege 
> [here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1159]]
>  Note that this add adds the user privilege in Catalog's cache out-of-band 
> (without confirming that Sentry has added this privilege in its database)
>  # Assume that Sentry is slow to update its database of roles/privileges. 
> (Actually depending on the timing of these events, it doesn't really matter 
> but likely increases if Sentry is slow.
>  # The refreshSentryAuthorization loop is triggered based on a configured 
> interval 
> [here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L174]].
>  Since Sentry has not yet updated its database of the owner information, this 
> loop will remove the privilege from Catalog. Any subsequent SQL which 
> requires privileges will fail until Sentry is synced and refresh loop adds 
> this privilege again the catalog cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8550) Sentry refresh privileges has race conditions

2019-05-14 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created IMPALA-8550:
---

 Summary: Sentry refresh privileges has race conditions
 Key: IMPALA-8550
 URL: https://issues.apache.org/jira/browse/IMPALA-8550
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar


Recently, I encountered a race condition in \{{SentryProxy}}'s 
refreshSentryAuthorization loop. The race happens when Sentry server is slow to 
update its information based on changes in HMS. Consider the following scenario:
 # Impala session from user A creates a database/table.
 # AuthorizationManager will updateDatabaseOwnerPrivilege 
[here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1159]]
 Note that this add adds the user privilege in Catalog's cache out-of-band 
(without confirming that Sentry has added this privilege in its database)
 # Assume that Sentry is slow to update its database of roles/privileges. 
(Actually depending on the timing of these events, it doesn't really matter but 
likely increases if Sentry is slow.
 # The refreshSentryAuthorization loop is triggered based on a configured 
interval 
[here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L174]].
 Since Sentry has not yet updated its database of the owner information, this 
loop will remove the privilege from Catalog. Any subsequent SQL which requires 
privileges will fail until Sentry is synced and refresh loop adds this 
privilege again the catalog cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8550) Sentry refresh privileges has race conditions

2019-05-14 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created IMPALA-8550:
---

 Summary: Sentry refresh privileges has race conditions
 Key: IMPALA-8550
 URL: https://issues.apache.org/jira/browse/IMPALA-8550
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar


Recently, I encountered a race condition in \{{SentryProxy}}'s 
refreshSentryAuthorization loop. The race happens when Sentry server is slow to 
update its information based on changes in HMS. Consider the following scenario:
 # Impala session from user A creates a database/table.
 # AuthorizationManager will updateDatabaseOwnerPrivilege 
[here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1159]]
 Note that this add adds the user privilege in Catalog's cache out-of-band 
(without confirming that Sentry has added this privilege in its database)
 # Assume that Sentry is slow to update its database of roles/privileges. 
(Actually depending on the timing of these events, it doesn't really matter but 
likely increases if Sentry is slow.
 # The refreshSentryAuthorization loop is triggered based on a configured 
interval 
[here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L174]].
 Since Sentry has not yet updated its database of the owner information, this 
loop will remove the privilege from Catalog. Any subsequent SQL which requires 
privileges will fail until Sentry is synced and refresh loop adds this 
privilege again the catalog cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7288) Codegen crash in FinalizeModule()

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839842#comment-16839842
 ] 

ASF subversion and git services commented on IMPALA-7288:
-

Commit aea18dd08f34caf5c659c4b71f7bc4d70d743739 in impala's branch 
refs/heads/2.x from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=aea18dd ]

IMPALA-7288: Fix Codegen Crash in FinalizeModule() (Addendum)

In addition to previous fix for IMPALA-7288, this patch would prevent
impala from crashing in case a code-path generates a malformed
handcrafted function which it then tries to finalize. Ideally this
would never happen since the code paths for generating handcrafted IRs
would never generate a malformed function.

Change-Id: Id09c6f59f677ba30145fb2081715f1a7d89fe20b
Reviewed-on: http://gerrit.cloudera.org:8080/10944
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Codegen crash in FinalizeModule()
> -
>
> Key: IMPALA-7288
> URL: https://issues.apache.org/jira/browse/IMPALA-7288
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Balazs Jeszenszky
>Assignee: Bikramjeet Vig
>Priority: Blocker
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> The following sequence crashes Impala 2.12 reliably:
> {code}
> CREATE TABLE test (c1 CHAR(6),c2 CHAR(6));
> select 1 from test t1, test t2
> where t1.c1 = FROM_TIMESTAMP(cast(t2.c2 as string), 'MMdd');
> {code}
> hs_err_pid has:
> {code}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x03b36ce4, pid=28459, tid=0x7f2c49685700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 
> 1.8.0_162-b12)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [impalad+0x3736ce4]  llvm::Value::getContext() const+0x4
> {code}
> Backtrace is:
> {code}
> #0  0x7f2cb217a5f7 in raise () from /lib64/libc.so.6
> #1  0x7f2cb217bce8 in abort () from /lib64/libc.so.6
> #2  0x7f2cb4de2f35 in os::abort(bool) () from 
> /usr/java/latest/jre/lib/amd64/server/libjvm.so
> #3  0x7f2cb4f86f33 in VMError::report_and_die() () from 
> /usr/java/latest/jre/lib/amd64/server/libjvm.so
> #4  0x7f2cb4de922f in JVM_handle_linux_signal () from 
> /usr/java/latest/jre/lib/amd64/server/libjvm.so
> #5  0x7f2cb4ddf253 in signalHandler(int, siginfo*, void*) () from 
> /usr/java/latest/jre/lib/amd64/server/libjvm.so
> #6  
> #7  0x03b36ce4 in llvm::Value::getContext() const ()
> #8  0x03b36cff in llvm::Value::getValueName() const ()
> #9  0x03b36de9 in llvm::Value::getName() const ()
> #10 0x01ba6bb2 in impala::LlvmCodeGen::FinalizeModule (this=0x9b53980)
> at 
> /usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/codegen/llvm-codegen.cc:1076
> #11 0x018f5c0f in impala::FragmentInstanceState::Open (this=0xac0b400)
> at 
> /usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/runtime/fragment-instance-state.cc:255
> #12 0x018f3699 in impala::FragmentInstanceState::Exec (this=0xac0b400)
> at 
> /usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/runtime/fragment-instance-state.cc:80
> #13 0x019028c3 in impala::QueryState::ExecFInstance (this=0x9c6ad00, 
> fis=0xac0b400)
> at 
> /usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/runtime/query-state.cc:410
> #14 0x0190113c in impala::QueryStateoperator()(void) 
> const (__closure=0x7f2c49684be8)
> at 
> /usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/runtime/query-state.cc:350
> #15 0x019034dd in 
> boost::detail::function::void_function_obj_invoker0,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...)
> at 
> /usr/src/debug/impala-2.12.0-cdh5.15.0/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> {code}
> Crash is at 
> https://github.com/cloudera/Impala/blob/cdh5-2.12.0_5.15.0/be/src/codegen/llvm-codegen.cc#L1070-L1079.
> The repro steps seem to be quite specific.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6086) Use of permanent function should require SELECT privilege on DB

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839838#comment-16839838
 ] 

ASF subversion and git services commented on IMPALA-6086:
-

Commit 2e720ace8b285ae6a3b6b5ebc63dcfd04a763ca1 in impala's branch 
refs/heads/2.x from Zoram Thanga
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2e720ac ]

IMPALA-6086: Use of permanent function should require SELECT privilege
on DB

To use a permanent UDF should require at least SELECT privilege on the
database. Functions that have constant arguments get constant-folded
into string literals, losing their privilege requests in the process.

This patch saves the privilege requests found during the first phase
of query analysis, where all the objects and the privileges required
to access them are identified. The requests are added back to the
new analyzer created for re-analysis post expression rewrite.

Testing:
New FE test cases have been added to AuthorizationStmtTest.

Manual tests were also done to identify the bug, as well as to test
the fix.

Ran exhaustive and covering tests.

Change-Id: Iee70f15e4c04f7daaed9cac2400ec626e1fb0e57
Reviewed-on: http://gerrit.cloudera.org:8080/10850
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Use of permanent function should require SELECT privilege on DB
> ---
>
> Key: IMPALA-6086
> URL: https://issues.apache.org/jira/browse/IMPALA-6086
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Security
>Affects Versions: Impala 2.9.0, Impala 3.1.0
>Reporter: Zoram Thanga
>Assignee: Zoram Thanga
>Priority: Minor
>  Labels: security
> Fix For: Impala 3.1.0
>
>
> A user that has no privilege on a database should not be able to execute any 
> permanent functions in that database. This is currently possible, and should 
> be fixed, so that the user must have SELECT privilege to execute permanent 
> functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8072) Clean up config files in docker containers

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839844#comment-16839844
 ] 

ASF subversion and git services commented on IMPALA-8072:
-

Commit d12675af59f2ac74db4fae09d41b720bfd72fe4b in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d12675a ]

IMPALA-8072: addendum: don't require fe rebuild for config

Previously config changes wouldn't be picked up by containers until
maven copied the files from fe/src/test/resources to
fe/target/test-classes. This makes it more convenient - after running
./bin/create-test-configuration.sh new configs are picked up by
any newly-run containers.

Change-Id: I18f9f90667b1d16cf97d3e3f9fac400980d5b733
Reviewed-on: http://gerrit.cloudera.org:8080/13288
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Clean up config files in docker containers
> --
>
> Key: IMPALA-8072
> URL: https://issues.apache.org/jira/browse/IMPALA-8072
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: docker
> Fix For: Impala 3.3.0
>
>
> Currently the docker containers include a bunch of config files copied 
> indiscriminately from the dev environment. Mostly these aren't valid for a 
> production container and it's expected that the real config files will be 
> mounted at /opt/impala/conf.
> We should instead include a more reasonable set of default configs (e.g. for 
> admission control), plus placeholders for other config files that may need to 
> be overridden with site-specific configs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7201) Support DDL in LocalCatalog using existing catalogd

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839837#comment-16839837
 ] 

ASF subversion and git services commented on IMPALA-7201:
-

Commit 3afde5d99e7bc434358b813457d83cac4a6f086c in impala's branch 
refs/heads/2.x from Todd Lipcon
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3afde5d ]

IMPALA-7201. Support DDL with LocalCatalog enabled

This fixes a couple issues with DDL commands when LocalCatalog is
enabled:

- updateCatalogCache() gets called after any DDL. Instead of throwing an
  exception, we can just no-op this by returning some fake result.

- In order to support 'drop database' we need to properly implement the
  various function-related calls such that they don't throw exceptions.
  This changes them to be stubbed out as having no functions.

- Fixes for 'alter view' and 'drop view' so that the underlying target
  table gets loaded by the catalogd before attempting the operation.
  Without this, in the LocalCatalog case, the catalogd would only have
  an IncompleteTable and these operations would fail with "unexpected
  table type" errors.

With this patch I was able to run 'run-tests.py -k views' and 3/4
passed. The one that failed depends on HBase tables, not yet
implemented.

Change-Id: Ic39c97a5f5ad145e03b96d1a470dc2dfa6ec71a5
Reviewed-on: http://gerrit.cloudera.org:8080/10806
Reviewed-by: Todd Lipcon 
Tested-by: Todd Lipcon 


> Support DDL in LocalCatalog using existing catalogd
> ---
>
> Key: IMPALA-7201
> URL: https://issues.apache.org/jira/browse/IMPALA-7201
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Need some changes to ensure that create table, create view, drop view, etc, 
> can work. The initial implementation will still RPC out to catalogd, which 
> will perform the mutations. At some point we may want to move this work to 
> the impalad itself, but for now keeping the code with as little change as 
> possible is preferred.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7228) Add tpcds-unmodified to single-node-perf-run

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839839#comment-16839839
 ] 

ASF subversion and git services commented on IMPALA-7228:
-

Commit 43e54501cece5e4d4a2f8c483465dd81d8b6a115 in impala's branch 
refs/heads/2.x from njanarthanan
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=43e5450 ]

IMPALA-7228: Add tpcds-unmodified to single-node-perf-run

Description:
tpcds-unmodified workload was added as a part of IMPALA-6819.
This change allows tpcds-unmodified workload to be available
for the single node perf run.

Testing:
Ran single node perf run using the following parameters and the
test run was successful

--iterations 2 --scale 2 --table_formats "parquet/none" \
--num_impalads 1 --workload "tpcds-unmodified" \
--load --query_names "TPCDS-Q17.*" --start_minicluster

Change-Id: I511661c586cd55e3240ccbea9c499b9c3fc98440
Reviewed-on: http://gerrit.cloudera.org:8080/10931
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins 


> Add tpcds-unmodified to single-node-perf-run
> 
>
> Key: IMPALA-7228
> URL: https://issues.apache.org/jira/browse/IMPALA-7228
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Affects Versions: Impala 3.1.0
>Reporter: Jim Apple
>Assignee: nithya
>Priority: Minor
>  Labels: newbie
>
> IMPALA-6819 added the tpcds-unmodified workload. This doesn't work with 
> single-node-perf-run yet:
> {noformat}
> Traceback (most recent call last):
>   File "./bin/single_node_perf_run.py", line 334, in 
> main()
>   File "./bin/single_node_perf_run.py", line 324, in main
> perf_ab_test(options, args)
>   File "./bin/single_node_perf_run.py", line 231, in perf_ab_test
> datasets = set([WORKLOAD_TO_DATASET[workload] for workload in workloads])
> KeyError: 'tpcds-unmodified'
> {noformat}
> cc: [~njanarthanan]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7140) Build out support for HDFS tables and views in LocalCatalog

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839836#comment-16839836
 ] 

ASF subversion and git services commented on IMPALA-7140:
-

Commit cb4755421b3437808037feca5c29d95f446aab93 in impala's branch 
refs/heads/2.x from Todd Lipcon
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cb47554 ]

IMPALA-7140 (part 8): support views in LocalCatalog

This adds basic support for loading views in LocalCatalog. Tested with a
small unit test and also verified from the shell that I can select from
a view.

Change-Id: Ib3516b9ceff6dce12ded68d93afde09728627e08
Reviewed-on: http://gerrit.cloudera.org:8080/10805
Tested-by: Impala Public Jenkins 
Reviewed-by: Todd Lipcon 


> Build out support for HDFS tables and views in LocalCatalog
> ---
>
> Key: IMPALA-7140
> URL: https://issues.apache.org/jira/browse/IMPALA-7140
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> This subtask tracks the work to build out basic read-only support for HDFS 
> tables and views in the LocalCatalog implementation:
> - loading table schemas
> - loading partitions
> - loading file information from HDFS
> This work will be broken up into a number of patches to keep each piece 
> reviewable. Once this subtask is complete we should be able to plan most 
> simple read-only queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6819) Add new performance test workloads

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839840#comment-16839840
 ] 

ASF subversion and git services commented on IMPALA-6819:
-

Commit 43e54501cece5e4d4a2f8c483465dd81d8b6a115 in impala's branch 
refs/heads/2.x from njanarthanan
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=43e5450 ]

IMPALA-7228: Add tpcds-unmodified to single-node-perf-run

Description:
tpcds-unmodified workload was added as a part of IMPALA-6819.
This change allows tpcds-unmodified workload to be available
for the single node perf run.

Testing:
Ran single node perf run using the following parameters and the
test run was successful

--iterations 2 --scale 2 --table_formats "parquet/none" \
--num_impalads 1 --workload "tpcds-unmodified" \
--load --query_names "TPCDS-Q17.*" --start_minicluster

Change-Id: I511661c586cd55e3240ccbea9c499b9c3fc98440
Reviewed-on: http://gerrit.cloudera.org:8080/10931
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins 


> Add new performance test workloads 
> ---
>
> Key: IMPALA-6819
> URL: https://issues.apache.org/jira/browse/IMPALA-6819
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: nithya
>Assignee: nithya
>Priority: Major
>
> Add additional workloads to impala-asf rep
> Workloads that will be added
> {code:java}
> [targeted-perf]
> [tpcds-unmodified]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6810) query_test::test_runtime_filters.py::test_row_filters fails when run against an external cluster

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839841#comment-16839841
 ] 

ASF subversion and git services commented on IMPALA-6810:
-

Commit d0a6239be30b64f1193394236de0564bebe9f696 in impala's branch 
refs/heads/2.x from Michael Brown
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d0a6239 ]

IMPALA-6810: runtime_row_filters.test: omit pool name in pattern

Some downstream tests run this with a fair-scheduler.xml set that, while
not changing admission control behavior, does change the name of the
pool. Omit the pool name to permit that downstream test to succeed.

Testing:
- local with change in minicluster
- downstream in environment as well

Change-Id: I3fe6beb169dc6bfefabde9dc7a4632c1a5e63fa7
Reviewed-on: http://gerrit.cloudera.org:8080/10942
Reviewed-by: Michael Brown 
Tested-by: Impala Public Jenkins 


> query_test::test_runtime_filters.py::test_row_filters fails when run against 
> an external cluster
> 
>
> Key: IMPALA-6810
> URL: https://issues.apache.org/jira/browse/IMPALA-6810
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.12.0
>Reporter: David Knupp
>Assignee: Michael Brown
>Priority: Critical
>  Labels: admission-control, resource-management
> Fix For: Impala 3.1.0
>
>
> Presumably this test has been passing when run against the local 
> mini-cluster. When run against an external cluster, however, the test fails 
> with an AssertionError because the exception string is different than 
> expected.
> The expected string is:
> _ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> {color:red}*default-pool*{color}: minimum memory reservation is greater than 
> memory available to the query for buffer reservations. Increase the 
> buffer_pool_limit to 290.00 MB. See the query profile for more information 
> about the per-node memory requirements._
> The actual string is:
> _ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> {color:red}*root.jenkins*{color}: minimum memory reservation is greater than 
> memory available to the query for buffer reservations. Increase the 
> buffer_pool_limit to 290.00 MB. See the query profile for more information 
> about the per-node memory requirements._
> {noformat}
> Stacktrace
> query_test/test_runtime_filters.py:168: in test_row_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS' : str(WAIT_TIME_MS)})
> common/impala_test_suite.py:401: in run_test_case
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:279: in __verify_exceptions
> (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: 
> ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> default-pool: minimum memory reservation is greater than memory available to 
> the query for buffer reservations. Increase the buffer_pool_limit to 290.00 
> MB. See the query profile for more information about the per-node memory 
> requirements.
> E   Not found in actual: ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> root.jenkins: minimum memory reservation is greater than memory available to 
> the query for buffer reservations. Increase the buffer_pool_limit to 290.00 
> MB. See the query profile for more information about the per-node memory 
> requirements.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7734) Catalog memz page shows useless memory breakdown

2019-05-14 Thread Bikramjeet Vig (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig updated IMPALA-7734:
---
Description: 
If you look at catalogd memz, e.g. at localhost:25020/memz, it has a bogus 
breakdown. The catalogd does not use the MemTracker infrastructure since the 
vast majority of it's memory consumption is in the JVM.
{noformat}
Breakdown

: Total=0 Peak=0
  Untracked Memory: Total=0
{noformat}
Reported by [~alanj_impala_5a78]

Update: It is the same for statestored as well and apart from "breakdown" 
section, the "Memory consumption / limit" part is also redundant. Should remove 
both parts from the memz pages of both daemons

 

  was:
If you look at catalogd memz, e.g. at localhost:25020/memz, it has a bogus 
breakdown. The catalogd does not use the MemTracker infrastructure since the 
vast majority of it's memory consumption is in the JVM.

{noformat}
Breakdown

: Total=0 Peak=0
  Untracked Memory: Total=0
{noformat}

Reported by [~alanj_impala_5a78]


> Catalog memz page shows useless memory breakdown
> 
>
> Key: IMPALA-7734
> URL: https://issues.apache.org/jira/browse/IMPALA-7734
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: newbie, observability, ramp-up, supportability
>
> If you look at catalogd memz, e.g. at localhost:25020/memz, it has a bogus 
> breakdown. The catalogd does not use the MemTracker infrastructure since the 
> vast majority of it's memory consumption is in the JVM.
> {noformat}
> Breakdown
> : Total=0 Peak=0
>   Untracked Memory: Total=0
> {noformat}
> Reported by [~alanj_impala_5a78]
> Update: It is the same for statestored as well and apart from "breakdown" 
> section, the "Memory consumption / limit" part is also redundant. Should 
> remove both parts from the memz pages of both daemons
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7734) Catalog and Statestore memz page shows useless memory breakdown

2019-05-14 Thread Bikramjeet Vig (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig updated IMPALA-7734:
---
Summary: Catalog and Statestore memz page shows useless memory breakdown  
(was: Catalog memz page shows useless memory breakdown)

> Catalog and Statestore memz page shows useless memory breakdown
> ---
>
> Key: IMPALA-7734
> URL: https://issues.apache.org/jira/browse/IMPALA-7734
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: newbie, observability, ramp-up, supportability
>
> If you look at catalogd memz, e.g. at localhost:25020/memz, it has a bogus 
> breakdown. The catalogd does not use the MemTracker infrastructure since the 
> vast majority of it's memory consumption is in the JVM.
> {noformat}
> Breakdown
> : Total=0 Peak=0
>   Untracked Memory: Total=0
> {noformat}
> Reported by [~alanj_impala_5a78]
> Update: It is the same for statestored as well and apart from "breakdown" 
> section, the "Memory consumption / limit" part is also redundant. Should 
> remove both parts from the memz pages of both daemons
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8400) Implement Ranger audit event handler

2019-05-14 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya updated IMPALA-8400:
-
Summary: Implement Ranger audit event handler  (was: Ranger audit log 
should be done atomically)

> Implement Ranger audit event handler
> 
>
> Key: IMPALA-8400
> URL: https://issues.apache.org/jira/browse/IMPALA-8400
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Critical
>
> The current implementation logs the audit log per request. We should consider 
> doing the audit log atomically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8371) Unified backend tests need to return appropriate return code

2019-05-14 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-8371.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Unified backend tests need to return appropriate return code
> 
>
> Key: IMPALA-8371
> URL: https://issues.apache.org/jira/browse/IMPALA-8371
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
>
> The scripts generated by bin/gen-backend-test-script.sh need to return the 
> return code from the call to the unified backend executable. The JUnitXML 
> contains a failure, which Jenkins and other tools can process, but the return 
> code must match up for scripts to be able to loop the test, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2658) Extend the NDV function to accept a precision

2019-05-14 Thread Peter Ebert (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839721#comment-16839721
 ] 

Peter Ebert commented on IMPALA-2658:
-

I took a 2nd look and I don't think it make sense to go lower than a precision 
of 6 (2^6=64), that's only 64 bytes of memory for the register.

I'm not confident that going above 16 was tested much in the research, but I 
think it's reasonable to allow users to try to go higher (I don't recall much 
precision improvement above 16, but it may vary depending on dataset and hash 
quality).  

> Extend the NDV function to accept a precision
> -
>
> Key: IMPALA-2658
> URL: https://issues.apache.org/jira/browse/IMPALA-2658
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Peter Ebert
>Priority: Minor
>  Labels: ramp-up
> Attachments: Comparison of HLL Memory usage, Query Duration and 
> Accuracy.jpg
>
>
> Hyperloglog algorithm used by NDV defaults to a precision of 10.  Being able 
> to set this precision would have two benefits:
> # Lower precision sizes can speed up the performance, as a precision of 9 has 
> 1/2 the number of registers as 10 (exponential) and may be just as accurate 
> depending on expected cardinality.
> # Higher precision can help with very large cardinalities (100 million to 
> billion range) and will typically provide more accurate data.  Those who are 
> presenting estimates to end users will likely be willing to trade some 
> performance cost for more accuracy, while still out performing the naive 
> approach by a large margin.
> Propose adding the overloaded function NDV(expression, int precision)
> with accepted range between 18 and 4 inclusive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7734) Catalog memz page shows useless memory breakdown

2019-05-14 Thread Bikramjeet Vig (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839697#comment-16839697
 ] 

Bikramjeet Vig commented on IMPALA-7734:


[~ngangam], will do. Thanks

> Catalog memz page shows useless memory breakdown
> 
>
> Key: IMPALA-7734
> URL: https://issues.apache.org/jira/browse/IMPALA-7734
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: newbie, observability, ramp-up, supportability
>
> If you look at catalogd memz, e.g. at localhost:25020/memz, it has a bogus 
> breakdown. The catalogd does not use the MemTracker infrastructure since the 
> vast majority of it's memory consumption is in the JVM.
> {noformat}
> Breakdown
> : Total=0 Peak=0
>   Untracked Memory: Total=0
> {noformat}
> Reported by [~alanj_impala_5a78]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7734) Catalog memz page shows useless memory breakdown

2019-05-14 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839686#comment-16839686
 ] 

Naveen Gangam commented on IMPALA-7734:
---

[~bikramjeet.vig] Sorry, this was meant to be a rampup jira for me in Impala. 
But I have been re-assigned back to hive. If there are any takers, could you 
please re-assign it? Thanks

> Catalog memz page shows useless memory breakdown
> 
>
> Key: IMPALA-7734
> URL: https://issues.apache.org/jira/browse/IMPALA-7734
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: newbie, observability, ramp-up, supportability
>
> If you look at catalogd memz, e.g. at localhost:25020/memz, it has a bogus 
> breakdown. The catalogd does not use the MemTracker infrastructure since the 
> vast majority of it's memory consumption is in the JVM.
> {noformat}
> Breakdown
> : Total=0 Peak=0
>   Untracked Memory: Total=0
> {noformat}
> Reported by [~alanj_impala_5a78]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2658) Extend the NDV function to accept a precision

2019-05-14 Thread Bikramjeet Vig (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839687#comment-16839687
 ] 

Bikramjeet Vig commented on IMPALA-2658:


Thanks [~PeterEbert]

> Extend the NDV function to accept a precision
> -
>
> Key: IMPALA-2658
> URL: https://issues.apache.org/jira/browse/IMPALA-2658
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Peter Ebert
>Priority: Minor
>  Labels: ramp-up
> Attachments: Comparison of HLL Memory usage, Query Duration and 
> Accuracy.jpg
>
>
> Hyperloglog algorithm used by NDV defaults to a precision of 10.  Being able 
> to set this precision would have two benefits:
> # Lower precision sizes can speed up the performance, as a precision of 9 has 
> 1/2 the number of registers as 10 (exponential) and may be just as accurate 
> depending on expected cardinality.
> # Higher precision can help with very large cardinalities (100 million to 
> billion range) and will typically provide more accurate data.  Those who are 
> presenting estimates to end users will likely be willing to trade some 
> performance cost for more accuracy, while still out performing the naive 
> approach by a large margin.
> Propose adding the overloaded function NDV(expression, int precision)
> with accepted range between 18 and 4 inclusive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8528) Refactor authorization code from AnalysisContext to AuthorizationChecker

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839685#comment-16839685
 ] 

ASF subversion and git services commented on IMPALA-8528:
-

Commit 5a23bacdba9f199948b6a971aebca30586c360a5 in impala's branch 
refs/heads/master from Fredy Wijaya
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5a23bac ]

IMPALA-8528: Refactor authorization check in AnalysisContext

This patch moves the authorization check logic from AnalysisContext
into BaseAuthorizationChecker to consolidate the logic into a single
place. This patch also converts AuthorizationChecker into an interface
The existing implementation code of AuthorizationChecker is now moved to
BaseAuthorizationChecker.

This patch has no functionality change.

Testing:
- Ran FE tests
- Ran E2E authorization tests

Change-Id: I3bc3a11220dae0f49ef3e73d9ff27a90e9d4a71c
Reviewed-on: http://gerrit.cloudera.org:8080/13285
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Refactor authorization code from AnalysisContext to AuthorizationChecker
> 
>
> Key: IMPALA-8528
> URL: https://issues.apache.org/jira/browse/IMPALA-8528
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> Currently the authorization code is scattered in few places, such as 
> AnalysisContext and AuthorizationChecker. This makes it difficult to add 
> things such as doing pre and post authorization check for audit logging, etc. 
> We need to consolidate the authorization code into a single place and perhaps 
> make AuthorizationChecker as an interface and create a 
> BaseAuthorizationChecker that contains many useful authorization methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7734) Catalog memz page shows useless memory breakdown

2019-05-14 Thread Bikramjeet Vig (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839670#comment-16839670
 ] 

Bikramjeet Vig commented on IMPALA-7734:


[~ngangam] [~ychena] Is anyone of you working on this?

> Catalog memz page shows useless memory breakdown
> 
>
> Key: IMPALA-7734
> URL: https://issues.apache.org/jira/browse/IMPALA-7734
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: newbie, observability, ramp-up, supportability
>
> If you look at catalogd memz, e.g. at localhost:25020/memz, it has a bogus 
> breakdown. The catalogd does not use the MemTracker infrastructure since the 
> vast majority of it's memory consumption is in the JVM.
> {noformat}
> Breakdown
> : Total=0 Peak=0
>   Untracked Memory: Total=0
> {noformat}
> Reported by [~alanj_impala_5a78]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-05-14 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-8549:
-
Labels: ramp-up  (was: )

> Add support for scanning DEFLATE text files
> ---
>
> Key: IMPALA-8549
> URL: https://issues.apache.org/jira/browse/IMPALA-8549
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
> one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-05-14 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839657#comment-16839657
 ] 

Sahil Takiar commented on IMPALA-8549:
--

Thanks Tim.

Looking through {{be/src/util/codec.cc}} and {{be/src/util/decompress.cc}} it 
seems we already have support for creating {{.deflate}} files; and we have test 
data for {{.deflate}} Avro and Sequence files already.

For reference, as part adding support for {{.deflate}} files, the following 
test changes need to be be made:
* {{TestCompressedFormats}} in {{test_compressed_formats.py}} needs to be 
updated to test text {{.deflate}} files; right now the test says: "# 
Deflate-compressed (['def']) text files (or at least text files with a 
compressed extension) have not been tested yet."
** I think getting this test to work requires adding the database 
{{functional_text_def}} (similar to {{functional_seq_def}}) to the dataload as 
well

> Add support for scanning DEFLATE text files
> ---
>
> Key: IMPALA-8549
> URL: https://issues.apache.org/jira/browse/IMPALA-8549
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Minor
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
> one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8473) Refactor lineage publication mechanism to allow for different consumers

2019-05-14 Thread Na Li (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839623#comment-16839623
 ] 

Na Li commented on IMPALA-8473:
---

[~radford-nguyen] please keep your current approach and uses abstract class for 
"ImpalaPostExecHook ". 

> Refactor lineage publication mechanism to allow for different consumers
> ---
>
> Key: IMPALA-8473
> URL: https://issues.apache.org/jira/browse/IMPALA-8473
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: radford nguyen
>Assignee: radford nguyen
>Priority: Critical
> Attachments: ImpalaPostExecHook-infra.patch
>
>
> Impetus for this change is to allow lineage to be consumed by Atlas via Kafka.
> h3. Design Proposal
> Move lineage logging from be to fe, where we can make use of the same plugin 
> approach as {{authorization_provider}} to allow a downstream user to provide 
> their own lineage consumers as runtime dependencies.
> [~mad...@apache.org] has provided a fe patch (attached) with suggested 
> mechanism for allowing multiple hooks to be registered with the fe.  Hooks 
> would be invoked from the be at appropriate places, e.g. 
> [https://github.com/apache/impala/blob/c1b0a073938c144e9bf33901bd4df6dcda0f09ec/be/src/service/impala-server.cc#L466].
>   The hooks should all be executed asynchronously, so the current thinking is 
> that this execution should happen in the fe, since the be does not know about 
> what hooks are registered.  IOW, the 
> {{ImpalaPostExecHookFactory.executeHooks}} method (see patch) should probably 
> make use of a thread-pool executor service (or something similar) in order to 
> execute all hooks in parallel and in a non-blocking manner, returning to the 
> be asap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8376) Add per-directory limits for scratch disk usage

2019-05-14 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8376:
--
Description: I think we'd want to use a similar syntax to the cache sizes 
specified for the data cache.

> Add per-directory limits for scratch disk usage
> ---
>
> Key: IMPALA-8376
> URL: https://issues.apache.org/jira/browse/IMPALA-8376
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> I think we'd want to use a similar syntax to the cache sizes specified for 
> the data cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8414) Warning caused by not skipping header of /proc/net/dev

2019-05-14 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8414.
-
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Warning caused by not skipping header of /proc/net/dev
> --
>
> Key: IMPALA-8414
> URL: https://issues.apache.org/jira/browse/IMPALA-8414
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> [This fix|https://gerrit.cloudera.org/#/c/12954/] for IMPALA-8395 does not 
> skip the first to header lines of /proc/net/dev, causing warnings like this:
> {noformat}
> W0414 17:58:49.836887 32683 system-state-info.cc:192] Failed to parse 
> interface name in line: Inter-|   Receive 
>|  Transmit
> W0414 17:59:49.940279 32683 system-state-info.cc:192] Failed to parse 
> interface name in line: Inter-|   Receive 
>|  Transmit
> W0414 18:00:50.077952 32683 system-state-info.cc:192] Failed to parse 
> interface name in line: Inter-|   Receive 
>|  Transmit
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8414) Warning caused by not skipping header of /proc/net/dev

2019-05-14 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8414.
-
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Warning caused by not skipping header of /proc/net/dev
> --
>
> Key: IMPALA-8414
> URL: https://issues.apache.org/jira/browse/IMPALA-8414
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> [This fix|https://gerrit.cloudera.org/#/c/12954/] for IMPALA-8395 does not 
> skip the first to header lines of /proc/net/dev, causing warnings like this:
> {noformat}
> W0414 17:58:49.836887 32683 system-state-info.cc:192] Failed to parse 
> interface name in line: Inter-|   Receive 
>|  Transmit
> W0414 17:59:49.940279 32683 system-state-info.cc:192] Failed to parse 
> interface name in line: Inter-|   Receive 
>|  Transmit
> W0414 18:00:50.077952 32683 system-state-info.cc:192] Failed to parse 
> interface name in line: Inter-|   Receive 
>|  Transmit
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8484) Add support to run queries on disjoint executor groups

2019-05-14 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker reassigned IMPALA-8484:
---

Assignee: Lars Volker

> Add support to run queries on disjoint executor groups
> --
>
> Key: IMPALA-8484
> URL: https://issues.apache.org/jira/browse/IMPALA-8484
> Project: IMPALA
>  Issue Type: New Feature
>Affects Versions: Impala 3.3.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Major
>  Labels: scalability
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-05-14 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839608#comment-16839608
 ] 

Tim Armstrong commented on IMPALA-8549:
---

[~stakiar] the code that marks it as unsupported predates my involvement so I'm 
not sure why it wasn't supported (I stopped short of shaving that particular 
yak). I believe deflate is a variant of gzip. Looking at the hadoop code the 
implementations only different in headers: 
https://github.com/apache/hadoop/tree/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress.
 It would make sense to just implement it.

> Add support for scanning DEFLATE text files
> ---
>
> Key: IMPALA-8549
> URL: https://issues.apache.org/jira/browse/IMPALA-8549
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Minor
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
> one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work stopped] (IMPALA-7613) Support round(DECIMAL) with non-constant second argument

2019-05-14 Thread Abhishek Rawat (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7613 stopped by Abhishek Rawat.
--
> Support round(DECIMAL) with non-constant second argument
> 
>
> Key: IMPALA-7613
> URL: https://issues.apache.org/jira/browse/IMPALA-7613
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Abhishek Rawat
>Priority: Major
>  Labels: decimal, ramp-up
>
> Sometimes users want to round to a precision that is data-driven (e.g. using 
> a lookup table). They can't currently do this with decimal. I think we could 
> support this by just using the input decimal type as the output type when the 
> second argument is non-constant.
> {noformat}
> select round(l_tax, l_linenumber) from tpch.lineitem limit 5;
> Query: select round(l_tax, l_linenumber) from tpch.lineitem limit 5
> Query submitted at: 2018-09-24 11:03:10 (Coordinator: 
> http://tarmstrong-box:25000)
> ERROR: AnalysisException: round() must be called with a constant second 
> argument.
> {noformat}
> Motivated by a user trying to do something like this; 
> http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-round-function-does-not-return-expected-result/m-p/80200#M4906



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8537) Negative values reported for tmp-file-mgr.scratch-space-bytes-used under heavy spilling load

2019-05-14 Thread Abhishek Rawat (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8537 started by Abhishek Rawat.
--
> Negative values reported for tmp-file-mgr.scratch-space-bytes-used under 
> heavy spilling load
> 
>
> Key: IMPALA-8537
> URL: https://issues.apache.org/jira/browse/IMPALA-8537
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Abhishek Rawat
>Priority: Major
> Attachments: bad_spill_metrics.json
>
>
> I'm running a workload that does a lot of spilling and noticed the value 
> reported for tmp-file-mgr.scratch-space-bytes-used is negative on all nodes.
> Some details of the workload and cluster configuration:
>  * Generating a 10 TB TPC-DS partitioned parquet data set (very large sort).
>  * 30 impalads, each with 48 GB RAM and 14 scratch directories (each on a 
> separate drive)
>  * Rough estimate (based on query metrics) of total cumulative aggregate 
> memory spilled across the cluster since restart is 6.5 TB.
> Snapshot of the bad metrics attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org

2019-05-14 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8527.
---
Resolution: Fixed

> Maven hangs on jenkins.impala.io talking to repository.apache.org
> -
>
> Key: IMPALA-8527
> URL: https://issues.apache.org/jira/browse/IMPALA-8527
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
>
> We're seeing most precommit builds failing because mvn gets stuck talking to 
> repository.apache.org. See IMPALA-8516.
> I'm going to see if we can avoid it by pruning down our Maven repository 
> dependencies - we should be able to get all the artifacts from other mirrors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8527) Maven hangs on jenkins.impala.io talking to repository.apache.org

2019-05-14 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8527.
---
Resolution: Fixed

> Maven hangs on jenkins.impala.io talking to repository.apache.org
> -
>
> Key: IMPALA-8527
> URL: https://issues.apache.org/jira/browse/IMPALA-8527
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
>
> We're seeing most precommit builds failing because mvn gets stuck talking to 
> repository.apache.org. See IMPALA-8516.
> I'm going to see if we can avoid it by pruning down our Maven repository 
> dependencies - we should be able to get all the artifacts from other mirrors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (IMPALA-8450) Add support for zstd and lz4 in parquet

2019-05-14 Thread Abhishek Rawat (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8450 started by Abhishek Rawat.
--
> Add support for zstd and lz4 in parquet
> ---
>
> Key: IMPALA-8450
> URL: https://issues.apache.org/jira/browse/IMPALA-8450
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Abhishek Rawat
>Priority: Major
>  Labels: parquet
>
> PARQUET-970 added these codecs to the format. We have LZ4 in the toolchain 
> already and I just added zstd: https://gerrit.cloudera.org/#/c/13079/
> These codec probably offer a better trade-off of density and speed than 
> snappy or gzip.
> https://github.com/apache/arrow/pull/807/files might be a useful crib sheet 
> for how to add a compressor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8548) Include Documentation About Ordinal Substitution

2019-05-14 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni reassigned IMPALA-8548:
---

Assignee: Alex Rodoni

> Include Documentation About Ordinal Substitution 
> -
>
> Key: IMPALA-8548
> URL: https://issues.apache.org/jira/browse/IMPALA-8548
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Affects Versions: Impala 2.0, Impala 3.0
>Reporter: David Mollitor
>Assignee: Alex Rodoni
>Priority: Minor
>
> Update Impala docs to include information on the 'ordinal substitution' 
> feature.
>  
> [https://github.com/apache/impala/blob/master/docs/shared/impala_common.xml#L1104]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-05-14 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839558#comment-16839558
 ] 

Sahil Takiar commented on IMPALA-8549:
--

CC: [~tarmstr...@cloudera.com] I think you might be more familiar with this 
area than I am, so just wondering if I am missing something here.

> Add support for scanning DEFLATE text files
> ---
>
> Key: IMPALA-8549
> URL: https://issues.apache.org/jira/browse/IMPALA-8549
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Minor
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
> one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-05-14 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-8549:
-
Priority: Minor  (was: Major)

> Add support for scanning DEFLATE text files
> ---
>
> Key: IMPALA-8549
> URL: https://issues.apache.org/jira/browse/IMPALA-8549
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Minor
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
> one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8528) Refactor authorization code from AnalysisContext to AuthorizationChecker

2019-05-14 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-8528.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Refactor authorization code from AnalysisContext to AuthorizationChecker
> 
>
> Key: IMPALA-8528
> URL: https://issues.apache.org/jira/browse/IMPALA-8528
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> Currently the authorization code is scattered in few places, such as 
> AnalysisContext and AuthorizationChecker. This makes it difficult to add 
> things such as doing pre and post authorization check for audit logging, etc. 
> We need to consolidate the authorization code into a single place and perhaps 
> make AuthorizationChecker as an interface and create a 
> BaseAuthorizationChecker that contains many useful authorization methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8528) Refactor authorization code from AnalysisContext to AuthorizationChecker

2019-05-14 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-8528.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Refactor authorization code from AnalysisContext to AuthorizationChecker
> 
>
> Key: IMPALA-8528
> URL: https://issues.apache.org/jira/browse/IMPALA-8528
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> Currently the authorization code is scattered in few places, such as 
> AnalysisContext and AuthorizationChecker. This makes it difficult to add 
> things such as doing pre and post authorization check for audit logging, etc. 
> We need to consolidate the authorization code into a single place and perhaps 
> make AuthorizationChecker as an interface and create a 
> BaseAuthorizationChecker that contains many useful authorization methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-05-14 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8549:


 Summary: Add support for scanning DEFLATE text files
 Key: IMPALA-8549
 URL: https://issues.apache.org/jira/browse/IMPALA-8549
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar


Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one 
of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8548) Include Documentation About Ordinal Substitution

2019-05-14 Thread David Mollitor (JIRA)
David Mollitor created IMPALA-8548:
--

 Summary: Include Documentation About Ordinal Substitution 
 Key: IMPALA-8548
 URL: https://issues.apache.org/jira/browse/IMPALA-8548
 Project: IMPALA
  Issue Type: Documentation
  Components: Docs
Affects Versions: Impala 3.0, Impala 2.0
Reporter: David Mollitor


Update Impala docs to include information on the 'ordinal substitution' feature.

 

[https://github.com/apache/impala/blob/master/docs/shared/impala_common.xml#L1104]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7107) [DOCS] Review docs for storage formats impala cannot insert into

2019-05-14 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839526#comment-16839526
 ] 

Sahil Takiar commented on IMPALA-7107:
--

[~arodoni_cloudera] I suggest we add it back in for now. You can't actually 
query {{.deflate}} text files in Impala.

> [DOCS] Review docs for storage formats impala cannot insert into
> 
>
> Key: IMPALA-7107
> URL: https://issues.apache.org/jira/browse/IMPALA-7107
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Alex Rodoni
>Priority: Minor
> Fix For: Impala 3.2.0
>
>
> There are several points to clear up or improve across these pages:
> * I'd refer to the Hive documentation on how to set compression codecs 
> instead of documenting Hive's behaviour for file formats Impala cannot write
> * Add 'Ingesting file formats Impala can't write' section to 'How Impala 
> Works with Hadoop File Formats' page, link that central location from 
> wherever applicable. Unify the recommendation on data loading (usage of LOAD 
> DATA or hive or manual copy).
> * add a compatibility matrix for compressions and file formats, clear up 
> compatibility on 'How Impala Works with Hadoop File Formats' (the page is 
> inconsistent even within itself, e.g. bzip2).
> * Remove references to Impala versions <2.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8548) Include Documentation About Ordinal Substitution

2019-05-14 Thread David Mollitor (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated IMPALA-8548:
---
Priority: Minor  (was: Major)

> Include Documentation About Ordinal Substitution 
> -
>
> Key: IMPALA-8548
> URL: https://issues.apache.org/jira/browse/IMPALA-8548
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Affects Versions: Impala 2.0, Impala 3.0
>Reporter: David Mollitor
>Priority: Minor
>
> Update Impala docs to include information on the 'ordinal substitution' 
> feature.
>  
> [https://github.com/apache/impala/blob/master/docs/shared/impala_common.xml#L1104]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7107) [DOCS] Review docs for storage formats impala cannot insert into

2019-05-14 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839489#comment-16839489
 ] 

Balazs Jeszenszky commented on IMPALA-7107:
---

That's right. Sorry, I misunderstood your original comment. Disregard.

> [DOCS] Review docs for storage formats impala cannot insert into
> 
>
> Key: IMPALA-7107
> URL: https://issues.apache.org/jira/browse/IMPALA-7107
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Alex Rodoni
>Priority: Minor
> Fix For: Impala 3.2.0
>
>
> There are several points to clear up or improve across these pages:
> * I'd refer to the Hive documentation on how to set compression codecs 
> instead of documenting Hive's behaviour for file formats Impala cannot write
> * Add 'Ingesting file formats Impala can't write' section to 'How Impala 
> Works with Hadoop File Formats' page, link that central location from 
> wherever applicable. Unify the recommendation on data loading (usage of LOAD 
> DATA or hive or manual copy).
> * add a compatibility matrix for compressions and file formats, clear up 
> compatibility on 'How Impala Works with Hadoop File Formats' (the page is 
> inconsistent even within itself, e.g. bzip2).
> * Remove references to Impala versions <2.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8490) Impala Doc: the file handle cache now supports S3

2019-05-14 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839475#comment-16839475
 ] 

Sahil Takiar commented on IMPALA-8490:
--

[~arodoni_cloudera] IMPALA-8428 has been merged now, so this should be 
unblocked.

> Impala Doc: the file handle cache now supports S3
> -
>
> Key: IMPALA-8490
> URL: https://issues.apache.org/jira/browse/IMPALA-8490
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Sahil Takiar
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> https://impala.apache.org/docs/build/html/topics/impala_scalability.html 
> state:
> {quote}
> Because this feature only involves HDFS data files, it does not apply to 
> non-HDFS tables, such as Kudu or HBase tables, or tables that store their 
> data on cloud services such as S3 or ADLS.
> {quote}
> This section should be updated because the file handle cache now supports S3 
> files.
> We should add a section to the docs similar to what we added when support for 
> remote HDFS files was added to the file handle cache:
> {quote}
> In Impala 3.2 and higher, file handle caching also applies to remote HDFS 
> file handles. This is controlled by the cache_remote_file_handles flag for an 
> impalad. It is recommended that you use the default value of true as this 
> caching prevents your NameNode from overloading when your cluster has many 
> remote HDFS reads.
> {quote}
> Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has 
> been added as an impalad startup option (the flag is enabled by default).
> Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a 
> call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode 
> from overloading when your cluster has many remote HDFS reads" should be 
> changed to something like "avoids an unnecessary call to 
> S3AFileSystem#getFileStatus() which reduces the number of API calls made to 
> S3."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8428) Add support for caching file handles on s3

2019-05-14 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8428.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Add support for caching file handles on s3
> --
>
> Key: IMPALA-8428
> URL: https://issues.apache.org/jira/browse/IMPALA-8428
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> The file handle cache is currently disabled for S3, as the S3 connector 
> needed to implement proper unbuffer support. Now that 
> https://issues.apache.org/jira/browse/HADOOP-14747 is fixed, Impala should 
> provide an option to cache S3 file handles.
> This is particularly important for data caching, as accessing the data cache 
> happens after obtaining a file handle. If getting a file handle is slow, the 
> caching will be less effective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8428) Add support for caching file handles on s3

2019-05-14 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8428.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Add support for caching file handles on s3
> --
>
> Key: IMPALA-8428
> URL: https://issues.apache.org/jira/browse/IMPALA-8428
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> The file handle cache is currently disabled for S3, as the S3 connector 
> needed to implement proper unbuffer support. Now that 
> https://issues.apache.org/jira/browse/HADOOP-14747 is fixed, Impala should 
> provide an option to cache S3 file handles.
> This is particularly important for data caching, as accessing the data cache 
> happens after obtaining a file handle. If getting a file handle is slow, the 
> caching will be less effective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-2658) Extend the NDV function to accept a precision

2019-05-14 Thread Peter Ebert (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839440#comment-16839440
 ] 

Peter Ebert commented on IMPALA-2658:
-

Based on the research papers 
[http://algo.inria.fr/flajolet/Publications/FlMa85.pdf] and 
[https://ai.google/research/pubs/pub40671] 

It has been some time since I read them but I do not recall seeing precisions 
outside of that range in the research.  Above 18 the register size becomes 
quite large and smaller than 4 the accuracy was very low.

> Extend the NDV function to accept a precision
> -
>
> Key: IMPALA-2658
> URL: https://issues.apache.org/jira/browse/IMPALA-2658
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Peter Ebert
>Priority: Minor
>  Labels: ramp-up
> Attachments: Comparison of HLL Memory usage, Query Duration and 
> Accuracy.jpg
>
>
> Hyperloglog algorithm used by NDV defaults to a precision of 10.  Being able 
> to set this precision would have two benefits:
> # Lower precision sizes can speed up the performance, as a precision of 9 has 
> 1/2 the number of registers as 10 (exponential) and may be just as accurate 
> depending on expected cardinality.
> # Higher precision can help with very large cardinalities (100 million to 
> billion range) and will typically provide more accurate data.  Those who are 
> presenting estimates to end users will likely be willing to trade some 
> performance cost for more accuracy, while still out performing the naive 
> approach by a large margin.
> Propose adding the overloaded function NDV(expression, int precision)
> with accepted range between 18 and 4 inclusive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8547) get_json_object fails to get value for numeric key

2019-05-14 Thread Eugene Zimichev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Zimichev updated IMPALA-8547:

Labels: built-in-function  (was: )

> get_json_object fails to get value for numeric key
> --
>
> Key: IMPALA-8547
> URL: https://issues.apache.org/jira/browse/IMPALA-8547
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Eugene Zimichev
>Priority: Minor
>  Labels: built-in-function
>
> {code:java}
> select get_json_object('{"1": 5}', '$.1');
> {code}
> returns error:
>  
> {code:java}
> "Expected key at position 2"
> {code}
>  
> I guess it's caused by using function FindEndOfIdentifier that expects first 
> symbol of key to be a letter.
> Hive version of get_json_object works fine in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8547) get_json_object fails to get value for numeric key

2019-05-14 Thread Eugene Zimichev (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Zimichev updated IMPALA-8547:

Component/s: Backend

> get_json_object fails to get value for numeric key
> --
>
> Key: IMPALA-8547
> URL: https://issues.apache.org/jira/browse/IMPALA-8547
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Eugene Zimichev
>Priority: Minor
>
> {code:java}
> select get_json_object('{"1": 5}', '$.1');
> {code}
> returns error:
>  
> {code:java}
> "Expected key at position 2"
> {code}
>  
> I guess it's caused by using function FindEndOfIdentifier that expects first 
> symbol of key to be a letter.
> Hive version of get_json_object works fine in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8547) get_json_object fails to get value for numeric key

2019-05-14 Thread Eugene Zimichev (JIRA)
Eugene Zimichev created IMPALA-8547:
---

 Summary: get_json_object fails to get value for numeric key
 Key: IMPALA-8547
 URL: https://issues.apache.org/jira/browse/IMPALA-8547
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.1.0
Reporter: Eugene Zimichev


{code:java}
select get_json_object('{"1": 5}', '$.1');
{code}

returns error:
 
{code:java}
"Expected key at position 2"
{code}
 
I guess it's caused by using function FindEndOfIdentifier that expects first 
symbol of key to be a letter.
Hive version of get_json_object works fine in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8546) Collect logs from docker containers in tests

2019-05-14 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8546:
-

 Summary: Collect logs from docker containers in tests
 Key: IMPALA-8546
 URL: https://issues.apache.org/jira/browse/IMPALA-8546
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


We should collect the logs from the cluster processes into the logs/ 
subdirectory for debugging purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8515) Test impala-shell distribution instead of special dev environment version

2019-05-14 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8515.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Test impala-shell distribution instead of special dev environment version
> -
>
> Key: IMPALA-8515
> URL: https://issues.apache.org/jira/browse/IMPALA-8515
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Impala shell tests use bin/impala-shell.sh, which uses impala-python and 
> various dev-environment specific infrastructure to run impala-shell. We also 
> build a shell tarball, which is meant to be a self-contained version of the 
> shell with all dependencies.
> In principle it's better to test the build artifacts rather than the 
> development environment. Therefore for full builds, where we build the 
> tarball, we should test the contents of the tarball including the bundled 
> libraries.
> For remote cluster tests, we can continue to use the dev environment (since 
> we don't necessarily build the shell tarball there).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8515) Test impala-shell distribution instead of special dev environment version

2019-05-14 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8515.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Test impala-shell distribution instead of special dev environment version
> -
>
> Key: IMPALA-8515
> URL: https://issues.apache.org/jira/browse/IMPALA-8515
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Impala shell tests use bin/impala-shell.sh, which uses impala-python and 
> various dev-environment specific infrastructure to run impala-shell. We also 
> build a shell tarball, which is meant to be a self-contained version of the 
> shell with all dependencies.
> In principle it's better to test the build artifacts rather than the 
> development environment. Therefore for full builds, where we build the 
> tarball, we should test the contents of the tarball including the bundled 
> libraries.
> For remote cluster tests, we can continue to use the dev environment (since 
> we don't necessarily build the shell tarball there).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8515) Test impala-shell distribution instead of special dev environment version

2019-05-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839115#comment-16839115
 ] 

ASF subversion and git services commented on IMPALA-8515:
-

Commit b55d905322db017a11b5424da9c26c8d43aebb4c in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b55d905 ]

IMPALA-8515: port shell tests to use shell build

shell/make_shell_tarball.sh builds a tarball with all the
shell dependencies bundled. We should test the contents of
that tarball in the shell tests instead of using infra/python/env
and the libraries bundled there.

This tarball is one of the default targets (e.g. run by buildall.sh) so
this should not affect any typical development workflows.

Note that this means the shell tests now requires the shell tarball to
be built locally, which doesn't necessarily happen for remote cluster
tests, so we preserve the old behaviour in that case.

Testing:
Ran core tests on CentOS 6 and CentOS 7.

Change-Id: I581363639b279a9c2ff1fd982bdb140260b24baa
Reviewed-on: http://gerrit.cloudera.org:8080/13267
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Test impala-shell distribution instead of special dev environment version
> -
>
> Key: IMPALA-8515
> URL: https://issues.apache.org/jira/browse/IMPALA-8515
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> Impala shell tests use bin/impala-shell.sh, which uses impala-python and 
> various dev-environment specific infrastructure to run impala-shell. We also 
> build a shell tarball, which is meant to be a self-contained version of the 
> shell with all dependencies.
> In principle it's better to test the build artifacts rather than the 
> development environment. Therefore for full builds, where we build the 
> tarball, we should test the contents of the tarball including the bundled 
> libraries.
> For remote cluster tests, we can continue to use the dev environment (since 
> we don't necessarily build the shell tarball there).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org