[jira] [Resolved] (IMPALA-7934) Switch to using Java 8's Base64 impl for incremental stats encoding

2019-01-31 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-7934.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Switch to using Java 8's Base64 impl for incremental stats encoding
> ---
>
> Key: IMPALA-7934
> URL: https://issues.apache.org/jira/browse/IMPALA-7934
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: Fredy Wijaya
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 3.2.0
>
> Attachments: base64.png
>
>
> Incremental stats are compressed and Base64 encoded before they are chunked 
> and written to the HMS' partition parameters map. When they are read back, we 
> need to Base64 decode and decompress. 
> For certain incremental stats heavy tables, we noticed that a significant 
> amount of time is spent in these base64 classes (see the attached image for 
> the stack. Unfortunately, I don't have the text version of it).
> Java 8 comes with its own Base64 implementation and that has shown much 
> better perf results [1] compared to apache codec's impl. So consider 
> switching to Java 8's base64 impl.
>  [1] http://java-performance.info/base64-encoding-and-decoding-performance/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-6632) Document compatibility of table and column stats between Impala and Hive

2019-01-31 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-6632.
---
Resolution: Won't Fix

> Document compatibility of table and column stats between Impala and Hive
> 
>
> Key: IMPALA-6632
> URL: https://issues.apache.org/jira/browse/IMPALA-6632
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Alexander Behm
>Assignee: Alex Rodoni
>Priority: Major
>
> The question of compatibility between the table and column stats between Hive 
> and Impala comes up quite often, so is worth documenting explicitly.
> Quoting myself from a recent discussion thread to get the docs effort started:
> Commonalities:
> - Hive and Impala both store row counts at the table level and partition 
> level. Hive also computes and stores additional stats like file counts which 
> Impala does not need or use.
> Differences:
> - Impala computes and stores column-level stats like the number of distinct 
> values (NDV) only at the table level, and not at the partition level.
> - Hive computes and stores column-level stats at the partition level. Impala 
> does not follow this approach because the per-partition NDVs cannot be 
> sensibly combined for queries that access multiple partitions. In short, the 
> column stats for partitioned tables are not compatible between Impala and 
> Hive (because imo Hive's approach does not make sense).
> - Impala uses a more modern and tuned algorithm (HyperLogLog++) for 
> estimating the number of distinct values, so they tend to be more accurate 
> than Hive's. Your mileage may vary.
> - For unpartitioned tables, the Hive and Impala column stats are compatible.
> For partitioned tables, the table-level column stats that Impala writes in 
> the Metastore are stored just like for unpartitioned tables. These statistics 
> are "available" to Hive in the sense that the standard retrieval APIs will 
> work as expected.  My understanding is that for partitioned tables, Hive does 
> not use the table-level column stats, but instead expects partition-level 
> column stats. As I've said before, these partition-level column stats do not 
> make any sense because it is not possible to sensibly combine them for 
> multiple partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-6632) Document compatibility of table and column stats between Impala and Hive

2019-01-31 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-6632.
---
Resolution: Won't Fix

> Document compatibility of table and column stats between Impala and Hive
> 
>
> Key: IMPALA-6632
> URL: https://issues.apache.org/jira/browse/IMPALA-6632
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Alexander Behm
>Assignee: Alex Rodoni
>Priority: Major
>
> The question of compatibility between the table and column stats between Hive 
> and Impala comes up quite often, so is worth documenting explicitly.
> Quoting myself from a recent discussion thread to get the docs effort started:
> Commonalities:
> - Hive and Impala both store row counts at the table level and partition 
> level. Hive also computes and stores additional stats like file counts which 
> Impala does not need or use.
> Differences:
> - Impala computes and stores column-level stats like the number of distinct 
> values (NDV) only at the table level, and not at the partition level.
> - Hive computes and stores column-level stats at the partition level. Impala 
> does not follow this approach because the per-partition NDVs cannot be 
> sensibly combined for queries that access multiple partitions. In short, the 
> column stats for partitioned tables are not compatible between Impala and 
> Hive (because imo Hive's approach does not make sense).
> - Impala uses a more modern and tuned algorithm (HyperLogLog++) for 
> estimating the number of distinct values, so they tend to be more accurate 
> than Hive's. Your mileage may vary.
> - For unpartitioned tables, the Hive and Impala column stats are compatible.
> For partitioned tables, the table-level column stats that Impala writes in 
> the Metastore are stored just like for unpartitioned tables. These statistics 
> are "available" to Hive in the sense that the standard retrieval APIs will 
> work as expected.  My understanding is that for partitioned tables, Hive does 
> not use the table-level column stats, but instead expects partition-level 
> column stats. As I've said before, these partition-level column stats do not 
> make any sense because it is not possible to sensibly combine them for 
> multiple partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-8137) Order by docs incorrectly state that order by happens on one node

2019-01-31 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757847#comment-16757847
 ] 

Alex Rodoni commented on IMPALA-8137:
-

https://gerrit.cloudera.org/#/c/12330/

> Order by docs incorrectly state that order by happens on one node
> -
>
> Key: IMPALA-8137
> URL: https://issues.apache.org/jira/browse/IMPALA-8137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_order_by.html
> "because the entire result set must be produced and transferred to one node 
> before the sorting can happen." is incorrect. If there is an "ORDER BY" 
> clause in a select block, then first data is sorted locally by each impala 
> daemon, then streamed to the coordinator, which merges the sorted result sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8137) Order by docs incorrectly state that order by happens on one node

2019-01-31 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8137 started by Alex Rodoni.
---
> Order by docs incorrectly state that order by happens on one node
> -
>
> Key: IMPALA-8137
> URL: https://issues.apache.org/jira/browse/IMPALA-8137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_order_by.html
> "because the entire result set must be produced and transferred to one node 
> before the sorting can happen." is incorrect. If there is an "ORDER BY" 
> clause in a select block, then first data is sorted locally by each impala 
> daemon, then streamed to the coordinator, which merges the sorted result sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8128) Build failure: Kudu functional data load - Kudu crashed

2019-01-31 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757840#comment-16757840
 ] 

Michael Ho commented on IMPALA-8128:


There is a Kudu troubleshooting guide here: 
https://kudu.apache.org/docs/troubleshooting.html#ntp

{quote}
For the master and tablet server daemons, the server’s clock must be 
synchronized using NTP. In addition, the maximum clock error (not to be 
mistaken with the estimated error) be below a configurable threshold. The 
default value is 10 seconds, but it can be set with the flag 
--max_clock_sync_error_usec.
{quote}


> Build failure: Kudu functional data load - Kudu crashed
> ---
>
> Key: IMPALA-8128
> URL: https://issues.apache.org/jira/browse/IMPALA-8128
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Lenisha Gandhi
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> ASAN and core builds failed due, it seems, to a Kudu crash when loading data. 
> Looks like tpcds, tech and functional all failed.
> {noformat}
> 20:28:01 + msg='Loading functional-query data'
> ...
> 20:31:42 20:31:41 Error executing impala SQL: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/data_loading/sql/tpch/create-tpch-core-impala-generated-kudu-none-none.sql
>  See: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/data_loading/sql/tpch/create-tpch-core-impala-generated-kudu-none-none.sql.log
> ...
> 20:40:52 FAILED (Took: 12 min 51 sec)
> ...
> 20:40:52 20:40:52 Error executing impala SQL: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql
>  See: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql.log
> ...
> 20:41:16 2019-01-25 20:41:16,863 - archive_core_dumps - INFO - Found binary 
> path through GDB: 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/cdh_compon
> 20:41:17 2019-01-25 20:41:17,043 - archive_core_dumps - WARNING - Failed to 
> determine binary because multiple candidate binaries were found and none of 
> their paths contained 'latest' to disambiguate:
> 20:41:17 Core:./core.1548476819.29942.kudu-tserver
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8128) Build failure: Kudu functional data load - Kudu crashed

2019-01-31 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757833#comment-16757833
 ] 

Michael Ho commented on IMPALA-8128:


Just hit a similar case. Seems to be unsynchronized clock in the VM:
 
{noformat}
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
F0131 13:19:06.405735 14545 tablet_server_main.cc:80] Check failed: _s.ok() Bad 
status: Service unavailable: Cannot initialize clock: Error reading clock. 
Clock considered unsynchronized
Log file created at: 2019/01/31 13:19:06
Running on machine: 
impala-ec2-centos74-m5-4xlarge-ondemand-12f6.vpc.cloudera.com
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
F0131 13:19:06.405642 14570 tablet_server_main.cc:80] Check failed: _s.ok() Bad 
status: Service unavailable: Cannot initialize clock: Error reading clock. 
Clock considered unsynchronized
Log file created at: 2019/01/31 13:19:06
Running on machine: 
impala-ec2-centos74-m5-4xlarge-ondemand-12f6.vpc.cloudera.com
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
F0131 13:19:06.405911 14529 tablet_server_main.cc:80] Check failed: _s.ok() Bad 
status: Service unavailable: Cannot initialize clock: Error reading clock. 
Clock considered unsynchronized
{noformat}


> Build failure: Kudu functional data load - Kudu crashed
> ---
>
> Key: IMPALA-8128
> URL: https://issues.apache.org/jira/browse/IMPALA-8128
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Lenisha Gandhi
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> ASAN and core builds failed due, it seems, to a Kudu crash when loading data. 
> Looks like tpcds, tech and functional all failed.
> {noformat}
> 20:28:01 + msg='Loading functional-query data'
> ...
> 20:31:42 20:31:41 Error executing impala SQL: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/data_loading/sql/tpch/create-tpch-core-impala-generated-kudu-none-none.sql
>  See: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/data_loading/sql/tpch/create-tpch-core-impala-generated-kudu-none-none.sql.log
> ...
> 20:40:52 FAILED (Took: 12 min 51 sec)
> ...
> 20:40:52 20:40:52 Error executing impala SQL: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql
>  See: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql.log
> ...
> 20:41:16 2019-01-25 20:41:16,863 - archive_core_dumps - INFO - Found binary 
> path through GDB: 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/cdh_compon
> 20:41:17 2019-01-25 20:41:17,043 - archive_core_dumps - WARNING - Failed to 
> determine binary because multiple candidate binaries were found and none of 
> their paths contained 'latest' to disambiguate:
> 20:41:17 Core:./core.1548476819.29942.kudu-tserver
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8150) AuditingTest.TestAccessEventsOnAuthFailure

2019-01-31 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8150:
---
Description: 
{{org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure}} 
started to fail recently with the following backtrace.

[~fredyw], would you mind taking a look as you seem to have touched this test 
recently.

{noformat}
java.lang.IllegalStateException: Error refreshing authorization policy: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
Caused by: org.apache.impala.catalog.CatalogException: Error refreshing 
authorization policy: at 

org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 
Caused by: org.apache.impala.common.ImpalaRuntimeException: Error refreshing 
authorization policy, current policy state may be inconsistent. Running 
'invalidate metadata' may resolve this problem: at 

org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.impala.common.SentryPolicyReaderException: 
org.apache.impala.common.InternalException: Error creating Sentry Service 
client: at 

org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
Caused by: org.apache.impala.common.SentryPolicyReaderException: 
org.apache.impala.common.InternalException: Error creating Sentry Service 
client: 

Caused by: org.apache.impala.common.InternalException: Error creating Sentry 
Service client:

Caused by: 
org.apache.sentry.core.common.exception.MissingConfigurationException: Property 
'sentry.service.server.principal' is missing in configuration
{noformat}

  was:
{{org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure}} 
started to fail recently with the following backtrace.

[~fredyw], would you mind taking a look as you seem to have touched this test 
recently.

{noformat}
java.lang.IllegalStateException: Error refreshing authorization policy: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 Caused by: org.apache.impala.catalog.CatalogException: Error refreshing 
authorization policy: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 Caused by: org.apache.impala.common.ImpalaRuntimeException: Error refreshing 
authorization policy, current policy state may be inconsistent. Running 
'invalidate metadata' may resolve this problem: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 Caused by: java.util.concurrent.ExecutionException: 
org.apache.impala.common.SentryPolicyReaderException: 
org.apache.impala.common.InternalException: Error creating Sentry Service 
client: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 Caused by: org.apache.impala.common.SentryPolicyReaderException: 
org.apache.impala.common.InternalException: Error creating Sentry Service 
client: Caused by: org.apache.impala.common.InternalException: Error creating 
Sentry Service client: Caused by: 
org.apache.sentry.core.common.exception.MissingConfigurationException: Property 
'sentry.service.server.principal' is missing in configuration
h3. Standard Output
{noformat}


> AuditingTest.TestAccessEventsOnAuthFailure
> --
>
> Key: IMPALA-8150
> URL: https://issues.apache.org/jira/browse/IMPALA-8150
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Fredy Wijaya
>Priority: Blocker
>  Labels: broken-build
>
> {{org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure}} 
> started to fail recently with the following backtrace.
> [~fredyw], would you mind taking a look as you seem to have touched this test 
> recently.
> {noformat}
> java.lang.IllegalStateException: Error refreshing authorization policy: at 
> org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
> Caused by: org.apache.impala.catalog.CatalogException: Error refreshing 
> authorization policy: at 
> org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
>  
> Caused by: org.apache.impala.common.ImpalaRuntimeException: Error refreshing 
> authorization policy, current policy state may be inconsistent. Running 
> 'invalidate metadata' may resolve this problem: at 
> org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.impala.common.SentryPolicyReaderException: 
> org.apache.impala.common.InternalException: Error creating Sentry Service 
> 

[jira] [Created] (IMPALA-8150) AuditingTest.TestAccessEventsOnAuthFailure

2019-01-31 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8150:
--

 Summary: AuditingTest.TestAccessEventsOnAuthFailure
 Key: IMPALA-8150
 URL: https://issues.apache.org/jira/browse/IMPALA-8150
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 3.2.0
Reporter: Michael Ho
Assignee: Fredy Wijaya


{{org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure}} 
started to fail recently with the following backtrace.

[~fredyw], would you mind taking a look as you seem to have touched this test 
recently.

{noformat}
java.lang.IllegalStateException: Error refreshing authorization policy: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 Caused by: org.apache.impala.catalog.CatalogException: Error refreshing 
authorization policy: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 Caused by: org.apache.impala.common.ImpalaRuntimeException: Error refreshing 
authorization policy, current policy state may be inconsistent. Running 
'invalidate metadata' may resolve this problem: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 Caused by: java.util.concurrent.ExecutionException: 
org.apache.impala.common.SentryPolicyReaderException: 
org.apache.impala.common.InternalException: Error creating Sentry Service 
client: at 
org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
 Caused by: org.apache.impala.common.SentryPolicyReaderException: 
org.apache.impala.common.InternalException: Error creating Sentry Service 
client: Caused by: org.apache.impala.common.InternalException: Error creating 
Sentry Service client: Caused by: 
org.apache.sentry.core.common.exception.MissingConfigurationException: Property 
'sentry.service.server.principal' is missing in configuration
h3. Standard Output
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7972) Detect self-events to avoid unnecessary invalidates

2019-01-31 Thread Bharathkrishna Guruvayoor Murali (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharathkrishna Guruvayoor Murali reassigned IMPALA-7972:


Assignee: Bharathkrishna Guruvayoor Murali

> Detect self-events to avoid unnecessary invalidates
> ---
>
> Key: IMPALA-7972
> URL: https://issues.apache.org/jira/browse/IMPALA-7972
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
>
> When a metastore objects are created/altered or dropped, it would generate a 
> event which will be polled by the same Catalog server. Such self-events 
> should be detected and we should avoid invalidating the tables when they are 
> received. See the design doc attached to the main JIRA for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6932) Simple LIMIT 1 query can be really slow on many-filed sequence datasets

2019-01-31 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6932.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Simple LIMIT 1 query can be really slow on many-filed sequence datasets
> ---
>
> Key: IMPALA-6932
> URL: https://issues.apache.org/jira/browse/IMPALA-6932
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Philip Zeyliger
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> I recently ran across really slow behavior with the trivial {{SELECT * FROM 
> table LIMIT 1}} query. The table used Avro as a file format and had about 
> 45,000 files across about 250 partitions. An optimization kicked in to set 
> NUM_NODES to 1.
> The query ran for about an hour, and the profile indicated that it was 
> opening files:
>   - TotalRawHdfsOpenFileTime(*): 1.0h (3622833666032)
> I took a single minidump while this query was running, and I suspect the 
> query was here:
> {code:java}
> 1 impalad!impala::ScannerContext::Stream::GetNextBuffer(long) 
> [scanner-context.cc : 115 + 0x13]
> 2 impalad!impala::ScannerContext::Stream::GetBytesInternal(long, unsigned 
> char**, bool, long*) [scanner-context.cc : 241 + 0x5]
> 3 impalad!impala::HdfsAvroScanner::ReadFileHeader() [scanner-context.inline.h 
> : 54 + 0x1f]
> 4 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 157 + 0x13]
> 5 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 129 + 0xc]
> 6 
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*) [hdfs-scan-node.cc : 527 + 0x17]
> 7 impalad!impala::HdfsScanNode::ScannerThread() [hdfs-scan-node.cc : 437 + 
> 0x1c]
> 8 impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [function_template.hpp : 767 + 0x7]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6932) Simple LIMIT 1 query can be really slow on many-filed sequence datasets

2019-01-31 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6932.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Simple LIMIT 1 query can be really slow on many-filed sequence datasets
> ---
>
> Key: IMPALA-6932
> URL: https://issues.apache.org/jira/browse/IMPALA-6932
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Philip Zeyliger
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> I recently ran across really slow behavior with the trivial {{SELECT * FROM 
> table LIMIT 1}} query. The table used Avro as a file format and had about 
> 45,000 files across about 250 partitions. An optimization kicked in to set 
> NUM_NODES to 1.
> The query ran for about an hour, and the profile indicated that it was 
> opening files:
>   - TotalRawHdfsOpenFileTime(*): 1.0h (3622833666032)
> I took a single minidump while this query was running, and I suspect the 
> query was here:
> {code:java}
> 1 impalad!impala::ScannerContext::Stream::GetNextBuffer(long) 
> [scanner-context.cc : 115 + 0x13]
> 2 impalad!impala::ScannerContext::Stream::GetBytesInternal(long, unsigned 
> char**, bool, long*) [scanner-context.cc : 241 + 0x5]
> 3 impalad!impala::HdfsAvroScanner::ReadFileHeader() [scanner-context.inline.h 
> : 54 + 0x1f]
> 4 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 157 + 0x13]
> 5 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 129 + 0xc]
> 6 
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*) [hdfs-scan-node.cc : 527 + 0x17]
> 7 impalad!impala::HdfsScanNode::ScannerThread() [hdfs-scan-node.cc : 437 + 
> 0x1c]
> 8 impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [function_template.hpp : 767 + 0x7]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757634#comment-16757634
 ] 

ASF subversion and git services commented on IMPALA-7992:
-

Commit cd25af360f8f7583582cb239353dd15f572fe211 in impala's branch 
refs/heads/master from Philip Zeyliger
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cd25af3 ]

Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug builds.""

I believe IMPALA-7992 is addressed with "IMPALA-7992: Reduce iterations
for test_decimal_fuzz." which reduces the number of iterations.

Bringing back debug symbols allows us to go back to symbolizing debug
traces, which is very handy for logs that are unattached to the binary
that generated them.

This reverts commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6.

Change-Id: Idd9c444da4016eb6935119d03fec6b07c17dda53
Reviewed-on: http://gerrit.cloudera.org:8080/12300
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
> 
>
> Key: IMPALA-7992
> URL: https://issues.apache.org/jira/browse/IMPALA-7992
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> Error Message
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> Stacktrace
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> stderr
> {noformat}
> -- 2018-12-16 00:10:48,905 INFO MainThread: Started query 
> aa4b44ad5b34c3fb:24d18385
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(-879550566.24 as decimal(11,2)) % 
> cast(-100.000 as decimal(28,5));
> -- 2018-12-16 00:10:48,979 INFO MainThread: Started query 
> b24acf22b1607dc6:4f287530
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(17179869.184 as decimal(19,7)) / 
> cast(-87808593158000679814.7939232649738916 as decimal(38,17));
> -- 2018-12-16 00:10:49,054 INFO MainThread: Started query 
> 38435f02022e590a:18f7e97
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(99 as decimal(32,2)) - 
> cast(-519203.671959101313 as decimal(18,12));
> -- 2018-12-16 00:10:49,132 INFO MainThread: Started query 
> 504edbac7ecb32ce:bfbbbe93
> ~ Stack of  (140061483271936) 
> ~
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 277, in _perform_spawn
> reply.run()
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 213, in run
> self._result = func(*args, **kwargs)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 954, in _thread_receiver
> msg = Message.from_io(io)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 418, in from_io
> header = io.read(9)  # type 1, channel 4, payload 4
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 386, in read
> data = self._read(numbytes-len(buf))
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757635#comment-16757635
 ] 

ASF subversion and git services commented on IMPALA-7992:
-

Commit cd25af360f8f7583582cb239353dd15f572fe211 in impala's branch 
refs/heads/master from Philip Zeyliger
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cd25af3 ]

Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug builds.""

I believe IMPALA-7992 is addressed with "IMPALA-7992: Reduce iterations
for test_decimal_fuzz." which reduces the number of iterations.

Bringing back debug symbols allows us to go back to symbolizing debug
traces, which is very handy for logs that are unattached to the binary
that generated them.

This reverts commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6.

Change-Id: Idd9c444da4016eb6935119d03fec6b07c17dda53
Reviewed-on: http://gerrit.cloudera.org:8080/12300
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
> 
>
> Key: IMPALA-7992
> URL: https://issues.apache.org/jira/browse/IMPALA-7992
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> Error Message
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> Stacktrace
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> stderr
> {noformat}
> -- 2018-12-16 00:10:48,905 INFO MainThread: Started query 
> aa4b44ad5b34c3fb:24d18385
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(-879550566.24 as decimal(11,2)) % 
> cast(-100.000 as decimal(28,5));
> -- 2018-12-16 00:10:48,979 INFO MainThread: Started query 
> b24acf22b1607dc6:4f287530
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(17179869.184 as decimal(19,7)) / 
> cast(-87808593158000679814.7939232649738916 as decimal(38,17));
> -- 2018-12-16 00:10:49,054 INFO MainThread: Started query 
> 38435f02022e590a:18f7e97
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(99 as decimal(32,2)) - 
> cast(-519203.671959101313 as decimal(18,12));
> -- 2018-12-16 00:10:49,132 INFO MainThread: Started query 
> 504edbac7ecb32ce:bfbbbe93
> ~ Stack of  (140061483271936) 
> ~
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 277, in _perform_spawn
> reply.run()
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 213, in run
> self._result = func(*args, **kwargs)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 954, in _thread_receiver
> msg = Message.from_io(io)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 418, in from_io
> header = io.read(9)  # type 1, channel 4, payload 4
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 386, in read
> data = self._read(numbytes-len(buf))
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6932) Simple LIMIT 1 query can be really slow on many-filed sequence datasets

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757637#comment-16757637
 ] 

ASF subversion and git services commented on IMPALA-6932:
-

Commit 653ff1585daf1ae0f1c914a3d03581e6ca80c47f in impala's branch 
refs/heads/master from poojanilangekar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=653ff15 ]

IMPALA-6932: Speed up scans for sequence datasets with many files

This change addresses the slow scans of sequence datasets with
many files by enqueueing the scan ranges to the head of the disk
IO queue instead of the tail. This ensures that the data ranges
get priority over headers of other files. Hence it produces
results earlier for limit queries.

Testing:
Added a unit test to verify that the expected elements are
dequeued from the front.

Tested the performance of this patch on S3 to emulate remote reads.
The following query was executed several times:
"SELECT * FROM TPCH_AVRO.LINEITEM LIMIT 1;"
The average timeline difference was 8.66s vs 5.87s. The scanner I/O
wait time went down from 2.37s to 9.85s.

Tested the patch with backend and end-to-end tests.
Single node performance test results:
+--++-++++
| Workload | File Format| Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--++-++++
| TPCH(50) | avro / none / none | 65.62   | -0.38% | 43.51  | -0.79%
 |
+--++-++++

Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Reviewed-on: http://gerrit.cloudera.org:8080/11517
Reviewed-by: Philip Zeyliger 
Tested-by: Philip Zeyliger 


> Simple LIMIT 1 query can be really slow on many-filed sequence datasets
> ---
>
> Key: IMPALA-6932
> URL: https://issues.apache.org/jira/browse/IMPALA-6932
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Philip Zeyliger
>Assignee: Pooja Nilangekar
>Priority: Critical
>
> I recently ran across really slow behavior with the trivial {{SELECT * FROM 
> table LIMIT 1}} query. The table used Avro as a file format and had about 
> 45,000 files across about 250 partitions. An optimization kicked in to set 
> NUM_NODES to 1.
> The query ran for about an hour, and the profile indicated that it was 
> opening files:
>   - TotalRawHdfsOpenFileTime(*): 1.0h (3622833666032)
> I took a single minidump while this query was running, and I suspect the 
> query was here:
> {code:java}
> 1 impalad!impala::ScannerContext::Stream::GetNextBuffer(long) 
> [scanner-context.cc : 115 + 0x13]
> 2 impalad!impala::ScannerContext::Stream::GetBytesInternal(long, unsigned 
> char**, bool, long*) [scanner-context.cc : 241 + 0x5]
> 3 impalad!impala::HdfsAvroScanner::ReadFileHeader() [scanner-context.inline.h 
> : 54 + 0x1f]
> 4 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 157 + 0x13]
> 5 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 129 + 0xc]
> 6 
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*) [hdfs-scan-node.cc : 527 + 0x17]
> 7 impalad!impala::HdfsScanNode::ScannerThread() [hdfs-scan-node.cc : 437 + 
> 0x1c]
> 8 impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [function_template.hpp : 767 + 0x7]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7540) Intern common strings in catalog

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757633#comment-16757633
 ] 

ASF subversion and git services commented on IMPALA-7540:
-

Commit f20a03a7b1bc2a9bb6cd8b54b8afb9ce384538f1 in impala's branch 
refs/heads/master from Todd Lipcon
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f20a03a ]

IMPALA-7540. Intern most repetitive strings and network addresses in catalog

This adds interning to a bunch of repeated strings in catalog objects,
including:
- table name
- DB name
- owner
- column names
- input/output formats
- parameter keys
- common parameter values ("true", "false", etc)
- HBase column family names

Additionally, it interns TNetworkAddresses, so that each datanode host
is only stored once rather than having its own copy in each table.

I verified this patch using jxray on the development catalogd and
impalad. The following lines are removed entirely from the "duplicate
strings" report:

 Overhead   # char[]s # objects  Value
 164K (0.3%) 2,635   2,635  "127.0.0.1"
 97K (0.2%)  1,038   1,038  "__HIVE_DEFAULT_PARTITION__"
 95K (0.2%)  1,111   1,111  "transient_lastDdlTime"
 92K (0.1%)  1,975   1,975  "d"
 70K (0.1%)  997 997"EXTERNAL_TABLE"
 56K (< 0.1%)1,201   1,201  "todd"
 54K (< 0.1%)998 998"EXTERNAL"
 46K (< 0.1%)998 998"TRUE"
 44K (< 0.1%)567 567"numFilesErasureCoded"
 38K (< 0.1%)612 612"totalSize"
 30K (< 0.1%)567 567"numFiles"

The following are reduced substantially:

Before: 72K (0.1%)  1,543   1,543  "1"
After:  47K (< 0.1%)1,009   1,009  "1"

A few large strings remain in the report that may be worth addressing, depending
on whether we think production catalogs exhibit the same repetitions:

1) Avro schemas, eg:
 204K (0.3%) 3   3  "{"fields": [{"type": ["boolean", "null"], 
"name": "bool_col1"}, {"type": ["int", "null"], "name": "tinyint_col1"}, 
{"type": ...[length 52429]"

(in the development catalog there are multiple tables with the same Avro
schema)

2) Partition location suffixes, eg:
 144K (0.2%) 1,234   1,234  "many_blocks_num_blocks_per_partition_1"
 17K (< 0.1%)230 230"year=2009/month=2"
 17K (< 0.1%)230 230"year=2009/month=3"
 17K (< 0.1%)230 230"year=2009/month=1"

(in the development catalog lots of tables have the same partitioning
layout)

3) Unsure (jxray isn't reporting the reference chain, but seems likely
   to be partition values):
 49K (< 0.1%)1,058   1,058  "2010"
 28K (< 0.1%)612 612"2009"
 27K (< 0.1%)585 585"0"
 22K (< 0.1%)71  899""

Change-Id: Ib3121aefa4391bcb1477d9dba0a49440d7000d26
Reviewed-on: http://gerrit.cloudera.org:8080/11158
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Intern common strings in catalog
> 
>
> Key: IMPALA-7540
> URL: https://issues.apache.org/jira/browse/IMPALA-7540
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> Using jxray shows that there are many common duplicate strings in the 
> catalog. For example, each table repeats the database name, and metadata like 
> the HMS parameter maps reuse a lot of common strings like "EXTERNAL" or 
> "transient_lastDdlTime". We should intern these to save memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8111) Document workaround for some authentication issues with KRPC

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757642#comment-16757642
 ] 

ASF subversion and git services commented on IMPALA-8111:
-

Commit 1c2778a9a371a0cb21bfb4afeeb6e50e3c94e469 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1c2778a ]

IMPALA-8111: [DOCS] Take 2: Removed the Fix Version for KUDU-2198

Change-Id: I299eeda31de3cea768c1a1627702527501240a8b
Reviewed-on: http://gerrit.cloudera.org:8080/12311
Tested-by: Alex Rodoni 
Reviewed-by: Alex Rodoni 


> Document workaround for some authentication issues with KRPC
> 
>
> Key: IMPALA-8111
> URL: https://issues.apache.org/jira/browse/IMPALA-8111
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Michael Ho
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_32
> Fix For: Impala 3.2.0
>
>
> There have been complaints from users about not being able to use Impala 
> after upgrading to Impala version with KRPC enabled due to authentication 
> issues. Please document them in the known issues or best practice guide.
> 1. https://issues.apache.org/jira/browse/IMPALA-7585:
>  *Symptoms*: When using Impala with LDAP enabled, a user may hit the 
> following:
> {noformat}
> Not authorized: Client connection negotiation failed: client connection to 
> 127.0.0.1:27000: SASL(-1): generic failure: All-whitespace username.
> {noformat}
> *Root cause*: The following sequence can lead to the user "impala" not being 
> created in /etc/passwd.
> {quote}time 1: no impala in LDAP; things get installed; impala created in 
> /etc/passwd
>  time 2: impala added to LDAP
>  time 3: new machine added
> {quote}
> *Workaround*:
>  - Manually edit /etc/passwd to add the impala user
>  - Upgrade to a version of Impala with the patch IMPALA-7585
> 2. https://issues.apache.org/jira/browse/IMPALA-7298
>  *Symptoms*: When running with Kerberos enabled, a user may hit the following 
> error:
> {noformat}
> WARNINGS: TransmitData() to X.X.X.X:27000 failed: Not authorized: Client 
> connection negotiation failed: client connection to X.X.X.X:27000: Server 
> impala/x.x@vpc.cloudera.com not found in Kerberos database
> {noformat}
> *Root cause*:
>  KrpcDataStreamSender passes a resolved IP address when creating a proxy. 
> Instead, we should pass both the resolved address and the hostname when 
> creating the proxy so that we won't end up using the IP address as the 
> hostname in the Kerberos principal.
> *Workaround*:
>  - Set rdns=true in /etc/krb5.conf
>  - Upgrade to a version of Impala with the fix of IMPALA-7298
> 3. https://issues.apache.org/jira/browse/KUDU-2198
>  *Symptoms*: When running with Kerberos enabled, a user may hit the following 
> error message where  is some random string which doesn't match 
> the primary in the Kerberos principal
> {noformat}
> WARNINGS: TransmitData() to X.X.X.X:27000 failed: Remote error: Not 
> authorized: {username='', principal='impala/redacted'} is not 
> allowed to access DataStreamService
> {noformat}
> *Root cause*:
>  Due to system "auth_to_local" mapping, the principal may be mapped to some 
> local name.
> *Workaround*:
>  - Start Impala with the flag {{--use_system_auth_to_local=false}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8129) Build failure: query_test/test_observability.py

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757641#comment-16757641
 ] 

ASF subversion and git services commented on IMPALA-8129:
-

Commit ee491bb67da2d62b2b801d6ff0de40fc286411b9 in impala's branch 
refs/heads/master from Lars Volker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ee491bb ]

IMPALA-8129: Don't test exact value of ExchangeScanRatio on S3 and EC

Running against S3 and erasure coded HDFS causes slight changes in the
observed ExchangeScanRatio and breaks such tests. This change limits
TestObservability::test_global_exchange_counters to HDFS local
minicluster runs without EC.

Change-Id: I6cf58113e092d43f5444120040aa49f90cdb91fb
Reviewed-on: http://gerrit.cloudera.org:8080/12288
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Build failure: query_test/test_observability.py
> ---
>
> Key: IMPALA-8129
> URL: https://issues.apache.org/jira/browse/IMPALA-8129
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> {{query_test/test_observability.py}} failed in the multiple builds:
> Erasure-coding build:
> {noformat}
> 18:49:01 === FAILURES 
> ===
> 18:49:01 ___ TestObservability.test_global_exchange_counters 
> 
> 18:49:01 [gw0] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/../infra/python/env/bin/python
> 18:49:01 query_test/test_observability.py:400: in 
> test_global_exchange_counters
> 18:49:01 assert "ExchangeScanRatio: 3.19" in profile
> 18:49:01 E   assert 'ExchangeScanRatio: 3.19' in 'Query 
> (id=704d1f6b09400fba:b91dc70):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG build...  - OptimizationTime: 32.000ms\n
>- PeakMemoryUsage: 220.00 KB (225280)\n   - PrepareTime: 
> 26.000ms\n'
> {noformat}
> Core build:
> {noformat}
> 07:36:43 FAIL 
> query_test/test_observability.py::TestObservability::()::test_global_exchange_counters
> 07:36:43 === FAILURES 
> ===
> 07:36:43 ___ TestObservability.test_global_exchange_counters 
> 
> 07:36:43 [gw2] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/../infra/python/env/bin/python
> 07:36:43 query_test/test_observability.py:400: in 
> test_global_exchange_counters
> 07:36:43 assert "ExchangeScanRatio: 3.19" in profile
> 07:36:43 E   assert 'ExchangeScanRatio: 3.19' in 'Query 
> (id=b546ddcfab65e431:471aa218):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...  - OptimizationTime: 32.000ms\n 
>   - PeakMemoryUsage: 220.00 KB (225280)\n   - PrepareTime: 32.000ms\n'
> {noformat}
> Assigning to Lars since it may be related to the patch for IMPALA-7731: Add 
> Read/Exchange counters to profile



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7333) Remove MarkNeedsDeepCopy from Aggregation and Hash Join Nodes

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757640#comment-16757640
 ] 

ASF subversion and git services commented on IMPALA-7333:
-

Commit 6214d2d81236695a9e606634dbeebdbbbccf3847 in impala's branch 
refs/heads/master from Lars Volker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6214d2d ]

IMPALA-8140: Fix use-after-poison in grouping aggregator

IMPALA-7333 changed the memory transfer semantics to move memory when
attaching it to the output batch instead of copying it. This caused a
use-after-poison when the cleaning up the hash tables during the call to
Close(). To fix this, we now clean up the hash table before closing the
output row stream.

Testing: added a test to aggregation.test

Change-Id: Id23cd1e2fc5e003e3c9e3503436621a76d49559d
Reviewed-on: http://gerrit.cloudera.org:8080/12298
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Remove MarkNeedsDeepCopy from Aggregation and Hash Join Nodes
> -
>
> Key: IMPALA-7333
> URL: https://issues.apache.org/jira/browse/IMPALA-7333
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.1.0
>
>
> The main part of this is fixing BufferedTupleStream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8140) Grouping aggregation with limit breaks asan build

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757639#comment-16757639
 ] 

ASF subversion and git services commented on IMPALA-8140:
-

Commit 6214d2d81236695a9e606634dbeebdbbbccf3847 in impala's branch 
refs/heads/master from Lars Volker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6214d2d ]

IMPALA-8140: Fix use-after-poison in grouping aggregator

IMPALA-7333 changed the memory transfer semantics to move memory when
attaching it to the output batch instead of copying it. This caused a
use-after-poison when the cleaning up the hash tables during the call to
Close(). To fix this, we now clean up the hash table before closing the
output row stream.

Testing: added a test to aggregation.test

Change-Id: Id23cd1e2fc5e003e3c9e3503436621a76d49559d
Reviewed-on: http://gerrit.cloudera.org:8080/12298
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Grouping aggregation with limit breaks asan build
> -
>
> Key: IMPALA-8140
> URL: https://issues.apache.org/jira/browse/IMPALA-8140
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: asan, crash
>
> Commit 4af3a7853e9 for IMPALA-7333 breaks the following query on ASAN:
> {code:sql}
> select count(*) from tpch_parquet.orders o group by o.o_clerk limit 10;
> {code}
> {noformat}
> ==30219==ERROR: AddressSanitizer: use-after-poison on address 0x631000c4569c 
> at pc 0x020163cc bp 0x7f73a12a5700 sp 0x7f73a12a56f8
> READ of size 1 at 0x631000c4569c thread T276
> #0 0x20163cb in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) 
> const /tmp/be/src/runtime/tuple.h:241:13
> #1 0x280c3d1 in 
> impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, 
> impala::SlotDescriptor const&, impala::Tuple*, void*) 
> /tmp/be/src/exprs/agg-fn-evaluator.cc:393:29
> #2 0x2777bc8 in 
> impala::AggFnEvaluator::Finalize(std::vector std::allocator > const&, impala::Tuple*, 
> impala::Tuple*) /tmp/be/src/exprs/agg-fn-evaluator.h:307:15
> #3 0x27add96 in 
> impala::GroupingAggregator::CleanupHashTbl(std::vector  std::allocator > const&, 
> impala::HashTable::Iterator) /tmp/be/src/exec/grouping-aggregator.cc:351:7
> #4 0x27ae2b2 in impala::GroupingAggregator::ClosePartitions() 
> /tmp/be/src/exec/grouping-aggregator.cc:930:5
> #5 0x27ae5f4 in impala::GroupingAggregator::Close(impala::RuntimeState*) 
> /tmp/be/src/exec/grouping-aggregator.cc:383:3
> #6 0x27637f7 in impala::AggregationNode::Close(impala::RuntimeState*) 
> /tmp/be/src/exec/aggregation-node.cc:139:32
> #7 0x206b7e9 in impala::FragmentInstanceState::Close() 
> /tmp/be/src/runtime/fragment-instance-state.cc:368:42
> #8 0x2066b1a in impala::FragmentInstanceState::Exec() 
> /tmp/be/src/runtime/fragment-instance-state.cc:99:3
> #9 0x2080e12 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> /tmp/be/src/runtime/query-state.cc:584:24
> #10 0x1d79036 in boost::function0::operator()() const 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:766:14
> #11 0x24bbe06 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) 
> /tmp/be/src/util/thread.cc:359:3
> #12 0x24c72f8 in void boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), boost::_bi::list0&, int) 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:525:9
> #13 0x24c714b in boost::_bi::bind_t std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
> >::operator()() 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind_template.hpp:20:16
> #14 0x3c83949 in thread_proxy 
> (/home/lv/i4/be/build/debug/service/impalad+0x3c83949)
> #15 0x7f768ce73183 in start_thread 
> /build/eglibc-ripdx6/eglibc-2.19/nptl/pthread_create.c:312
> #16 0x7f768c98a03c in clone 
> /build/eglibc-ripdx6/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> {noformat}
> The problem seems to be that we call 
> {{output_partition_->aggregated_row_stream->Close()}} in 
> be/src/exec/grouping-aggregator.cc:284 when hitting the limit, and then later 
> 

[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757636#comment-16757636
 ] 

ASF subversion and git services commented on IMPALA-7992:
-

Commit cd25af360f8f7583582cb239353dd15f572fe211 in impala's branch 
refs/heads/master from Philip Zeyliger
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cd25af3 ]

Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug builds.""

I believe IMPALA-7992 is addressed with "IMPALA-7992: Reduce iterations
for test_decimal_fuzz." which reduces the number of iterations.

Bringing back debug symbols allows us to go back to symbolizing debug
traces, which is very handy for logs that are unattached to the binary
that generated them.

This reverts commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6.

Change-Id: Idd9c444da4016eb6935119d03fec6b07c17dda53
Reviewed-on: http://gerrit.cloudera.org:8080/12300
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
> 
>
> Key: IMPALA-7992
> URL: https://issues.apache.org/jira/browse/IMPALA-7992
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> Error Message
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> Stacktrace
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> stderr
> {noformat}
> -- 2018-12-16 00:10:48,905 INFO MainThread: Started query 
> aa4b44ad5b34c3fb:24d18385
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(-879550566.24 as decimal(11,2)) % 
> cast(-100.000 as decimal(28,5));
> -- 2018-12-16 00:10:48,979 INFO MainThread: Started query 
> b24acf22b1607dc6:4f287530
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(17179869.184 as decimal(19,7)) / 
> cast(-87808593158000679814.7939232649738916 as decimal(38,17));
> -- 2018-12-16 00:10:49,054 INFO MainThread: Started query 
> 38435f02022e590a:18f7e97
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(99 as decimal(32,2)) - 
> cast(-519203.671959101313 as decimal(18,12));
> -- 2018-12-16 00:10:49,132 INFO MainThread: Started query 
> 504edbac7ecb32ce:bfbbbe93
> ~ Stack of  (140061483271936) 
> ~
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 277, in _perform_spawn
> reply.run()
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 213, in run
> self._result = func(*args, **kwargs)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 954, in _thread_receiver
> msg = Message.from_io(io)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 418, in from_io
> header = io.read(9)  # type 1, channel 4, payload 4
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 386, in read
> data = self._read(numbytes-len(buf))
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8146) Remove make_{debug,release,asan}.sh

2019-01-31 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757638#comment-16757638
 ] 

ASF subversion and git services commented on IMPALA-8146:
-

Commit e8ea4b0525fb3223da9291775eff4b7fbd15c547 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e8ea4b0 ]

IMPALA-8146: Remove make_{debug,release,asan}.sh

Change-Id: Iac997917463b871769112834835cc3f99cff5954
Reviewed-on: http://gerrit.cloudera.org:8080/12306
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Remove make_{debug,release,asan}.sh
> ---
>
> Key: IMPALA-8146
> URL: https://issues.apache.org/jira/browse/IMPALA-8146
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Minor
>
> These scripts are thin wrappers around make_impala.sh and don't add much 
> value. We should remove them. I think there are a couple of references in the 
> wiki that we can update to show alternative invocations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8149) Add support for alter_database events

2019-01-31 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created IMPALA-8149:
---

 Summary: Add support for alter_database events
 Key: IMPALA-8149
 URL: https://issues.apache.org/jira/browse/IMPALA-8149
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-4018) Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... FORMAT )

2019-01-31 Thread Greg Rahn (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757472#comment-16757472
 ] 

Greg Rahn edited comment on IMPALA-4018 at 1/31/19 4:59 PM:


I am strongly opposed to introducing SQL functionality that does not follow the 
ISO standard, including the date patterns.

One of the major reasons for introducing something new is that it allows a 
clean break from mistakes of the past -- and the Java date patterns are a 
mistake per the SQL standard. This new ISO cast/format functionality allows 
Impala to fully support the ISO SQL patterns while not changing any legacy 
behavior as both the syntax and the formats are new.

I would be in favor of a session level feature flag that enables the ISO SQL 
date patterns for legacy functions, which would bring them into ISO SQL 
compliance, but I see zero benefits to introducing the cast/format with non-ISO 
SQL patterns.






was (Author: grahn):
I am strongly opposed to introducing SQL functionality that does not follow the 
ISO standard, including the date patterns.

One of the major reasons for introducing something new is that it allows a 
clean break from mistakes of the past -- and the Java date patterns are a 
mistake per the SQL standard. This new ISO cast/format functionality allows 
Impala to fully support the ISO SQL patterns while not changing any legacy 
behavior as both the syntax and the formats are new.

I would be in favor of a session level feature flag that enables the ISO SQL 
date patterns for legacy functions, which would bring them into ISO SQL 
compliance, but I zero benefits to introducing the cast/format with non-ISO SQL 
patterns.





> Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... 
> FORMAT )
> 
>
> Key: IMPALA-4018
> URL: https://issues.apache.org/jira/browse/IMPALA-4018
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: ansi-sql, compatibility, sql-language
>
> *Summary*
> The format masks/templates for currently are implemented using the [Java 
> SimpleDateFormat 
> patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  and although this is what Hive has implemented, it is not what most standard 
> SQL systems implement.  For example see 
> [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm],
>  
> [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html],
>   
> [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212],
>  and 
> [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE].
>  
> *Examples of incompatibilities*
> {noformat}
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('May 15, 2015 12:00:00', 'mon dd,  hh:mi:ss');
> -- Impala
> select to_timestamp('May 15, 2015 12:00:00', 'MMM dd,  HH:mm:ss');
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07','-mm-dd hh24:mi:ss');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07','-MM-dd HH:mm:ss');
> -- Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07.123456','-mm-dd hh24:mi:ss.ff');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07.123456','-MM-dd 
> HH:mm:ss.SS');
> {noformat}
> *Considerations*
> Because this is a change in default behavior for to_timestamp(), if possible, 
> having a feature flag to revert to the legacy Java SimpleDateFormat patterns 
> should be strongly considered.  This would allow users to chose the behavior 
> they desire and scope it to a session if need be.
> SQL:2016 defines the following datetime templates
> {noformat}
>  ::=
>   {  }...
>  ::=
> 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
> | 
>  ::=
>    | YYY | YY | Y
>  ::=
>    | RR
>  ::=
>   MM
>  ::=
>   DD
>  ::=
>   DDD
>  ::=
>   HH | HH12
>  ::=
>   HH24
>  ::=
>   MI
>  ::=
>   SS
>  ::=
>   S
>  ::=
>   FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
>  ::=
>   A.M. | P.M.
>  ::=
>   TZH
>  ::=
>   TZM
> {noformat}
> SQL:2016 also introduced the FORMAT clause for CAST which is the standard way 
> to do string <> datetime conversions
> {noformat}
>  ::=
>   CAST 
>AS 
>   [ FORMAT  ]
>   
>  ::=
> 
>   | 
>  ::=
> 
> | 
>  ::=
>   
> {noformat}

[jira] [Commented] (IMPALA-4018) Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... FORMAT )

2019-01-31 Thread Greg Rahn (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757472#comment-16757472
 ] 

Greg Rahn commented on IMPALA-4018:
---

I am strongly opposed to introducing SQL functionality that does not follow the 
ISO standard, including the date patterns.

One of the major reasons for introducing something new is that it allows a 
clean break from mistakes of the past -- and the Java date patterns are a 
mistake per the SQL standard. This new ISO cast/format functionality allows 
Impala to fully support the ISO SQL patterns while not changing any legacy 
behavior as both the syntax and the formats are new.

I would be in favor of a session level feature flag that enables the ISO SQL 
date patterns for legacy functions, which would bring them into ISO SQL 
compliance, but I zero benefits to introducing the cast/format with non-ISO SQL 
patterns.





> Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... 
> FORMAT )
> 
>
> Key: IMPALA-4018
> URL: https://issues.apache.org/jira/browse/IMPALA-4018
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: ansi-sql, compatibility, sql-language
>
> *Summary*
> The format masks/templates for currently are implemented using the [Java 
> SimpleDateFormat 
> patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  and although this is what Hive has implemented, it is not what most standard 
> SQL systems implement.  For example see 
> [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm],
>  
> [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html],
>   
> [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212],
>  and 
> [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE].
>  
> *Examples of incompatibilities*
> {noformat}
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('May 15, 2015 12:00:00', 'mon dd,  hh:mi:ss');
> -- Impala
> select to_timestamp('May 15, 2015 12:00:00', 'MMM dd,  HH:mm:ss');
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07','-mm-dd hh24:mi:ss');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07','-MM-dd HH:mm:ss');
> -- Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07.123456','-mm-dd hh24:mi:ss.ff');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07.123456','-MM-dd 
> HH:mm:ss.SS');
> {noformat}
> *Considerations*
> Because this is a change in default behavior for to_timestamp(), if possible, 
> having a feature flag to revert to the legacy Java SimpleDateFormat patterns 
> should be strongly considered.  This would allow users to chose the behavior 
> they desire and scope it to a session if need be.
> SQL:2016 defines the following datetime templates
> {noformat}
>  ::=
>   {  }...
>  ::=
> 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
> | 
>  ::=
>    | YYY | YY | Y
>  ::=
>    | RR
>  ::=
>   MM
>  ::=
>   DD
>  ::=
>   DDD
>  ::=
>   HH | HH12
>  ::=
>   HH24
>  ::=
>   MI
>  ::=
>   SS
>  ::=
>   S
>  ::=
>   FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
>  ::=
>   A.M. | P.M.
>  ::=
>   TZH
>  ::=
>   TZM
> {noformat}
> SQL:2016 also introduced the FORMAT clause for CAST which is the standard way 
> to do string <> datetime conversions
> {noformat}
>  ::=
>   CAST 
>AS 
>   [ FORMAT  ]
>   
>  ::=
> 
>   | 
>  ::=
> 
> | 
>  ::=
>   
> {noformat}
> For example:
> {noformat}
> CAST( AS  [FORMAT ])
> CAST( AS  [FORMAT ])
> cast(dt as string format 'DD-MM-')
> cast('01-05-2017' as date format 'DD-MM-')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4018) Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... FORMAT )

2019-01-31 Thread Gabor Kaszab (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757273#comment-16757273
 ] 

Gabor Kaszab commented on IMPALA-4018:
--

Hey,

I have reflected to some of the points above on the code review but will cover 
them here as well (https://gerrit.cloudera.org/#/c/12267/).

About the technical details of the current solution:
- We currently have to_timestamp() and from_timestamp() to have both directions 
of the "string/varchar/char vs timestamp" conversions.
- to_char() mentioned above sounds similar to from_timestamp() for me.
- In theory these functions use the Java pattern for conversion but this might 
be misleading as the actual parsing of the pattern and conversion of values 
happen in BE C++. So there is an Impala specific implementation that is meant 
to reflect the Java pattern and we don't re-use a built-in Java library for 
this purpose.
- Have a parsing algorithm in the FE is feasible but since the actual 
formatting has to happen in the BE we should have a quite similar 
implementation there as well and maintain both of them. In addition to this we 
have to send the parsed tokens at least between FE and BE in case we want to 
have the parsing in the FE.

I still feel that delivering first CAST(..FORMAT..) with the Java pattern makes 
much sense. It was mentioned as a requirement that the new SQL pattern should 
be hidden by a feature flag as it changes how from_timestamp() and 
to_timestamp() work. Then it would be reasonable to have both the Java and SQL 
pattern available for the CAST(..FORMAT..) as well. I imagine that a situation 
where a feature flag changes to_timestamp() and from_timestamp() behaviour but 
doesn't change CAST(..FORMAT..) would lead to more misunderstandings of how 
Impala handles these patterns.
In my opinion it would be a cleaner approach to have CAST(..FORMAT..) now to be 
in line with the other 2 conversion functions, and then introduce the new SQL 
pattern in one go for every function that uses these datetime patterns. 
(Introducing CAST(..FORMAT..) with Java pattern is already on review)


> Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... 
> FORMAT )
> 
>
> Key: IMPALA-4018
> URL: https://issues.apache.org/jira/browse/IMPALA-4018
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: ansi-sql, compatibility, sql-language
>
> *Summary*
> The format masks/templates for currently are implemented using the [Java 
> SimpleDateFormat 
> patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  and although this is what Hive has implemented, it is not what most standard 
> SQL systems implement.  For example see 
> [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm],
>  
> [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html],
>   
> [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212],
>  and 
> [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE].
>  
> *Examples of incompatibilities*
> {noformat}
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('May 15, 2015 12:00:00', 'mon dd,  hh:mi:ss');
> -- Impala
> select to_timestamp('May 15, 2015 12:00:00', 'MMM dd,  HH:mm:ss');
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07','-mm-dd hh24:mi:ss');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07','-MM-dd HH:mm:ss');
> -- Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07.123456','-mm-dd hh24:mi:ss.ff');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07.123456','-MM-dd 
> HH:mm:ss.SS');
> {noformat}
> *Considerations*
> Because this is a change in default behavior for to_timestamp(), if possible, 
> having a feature flag to revert to the legacy Java SimpleDateFormat patterns 
> should be strongly considered.  This would allow users to chose the behavior 
> they desire and scope it to a session if need be.
> SQL:2016 defines the following datetime templates
> {noformat}
>  ::=
>   {  }...
>  ::=
> 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
> | 
>  ::=
>    | YYY | YY | Y
>  ::=
>    | RR
>  ::=
>   MM
>  ::=
>   DD
>  ::=
>   DDD
>  ::=
>   HH | HH12
>  ::=
>   HH24
>  ::=
>   MI
>