[jira] [Created] (IMPALA-8765) Document JSON Udfs

2019-07-15 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8765:
-

 Summary: Document JSON Udfs
 Key: IMPALA-8765
 URL: https://issues.apache.org/jira/browse/IMPALA-8765
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Tim Armstrong
Assignee: Alex Rodoni


It looks like we missed documenting the new builtin from the parent JIRA



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8765) Document JSON Udfs

2019-07-15 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8765:
-

 Summary: Document JSON Udfs
 Key: IMPALA-8765
 URL: https://issues.apache.org/jira/browse/IMPALA-8765
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Tim Armstrong
Assignee: Alex Rodoni


It looks like we missed documenting the new builtin from the parent JIRA



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IMPALA-8606) GET_TABLES performance in local catalog mode

2019-07-15 Thread Quanlong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885708#comment-16885708
 ] 

Quanlong Huang commented on IMPALA-8606:


I think we can leverage the new getTableMeta API (HIVE-7575) introduced in Hive 
2.x. It can get all table names and comments in a round trip to HMS: 
[https://github.com/apache/hive/blob/release-2.0.0/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java#L167]

> GET_TABLES performance in local catalog mode
> 
>
> Key: IMPALA-8606
> URL: https://issues.apache.org/jira/browse/IMPALA-8606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Quanlong Huang
>Priority: Blocker
>  Labels: catalog-v2
>
> With local catalog mode enabled, GET_TABLES JDBC requests will return more 
> than the always available table information. Any request for more metadata 
> about a table will trigger a full load of that table on the catalogd side, 
> meaning that GET_TABLES triggers the load of the entire catalog. Also, as far 
> as I can see, the requests for more metadata are made one table at a time. 
> Once the tables are loaded on the catalogd-side, a coordinator needs 3 
> roundtrips to the catalog to fetch all the details about a single table. My 
> test case had around 57k tables, 1700 DBs, and ~120k partitions. 
> GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
> impalad, it still takes ~70 seconds.
> Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
> end user experience and catalog memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8763) Dedicated executors not invalidating stale LibCache entries

2019-07-15 Thread Quanlong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885467#comment-16885467
 ] 

Quanlong Huang commented on IMPALA-8763:


Haven't go through the SYNC_DDL implementation in LocalCatalog mode in details. 
I think the races can be fixed if the execution of drop function with 
SYNC_DDL=1 also wait util all the executors invalidate their libCache entries 
for this new topic.

> Dedicated executors not invalidating stale LibCache entries
> ---
>
> Key: IMPALA-8763
> URL: https://issues.apache.org/jira/browse/IMPALA-8763
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Major
>
> This is a follow-up Jira for IMPALA-8486. Dedicated executors won't receive 
> any catalog updates since they don't subscribe to the catalog-update topic. 
> This causes them unable to notice whether a libCache entry is stale.
> We may need to introduce a new topic for the dedicated executors to subscribe 
> to invalidate the stale libCache entries.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8764) Kudu data load failures due to "Clock considered unsynchronized"

2019-07-15 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8764:


 Summary: Kudu data load failures due to "Clock considered 
unsynchronized"
 Key: IMPALA-8764
 URL: https://issues.apache.org/jira/browse/IMPALA-8764
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.3.0
Reporter: Sahil Takiar


Dataload error:

{code}
03:08:38 03:08:38 Error executing impala SQL: 
Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql
 See: 
Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql.log
{code}

Digging through the mini-cluster logs, I see that the Kudu tservers crashed 
with this error:

{code}
F0715 02:58:43.202059   649 hybrid_clock.cc:339] Check failed: _s.ok() unable 
to get current time with error bound: Service unavailable: could not read 
system time source: Error reading clock. Clock considered unsynchronized
*** Check failure stack trace: ***
Wrote minidump to 
Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp
Wrote minidump to 
Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp
*** Aborted at 1563184723 (unix time) try "date -d @1563184723" if you are 
using GNU date ***
PC: @ 0x7ff75ed631f7 __GI_raise
*** SIGABRT (@0x7d10232) received by PID 562 (TID 0x7ff756c1e700) from PID 
562; stack trace: ***
@ 0x7ff760b545e0 (unknown)
@ 0x7ff75ed631f7 __GI_raise
@ 0x7ff75ed648e8 __GI_abort
@  0x1fb7309 kudu::AbortFailureFunction()
@   0x9c054d google::LogMessage::Fail()
@   0x9c240d google::LogMessage::SendToLog()
@   0x9c0089 google::LogMessage::Flush()
@   0x9c2eaf google::LogMessageFatal::~LogMessageFatal()
@   0xc0c60e kudu::clock::HybridClock::WalltimeWithErrorOrDie()
@   0xc0c67e kudu::clock::HybridClock::NowWithError()
@   0xc0d4aa kudu::clock::HybridClock::NowForMetrics()
@   0x9a29c0 kudu::FunctionGauge<>::WriteValue()
@  0x1fb0dc0 kudu::Gauge::WriteAsJson()
@  0x1fb3212 kudu::MetricEntity::WriteAsJson()
@  0x1fb390e kudu::MetricRegistry::WriteAsJson()
@   0xa856a3 kudu::server::DiagnosticsLog::LogMetrics()
@   0xa8789a kudu::server::DiagnosticsLog::RunThread()
@  0x1ff44d7 kudu::Thread::SuperviseThread()
@ 0x7ff760b4ce25 start_thread
@ 0x7ff75ee2634d __clone
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8764) Kudu data load failures due to "Clock considered unsynchronized"

2019-07-15 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8764:


 Summary: Kudu data load failures due to "Clock considered 
unsynchronized"
 Key: IMPALA-8764
 URL: https://issues.apache.org/jira/browse/IMPALA-8764
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.3.0
Reporter: Sahil Takiar


Dataload error:

{code}
03:08:38 03:08:38 Error executing impala SQL: 
Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql
 See: 
Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql.log
{code}

Digging through the mini-cluster logs, I see that the Kudu tservers crashed 
with this error:

{code}
F0715 02:58:43.202059   649 hybrid_clock.cc:339] Check failed: _s.ok() unable 
to get current time with error bound: Service unavailable: could not read 
system time source: Error reading clock. Clock considered unsynchronized
*** Check failure stack trace: ***
Wrote minidump to 
Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp
Wrote minidump to 
Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp
*** Aborted at 1563184723 (unix time) try "date -d @1563184723" if you are 
using GNU date ***
PC: @ 0x7ff75ed631f7 __GI_raise
*** SIGABRT (@0x7d10232) received by PID 562 (TID 0x7ff756c1e700) from PID 
562; stack trace: ***
@ 0x7ff760b545e0 (unknown)
@ 0x7ff75ed631f7 __GI_raise
@ 0x7ff75ed648e8 __GI_abort
@  0x1fb7309 kudu::AbortFailureFunction()
@   0x9c054d google::LogMessage::Fail()
@   0x9c240d google::LogMessage::SendToLog()
@   0x9c0089 google::LogMessage::Flush()
@   0x9c2eaf google::LogMessageFatal::~LogMessageFatal()
@   0xc0c60e kudu::clock::HybridClock::WalltimeWithErrorOrDie()
@   0xc0c67e kudu::clock::HybridClock::NowWithError()
@   0xc0d4aa kudu::clock::HybridClock::NowForMetrics()
@   0x9a29c0 kudu::FunctionGauge<>::WriteValue()
@  0x1fb0dc0 kudu::Gauge::WriteAsJson()
@  0x1fb3212 kudu::MetricEntity::WriteAsJson()
@  0x1fb390e kudu::MetricRegistry::WriteAsJson()
@   0xa856a3 kudu::server::DiagnosticsLog::LogMetrics()
@   0xa8789a kudu::server::DiagnosticsLog::RunThread()
@  0x1ff44d7 kudu::Thread::SuperviseThread()
@ 0x7ff760b4ce25 start_thread
@ 0x7ff75ee2634d __clone
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8692) INSERT ... () VALUES () Crashes for parquet tables.

2019-07-15 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8692:
--
Labels: analysis crash front-end parquet  (was: analysis front-end parquet)

> INSERT ...  () VALUES () Crashes for 
> parquet tables.
> --
>
> Key: IMPALA-8692
> URL: https://issues.apache.org/jira/browse/IMPALA-8692
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Critical
>  Labels: analysis, crash, front-end, parquet
>
> Block such insert statement in analysis phase.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8692) INSERT ... () VALUES () Crashes for parquet tables.

2019-07-15 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8692:
--
Priority: Blocker  (was: Critical)

> INSERT ...  () VALUES () Crashes for 
> parquet tables.
> --
>
> Key: IMPALA-8692
> URL: https://issues.apache.org/jira/browse/IMPALA-8692
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Blocker
>  Labels: analysis, crash, front-end, parquet
>
> Block such insert statement in analysis phase.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8692) INSERT ... () VALUES () Crashes for parquet tables.

2019-07-15 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8692:
--
Priority: Critical  (was: Major)

> INSERT ...  () VALUES () Crashes for 
> parquet tables.
> --
>
> Key: IMPALA-8692
> URL: https://issues.apache.org/jira/browse/IMPALA-8692
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Critical
>  Labels: analysis, front-end, parquet
>
> Block such insert statement in analysis phase.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8692) INSERT ... () VALUES () Crashes for parquet tables.

2019-07-15 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8692:
--
Target Version: Impala 3.3.0

> INSERT ...  () VALUES () Crashes for 
> parquet tables.
> --
>
> Key: IMPALA-8692
> URL: https://issues.apache.org/jira/browse/IMPALA-8692
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Blocker
>  Labels: analysis, crash, front-end, parquet
>
> Block such insert statement in analysis phase.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8763) Dedicated executors not invalidating stale LibCache entries

2019-07-15 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885454#comment-16885454
 ] 

Tim Armstrong commented on IMPALA-8763:
---

I think adding a separate topic would solve the worst part of this - 
permanently stale functions. It still doesn't provide strong consistency - 
there are still races with query execution, but that should be very rare.

> Dedicated executors not invalidating stale LibCache entries
> ---
>
> Key: IMPALA-8763
> URL: https://issues.apache.org/jira/browse/IMPALA-8763
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Major
>
> This is a follow-up Jira for IMPALA-8486. Dedicated executors won't receive 
> any catalog updates since they don't subscribe to the catalog-update topic. 
> This causes them unable to notice whether a libCache entry is stale.
> We may need to introduce a new topic for the dedicated executors to subscribe 
> to invalidate the stale libCache entries.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8486) test_udf_update_via_drop and test_udf_update_via_create fail on local catalog

2019-07-15 Thread Quanlong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885442#comment-16885442
 ] 

Quanlong Huang commented on IMPALA-8486:


Just realized the dedicated executors problem. Created a follow-up JIRA: 
IMPALA-8763

> test_udf_update_via_drop and test_udf_update_via_create fail on local catalog
> -
>
> Key: IMPALA-8486
> URL: https://issues.apache.org/jira/browse/IMPALA-8486
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: catalog-v2
>
> {noformat}
>  TestUdfTargeted.test_udf_update_via_drop[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
> tests/query_test/test_udfs.py:541: in test_udf_update_via_drop
> self._run_query_all_impalads(exec_options, query_stmt, ["New UDF"])
> tests/query_test/test_udfs.py:52: in _run_query_all_impalads
> assert result.data == expected
> E   assert ['Old UDF'] == ['New UDF']
> E At index 0 diff: 'Old UDF' != 'New UDF'
> E Full diff:
> E - ['Old UDF']
> E + ['New UDF']
> 
> {noformat}
> The tests are checking that the local UDF caches on each impalad get 
> invalidated by a drop/create of a function referencing the HDFS file 
> containing the UDF. The test fails because the local catalog, unlike the 
> regular catalog, doesn't invalidate LibCache entries upon receiving a catalog 
> update.
> I looked at this for long enough to realise that the invalidation mechanism 
> is fundamentally broken - it doesn't work with dedicated executors. It also 
> creates a race between the statestore updates and queries referencing the 
> UDFs - if the queries win the race, then they can incorrectly use the old 
> version that should have been invalidated.
> I think this is a potentially problematic issue because old JAR/SO versions 
> could persist in the cache indefinitely if old versions are overwritten in 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8763) Dedicated executors not invalidating stale LibCache entries

2019-07-15 Thread Quanlong Huang (JIRA)
Quanlong Huang created IMPALA-8763:
--

 Summary: Dedicated executors not invalidating stale LibCache 
entries
 Key: IMPALA-8763
 URL: https://issues.apache.org/jira/browse/IMPALA-8763
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang


This is a follow-up Jira for IMPALA-8486. Dedicated executors won't receive any 
catalog updates since they don't subscribe to the catalog-update topic. This 
causes them unable to notice whether a libCache entry is stale.

We may need to introduce a new topic for the dedicated executors to subscribe 
to invalidate the stale libCache entries.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8763) Dedicated executors not invalidating stale LibCache entries

2019-07-15 Thread Quanlong Huang (JIRA)
Quanlong Huang created IMPALA-8763:
--

 Summary: Dedicated executors not invalidating stale LibCache 
entries
 Key: IMPALA-8763
 URL: https://issues.apache.org/jira/browse/IMPALA-8763
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang


This is a follow-up Jira for IMPALA-8486. Dedicated executors won't receive any 
catalog updates since they don't subscribe to the catalog-update topic. This 
causes them unable to notice whether a libCache entry is stale.

We may need to introduce a new topic for the dedicated executors to subscribe 
to invalidate the stale libCache entries.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8641) Document compression codec zstd in Parquet

2019-07-15 Thread Abhishek Rawat (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885424#comment-16885424
 ] 

Abhishek Rawat commented on IMPALA-8641:


[~arodoni_cloudera] Yes, we should include this in 3.3 along with the new 
feature. This kind of fell through the cracks. I will work on it today and send 
out a review.

> Document compression codec zstd in Parquet
> --
>
> Key: IMPALA-8641
> URL: https://issues.apache.org/jira/browse/IMPALA-8641
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Minor
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level

2019-07-15 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885410#comment-16885410
 ] 

Vihang Karajgaonkar commented on IMPALA-7973:
-

Thats right. This does not add new configurations. In case of partition events, 
if the documentation says it  invalidates the table, with this patch it should 
say it refreshes the partitions from the event. Only applies to the alter, add 
and drop partition events. Hope that helps.

> Add support for fine-grained updates at partition level
> ---
>
> Key: IMPALA-7973
> URL: https://issues.apache.org/jira/browse/IMPALA-7973
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> When data is inserted into a partition or a new partition is created in a 
> large table, we should not be invalidating the whole table. Instead it should 
> be possible to refresh/add/drop certain partitions on the table directly 
> based on the event information. This would help with the performance of 
> subsequent access to the table by avoiding reloading the large table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-1638) Investigate using c++ templates option when generating thrift and increasing the transport buffer

2019-07-15 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885364#comment-16885364
 ] 

Tim Armstrong commented on IMPALA-1638:
---

Enabling the templated code generation requires this change. Then there are 
some compile errors that I assume require using the new templated types.
{noformat}
diff --git a/common/thrift/CMakeLists.txt b/common/thrift/CMakeLists.txt
index 958cfac..2723c0c 100644
--- a/common/thrift/CMakeLists.txt
+++ b/common/thrift/CMakeLists.txt
@@ -59,7 +59,7 @@ function(THRIFT_GEN VAR)
 # The java dependency is handled by maven.
 # We need to generate C++ src file for the parent dependencies using the 
"-r" option.
 set(CPP_ARGS ${THRIFT_INCLUDE_DIR_OPTION}
---gen cpp:moveable_types,no_default_operators -o ${BE_OUTPUT_DIR})
+--gen cpp:moveable_types,no_default_operators,templates -o 
${BE_OUTPUT_DIR})
 IF (THRIFT_FILE STREQUAL "beeswax.thrift")
   set(CPP_ARGS -r ${CPP_ARGS})
 ENDIF(THRIFT_FILE STREQUAL "beeswax.thrift")
{noformat}

> Investigate using c++ templates option when generating thrift and increasing 
> the transport buffer
> -
>
> Key: IMPALA-1638
> URL: https://issues.apache.org/jira/browse/IMPALA-1638
> Project: IMPALA
>  Issue Type: Task
>  Components: Distributed Exec
>Affects Versions: Impala 2.2
>Reporter: casey
>Priority: Minor
>  Labels: performance
>
> While investigating the performance of "select * from tpch.lineitem" thrift 
> seemed to be very slow. I did a benchmark comparing thrift and captnproto to 
> transfer a total of 1gb using 1mb responses over the loopback. captnproto 
> took ~0.5 seconds where thrift took ~6 seconds. Thrift was setup similar to 
> how it's used in Impala. Todd was able to get the thrift timing down to ~1.7 
> seconds with a few simple tweaks that aren't used by Impala. The improvements 
> are
> 1) Generate code using templates. Without this, thrift generates inheritance 
> style code which results in a virtual call to read and write every data point 
> (such as an int).
> 2) Use a framed transport. The problem was that even when using the buffered 
> transport, the default buffer was too small (Impala also uses the defualt 
> buffer size). It's possible all we need to do is increase the buffer size. 
> Testing should be done on a real cluster since the loopback could give very 
> different results.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8759) Use double precision for HLL

2019-07-15 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8759:
--
Labels: perf ramp-up  (was: )

> Use double precision for HLL
> 
>
> Key: IMPALA-8759
> URL: https://issues.apache.org/jira/browse/IMPALA-8759
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Peter Ebert
>Priority: Major
>  Labels: perf, ramp-up
>
> For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a 
> float which is only capable of 6-9 digits of precision.  More accurate 
> estimates for larger cardinalities (beyond 999,999) should be possible with 
> double precision.  Another c++ implementation uses double as well 
> [https://github.com/dialtr/libcount]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8759) Use double precision for HLL

2019-07-15 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885351#comment-16885351
 ] 

Tim Armstrong commented on IMPALA-8759:
---

[~peter.ebert] it looks like using single precision was a deliberate decision 
but I don't see a justification for why it was done that way. Nong Li, the 
original author of that code, generally did things like that for a reason. I 
looked at the original code review and there's no explanation unfortunately.

Maybe there was some idea that the exponentiation would be cheaper with single 
precision or something like that. It seems like that cost is probably 
negligible in the scheme of things so we could switch and just do a quick 
benchmark to see if there's any real difference.

> Use double precision for HLL
> 
>
> Key: IMPALA-8759
> URL: https://issues.apache.org/jira/browse/IMPALA-8759
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Peter Ebert
>Priority: Major
>
> For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a 
> float which is only capable of 6-9 digits of precision.  More accurate 
> estimates for larger cardinalities (beyond 999,999) should be possible with 
> double precision.  Another c++ implementation uses double as well 
> [https://github.com/dialtr/libcount]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8759) Use double precision for HLL

2019-07-15 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8759:
--
Affects Version/s: Impala 3.2.0

> Use double precision for HLL
> 
>
> Key: IMPALA-8759
> URL: https://issues.apache.org/jira/browse/IMPALA-8759
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.2.0
>Reporter: Peter Ebert
>Priority: Major
>
> For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a 
> float which is only capable of 6-9 digits of precision.  More accurate 
> estimates for larger cardinalities (beyond 999,999) should be possible with 
> double precision.  Another c++ implementation uses double as well 
> [https://github.com/dialtr/libcount]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8759) Use double precision for HLL

2019-07-15 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8759:
--
Component/s: Backend

> Use double precision for HLL
> 
>
> Key: IMPALA-8759
> URL: https://issues.apache.org/jira/browse/IMPALA-8759
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Peter Ebert
>Priority: Major
>
> For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a 
> float which is only capable of 6-9 digits of precision.  More accurate 
> estimates for larger cardinalities (beyond 999,999) should be possible with 
> double precision.  Another c++ implementation uses double as well 
> [https://github.com/dialtr/libcount]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org