[jira] [Created] (IMPALA-8765) Document JSON Udfs
Tim Armstrong created IMPALA-8765: - Summary: Document JSON Udfs Key: IMPALA-8765 URL: https://issues.apache.org/jira/browse/IMPALA-8765 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Tim Armstrong Assignee: Alex Rodoni It looks like we missed documenting the new builtin from the parent JIRA -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8765) Document JSON Udfs
Tim Armstrong created IMPALA-8765: - Summary: Document JSON Udfs Key: IMPALA-8765 URL: https://issues.apache.org/jira/browse/IMPALA-8765 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Tim Armstrong Assignee: Alex Rodoni It looks like we missed documenting the new builtin from the parent JIRA -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IMPALA-8606) GET_TABLES performance in local catalog mode
[ https://issues.apache.org/jira/browse/IMPALA-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885708#comment-16885708 ] Quanlong Huang commented on IMPALA-8606: I think we can leverage the new getTableMeta API (HIVE-7575) introduced in Hive 2.x. It can get all table names and comments in a round trip to HMS: [https://github.com/apache/hive/blob/release-2.0.0/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java#L167] > GET_TABLES performance in local catalog mode > > > Key: IMPALA-8606 > URL: https://issues.apache.org/jira/browse/IMPALA-8606 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.2.0 >Reporter: Balazs Jeszenszky >Assignee: Quanlong Huang >Priority: Blocker > Labels: catalog-v2 > > With local catalog mode enabled, GET_TABLES JDBC requests will return more > than the always available table information. Any request for more metadata > about a table will trigger a full load of that table on the catalogd side, > meaning that GET_TABLES triggers the load of the entire catalog. Also, as far > as I can see, the requests for more metadata are made one table at a time. > Once the tables are loaded on the catalogd-side, a coordinator needs 3 > roundtrips to the catalog to fetch all the details about a single table. My > test case had around 57k tables, 1700 DBs, and ~120k partitions. > GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold > impalad, it still takes ~70 seconds. > Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both > end user experience and catalog memory usage. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8763) Dedicated executors not invalidating stale LibCache entries
[ https://issues.apache.org/jira/browse/IMPALA-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885467#comment-16885467 ] Quanlong Huang commented on IMPALA-8763: Haven't go through the SYNC_DDL implementation in LocalCatalog mode in details. I think the races can be fixed if the execution of drop function with SYNC_DDL=1 also wait util all the executors invalidate their libCache entries for this new topic. > Dedicated executors not invalidating stale LibCache entries > --- > > Key: IMPALA-8763 > URL: https://issues.apache.org/jira/browse/IMPALA-8763 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Priority: Major > > This is a follow-up Jira for IMPALA-8486. Dedicated executors won't receive > any catalog updates since they don't subscribe to the catalog-update topic. > This causes them unable to notice whether a libCache entry is stale. > We may need to introduce a new topic for the dedicated executors to subscribe > to invalidate the stale libCache entries. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8764) Kudu data load failures due to "Clock considered unsynchronized"
Sahil Takiar created IMPALA-8764: Summary: Kudu data load failures due to "Clock considered unsynchronized" Key: IMPALA-8764 URL: https://issues.apache.org/jira/browse/IMPALA-8764 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.3.0 Reporter: Sahil Takiar Dataload error: {code} 03:08:38 03:08:38 Error executing impala SQL: Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql See: Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql.log {code} Digging through the mini-cluster logs, I see that the Kudu tservers crashed with this error: {code} F0715 02:58:43.202059 649 hybrid_clock.cc:339] Check failed: _s.ok() unable to get current time with error bound: Service unavailable: could not read system time source: Error reading clock. Clock considered unsynchronized *** Check failure stack trace: *** Wrote minidump to Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp Wrote minidump to Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp *** Aborted at 1563184723 (unix time) try "date -d @1563184723" if you are using GNU date *** PC: @ 0x7ff75ed631f7 __GI_raise *** SIGABRT (@0x7d10232) received by PID 562 (TID 0x7ff756c1e700) from PID 562; stack trace: *** @ 0x7ff760b545e0 (unknown) @ 0x7ff75ed631f7 __GI_raise @ 0x7ff75ed648e8 __GI_abort @ 0x1fb7309 kudu::AbortFailureFunction() @ 0x9c054d google::LogMessage::Fail() @ 0x9c240d google::LogMessage::SendToLog() @ 0x9c0089 google::LogMessage::Flush() @ 0x9c2eaf google::LogMessageFatal::~LogMessageFatal() @ 0xc0c60e kudu::clock::HybridClock::WalltimeWithErrorOrDie() @ 0xc0c67e kudu::clock::HybridClock::NowWithError() @ 0xc0d4aa kudu::clock::HybridClock::NowForMetrics() @ 0x9a29c0 kudu::FunctionGauge<>::WriteValue() @ 0x1fb0dc0 kudu::Gauge::WriteAsJson() @ 0x1fb3212 kudu::MetricEntity::WriteAsJson() @ 0x1fb390e kudu::MetricRegistry::WriteAsJson() @ 0xa856a3 kudu::server::DiagnosticsLog::LogMetrics() @ 0xa8789a kudu::server::DiagnosticsLog::RunThread() @ 0x1ff44d7 kudu::Thread::SuperviseThread() @ 0x7ff760b4ce25 start_thread @ 0x7ff75ee2634d __clone {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8764) Kudu data load failures due to "Clock considered unsynchronized"
Sahil Takiar created IMPALA-8764: Summary: Kudu data load failures due to "Clock considered unsynchronized" Key: IMPALA-8764 URL: https://issues.apache.org/jira/browse/IMPALA-8764 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.3.0 Reporter: Sahil Takiar Dataload error: {code} 03:08:38 03:08:38 Error executing impala SQL: Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql See: Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql.log {code} Digging through the mini-cluster logs, I see that the Kudu tservers crashed with this error: {code} F0715 02:58:43.202059 649 hybrid_clock.cc:339] Check failed: _s.ok() unable to get current time with error bound: Service unavailable: could not read system time source: Error reading clock. Clock considered unsynchronized *** Check failure stack trace: *** Wrote minidump to Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp Wrote minidump to Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp *** Aborted at 1563184723 (unix time) try "date -d @1563184723" if you are using GNU date *** PC: @ 0x7ff75ed631f7 __GI_raise *** SIGABRT (@0x7d10232) received by PID 562 (TID 0x7ff756c1e700) from PID 562; stack trace: *** @ 0x7ff760b545e0 (unknown) @ 0x7ff75ed631f7 __GI_raise @ 0x7ff75ed648e8 __GI_abort @ 0x1fb7309 kudu::AbortFailureFunction() @ 0x9c054d google::LogMessage::Fail() @ 0x9c240d google::LogMessage::SendToLog() @ 0x9c0089 google::LogMessage::Flush() @ 0x9c2eaf google::LogMessageFatal::~LogMessageFatal() @ 0xc0c60e kudu::clock::HybridClock::WalltimeWithErrorOrDie() @ 0xc0c67e kudu::clock::HybridClock::NowWithError() @ 0xc0d4aa kudu::clock::HybridClock::NowForMetrics() @ 0x9a29c0 kudu::FunctionGauge<>::WriteValue() @ 0x1fb0dc0 kudu::Gauge::WriteAsJson() @ 0x1fb3212 kudu::MetricEntity::WriteAsJson() @ 0x1fb390e kudu::MetricRegistry::WriteAsJson() @ 0xa856a3 kudu::server::DiagnosticsLog::LogMetrics() @ 0xa8789a kudu::server::DiagnosticsLog::RunThread() @ 0x1ff44d7 kudu::Thread::SuperviseThread() @ 0x7ff760b4ce25 start_thread @ 0x7ff75ee2634d __clone {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8692) INSERT ... () VALUES () Crashes for parquet tables.
[ https://issues.apache.org/jira/browse/IMPALA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8692: -- Labels: analysis crash front-end parquet (was: analysis front-end parquet) > INSERT ... () VALUES () Crashes for > parquet tables. > -- > > Key: IMPALA-8692 > URL: https://issues.apache.org/jira/browse/IMPALA-8692 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Critical > Labels: analysis, crash, front-end, parquet > > Block such insert statement in analysis phase. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8692) INSERT ... () VALUES () Crashes for parquet tables.
[ https://issues.apache.org/jira/browse/IMPALA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8692: -- Priority: Blocker (was: Critical) > INSERT ... () VALUES () Crashes for > parquet tables. > -- > > Key: IMPALA-8692 > URL: https://issues.apache.org/jira/browse/IMPALA-8692 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Blocker > Labels: analysis, crash, front-end, parquet > > Block such insert statement in analysis phase. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8692) INSERT ... () VALUES () Crashes for parquet tables.
[ https://issues.apache.org/jira/browse/IMPALA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8692: -- Priority: Critical (was: Major) > INSERT ... () VALUES () Crashes for > parquet tables. > -- > > Key: IMPALA-8692 > URL: https://issues.apache.org/jira/browse/IMPALA-8692 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Critical > Labels: analysis, front-end, parquet > > Block such insert statement in analysis phase. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8692) INSERT ... () VALUES () Crashes for parquet tables.
[ https://issues.apache.org/jira/browse/IMPALA-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8692: -- Target Version: Impala 3.3.0 > INSERT ... () VALUES () Crashes for > parquet tables. > -- > > Key: IMPALA-8692 > URL: https://issues.apache.org/jira/browse/IMPALA-8692 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Blocker > Labels: analysis, crash, front-end, parquet > > Block such insert statement in analysis phase. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8763) Dedicated executors not invalidating stale LibCache entries
[ https://issues.apache.org/jira/browse/IMPALA-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885454#comment-16885454 ] Tim Armstrong commented on IMPALA-8763: --- I think adding a separate topic would solve the worst part of this - permanently stale functions. It still doesn't provide strong consistency - there are still races with query execution, but that should be very rare. > Dedicated executors not invalidating stale LibCache entries > --- > > Key: IMPALA-8763 > URL: https://issues.apache.org/jira/browse/IMPALA-8763 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Priority: Major > > This is a follow-up Jira for IMPALA-8486. Dedicated executors won't receive > any catalog updates since they don't subscribe to the catalog-update topic. > This causes them unable to notice whether a libCache entry is stale. > We may need to introduce a new topic for the dedicated executors to subscribe > to invalidate the stale libCache entries. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8486) test_udf_update_via_drop and test_udf_update_via_create fail on local catalog
[ https://issues.apache.org/jira/browse/IMPALA-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885442#comment-16885442 ] Quanlong Huang commented on IMPALA-8486: Just realized the dedicated executors problem. Created a follow-up JIRA: IMPALA-8763 > test_udf_update_via_drop and test_udf_update_via_create fail on local catalog > - > > Key: IMPALA-8486 > URL: https://issues.apache.org/jira/browse/IMPALA-8486 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Quanlong Huang >Priority: Critical > Labels: catalog-v2 > > {noformat} > TestUdfTargeted.test_udf_update_via_drop[protocol: beeswax | exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] > tests/query_test/test_udfs.py:541: in test_udf_update_via_drop > self._run_query_all_impalads(exec_options, query_stmt, ["New UDF"]) > tests/query_test/test_udfs.py:52: in _run_query_all_impalads > assert result.data == expected > E assert ['Old UDF'] == ['New UDF'] > E At index 0 diff: 'Old UDF' != 'New UDF' > E Full diff: > E - ['Old UDF'] > E + ['New UDF'] > > {noformat} > The tests are checking that the local UDF caches on each impalad get > invalidated by a drop/create of a function referencing the HDFS file > containing the UDF. The test fails because the local catalog, unlike the > regular catalog, doesn't invalidate LibCache entries upon receiving a catalog > update. > I looked at this for long enough to realise that the invalidation mechanism > is fundamentally broken - it doesn't work with dedicated executors. It also > creates a race between the statestore updates and queries referencing the > UDFs - if the queries win the race, then they can incorrectly use the old > version that should have been invalidated. > I think this is a potentially problematic issue because old JAR/SO versions > could persist in the cache indefinitely if old versions are overwritten in > place. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8763) Dedicated executors not invalidating stale LibCache entries
Quanlong Huang created IMPALA-8763: -- Summary: Dedicated executors not invalidating stale LibCache entries Key: IMPALA-8763 URL: https://issues.apache.org/jira/browse/IMPALA-8763 Project: IMPALA Issue Type: Bug Reporter: Quanlong Huang This is a follow-up Jira for IMPALA-8486. Dedicated executors won't receive any catalog updates since they don't subscribe to the catalog-update topic. This causes them unable to notice whether a libCache entry is stale. We may need to introduce a new topic for the dedicated executors to subscribe to invalidate the stale libCache entries. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8763) Dedicated executors not invalidating stale LibCache entries
Quanlong Huang created IMPALA-8763: -- Summary: Dedicated executors not invalidating stale LibCache entries Key: IMPALA-8763 URL: https://issues.apache.org/jira/browse/IMPALA-8763 Project: IMPALA Issue Type: Bug Reporter: Quanlong Huang This is a follow-up Jira for IMPALA-8486. Dedicated executors won't receive any catalog updates since they don't subscribe to the catalog-update topic. This causes them unable to notice whether a libCache entry is stale. We may need to introduce a new topic for the dedicated executors to subscribe to invalidate the stale libCache entries. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8641) Document compression codec zstd in Parquet
[ https://issues.apache.org/jira/browse/IMPALA-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885424#comment-16885424 ] Abhishek Rawat commented on IMPALA-8641: [~arodoni_cloudera] Yes, we should include this in 3.3 along with the new feature. This kind of fell through the cracks. I will work on it today and send out a review. > Document compression codec zstd in Parquet > -- > > Key: IMPALA-8641 > URL: https://issues.apache.org/jira/browse/IMPALA-8641 > Project: IMPALA > Issue Type: Task > Components: Docs >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Minor > Labels: future_release_doc, in_33 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level
[ https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885410#comment-16885410 ] Vihang Karajgaonkar commented on IMPALA-7973: - Thats right. This does not add new configurations. In case of partition events, if the documentation says it invalidates the table, with this patch it should say it refreshes the partitions from the event. Only applies to the alter, add and drop partition events. Hope that helps. > Add support for fine-grained updates at partition level > --- > > Key: IMPALA-7973 > URL: https://issues.apache.org/jira/browse/IMPALA-7973 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Anurag Mantripragada >Priority: Major > Fix For: Impala 3.3.0 > > > When data is inserted into a partition or a new partition is created in a > large table, we should not be invalidating the whole table. Instead it should > be possible to refresh/add/drop certain partitions on the table directly > based on the event information. This would help with the performance of > subsequent access to the table by avoiding reloading the large table. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-1638) Investigate using c++ templates option when generating thrift and increasing the transport buffer
[ https://issues.apache.org/jira/browse/IMPALA-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885364#comment-16885364 ] Tim Armstrong commented on IMPALA-1638: --- Enabling the templated code generation requires this change. Then there are some compile errors that I assume require using the new templated types. {noformat} diff --git a/common/thrift/CMakeLists.txt b/common/thrift/CMakeLists.txt index 958cfac..2723c0c 100644 --- a/common/thrift/CMakeLists.txt +++ b/common/thrift/CMakeLists.txt @@ -59,7 +59,7 @@ function(THRIFT_GEN VAR) # The java dependency is handled by maven. # We need to generate C++ src file for the parent dependencies using the "-r" option. set(CPP_ARGS ${THRIFT_INCLUDE_DIR_OPTION} ---gen cpp:moveable_types,no_default_operators -o ${BE_OUTPUT_DIR}) +--gen cpp:moveable_types,no_default_operators,templates -o ${BE_OUTPUT_DIR}) IF (THRIFT_FILE STREQUAL "beeswax.thrift") set(CPP_ARGS -r ${CPP_ARGS}) ENDIF(THRIFT_FILE STREQUAL "beeswax.thrift") {noformat} > Investigate using c++ templates option when generating thrift and increasing > the transport buffer > - > > Key: IMPALA-1638 > URL: https://issues.apache.org/jira/browse/IMPALA-1638 > Project: IMPALA > Issue Type: Task > Components: Distributed Exec >Affects Versions: Impala 2.2 >Reporter: casey >Priority: Minor > Labels: performance > > While investigating the performance of "select * from tpch.lineitem" thrift > seemed to be very slow. I did a benchmark comparing thrift and captnproto to > transfer a total of 1gb using 1mb responses over the loopback. captnproto > took ~0.5 seconds where thrift took ~6 seconds. Thrift was setup similar to > how it's used in Impala. Todd was able to get the thrift timing down to ~1.7 > seconds with a few simple tweaks that aren't used by Impala. The improvements > are > 1) Generate code using templates. Without this, thrift generates inheritance > style code which results in a virtual call to read and write every data point > (such as an int). > 2) Use a framed transport. The problem was that even when using the buffered > transport, the default buffer was too small (Impala also uses the defualt > buffer size). It's possible all we need to do is increase the buffer size. > Testing should be done on a real cluster since the loopback could give very > different results. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8759) Use double precision for HLL
[ https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8759: -- Labels: perf ramp-up (was: ) > Use double precision for HLL > > > Key: IMPALA-8759 > URL: https://issues.apache.org/jira/browse/IMPALA-8759 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Peter Ebert >Priority: Major > Labels: perf, ramp-up > > For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a > float which is only capable of 6-9 digits of precision. More accurate > estimates for larger cardinalities (beyond 999,999) should be possible with > double precision. Another c++ implementation uses double as well > [https://github.com/dialtr/libcount] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8759) Use double precision for HLL
[ https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885351#comment-16885351 ] Tim Armstrong commented on IMPALA-8759: --- [~peter.ebert] it looks like using single precision was a deliberate decision but I don't see a justification for why it was done that way. Nong Li, the original author of that code, generally did things like that for a reason. I looked at the original code review and there's no explanation unfortunately. Maybe there was some idea that the exponentiation would be cheaper with single precision or something like that. It seems like that cost is probably negligible in the scheme of things so we could switch and just do a quick benchmark to see if there's any real difference. > Use double precision for HLL > > > Key: IMPALA-8759 > URL: https://issues.apache.org/jira/browse/IMPALA-8759 > Project: IMPALA > Issue Type: Improvement >Reporter: Peter Ebert >Priority: Major > > For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a > float which is only capable of 6-9 digits of precision. More accurate > estimates for larger cardinalities (beyond 999,999) should be possible with > double precision. Another c++ implementation uses double as well > [https://github.com/dialtr/libcount] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8759) Use double precision for HLL
[ https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8759: -- Affects Version/s: Impala 3.2.0 > Use double precision for HLL > > > Key: IMPALA-8759 > URL: https://issues.apache.org/jira/browse/IMPALA-8759 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.2.0 >Reporter: Peter Ebert >Priority: Major > > For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a > float which is only capable of 6-9 digits of precision. More accurate > estimates for larger cardinalities (beyond 999,999) should be possible with > double precision. Another c++ implementation uses double as well > [https://github.com/dialtr/libcount] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8759) Use double precision for HLL
[ https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8759: -- Component/s: Backend > Use double precision for HLL > > > Key: IMPALA-8759 > URL: https://issues.apache.org/jira/browse/IMPALA-8759 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Peter Ebert >Priority: Major > > For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a > float which is only capable of 6-9 digits of precision. More accurate > estimates for larger cardinalities (beyond 999,999) should be possible with > double precision. Another c++ implementation uses double as well > [https://github.com/dialtr/libcount] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org