[
https://issues.apache.org/jira/browse/IMPALA-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Csaba Ringhofer updated IMPALA-7085:
------------------------------------
Labels: Timestamp performance (was: Timestamp perfomance)
> Consider patching Google/CCTZ for Impala's needs
> ------------------------------------------------
>
> Key: IMPALA-7085
> URL: https://issues.apache.org/jira/browse/IMPALA-7085
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Csaba Ringhofer
> Priority: Major
> Labels: Timestamp, performance
>
> Google/CCTZ was chosen as Impala's new timezone database in IMPALA-3307 and
> benchmarks have shown significant improvements compared to the previous libc
> / boost solution (see convert-timestamp-benchmark.cc in
> https://gerrit.cloudera.org/#/c/9986/). If further optimizations are needed,
> it may become necessary to modify Google/CCTZ to serve Impala's needs with
> less overhead.
> I see 3 points where avoidable overhead is added to UTC<->local conversions:
> 1. Timezone conversions use atomic integer hints that speed up rule lookup if
> the timestamp is close to the one in the last call ( see
> https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/src/time_zone_info.cc#L828
> ). This may be detrimental if the function is called from different threads
> with far away timestamps. As the same class that contains hints contains the
> vector of rule transitions (which can be large), creating a copy for every
> thread would be expensive.
> The solution would be to move the hints to a new class whose instances would
> not be shared between threads.
> 2. CCTZ handles local time in civil_seconds, which contains
> year/month/day/hour/minutes/seconds in separate fields, while Impala stores
> timestamps in "days since some epoch" (boost::gregorian::date)/"nanoseconds
> till midnight" (boost::posix_time::time_duration). A modified version of
> TimeZoneInfo::MakeTime() could skip this back and forth conversion.
> 3. Timezone conversions use a virtual function, while the backing class is
> always https://github.com/google/cctz/blob/master/src/time_zone_info.h.
> This could be probably avoided without modifying CCTZ by including some of
> its headers in https://github.com/google/cctz/tree/master/src
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]