[jira] [Assigned] (IMPALA-10947) SQL support for querying Iceberg metadata
[ https://issues.apache.org/jira/browse/IMPALA-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-10947: --- Assignee: Daniel Becker (was: Tamas Mate) > SQL support for querying Iceberg metadata > - > > Key: IMPALA-10947 > URL: https://issues.apache.org/jira/browse/IMPALA-10947 > Project: IMPALA > Issue Type: Epic > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Daniel Becker >Priority: Major > Labels: impala-iceberg > > HIVE-25457 added support for querying Iceberg table metadata to Hive. > They support the following syntax: > SELECT * FROM default.iceberg_table.history; > Spark uses the same syntax: https://iceberg.apache.org/spark-queries/#history > Other than "history", the following metadata tables are available in Iceberg: > The following metadata tables are available in Iceberg: > * ENTRIES, > * FILES, > * HISTORY, > * SNAPSHOTS, > * MANIFESTS, > * PARTITIONS, > * ALL_DATA_FILES, > * ALL_MANIFESTS, > * ALL_ENTRIES > Impala currently only supports "DESCRIBE HISTORY ". The above SELECT > syntax would be more convenient for the users, also it would be more flexible > as users could easily define filters in WHERE clauses. And of course we would > be consistent with other engines. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-12860) Invoke validateDataFilesExist for RowDelta operations
[ https://issues.apache.org/jira/browse/IMPALA-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12860 started by Boglarka Egyed. --- > Invoke validateDataFilesExist for RowDelta operations > - > > Key: IMPALA-12860 > URL: https://issues.apache.org/jira/browse/IMPALA-12860 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Boglarka Egyed >Priority: Major > Labels: impala-iceberg > > We must invoke validateDataFilesExist for RowDelta operations > (DELETE/UPDATE/MERGE). > Without this a concurrent RewriteFiles (compaction) and RowDelta can corrupt > a table. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12860) Invoke validateDataFilesExist for RowDelta operations
[ https://issues.apache.org/jira/browse/IMPALA-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-12860: --- Assignee: Zoltán Borók-Nagy (was: Boglarka Egyed) > Invoke validateDataFilesExist for RowDelta operations > - > > Key: IMPALA-12860 > URL: https://issues.apache.org/jira/browse/IMPALA-12860 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > We must invoke validateDataFilesExist for RowDelta operations > (DELETE/UPDATE/MERGE). > Without this a concurrent RewriteFiles (compaction) and RowDelta can corrupt > a table. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12609) Implement SHOW TABLES IN statement to list Iceberg Metadata tables
[ https://issues.apache.org/jira/browse/IMPALA-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-12609: --- Assignee: Daniel Becker (was: Tamas Mate) > Implement SHOW TABLES IN statement to list Iceberg Metadata tables > -- > > Key: IMPALA-12609 > URL: https://issues.apache.org/jira/browse/IMPALA-12609 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Affects Versions: Impala 4.4.0 >Reporter: Tamas Mate >Assignee: Daniel Becker >Priority: Minor > Labels: impala-iceberg > > {{SHOW TABLES IN}} statement could be used to list all the available metadata > tables of an Iceberg table. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7471) Impala can hit dcheck in corrupted Parquet files
[ https://issues.apache.org/jira/browse/IMPALA-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed updated IMPALA-7471: --- Summary: Impala can hit dcheck in corrupted Parquet files (was: Impala crashes or returns incorrect results when querying parquet nested types) > Impala can hit dcheck in corrupted Parquet files > > > Key: IMPALA-7471 > URL: https://issues.apache.org/jira/browse/IMPALA-7471 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Tim Armstrong >Assignee: Csaba Ringhofer >Priority: Critical > Labels: complextype, correctness, crash, parquet > Attachments: test_users_131786401297925138_0.parquet > > > From > http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-bug-with-nested-arrays-of-structures-where-some-of/m-p/78507/highlight/false#M4779 > {quote}We found a case where Impala returns incorrect values from simple > query. Our data contains nested array of structures and structures contains > other structures. > We generated minimal sample data allowing to reproduce the issue. > > SQL to create a table: > {quote} > {code} > CREATE TABLE plat_test.test_users ( > id INT, > name STRING, > devices ARRAY< > STRUCT< > id:STRING, > device_info:STRUCT< > model:STRING > > > > > > > ) > STORED AS PARQUET > {code} > {quote} > Please put attached parquet file to the location of the table and refresh the > table. > In sample data we have 2 users, one with 2 devices, second one with 3. Some > of the devices.device_info.model fields are NULL. > > When I issue a query: > {quote} > {code} > SELECT u.name, d.device_info.model as model > FROM test_users u, > u.devices d; > {code} > {quote} > I'm expecting to get 5 records in results, but getting only one1.png > If I change query to: > {quote} > {code} > SELECT u.name, d.device_info.model as model > FROM test_users u > LEFT OUTER JOIN u.devices d; > {code} > {quote} > I'm getting two records in the results, but still not as it should be. > We found some workaround to this problem. If we add to the result columns > device.id we will get all records from parquet file: > {quote} > {code} > SELECT u.name, d.id, d.device_info.model as model > FROM test_users u > , u.devices d > {code} > {quote} > And result is 3.png > > But we can't rely on this workaround, because we don't need device.id in all > queries and Impala optimizes it, and as a result we are getting unpredicted > results. > > I tested Hive query on this table and it returns expected results: > {quote} > {code} > SELECT u.name, d.device_info.model > FROM test_users u > lateral view outer inline (u.devices) d; > {code} > {quote} > results: > 4.png > Please advice if it's a problem in Impala engine or we did some mistake in > our query. > > Best regards, > Come2Play team. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-1766) Misc. statistical functions
[ https://issues.apache.org/jira/browse/IMPALA-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-1766: -- Assignee: Pranav Yogi Lodha (was: Peter Rozsa) > Misc. statistical functions > --- > > Key: IMPALA-1766 > URL: https://issues.apache.org/jira/browse/IMPALA-1766 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 2.1.1 >Reporter: Henry Robinson >Assignee: Pranav Yogi Lodha >Priority: Minor > Labels: 2023Q1, built-in-function, ramp-up > > Some useful statistical functions for BI integration: > * {{-median()- part of IMPALA-4025}} > * {{corr()}} > * {{covar_pop()}} > * {{regr_intercept()}} > * {{regr_slope()}} > * {{regr_r2()}} (see [http://psoug.org/definition/REGR_R2.htm]) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9594) Implement percentile function
Boglarka Egyed created IMPALA-9594: -- Summary: Implement percentile function Key: IMPALA-9594 URL: https://issues.apache.org/jira/browse/IMPALA-9594 Project: IMPALA Issue Type: New Feature Reporter: Boglarka Egyed Implement the percentile function from the DataSketches library in C++. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9593) Implement count(distinct) function
Boglarka Egyed created IMPALA-9593: -- Summary: Implement count(distinct) function Key: IMPALA-9593 URL: https://issues.apache.org/jira/browse/IMPALA-9593 Project: IMPALA Issue Type: New Feature Reporter: Boglarka Egyed Implement the count(distinct) function from the DataSketches library for HLL in C++. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9592) DataSketches support
[ https://issues.apache.org/jira/browse/IMPALA-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed updated IMPALA-9592: --- Description: The goal is to integrate with the [DataSketches|https://datasketches.apache.org/] library more closely to utilize its estimation algorithms for BI acceleration purposes. (was: The goal is to integrate with the [DataSketches|[https://datasketches.apache.org/]] library more closely to utilize its estimation algorithms for BI acceleration purposes.) > DataSketches support > > > Key: IMPALA-9592 > URL: https://issues.apache.org/jira/browse/IMPALA-9592 > Project: IMPALA > Issue Type: Epic >Reporter: Boglarka Egyed >Priority: Major > > The goal is to integrate with the > [DataSketches|https://datasketches.apache.org/] library more closely to > utilize its estimation algorithms for BI acceleration purposes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9592) DataSketches support
Boglarka Egyed created IMPALA-9592: -- Summary: DataSketches support Key: IMPALA-9592 URL: https://issues.apache.org/jira/browse/IMPALA-9592 Project: IMPALA Issue Type: Epic Reporter: Boglarka Egyed The goal is to integrate with the [DataSketches|[https://datasketches.apache.org/]] library more closely to utilize its estimation algorithms for BI acceleration purposes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9499) Display support for all complex types in a SELECT * query
[ https://issues.apache.org/jira/browse/IMPALA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-9499: -- Assignee: Adam Tamas > Display support for all complex types in a SELECT * query > - > > Key: IMPALA-9499 > URL: https://issues.apache.org/jira/browse/IMPALA-9499 > Project: IMPALA > Issue Type: New Feature >Reporter: Gabor Kaszab >Assignee: Adam Tamas >Priority: Major > Labels: complextype > > Covers all complex types (Struct, Array, Map) for both Parquet and ORC file > formats. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9557) Implement to_json() for complex types
[ https://issues.apache.org/jira/browse/IMPALA-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-9557: -- Assignee: Tamas Mate > Implement to_json() for complex types > - > > Key: IMPALA-9557 > URL: https://issues.apache.org/jira/browse/IMPALA-9557 > Project: IMPALA > Issue Type: New Feature >Reporter: Gabor Kaszab >Assignee: Tamas Mate >Priority: Major > Labels: complextype > > This built-in function should accept complex types as parameter and return a > string containing that particular complex type in Json format. Check Hive for > how each complex type looks like in Json. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9277) Crash due to unhandled exception thrown from orc::ColumnSelector::updateSelectedByTypeId
[ https://issues.apache.org/jira/browse/IMPALA-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-9277: -- Assignee: Zoltán Borók-Nagy > Crash due to unhandled exception thrown from > orc::ColumnSelector::updateSelectedByTypeId > > > Key: IMPALA-9277 > URL: https://issues.apache.org/jira/browse/IMPALA-9277 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Zoltán Borók-Nagy >Priority: Blocker > Attachments: copy7_nullable.orc > > > Build latest Impala with latest ORC lib and run test_fuzz_scanner for ORC > format: > * Impala git hash: 497a17dbdc0669abd47c2360b8ca94de8b54d413 > * ORC git hash: c26ff4c351d7c34c4272442a6874703f510282a8 > Found the crash: > {code:java} > Operating system: Linux > 0.0.0 Linux 4.15.0-72-generic #81~16.04.1-Ubuntu SMP Tue > Nov 26 16:34:21 UTC 2019 x86_64 > CPU: amd64 > family 6 model 158 stepping 10 > 1 CPU > GPU: UNKNOWN > Crash reason: SIGABRT > Crash address: 0x3e848f0 > Process uptime: not available > Thread 319 (crashed) > 0 libc-2.23.so + 0x35428 > 1 libc-2.23.so + 0x3702a > 2 impalad!_fini + 0x15bae90 > 3 libc-2.23.so + 0x79242 > 4 libc-2.23.so + 0x79242 > 5 libstdc++.so.6.0.21 + 0x8c880 > 6 libstdc++.so.6.0.21 + 0x8f84d > 7 impalad!_fini + 0x15baeb0 > 8 impalad + 0x4b984e0 > 9 libstdc++.so.6.0.21 + 0x8d6b6 > 10 libstdc++.so.6.0.21 + 0x8d701 > 11 libstdc++.so.6.0.21 + 0x8d919 > 12 impalad!orc::ColumnSelector::updateSelectedByTypeId(std::vector std::allocator >&, unsigned long) [Reader.cc : 166 + 0x12] > 13 impalad!orc::ColumnSelector::updateSelected(std::vector std::allocator >&, orc::RowReaderOptions const&) [Reader.cc : 136 + 0xf] > 14 > impalad!orc::RowReaderImpl::RowReaderImpl(std::shared_ptr, > orc::RowReaderOptions const&) [Reader.cc : 229 + 0x11] > 15 impalad!orc::ReaderImpl::createRowReader(orc::RowReaderOptions const&) > const [Reader.cc : 725 + 0x1b] > 16 impalad!impala::HdfsOrcScanner::Open(impala::ScannerContext*) > [hdfs-orc-scanner.cc : 198 + 0x3c] > 17 > impalad!impala::HdfsScanNodeBase::CreateAndOpenScannerHelper(impala::HdfsPartitionDescriptor*, > impala::ScannerContext*, boost::scoped_ptr*) > [hdfs-scan-node-base.cc : 819 + 0x29] > 18 > impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 494 + 0x2b] > 19 impalad!impala::HdfsScanNode::ScannerThread(bool, long) > [hdfs-scan-node.cc : 416 + 0x2a] > 20 > impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()() > const + 0x30 > 21 > impalad!boost::detail::function::void_function_obj_invoker0, > void>::invoke [function_template.hpp : 153 + 0xc] > 22 impalad!boost::function0::operator()() const [function_template.hpp > : 767 + 0x11] > 23 impalad!impala::Thread::SuperviseThread(std::string const&, std::string > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) [thread.cc : 360 + 0xf] > 24 impalad!void boost::_bi::list5, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > >::operator() boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, > std::string const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), boost::_bi::list0&, int) > [bind.hpp : 525 + 0x15] > 25 impalad!boost::_bi::bind_t const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list5, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > > >::operator()() [bind_template.hpp : 20 + 0x22] > 26 impalad!boost::detail::thread_data (*)(std::string const&, std::string const&, boost::function, > impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), boost::_bi::list5, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > > > >::run() [thread.hpp : 116 + 0x12] > 27 impalad!thread_proxy + 0xda > 28 libpthread-2.23.so + 0x76ba > 29 libc-2.23.so + 0x10741d > {code} > Code snipper for orc Reader.cc:166 > {code:c++} > 158 void ColumnSelector::updateSelectedByTypeId(std::vector& > selectedColumns, uint64_t typeId) { > 159if (typeId < selectedColumns.size()) { > 160 const Type& type = *idTypeMap[typeId]; > 161 selectChildren(selectedColumns, type); > 162} else { > 163 std::stringstream buffer; > 164 buffer << "Invalid type id selected " << typeId << " out of " > 165 << selectedColumns.size(); > 166 throw ParseError(buffer.str()); > 167} > 168 } > {code}
[jira] [Assigned] (IMPALA-9042) Support reading full-ACID ORC tables
[ https://issues.apache.org/jira/browse/IMPALA-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-9042: -- Assignee: Zoltán Borók-Nagy > Support reading full-ACID ORC tables > > > Key: IMPALA-9042 > URL: https://issues.apache.org/jira/browse/IMPALA-9042 > Project: IMPALA > Issue Type: New Feature >Reporter: Quanlong Huang >Assignee: Zoltán Borók-Nagy >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8943) When USE_CDP_HIVE=true, Impala should use the CDP version of Kudu
[ https://issues.apache.org/jira/browse/IMPALA-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8943 started by Boglarka Egyed. -- > When USE_CDP_HIVE=true, Impala should use the CDP version of Kudu > - > > Key: IMPALA-8943 > URL: https://issues.apache.org/jira/browse/IMPALA-8943 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: Boglarka Egyed >Priority: Critical > > Currently, Impala uses the version of Kudu that comes with the > CDH_BUILD_NUMBER, even when USE_CDP_HIVE=true. This is incorrect. The > USE_CDP_HIVE=true build of Impala should use the Kudu version from the > CDP_BUILD_NUMBER. > To avoid any cross-version issues, this Kudu will need to be built using the > native toolchain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8943) When USE_CDP_HIVE=true, Impala should use the CDP version of Kudu
[ https://issues.apache.org/jira/browse/IMPALA-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-8943: -- Assignee: Attila Jeges (was: Boglarka Egyed) > When USE_CDP_HIVE=true, Impala should use the CDP version of Kudu > - > > Key: IMPALA-8943 > URL: https://issues.apache.org/jira/browse/IMPALA-8943 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: Attila Jeges >Priority: Critical > > Currently, Impala uses the version of Kudu that comes with the > CDH_BUILD_NUMBER, even when USE_CDP_HIVE=true. This is incorrect. The > USE_CDP_HIVE=true build of Impala should use the Kudu version from the > CDP_BUILD_NUMBER. > To avoid any cross-version issues, this Kudu will need to be built using the > native toolchain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9175) Revisit the error handling logics in ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-9175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-9175: -- Assignee: Norbert Luksa > Revisit the error handling logics in ORC scanner > > > Key: IMPALA-9175 > URL: https://issues.apache.org/jira/browse/IMPALA-9175 > Project: IMPALA > Issue Type: Task >Reporter: Quanlong Huang >Assignee: Norbert Luksa >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8184) Add timestamp validation to Orc scanner
[ https://issues.apache.org/jira/browse/IMPALA-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-8184: -- Assignee: Csaba Ringhofer > Add timestamp validation to Orc scanner > --- > > Key: IMPALA-8184 > URL: https://issues.apache.org/jira/browse/IMPALA-8184 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Critical > > Similarly to Parquet, Orc can also contain timestamps that are not valid in > Impala, e.g. Hive can insert timestamps before 1400 while these are invalid > in Impala. These invalid timestamps are often handled similarly to NULL, bur > are actually not "real" NULLs, which can lead to some some weird behavior: > Hive: > create table orcts (ts timestamp) stored as orc; > insert into orcts values ("1200-01-01"); > Impala: > select * from orcts where ts is not null; > Returns 1 row: > NULL -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7730) Improve ORC File Format Timezone issues
[ https://issues.apache.org/jira/browse/IMPALA-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-7730: -- Assignee: Csaba Ringhofer > Improve ORC File Format Timezone issues > --- > > Key: IMPALA-7730 > URL: https://issues.apache.org/jira/browse/IMPALA-7730 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 3.0 >Reporter: Philip Martin >Assignee: Csaba Ringhofer >Priority: Major > Attachments: orc.zip > > > As pointed out in https://gerrit.cloudera.org/#/c/11731 by [~csringhofer], > our support for the ORC file format doesn't follow the same timezone > conventions as the rest of Impala. > {quote} > tldr: ORC's timezone handling is likely to be broken in Impala so we should > patch it in the toolchain > The ORC library implements its own IANA timezone handling to convert stored > timestamps from UTC to local time + do something similar for min/max stats. > The writer's timezone can be also stored in .orc files and used instead of > local timezone. > Impala's and ORC library's timezone can be different because of several > reasons: > ORC's timezone is not overridden by env var TZ and query option timezone > ORC uses a simpler way to detect the local timezone which may not work on > some Linux distros (see TimezoneDatabase::LocalZoneName in Impala vs > LOCAL_TIMEZONE in Orc) > .orc files can use any time zone as writer's timezone and we cannot be sure > that it will exist on the reader machine > My suggestion is to patch the ORC library in the toolchain and remove > timezone handling (e.g. by always using UTC, maybe depending on a flag), as > the way it is currently working is likely to be broken and is surely not > consistent with the rest of Impala. > I am not sure how timezones could be handled correctly in Orc + Impala. If > someone plans to work on it, I would gladly help in the integration to Impala. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8801) Add DATE type support to ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-8801: -- Assignee: Gabor Kaszab (was: Quanlong Huang) > Add DATE type support to ORC scanner > > > Key: IMPALA-8801 > URL: https://issues.apache.org/jira/browse/IMPALA-8801 > Project: IMPALA > Issue Type: Sub-task >Reporter: Attila Jeges >Assignee: Gabor Kaszab >Priority: Critical > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9130) Upgrade external non-ACID table to ACID from Impala
[ https://issues.apache.org/jira/browse/IMPALA-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-9130: -- Assignee: Csaba Ringhofer > Upgrade external non-ACID table to ACID from Impala > --- > > Key: IMPALA-9130 > URL: https://issues.apache.org/jira/browse/IMPALA-9130 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Affects Versions: Impala 3.3.0 >Reporter: Gabor Kaszab >Assignee: Csaba Ringhofer >Priority: Major > Labels: impala-acid > > If you have an external, non-ACID table and try to upgrade it to become an > ACID table you get an error message that an external table is not allowed to > be promoted to ACID. This is fine, however if in the very same step you set > 'EXTERNAL' = 'FALSE' in table properties you still get the same error while > Hive is able to execute it. > Steps to repro: > 1) Create a non-ACID external table. (or a single non-ACID table if you use > Hive that contains HIVE-22158) > 2) Upgrade the table > {code:java} > alter table tbl set tblproperties ('transactional'='true', > 'transactional_properties'='insert_only', 'EXTERNAL'='FALSE'); > {code} > Step 2) fails in Impala but succeeds in Hive -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8648) Impala ACID read stress tests
[ https://issues.apache.org/jira/browse/IMPALA-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963017#comment-16963017 ] Boglarka Egyed commented on IMPALA-8648: [https://gerrit.cloudera.org/#/c/1/] > Impala ACID read stress tests > - > > Key: IMPALA-8648 > URL: https://issues.apache.org/jira/browse/IMPALA-8648 > Project: IMPALA > Issue Type: Test >Reporter: Dinesh Garg >Assignee: Zoltán Borók-Nagy >Priority: Critical > Labels: impala-acid > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8631) Ensure that cached data is always up to date to avoid reads based on stale metadata for transactional read only tables
[ https://issues.apache.org/jira/browse/IMPALA-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-8631: -- Assignee: Gabor Kaszab (was: Boglarka Egyed) > Ensure that cached data is always up to date to avoid reads based on stale > metadata for transactional read only tables > --- > > Key: IMPALA-8631 > URL: https://issues.apache.org/jira/browse/IMPALA-8631 > Project: IMPALA > Issue Type: Improvement >Reporter: Dinesh Garg >Assignee: Gabor Kaszab >Priority: Major > Labels: impala-acid > > Acquire latest validWriteIdList in the coordinator and validate that the > cached data is up to date. Automatically force refresh with query if it’s not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8631) Ensure that cached data is always up to date to avoid reads based on stale metadata for transactional read only tables
[ https://issues.apache.org/jira/browse/IMPALA-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boglarka Egyed reassigned IMPALA-8631: -- Assignee: Boglarka Egyed (was: Gabor Kaszab) > Ensure that cached data is always up to date to avoid reads based on stale > metadata for transactional read only tables > --- > > Key: IMPALA-8631 > URL: https://issues.apache.org/jira/browse/IMPALA-8631 > Project: IMPALA > Issue Type: Improvement >Reporter: Dinesh Garg >Assignee: Boglarka Egyed >Priority: Major > Labels: impala-acid > > Acquire latest validWriteIdList in the coordinator and validate that the > cached data is up to date. Automatically force refresh with query if it’s not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org