[jira] [Assigned] (IMPALA-10947) SQL support for querying Iceberg metadata

2024-05-09 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-10947:
---

Assignee: Daniel Becker  (was: Tamas Mate)

> SQL support for querying Iceberg metadata
> -
>
> Key: IMPALA-10947
> URL: https://issues.apache.org/jira/browse/IMPALA-10947
> Project: IMPALA
>  Issue Type: Epic
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>
> HIVE-25457 added support for querying Iceberg table metadata to Hive.
> They support the following syntax:
> SELECT * FROM default.iceberg_table.history;
> Spark uses the same syntax: https://iceberg.apache.org/spark-queries/#history
> Other than "history", the following metadata tables are available in Iceberg:
> The following metadata tables are available in Iceberg:
> * ENTRIES,
> * FILES,
> * HISTORY,
> * SNAPSHOTS,
> * MANIFESTS,
> * PARTITIONS,
> * ALL_DATA_FILES,
> * ALL_MANIFESTS,
> * ALL_ENTRIES
> Impala currently only supports "DESCRIBE HISTORY ". The above SELECT 
> syntax would be more convenient for the users, also it would be more flexible 
> as users could easily define filters in WHERE clauses. And of course we would 
> be consistent with other engines.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12860) Invoke validateDataFilesExist for RowDelta operations

2024-03-04 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12860 started by Boglarka Egyed.
---
> Invoke validateDataFilesExist for RowDelta operations
> -
>
> Key: IMPALA-12860
> URL: https://issues.apache.org/jira/browse/IMPALA-12860
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Boglarka Egyed
>Priority: Major
>  Labels: impala-iceberg
>
> We must invoke validateDataFilesExist for RowDelta operations 
> (DELETE/UPDATE/MERGE).
> Without this a concurrent RewriteFiles (compaction) and RowDelta can corrupt 
> a table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12860) Invoke validateDataFilesExist for RowDelta operations

2024-03-04 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-12860:
---

Assignee: Zoltán Borók-Nagy  (was: Boglarka Egyed)

> Invoke validateDataFilesExist for RowDelta operations
> -
>
> Key: IMPALA-12860
> URL: https://issues.apache.org/jira/browse/IMPALA-12860
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> We must invoke validateDataFilesExist for RowDelta operations 
> (DELETE/UPDATE/MERGE).
> Without this a concurrent RewriteFiles (compaction) and RowDelta can corrupt 
> a table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12609) Implement SHOW TABLES IN statement to list Iceberg Metadata tables

2024-01-08 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-12609:
---

Assignee: Daniel Becker  (was: Tamas Mate)

> Implement SHOW TABLES IN statement to list Iceberg Metadata tables
> --
>
> Key: IMPALA-12609
> URL: https://issues.apache.org/jira/browse/IMPALA-12609
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Tamas Mate
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: impala-iceberg
>
> {{SHOW TABLES IN}} statement could be used to list all the available metadata 
> tables of an Iceberg table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7471) Impala can hit dcheck in corrupted Parquet files

2023-04-06 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed updated IMPALA-7471:
---
Summary: Impala can hit dcheck in corrupted Parquet files  (was: Impala 
crashes or returns incorrect results when querying parquet nested types)

> Impala can hit dcheck in corrupted Parquet files
> 
>
> Key: IMPALA-7471
> URL: https://issues.apache.org/jira/browse/IMPALA-7471
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: complextype, correctness, crash, parquet
> Attachments: test_users_131786401297925138_0.parquet
>
>
> From 
> http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-bug-with-nested-arrays-of-structures-where-some-of/m-p/78507/highlight/false#M4779
> {quote}We found a case where Impala returns incorrect values from simple 
> query. Our data contains nested array of structures and structures contains 
> other structures.
> We generated minimal sample data allowing to reproduce the issue.
>  
> SQL to create a table:
> {quote}
> {code}
> CREATE TABLE plat_test.test_users (
>   id INT,
>   name STRING,   
>   devices ARRAY<
> STRUCT<
>   id:STRING,
>   device_info:STRUCT<
> model:STRING
>   >
> >
>   >
> )
> STORED AS PARQUET
> {code}
> {quote}
> Please put attached parquet file to the location of the table and refresh the 
> table.
> In sample data we have 2 users, one with 2 devices, second one with 3. Some 
> of the devices.device_info.model fields are NULL.
>  
> When I issue a query:
> {quote}
> {code}
> SELECT u.name, d.device_info.model as model
> FROM test_users u,
> u.devices d;
> {code}
>  {quote}
> I'm expecting to get 5 records in results, but getting only one1.png
> If I change query to:
>  {quote}
> {code}
> SELECT u.name, d.device_info.model as model
> FROM test_users u
> LEFT OUTER JOIN u.devices d;
>  {code}
> {quote}
> I'm getting two records in the results, but still not as it should be.
> We found some workaround to this problem. If we add to the result columns 
> device.id we will get all records from parquet file:
> {quote}
> {code}
> SELECT u.name, d.id, d.device_info.model as model
> FROM test_users u
> , u.devices d
>  {code}
> {quote}
> And result is 3.png
>  
> But we can't rely on this workaround, because we don't need device.id in all 
> queries and Impala optimizes it, and as a result we are getting unpredicted 
> results.
>  
> I tested Hive query on this table and it returns expected results:
> {quote}
> {code}
> SELECT u.name, d.device_info.model
> FROM test_users u
> lateral view outer inline (u.devices) d;
>  {code}
> {quote}
> results:
> 4.png
> Please advice if it's a problem in Impala engine or we did some mistake in 
> our query.
>  
> Best regards,
> Come2Play team.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-1766) Misc. statistical functions

2023-01-31 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-1766:
--

Assignee: Pranav Yogi Lodha  (was: Peter Rozsa)

> Misc. statistical functions
> ---
>
> Key: IMPALA-1766
> URL: https://issues.apache.org/jira/browse/IMPALA-1766
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.1.1
>Reporter: Henry Robinson
>Assignee: Pranav Yogi Lodha
>Priority: Minor
>  Labels: 2023Q1, built-in-function, ramp-up
>
> Some useful statistical functions for BI integration:
>  * {{-median()- part of IMPALA-4025}}
>  * {{corr()}}
>  * {{covar_pop()}}
>  * {{regr_intercept()}}
>  * {{regr_slope()}}
>  * {{regr_r2()}} (see [http://psoug.org/definition/REGR_R2.htm])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9594) Implement percentile function

2020-04-02 Thread Boglarka Egyed (Jira)
Boglarka Egyed created IMPALA-9594:
--

 Summary: Implement percentile function
 Key: IMPALA-9594
 URL: https://issues.apache.org/jira/browse/IMPALA-9594
 Project: IMPALA
  Issue Type: New Feature
Reporter: Boglarka Egyed


Implement the percentile function from the DataSketches library in C++.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9593) Implement count(distinct) function

2020-04-02 Thread Boglarka Egyed (Jira)
Boglarka Egyed created IMPALA-9593:
--

 Summary: Implement count(distinct) function
 Key: IMPALA-9593
 URL: https://issues.apache.org/jira/browse/IMPALA-9593
 Project: IMPALA
  Issue Type: New Feature
Reporter: Boglarka Egyed


Implement the count(distinct) function from the DataSketches library for HLL in 
C++.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9592) DataSketches support

2020-04-02 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed updated IMPALA-9592:
---
Description: The goal is to integrate with the 
[DataSketches|https://datasketches.apache.org/] library more closely to utilize 
its estimation algorithms for BI acceleration purposes.  (was: The goal is to 
integrate with the [DataSketches|[https://datasketches.apache.org/]] library 
more closely to utilize its estimation algorithms for BI acceleration purposes.)

> DataSketches support
> 
>
> Key: IMPALA-9592
> URL: https://issues.apache.org/jira/browse/IMPALA-9592
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Boglarka Egyed
>Priority: Major
>
> The goal is to integrate with the 
> [DataSketches|https://datasketches.apache.org/] library more closely to 
> utilize its estimation algorithms for BI acceleration purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9592) DataSketches support

2020-04-02 Thread Boglarka Egyed (Jira)
Boglarka Egyed created IMPALA-9592:
--

 Summary: DataSketches support
 Key: IMPALA-9592
 URL: https://issues.apache.org/jira/browse/IMPALA-9592
 Project: IMPALA
  Issue Type: Epic
Reporter: Boglarka Egyed


The goal is to integrate with the 
[DataSketches|[https://datasketches.apache.org/]] library more closely to 
utilize its estimation algorithms for BI acceleration purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9499) Display support for all complex types in a SELECT * query

2020-03-30 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-9499:
--

Assignee: Adam Tamas

> Display support for all complex types in a SELECT * query
> -
>
> Key: IMPALA-9499
> URL: https://issues.apache.org/jira/browse/IMPALA-9499
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Gabor Kaszab
>Assignee: Adam Tamas
>Priority: Major
>  Labels: complextype
>
> Covers all complex types (Struct, Array, Map) for both Parquet and ORC file 
> formats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9557) Implement to_json() for complex types

2020-03-30 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-9557:
--

Assignee: Tamas Mate

> Implement to_json() for complex types
> -
>
> Key: IMPALA-9557
> URL: https://issues.apache.org/jira/browse/IMPALA-9557
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Gabor Kaszab
>Assignee: Tamas Mate
>Priority: Major
>  Labels: complextype
>
> This built-in function should accept complex types as parameter and return a 
> string containing that particular complex type in Json format. Check Hive for 
> how each complex type looks like in Json.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9277) Crash due to unhandled exception thrown from orc::ColumnSelector::updateSelectedByTypeId

2020-01-06 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-9277:
--

Assignee: Zoltán Borók-Nagy

> Crash due to unhandled exception thrown from 
> orc::ColumnSelector::updateSelectedByTypeId
> 
>
> Key: IMPALA-9277
> URL: https://issues.apache.org/jira/browse/IMPALA-9277
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Zoltán Borók-Nagy
>Priority: Blocker
> Attachments: copy7_nullable.orc
>
>
> Build latest Impala with latest ORC lib and run test_fuzz_scanner for ORC 
> format:
>  * Impala git hash: 497a17dbdc0669abd47c2360b8ca94de8b54d413
>  * ORC git hash: c26ff4c351d7c34c4272442a6874703f510282a8
> Found the crash:
> {code:java}
> Operating system: Linux
>   0.0.0 Linux 4.15.0-72-generic #81~16.04.1-Ubuntu SMP Tue 
> Nov 26 16:34:21 UTC 2019 x86_64
> CPU: amd64
>  family 6 model 158 stepping 10
>  1 CPU
> GPU: UNKNOWN
> Crash reason:  SIGABRT
> Crash address: 0x3e848f0
> Process uptime: not available
> Thread 319 (crashed)
>  0  libc-2.23.so + 0x35428
>  1  libc-2.23.so + 0x3702a
>  2  impalad!_fini + 0x15bae90
>  3  libc-2.23.so + 0x79242
>  4  libc-2.23.so + 0x79242
>  5  libstdc++.so.6.0.21 + 0x8c880
>  6  libstdc++.so.6.0.21 + 0x8f84d
>  7  impalad!_fini + 0x15baeb0
>  8  impalad + 0x4b984e0
>  9  libstdc++.so.6.0.21 + 0x8d6b6
> 10  libstdc++.so.6.0.21 + 0x8d701
> 11  libstdc++.so.6.0.21 + 0x8d919
> 12  impalad!orc::ColumnSelector::updateSelectedByTypeId(std::vector std::allocator >&, unsigned long) [Reader.cc : 166 + 0x12]
> 13  impalad!orc::ColumnSelector::updateSelected(std::vector std::allocator >&, orc::RowReaderOptions const&) [Reader.cc : 136 + 0xf]
> 14  
> impalad!orc::RowReaderImpl::RowReaderImpl(std::shared_ptr, 
> orc::RowReaderOptions const&) [Reader.cc : 229 + 0x11]
> 15  impalad!orc::ReaderImpl::createRowReader(orc::RowReaderOptions const&) 
> const [Reader.cc : 725 + 0x1b]
> 16  impalad!impala::HdfsOrcScanner::Open(impala::ScannerContext*) 
> [hdfs-orc-scanner.cc : 198 + 0x3c]
> 17  
> impalad!impala::HdfsScanNodeBase::CreateAndOpenScannerHelper(impala::HdfsPartitionDescriptor*,
>  impala::ScannerContext*, boost::scoped_ptr*) 
> [hdfs-scan-node-base.cc : 819 + 0x29]
> 18  
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 494 + 0x2b]
> 19  impalad!impala::HdfsScanNode::ScannerThread(bool, long) 
> [hdfs-scan-node.cc : 416 + 0x2a]
> 20  
> impalad!impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::{lambda()#1}::operator()()
>  const + 0x30
> 21  
> impalad!boost::detail::function::void_function_obj_invoker0,
>  void>::invoke [function_template.hpp : 153 + 0xc]
> 22  impalad!boost::function0::operator()() const [function_template.hpp 
> : 767 + 0x11]
> 23  impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [thread.cc : 360 + 0xf]
> 24  impalad!void boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), boost::_bi::list0&, int) 
> [bind.hpp : 525 + 0x15]
> 25  impalad!boost::_bi::bind_t const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
> >::operator()() [bind_template.hpp : 20 + 0x22]
> 26  impalad!boost::detail::thread_data (*)(std::string const&, std::string const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > 
> >::run() [thread.hpp : 116 + 0x12]
> 27  impalad!thread_proxy + 0xda
> 28  libpthread-2.23.so + 0x76ba
> 29  libc-2.23.so + 0x10741d
> {code}
> Code snipper for orc Reader.cc:166
> {code:c++}
> 158  void ColumnSelector::updateSelectedByTypeId(std::vector& 
> selectedColumns, uint64_t typeId) {
> 159if (typeId < selectedColumns.size()) {
> 160  const Type& type = *idTypeMap[typeId];
> 161  selectChildren(selectedColumns, type);
> 162} else {
> 163  std::stringstream buffer;
> 164  buffer << "Invalid type id selected " << typeId << " out of "
> 165 << selectedColumns.size();
> 166  throw ParseError(buffer.str());
> 167}
> 168  }
> {code}

[jira] [Assigned] (IMPALA-9042) Support reading full-ACID ORC tables

2019-12-06 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-9042:
--

Assignee: Zoltán Borók-Nagy

> Support reading full-ACID ORC tables
> 
>
> Key: IMPALA-9042
> URL: https://issues.apache.org/jira/browse/IMPALA-9042
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8943) When USE_CDP_HIVE=true, Impala should use the CDP version of Kudu

2019-11-29 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8943 started by Boglarka Egyed.
--
> When USE_CDP_HIVE=true, Impala should use the CDP version of Kudu
> -
>
> Key: IMPALA-8943
> URL: https://issues.apache.org/jira/browse/IMPALA-8943
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Boglarka Egyed
>Priority: Critical
>
> Currently, Impala uses the version of Kudu that comes with the 
> CDH_BUILD_NUMBER, even when USE_CDP_HIVE=true. This is incorrect. The 
> USE_CDP_HIVE=true build of Impala should use the Kudu version from the 
> CDP_BUILD_NUMBER. 
> To avoid any cross-version issues, this Kudu will need to be built using the 
> native toolchain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8943) When USE_CDP_HIVE=true, Impala should use the CDP version of Kudu

2019-11-29 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-8943:
--

Assignee: Attila Jeges  (was: Boglarka Egyed)

> When USE_CDP_HIVE=true, Impala should use the CDP version of Kudu
> -
>
> Key: IMPALA-8943
> URL: https://issues.apache.org/jira/browse/IMPALA-8943
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Attila Jeges
>Priority: Critical
>
> Currently, Impala uses the version of Kudu that comes with the 
> CDH_BUILD_NUMBER, even when USE_CDP_HIVE=true. This is incorrect. The 
> USE_CDP_HIVE=true build of Impala should use the Kudu version from the 
> CDP_BUILD_NUMBER. 
> To avoid any cross-version issues, this Kudu will need to be built using the 
> native toolchain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9175) Revisit the error handling logics in ORC scanner

2019-11-29 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-9175:
--

Assignee: Norbert Luksa

> Revisit the error handling logics in ORC scanner
> 
>
> Key: IMPALA-9175
> URL: https://issues.apache.org/jira/browse/IMPALA-9175
> Project: IMPALA
>  Issue Type: Task
>Reporter: Quanlong Huang
>Assignee: Norbert Luksa
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8184) Add timestamp validation to Orc scanner

2019-11-29 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-8184:
--

Assignee: Csaba Ringhofer

> Add timestamp validation to Orc scanner
> ---
>
> Key: IMPALA-8184
> URL: https://issues.apache.org/jira/browse/IMPALA-8184
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> Similarly to Parquet, Orc can also contain timestamps that are not valid in 
> Impala, e.g. Hive can insert timestamps before 1400 while these are invalid 
> in Impala. These invalid timestamps are often handled similarly to NULL, bur 
> are actually not "real" NULLs, which can lead to some some weird behavior:
> Hive:
> create table orcts (ts timestamp) stored as orc;
> insert into orcts values ("1200-01-01");
> Impala:
> select * from orcts where ts is not null;
> Returns 1 row:
> NULL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7730) Improve ORC File Format Timezone issues

2019-11-29 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-7730:
--

Assignee: Csaba Ringhofer

> Improve ORC File Format Timezone issues
> ---
>
> Key: IMPALA-7730
> URL: https://issues.apache.org/jira/browse/IMPALA-7730
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Philip Martin
>Assignee: Csaba Ringhofer
>Priority: Major
> Attachments: orc.zip
>
>
> As pointed out in https://gerrit.cloudera.org/#/c/11731 by [~csringhofer], 
> our support for the ORC file format doesn't follow the same timezone 
> conventions as the rest of Impala.
> {quote}
> tldr: ORC's timezone handling is likely to be broken in Impala so we should 
> patch it in the toolchain
> The ORC library implements its own IANA timezone handling to convert stored 
> timestamps from UTC to local time + do something similar for min/max stats. 
> The writer's timezone can be also stored in .orc files and used instead of 
> local timezone.
> Impala's and ORC library's timezone can be different because of several 
> reasons:
> ORC's timezone is not overridden by env var TZ and query option timezone
> ORC uses a simpler way to detect the local timezone which may not work on 
> some Linux distros (see TimezoneDatabase::LocalZoneName in Impala vs 
> LOCAL_TIMEZONE in Orc)
> .orc files can use any time zone as writer's timezone and we cannot be sure 
> that it will exist on the reader machine
> My suggestion is to patch the ORC library in the toolchain and remove 
> timezone handling (e.g. by always using UTC, maybe depending on a flag), as 
> the way it is currently working is likely to be broken and is surely not 
> consistent with the rest of Impala.
> I am not sure how timezones could be handled correctly in Orc + Impala. If 
> someone plans to work on it, I would gladly help in the integration to Impala.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8801) Add DATE type support to ORC scanner

2019-11-29 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-8801:
--

Assignee: Gabor Kaszab  (was: Quanlong Huang)

> Add DATE type support to ORC scanner
> 
>
> Key: IMPALA-8801
> URL: https://issues.apache.org/jira/browse/IMPALA-8801
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Gabor Kaszab
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9130) Upgrade external non-ACID table to ACID from Impala

2019-11-08 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-9130:
--

Assignee: Csaba Ringhofer

> Upgrade external non-ACID table to ACID from Impala
> ---
>
> Key: IMPALA-9130
> URL: https://issues.apache.org/jira/browse/IMPALA-9130
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Gabor Kaszab
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: impala-acid
>
> If you have an external, non-ACID table and try to upgrade it to become an 
> ACID table you get an error message that an external table is not allowed to 
> be promoted to ACID. This is fine, however if in the very same step you set 
> 'EXTERNAL' = 'FALSE' in table properties you still get the same error while 
> Hive is able to execute it.
> Steps to repro:
> 1) Create a non-ACID external table. (or a single non-ACID table if you use 
> Hive that contains HIVE-22158)
> 2) Upgrade the table
> {code:java}
> alter table tbl set tblproperties ('transactional'='true', 
> 'transactional_properties'='insert_only', 'EXTERNAL'='FALSE');
> {code}
> Step 2) fails in Impala but succeeds in Hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8648) Impala ACID read stress tests

2019-10-30 Thread Boglarka Egyed (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963017#comment-16963017
 ] 

Boglarka Egyed commented on IMPALA-8648:


[https://gerrit.cloudera.org/#/c/1/]

> Impala ACID read stress tests
> -
>
> Key: IMPALA-8648
> URL: https://issues.apache.org/jira/browse/IMPALA-8648
> Project: IMPALA
>  Issue Type: Test
>Reporter: Dinesh Garg
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: impala-acid
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8631) Ensure that cached data is always up to date to avoid reads based on stale metadata for transactional read only tables

2019-10-04 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-8631:
--

Assignee: Gabor Kaszab  (was: Boglarka Egyed)

> Ensure that cached data is always up to date to avoid reads based on stale 
> metadata for transactional read only tables 
> ---
>
> Key: IMPALA-8631
> URL: https://issues.apache.org/jira/browse/IMPALA-8631
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Dinesh Garg
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-acid
>
> Acquire latest validWriteIdList in the coordinator and validate that the 
> cached data is up to date. Automatically force refresh with query if it’s not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8631) Ensure that cached data is always up to date to avoid reads based on stale metadata for transactional read only tables

2019-10-04 Thread Boglarka Egyed (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-8631:
--

Assignee: Boglarka Egyed  (was: Gabor Kaszab)

> Ensure that cached data is always up to date to avoid reads based on stale 
> metadata for transactional read only tables 
> ---
>
> Key: IMPALA-8631
> URL: https://issues.apache.org/jira/browse/IMPALA-8631
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Dinesh Garg
>Assignee: Boglarka Egyed
>Priority: Major
>  Labels: impala-acid
>
> Acquire latest validWriteIdList in the coordinator and validate that the 
> cached data is up to date. Automatically force refresh with query if it’s not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org