[jira] [Created] (IMPALA-9484) Milestone 1: properly scan files that has full ACID schema

2020-03-10 Thread Jira
Zoltán Borók-Nagy created IMPALA-9484:
-

 Summary: Milestone 1: properly scan files that has full ACID schema
 Key: IMPALA-9484
 URL: https://issues.apache.org/jira/browse/IMPALA-9484
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Zoltán Borók-Nagy


 

Full ACID row format looks like this:
{
 "operation": 0,
 "originalTransaction": 1,
 "bucket": 536870912,
 "rowId": 0,
 "currentTransaction": 1,
 "row": \{"i": 1}
}

User columns are nested under "row". The frontend should create proper tuples 
and slot descriptors for the scan nodes to read the files correctly.

We should be able to query the ACID columns, at least for debugging/testing. 
Hive uses the special “row__id” identifier for that.

Impala should raise an error if there are delete deltas. Directory filtering 
should filter out minor compacted directories since the records from those need 
validation.

Non-goals in this sub-task:
 * row validation against validWriteIdList
 * reading "original files" (files in non-ACID format)
 * reading delete deltas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9485) Enable file handle cache for EC files

2020-03-10 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9485:


 Summary: Enable file handle cache for EC files
 Key: IMPALA-9485
 URL: https://issues.apache.org/jira/browse/IMPALA-9485
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Reporter: Sahil Takiar


Now that HDFS-14308 has been fixed, we can re-enable the file handle cache for 
EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)