[
https://issues.apache.org/jira/browse/ORC-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413292#comment-17413292
]
Yiqun Zhang edited comment on ORC-991 at 9/10/21, 5:05 PM:
-----------------------------------------------------------
[~hgs19921112] I was able to verify my point by hard-coding read your orc file.
The id field Row Index stream is written without encryption and the rowIndex
read is ignored. I will provide a pr fix for this bug later.
Index is not populated for 1, because indexStreams does not add a row index
stream.
{code:java}
private long handleStream(long offset,
boolean[] columnInclude,
OrcProto.Stream stream,
ReaderEncryptionVariant variant) {
.....
//encryption.getVariant(column) != null
//variant == null
if (columnInclude[column] && encryption.getVariant(column) == variant) {
.......
case INDEX:
indexStreams.add(info);
break;
........
}
{code}
{code:java}
File schema: struct<id:int,name:string>
Row count: 1000
id: 2482
name: Sylvia Gleason MD
id: 3342
name: Vernie Stamm MD
id: 7275
name: Mrs. Donnie Klocko
id: 917
name: Jermaine Abshire
id: 4142
name: Min Kutch
......
{code}
was (Author: guiyankuang):
I was able to verify my point by hard-coding read your orc file. The id field
Row Index stream is written without encryption and the rowIndex read is
ignored. I will provide a pr fix for this bug later.
Index is not populated for 1, because indexStreams does not add a row index
stream.
{code:java}
private long handleStream(long offset,
boolean[] columnInclude,
OrcProto.Stream stream,
ReaderEncryptionVariant variant) {
.....
//encryption.getVariant(column) != null
//variant == null
if (columnInclude[column] && encryption.getVariant(column) == variant) {
.......
case INDEX:
indexStreams.add(info);
break;
........
}
{code}
{code:java}
File schema: struct<id:int,name:string>
Row count: 1000
id: 2482
name: Sylvia Gleason MD
id: 3342
name: Vernie Stamm MD
id: 7275
name: Mrs. Donnie Klocko
id: 917
name: Jermaine Abshire
id: 4142
name: Min Kutch
......
{code}
> enctypt data throw exception with a sql filter push down
> --------------------------------------------------------
>
> Key: ORC-991
> URL: https://issues.apache.org/jira/browse/ORC-991
> Project: ORC
> Issue Type: Bug
> Components: Java
> Affects Versions: 1.6.8, 1.6.9, 1.6.10
> Environment: 1.ORC 1.6.8+
> 2.SparkSQL 2.4.7
> 3.JDK 1.8
> Reporter: hgs
> Priority: Major
> Attachments: files.zip, image-2021-09-11-00-41-53-176.png
>
>
> 1.create a table
> CREATE TABLE `itmp8888`(`id` INT, `name` STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> WITH SERDEPROPERTIES (
> 'serialization.format' = '1'
> )
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> TBLPROPERTIES (
> 'transient_lastDdlTime' = '1631174384',
> 'orc.encrypt' = 'AES_CTR_128:id,name',
> 'orc.mask' = 'sha256:id,name',
> 'orc.encrypt.ezk' = 'jNCeDBtNfT8wPaTpR34JHA=='
> )
> 2. insert data
> 3. a select statement that no filters is fine
> select * from itmp8888
> 4. a select statement with the filter including the encrypted column will
> throw exception
> select * from itmp8888 where id = 1
>
> 5.the stack trace
> Caused by: java.lang.AssertionError: Index is not populated for 1Caused by:
> java.lang.AssertionError: Index is not populated for 1 at
> org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:995)
> at
> org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1083)
> at
> org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1101)
> at
> org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1151)
> at
> org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1186)
> at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:248) at
> org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:864) at
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:142)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:211)
> at
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:175)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
> 6. I debug the code find that the RowIndex is null for all the encrypted
> columns
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)