[ https://issues.apache.org/jira/browse/TEZ-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682391#comment-17682391 ]
Syed Shameerur Rahman commented on TEZ-4397: -------------------------------------------- [~abstractdog] On debugging i found that This was due Hive’s operator initialization which are designed in a way that only one split can be opened at a time since they share the [IOContext|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java] variable so when multiple splits are opened at the same time the shared variable gets overwritten and doesn’t match the correct context. I will take this up separately in Hive, But for Tez I have created a new PR with the following changes 1. Disable the feature by default (no. of reader = 1) > Open Tez Input splits asynchronously > ------------------------------------ > > Key: TEZ-4397 > URL: https://issues.apache.org/jira/browse/TEZ-4397 > Project: Apache Tez > Issue Type: Task > Reporter: Ramesh Kumar Thangarajan > Assignee: Ramesh Kumar Thangarajan > Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Tez input splits can be opened asynchronously. This will reduce the amount of > time spent for s3 to prepare the connection and opening the object. -- This message was sent by Atlassian Jira (v8.20.10#820010)