[jira] [Updated] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat
[ https://issues.apache.org/jira/browse/HIVE-21458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-21458: Labels: Transactions-Performance (was: ) > ACID: Optimize AcidUtils$MetaDataFile.isRawFormat > -- > > Key: HIVE-21458 > URL: https://issues.apache.org/jira/browse/HIVE-21458 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.1.1 >Reporter: Vaibhav Gumashta >Priority: Major > Labels: Transactions-Performance > Attachments: async-prof-pid-1-cpu-1.svg > > > In the transactional subsystems, in several places we check to see if a data > file has ROW__ID fields or not. Every time we do that (even within the > context of the same query), we open a Reader for that file/split. We could > optimize this by caching or perhaps checking once, and saving our result for > later. Also, perhaps we don't need to do this for every split. An example > call stack: > {code} > OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 > AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 > AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 > AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007 > OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, > Path, Configuration) line: 1231 > OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, > Configuration, OrcRawRecordMerger$Options) line: 722 > OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, > ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: > 1022 > OrcInputFormat.getReader(InputSplit, Options) line: 2108 > OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 > FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 > FetchOperator.getRecordReader() line: 344 > FetchOperator.getNextRow() line: 540 > FetchOperator.pushRow() line: 509 > FetchTask.fetch(List) line: 146 > {code} > Here, for each split we'll make that check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat
[ https://issues.apache.org/jira/browse/HIVE-21458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-21458: - Attachment: async-prof-pid-1-cpu-1.svg > ACID: Optimize AcidUtils$MetaDataFile.isRawFormat > -- > > Key: HIVE-21458 > URL: https://issues.apache.org/jira/browse/HIVE-21458 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.1.1 >Reporter: Vaibhav Gumashta >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: async-prof-pid-1-cpu-1.svg > > > In the transactional subsystems, in several places we check to see if a data > file has ROW__ID fields or not. Every time we do that (even within the > context of the same query), we open a Reader for that file/split. We could > optimize this by caching or perhaps checking once, and saving our result for > later. Also, perhaps we don't need to do this for every split. An example > call stack: > {code} > OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 > AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 > AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 > AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007 > OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, > Path, Configuration) line: 1231 > OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, > Configuration, OrcRawRecordMerger$Options) line: 722 > OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, > ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: > 1022 > OrcInputFormat.getReader(InputSplit, Options) line: 2108 > OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 > FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 > FetchOperator.getRecordReader() line: 344 > FetchOperator.getNextRow() line: 540 > FetchOperator.pushRow() line: 509 > FetchTask.fetch(List) line: 146 > {code} > Here, for each split we'll make that check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat
[ https://issues.apache.org/jira/browse/HIVE-21458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-21458: Description: In the transactional subsystems, in several places we check to see if a data file has ROW__ID fields or not. Every time we do that (even within the context of the same query), we open a Reader for that file/split. We could optimize this by caching or perhaps checking once, and saving our result for later. Also, perhaps we don't need to do this for every split. An example call stack: {code} OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007 OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, Configuration) line: 1231 OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, Configuration, OrcRawRecordMerger$Options) line: 722 OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 1022 OrcInputFormat.getReader(InputSplit, Options) line: 2108 OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 FetchOperator.getRecordReader() line: 344 FetchOperator.getNextRow() line: 540 FetchOperator.pushRow() line: 509 FetchTask.fetch(List) line: 146 {code} Here, for each split we'll make that check. was: In the transactional subsystems, in several places we check to see if a data file has ROW__ID fields or not. Every time we do that (even within the context of the same query), we open a Reader for that file/split. We could optimize this by caching. Also, perhaps we don't need to do this for every split. An example call stack: {code} OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007 OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, Configuration) line: 1231 OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, Configuration, OrcRawRecordMerger$Options) line: 722 OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 1022 OrcInputFormat.getReader(InputSplit, Options) line: 2108 OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 FetchOperator.getRecordReader() line: 344 FetchOperator.getNextRow() line: 540 FetchOperator.pushRow() line: 509 FetchTask.fetch(List) line: 146 {code} Here, for each split we'll make that check. > ACID: Optimize AcidUtils$MetaDataFile.isRawFormat > -- > > Key: HIVE-21458 > URL: https://issues.apache.org/jira/browse/HIVE-21458 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.1.1 >Reporter: Vaibhav Gumashta >Priority: Major > > In the transactional subsystems, in several places we check to see if a data > file has ROW__ID fields or not. Every time we do that (even within the > context of the same query), we open a Reader for that file/split. We could > optimize this by caching or perhaps checking once, and saving our result for > later. Also, perhaps we don't need to do this for every split. An example > call stack: > {code} > OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 > AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 > AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 > AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007 > OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, > Path, Configuration) line: 1231 > OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, > Configuration, OrcRawRecordMerger$Options) line: 722 > OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, > ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: > 1022 > OrcInputFormat.getReader(InputSplit, Options) line: 2108 > OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 > FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 > FetchOperator.getRecordReader() line: 344 > FetchOperator.getNextRow() line:
[jira] [Updated] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat
[ https://issues.apache.org/jira/browse/HIVE-21458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-21458: Summary: ACID: Optimize AcidUtils$MetaDataFile.isRawFormat (was: ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check by caching the split reader) > ACID: Optimize AcidUtils$MetaDataFile.isRawFormat > -- > > Key: HIVE-21458 > URL: https://issues.apache.org/jira/browse/HIVE-21458 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.1.1 >Reporter: Vaibhav Gumashta >Priority: Major > > In the transactional subsystems, in several places we check to see if a data > file has ROW__ID fields or not. Every time we do that (even within the > context of the same query), we open a Reader for that file/split. We could > optimize this by caching. Also, perhaps we don't need to do this for every > split. An example call stack: > {code} > OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105 > AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026 > AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 > AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007 > OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, > Path, Configuration) line: 1231 > OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, > Configuration, OrcRawRecordMerger$Options) line: 722 > OrcRawRecordMerger.(Configuration, boolean, Reader, boolean, int, > ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: > 1022 > OrcInputFormat.getReader(InputSplit, Options) line: 2108 > OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006 > FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776 > FetchOperator.getRecordReader() line: 344 > FetchOperator.getNextRow() line: 540 > FetchOperator.pushRow() line: 509 > FetchTask.fetch(List) line: 146 > {code} > Here, for each split we'll make that check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)