[ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054325#comment-17054325
 ] 

Peter Vary commented on HIVE-22964:
-----------------------------------

Hi [~aditya-shah],
 * I have found this for renaming the configuration key: 
[https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/conf/Configuration.html#addDeprecation-java.lang.String-java.lang.String-java.lang.String-]
 We should check that this is working as advertised/expected, and then go ahead 
and we can rename the configuration value.
 * HIVE-13120: The last comment states:
{quote}Since the ORCInputformat is cached in `FetchOperator.java`, the UGI in 
`Context.threadpool` thread will be userA always.
{quote}
This suggests to me that the problem was that we cached the ORCInputFormat. Do 
we have any such problem here?

 * MMPathInfo: We might just use 2 synchronizedList or some "Concurrent" 
implementation as {{finalPaths}} and {{pathsWithFileOriginals}} parameters for 
the processPathsForMmRead method, and get away without more objects. Or did you 
see serious performance degradation there because of the synchronization?

Thanks for taking care of this!
 Peter

> MM table split computation is very slow
> ---------------------------------------
>
>                 Key: HIVE-22964
>                 URL: https://issues.apache.org/jira/browse/HIVE-22964
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Aditya Shah
>            Assignee: Aditya Shah
>            Priority: Major
>         Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to