[ https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125439#comment-16125439 ]
Peter Vary commented on HIVE-17291: ----------------------------------- [~lirui]: You might be using black magic, or similar :) In the middle of the night I woke up, that something is not good with the patch - some of the query output should be changed as a result of the change, so I decided that check out some stuff, and saw your comment which immediately answered my question. What I am not sure of is wether you were using black magic to wake me up, because my patch was not good, or you were using white magic to answer my unasked question :) :) :) I found this problem with the help of the qtest outputs, but I think this signals a more important usage problem. Maybe you have more experience how users use HoS, and how they tune their queries, but when my wife optimizes an Oracle query she often iterates through explain / hint change / explain loops. I imagine with hive it looks like this: explain / change config / explain. When we change the spark configuration the RpcServer is killed, and a new one is started (I might be wrong here, tell me if it is not like this). The result of this is that the first explain is different than the next one this defeats the whole purpose of the optimization. What happens when a query uses a wrong number of reducers? The query will run, but will result in slower execution? Out of memory? Also the test results of HIVE-17292 showed me, that with multiple reducers configured the output of the {{hiveSparkClient.getExecutorCount()}} is even less reliable. The possible outcomes are: - *0* - if only the Application Master is started - *1* - if the AM, and 1 executor is started - *2* - if the AM, and both executors are started So we should not base any decision on it neither in QTestUtil, nor in {{sparkSession.getMemoryAndCores()}}, only if we made sure that all of the executors are stared which will be running at the time of the query. I starting to understand the wisdom of [~xuefuz] statement: ??I also had some doubts on the original idea which was to automatically and dynamically deciding the parallelism based on available memory/cores. Maybe we should back to the basis, where the number of reducers is solely determined (statically) by the total shuffled data size (stats) divided by the configuration "bytes per reducer". I'm open to all proposals, including doing this for dynamic allocation and using spark.executor.instances for static allocation.?? In the light of these findings I think we should use the proposed algorithm to calculate the parallelism (as proposed by [~xuefuz]): - for dynamic allocation - number of reducers is solely determined (statically) by the total shuffled data size (stats) divided by the configuration "bytes per reducer" - for static allocation - using spark.executor.instances What do you think? Sorry for dragging you through the whole thinking process :( Thanks, Peter > Set the number of executors based on config if client does not provide > information > ---------------------------------------------------------------------------------- > > Key: HIVE-17291 > URL: https://issues.apache.org/jira/browse/HIVE-17291 > Project: Hive > Issue Type: Sub-task > Components: Spark > Affects Versions: 3.0.0 > Reporter: Peter Vary > Assignee: Peter Vary > Attachments: HIVE-17291.1.patch > > > When calculating the memory and cores and the client does not provide > information we should try to use the one provided by default. This can happen > on startup, when {{spark.dynamicAllocation.enabled}} is not enabled -- This message was sent by Atlassian JIRA (v6.4.14#64029)