[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-17292: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks [~pvary]! > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 3.0.0 > > Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, > HIVE-17292.3.patch, HIVE-17292.5.patch, HIVE-17292.6.patch, HIVE-17292.7.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17292: -- Attachment: HIVE-17292.7.patch Rebased the patch > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, > HIVE-17292.3.patch, HIVE-17292.5.patch, HIVE-17292.6.patch, HIVE-17292.7.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17292: -- Attachment: HIVE-17292.6.patch The patch contains the following changes: - Changing Hadoop23Shims.java, so the MiniSparkShim will able to provide the requested 2 executors. - Changing QTestUtil.setSparkSession, so we will wait until every executor is available, not only the 1st. - Changing SparkSessionImpl.getMemoryAndCores, so we use the client provided paralellism in case of local spark.master too. - Regenerating golden files (numReducers, and number of files changed in the explain plans) The change contains 2 golden file changes (spark_dynamic_partition_pruning_mapjoin_only.q.out, spark_dynamic_partition_pruning.q.out) which are containing other neccessary changes for a green run, so this patch should be regenerated after their corresponding jiras are solved (HIVE-17347, HIVE-17346) > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, > HIVE-17292.3.patch, HIVE-17292.5.patch, HIVE-17292.6.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17292: -- Attachment: HIVE-17292.5.patch Well, this patch become huge :( The actual code/configuration change is minimal: - QTestUtil.java - to check for 4 cores before allowing to run a query - SparkSessionImpl.java - to use the same method to calculating cores was with spark.master="spark.\*" - Hadoop23Shims.java - to change the scheduler allocation minimum, this way allowing the MiniCluster to create 2 nodes - The others are only q.out changes -- Number of executors 2->4 -- Number of result files are higher because the executor number is higher -- When there is no order by in the query the resulting lines are mixed in some cases (union.q.out, union11.q.out, union14.q.out, union15.q.out, union7.q.out, union_null.q.out) - We might have to apply {{-- SORT_QUERY_RESULTS}} if they become flaky -- The overall size of the result files become bigger (union_remove_10.q.out, union_remove_13.q.out, union_remove_15.q.out, union_remove_16.q.out, union_remove_7.q.out, union_remove_8.q.out, union_remove_9.q.out) - I think the number of the files, and the overhead of the RCFileOutputFormat causes this issue - spark_dynamic_partition_pruning_mapjoin_only.q.out is changed - See: HIVE-16948 What do you think about this change [~lirui]? Shall we bite the bullet, and review/commit it - do we have a good way to validate the changes? Or shall we chicken out, and change the configuration back to use only 1 executor with 2 cores, and then only configuration change is needed? > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, > HIVE-17292.3.patch, HIVE-17292.5.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17292: -- Attachment: HIVE-17292.3.patch Changed we expect different number of executors on yarn, and on standalone mode > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, > HIVE-17292.3.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17292: -- Attachment: HIVE-17292.2.patch Moved the config change to the Shim, as suggested by [~lirui]. Also updated the QTestUtil, so it will wait until all of the executors are ready. Might be some problem still, if the configuration is changed with a set command inside the test file. Will see the results. Updated the necessary q.out files > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17292: -- Attachment: HIVE-17292.1.patch Changed the hive-site.xml, so we will have 2 executors. Regenerated the out files, there are some interesting changes, so curious what Jenkins say for it - I can rationalize most of them, but still. Also was not able to regenerate {{spark_vectorized_dynamic_partition_pruning}} with, or without the patch. Interested if it is working on the pre-commit. > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17292: -- Status: Patch Available (was: Open) > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)