[jira] [Updated] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd
[ https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6464: Affects Version/s: (was: 1.3.0) 1.4.0 Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd --- Key: SPARK-6464 URL: https://issues.apache.org/jira/browse/SPARK-6464 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: SaintBacchus Attachments: screenshot-1.png Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I mentioned in the title +small and cached rdd+, we want to coalesce all the partition in the same executor into one partition and make sure the child partition will be executed in this executor. It can avoid network transfer and reduce the scheduler of the Tasks and also can reused the cpu core to do other job. In this scenario, our performance had improved 20% than before. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd
[ https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6464: Description: Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I mentioned in the title +small and cached rdd+, we want to coalesce all the partition in the same executor into one partition and make sure the child partition will be executed in this executor. It can avoid network transfer and reduce the scheduler of the Tasks and also can reused the cpu core to do other job. In this scenario, our performance had improved 20% than before. was: Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I metioned in the title Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd --- Key: SPARK-6464 URL: https://issues.apache.org/jira/browse/SPARK-6464 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.3.0 Reporter: SaintBacchus Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I mentioned in the title +small and cached rdd+, we want to coalesce all the partition in the same executor into one partition and make sure the child partition will be executed in this executor. It can avoid network transfer and reduce the scheduler of the Tasks and also can reused the cpu core to do other job. In this scenario, our performance had improved 20% than before. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd
[ https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6464: Description: Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I metioned in the title Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd --- Key: SPARK-6464 URL: https://issues.apache.org/jira/browse/SPARK-6464 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.3.0 Reporter: SaintBacchus Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I metioned in the title -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd
[ https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-6464: Description: Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I mentioned in the title +small and cached rdd+, we want to coalesce all the partition in the same executor into one partition and make sure the child partition will be executed in this executor. It can avoid network transfer and reduce the scheduler of the Tasks and also can reused the cpu core to do other job. In this scenario, our performance had improved 20% than before. was: Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I mentioned in the title +small and cached rdd+, we want to coalesce all the partition in the same executor into one partition and make sure the child partition will be executed in this executor. It can avoid network transfer and reduce the scheduler of the Tasks and also can reused the cpu core to do other job. In this scenario, our performance had improved 20% than before. Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd --- Key: SPARK-6464 URL: https://issues.apache.org/jira/browse/SPARK-6464 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.3.0 Reporter: SaintBacchus Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance. But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer. In some scenario such as I mentioned in the title +small and cached rdd+, we want to coalesce all the partition in the same executor into one partition and make sure the child partition will be executed in this executor. It can avoid network transfer and reduce the scheduler of the Tasks and also can reused the cpu core to do other job. In this scenario, our performance had improved 20% than before. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org