[jira] [Updated] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd

2015-03-28 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6464:

Affects Version/s: (was: 1.3.0)
   1.4.0

 Add a new transformation of rdd named processCoalesce which was  particularly 
 to deal with the small and cached rdd
 ---

 Key: SPARK-6464
 URL: https://issues.apache.org/jira/browse/SPARK-6464
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: SaintBacchus
 Attachments: screenshot-1.png


 Nowadays, the transformation *coalesce* was always used to expand or reduce 
 the number of the partition in order to gain a good performance.
 But *coalesce* can't make sure that the child partition will be executed in 
 the same executor as the parent partition. And this will lead to have a large 
 network transfer.
 In some scenario such as I mentioned in the title +small and cached rdd+, we 
 want to coalesce all the partition in the same executor into one partition 
 and make sure the child partition will be executed in this executor. It can 
 avoid network transfer and reduce the scheduler of the Tasks and also can 
 reused the cpu core to do other job. 
 In this scenario, our performance had improved 20% than before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd

2015-03-23 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6464:

Description: 
Nowadays, the transformation *coalesce* was always used to expand or reduce the 
number of the partition in order to gain a good performance.
But *coalesce* can't make sure that the child partition will be executed in the 
same executor as the parent partition. And this will lead to have a large 
network transfer.
In some scenario such as I mentioned in the title +small and cached rdd+, we 
want to coalesce all the partition in the same executor into one partition and 
make sure the child partition will be executed in this executor. It can avoid 
network transfer and reduce the scheduler of the Tasks and also can reused the 
cpu core to do other job. 
In this scenario, our performance had improved 20% than before.

  was:
Nowadays, the transformation *coalesce* was always used to expand or reduce the 
number of the partition in order to gain a good performance.
But *coalesce* can't make sure that the child partition will be executed in the 
same executor as the parent partition. And this will lead to have a large 
network transfer.
In some scenario such as I metioned in the title 


 Add a new transformation of rdd named processCoalesce which was  particularly 
 to deal with the small and cached rdd
 ---

 Key: SPARK-6464
 URL: https://issues.apache.org/jira/browse/SPARK-6464
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: SaintBacchus

 Nowadays, the transformation *coalesce* was always used to expand or reduce 
 the number of the partition in order to gain a good performance.
 But *coalesce* can't make sure that the child partition will be executed in 
 the same executor as the parent partition. And this will lead to have a large 
 network transfer.
 In some scenario such as I mentioned in the title +small and cached rdd+, we 
 want to coalesce all the partition in the same executor into one partition 
 and make sure the child partition will be executed in this executor. It can 
 avoid network transfer and reduce the scheduler of the Tasks and also can 
 reused the cpu core to do other job. 
 In this scenario, our performance had improved 20% than before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd

2015-03-23 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6464:

Description: 
Nowadays, the transformation *coalesce* was always used to expand or reduce the 
number of the partition in order to gain a good performance.
But *coalesce* can't make sure that the child partition will be executed in the 
same executor as the parent partition. And this will lead to have a large 
network transfer.
In some scenario such as I metioned in the title 

 Add a new transformation of rdd named processCoalesce which was  particularly 
 to deal with the small and cached rdd
 ---

 Key: SPARK-6464
 URL: https://issues.apache.org/jira/browse/SPARK-6464
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: SaintBacchus

 Nowadays, the transformation *coalesce* was always used to expand or reduce 
 the number of the partition in order to gain a good performance.
 But *coalesce* can't make sure that the child partition will be executed in 
 the same executor as the parent partition. And this will lead to have a large 
 network transfer.
 In some scenario such as I metioned in the title 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd

2015-03-23 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-6464:

Description: 
Nowadays, the transformation *coalesce* was always used to expand or reduce the 
number of the partition in order to gain a good performance.
But *coalesce* can't make sure that the child partition will be executed in the 
same executor as the parent partition. And this will lead to have a large 
network transfer.
In some scenario such as I mentioned in the title +small and cached rdd+, we 
want to coalesce all the partition in the same executor into one partition and 
make sure the child partition will be executed in this executor. It can avoid 
network transfer and reduce the scheduler of the Tasks and also can reused the 
cpu core to do other job. 
In this scenario, our performance had improved 20% than before.


  was:
Nowadays, the transformation *coalesce* was always used to expand or reduce the 
number of the partition in order to gain a good performance.
But *coalesce* can't make sure that the child partition will be executed in the 
same executor as the parent partition. And this will lead to have a large 
network transfer.
In some scenario such as I mentioned in the title +small and cached rdd+, we 
want to coalesce all the partition in the same executor into one partition and 
make sure the child partition will be executed in this executor. It can avoid 
network transfer and reduce the scheduler of the Tasks and also can reused the 
cpu core to do other job. 
In this scenario, our performance had improved 20% than before.


 Add a new transformation of rdd named processCoalesce which was  particularly 
 to deal with the small and cached rdd
 ---

 Key: SPARK-6464
 URL: https://issues.apache.org/jira/browse/SPARK-6464
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: SaintBacchus

 Nowadays, the transformation *coalesce* was always used to expand or reduce 
 the number of the partition in order to gain a good performance.
 But *coalesce* can't make sure that the child partition will be executed in 
 the same executor as the parent partition. And this will lead to have a large 
 network transfer.
 In some scenario such as I mentioned in the title +small and cached rdd+, we 
 want to coalesce all the partition in the same executor into one partition 
 and make sure the child partition will be executed in this executor. It can 
 avoid network transfer and reduce the scheduler of the Tasks and also can 
 reused the cpu core to do other job. 
 In this scenario, our performance had improved 20% than before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org