[ 
https://issues.apache.org/jira/browse/SPARK-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenmin Wu updated SPARK-10629:
------------------------------
    Description: 
First of all, I think my problem is quite different from 
https://issues.apache.org/jira/browse/SPARK-10433, which point that the input 
size increasing at each iteration.

My problem is the mapPartitions input size increase in one iteration. My 
training samples has 2958359 features in total. Within one iteration, 3 
collectAsMap operation had been called. And here is a summary of each call.

| Stage Id |               Description                                | 
Duration  |   Input    | Shuffle Read | Shuffle Write |
|:----------:|:---------------------------------------------------:|:-----------:|:-----------:|:----------------:|:----------------:|
|      4      | mapPartitions at DecisionTree.scala:613 |  1.6 h      |710.2 MB 
|               |       2.8 GB       |
|      5      | collectAsMap at DecisionTree.scala:642  |  1.8 min  |           
     |          2.8 GB        |                      |
|      6      | mapPartitions at DecisionTree.scala:613 |  1.2 h      | 27.0 GB 
 |        |          5.6 GB |
|      7      | collectAsMap at DecisionTree.scala:642 | 2.0 min     |   |    
5.6GB       |          |
|      8      | mapPartitions at DecisionTree.scala:613 |  1.2 h      | 26.5 GB 
 |        |             11.1 GB |
|      9      | collectAsMap at DecisionTree.scala:642 | 2.0 min     |  |    
8.3 GB      |          |

the mapPartitions operation took too long time! It's so strange! I wonder 
whether there is bug exits?

  was:
First of all, I think my problem is quite different from 
https://issues.apache.org/jira/browse/SPARK-10433, which point that the input 
size increasing at each iteration.

My problem is the mapPartitions input size increase in one iteration. My 
training samples has 2958359 features in total. Within one iteration, 3 
collectAsMap operation had been called. And here is a summary of each call.

| Tables        | Are           | Cool  |
| ------------- |:-------------:| -----:|
| col 3 is      | right-aligned | $1600 |
| col 2 is      | centered      |   $12 |
| zebra stripes | are neat      |    $1 |


> Gradient boosted trees: mapPartitions input size increasing 
> ------------------------------------------------------------
>
>                 Key: SPARK-10629
>                 URL: https://issues.apache.org/jira/browse/SPARK-10629
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.4.1
>            Reporter: Wenmin Wu
>
> First of all, I think my problem is quite different from 
> https://issues.apache.org/jira/browse/SPARK-10433, which point that the input 
> size increasing at each iteration.
> My problem is the mapPartitions input size increase in one iteration. My 
> training samples has 2958359 features in total. Within one iteration, 3 
> collectAsMap operation had been called. And here is a summary of each call.
> | Stage Id |               Description                                | 
> Duration  |   Input    | Shuffle Read | Shuffle Write |
> |:----------:|:---------------------------------------------------:|:-----------:|:-----------:|:----------------:|:----------------:|
> |      4      | mapPartitions at DecisionTree.scala:613 |  1.6 h      |710.2 
> MB |             |       2.8 GB       |
> |      5      | collectAsMap at DecisionTree.scala:642  |  1.8 min  |         
>        |        2.8 GB        |                      |
> |      6      | mapPartitions at DecisionTree.scala:613 |  1.2 h      | 27.0 
> GB  |        |          5.6 GB |
> |      7      | collectAsMap at DecisionTree.scala:642 | 2.0 min     |   |    
> 5.6GB       |          |
> |      8      | mapPartitions at DecisionTree.scala:613 |  1.2 h      | 26.5 
> GB  |        |           11.1 GB |
> |      9      | collectAsMap at DecisionTree.scala:642 | 2.0 min     |  |    
> 8.3 GB      |          |
> the mapPartitions operation took too long time! It's so strange! I wonder 
> whether there is bug exits?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to