[jira] [Assigned] (SPARK-11316) isEmpty before coalesce seems to cause huge performance issue in setupGroups

2016-02-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11316:


Assignee: Apache Spark

> isEmpty before coalesce seems to cause huge performance issue in setupGroups
> 
>
> Key: SPARK-11316
> URL: https://issues.apache.org/jira/browse/SPARK-11316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Apache Spark
>Priority: Critical
>
> So I haven't fully debugged this yet but reporting what I'm seeing and think 
> might be going on.
> I have a graph processing job that is seeing huge slow down in setupGroups in 
> the location iterator where its getting the preferred locations for the 
> coalesce.  They are coalescing from 2400 down to 1200 and its taking 17+ 
> hours to do the calculation.  Killed it at this point so don't know total 
> time.
> It appears that the job is doing an isEmpty call, a bunch of other 
> transformation, then a coalesce (where it takes so long), other 
> transformations, then finally a count to trigger it.   
> It appears that there is only one node that its finding in the setupGroup 
> call and to get to that node it has to first to through the while loop:
> while (numCreated < targetLen && tries < expectedCoupons2) {
> where expectedCoupons2 is around 19000.  It finds very few or none in this 
> loop.  
> Then it does the second loop:
> while (numCreated < targetLen) {  // if we don't have enough partition 
> groups, create duplicates
>   var (nxt_replica, nxt_part) = rotIt.next()
>   val pgroup = PartitionGroup(nxt_replica)
>   groupArr += pgroup
>   groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup
>   var tries = 0
>   while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // 
> ensure at least one part
> nxt_part = rotIt.next()._2
> tries += 1
>   }
>   numCreated += 1
> }
> Where it has an inner while loop and both of those are going 1200 times.  
> 1200*1200 loops.  This is taking a very long time.
> The user can work around the issue by adding in a count() call very close to 
> after the isEmpty call before the coalesce is called.  I also tried putting 
> in a take(1)  right before the isEmpty call and it seems to work around 
> the issue, took 1 hours with the take vs a few minutes with the count().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11316) isEmpty before coalesce seems to cause huge performance issue in setupGroups

2016-02-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11316:


Assignee: (was: Apache Spark)

> isEmpty before coalesce seems to cause huge performance issue in setupGroups
> 
>
> Key: SPARK-11316
> URL: https://issues.apache.org/jira/browse/SPARK-11316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Critical
>
> So I haven't fully debugged this yet but reporting what I'm seeing and think 
> might be going on.
> I have a graph processing job that is seeing huge slow down in setupGroups in 
> the location iterator where its getting the preferred locations for the 
> coalesce.  They are coalescing from 2400 down to 1200 and its taking 17+ 
> hours to do the calculation.  Killed it at this point so don't know total 
> time.
> It appears that the job is doing an isEmpty call, a bunch of other 
> transformation, then a coalesce (where it takes so long), other 
> transformations, then finally a count to trigger it.   
> It appears that there is only one node that its finding in the setupGroup 
> call and to get to that node it has to first to through the while loop:
> while (numCreated < targetLen && tries < expectedCoupons2) {
> where expectedCoupons2 is around 19000.  It finds very few or none in this 
> loop.  
> Then it does the second loop:
> while (numCreated < targetLen) {  // if we don't have enough partition 
> groups, create duplicates
>   var (nxt_replica, nxt_part) = rotIt.next()
>   val pgroup = PartitionGroup(nxt_replica)
>   groupArr += pgroup
>   groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup
>   var tries = 0
>   while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // 
> ensure at least one part
> nxt_part = rotIt.next()._2
> tries += 1
>   }
>   numCreated += 1
> }
> Where it has an inner while loop and both of those are going 1200 times.  
> 1200*1200 loops.  This is taking a very long time.
> The user can work around the issue by adding in a count() call very close to 
> after the isEmpty call before the coalesce is called.  I also tried putting 
> in a take(1)  right before the isEmpty call and it seems to work around 
> the issue, took 1 hours with the take vs a few minutes with the count().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org