[jira] [Comment Edited] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-19 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763324#comment-15763324
 ] 

Alok Bhandari edited comment on SPARK-16473 at 12/20/16 5:38 AM:
-

[~imatiach] , thanks for showing interest in this issue. I will try to share 
the dataset with you , please can you suggest where should I share it ? should 
I share it through github? is it fine?

Also , I have tried to diagnose this issue on my own , from my analysis it 
looks like , it is failing if it tries to bisect a node which does not have any 
children. I also have added a code fix , but not sure if this is the correct 
solution :- 

*Suggested solution*
{code:title=BisectingkMeans.scala}
 private def updateAssignments(
  assignments: RDD[(Long, VectorWithNorm)],
  divisibleIndices: Set[Long],
  newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, 
VectorWithNorm)] = {
assignments.map { case (index, v) =>
  if (divisibleIndices.contains(index)) {
val children = Seq(leftChildIndex(index), rightChildIndex(index))
if ( children.length>0 ) {
val selected = children.minBy { child =>
  KMeans.fastSquaredDistance(newClusterCenters(child), v)
}
(selected, v)
}else {
  (index, v)
}
  } else {
(index, v)
  }
}
  }
{code}

*Original code* 
{code:title=BisectingkMeans.scala}
  private def updateAssignments(
  assignments: RDD[(Long, VectorWithNorm)],
  divisibleIndices: Set[Long],
  newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, 
VectorWithNorm)] = {
assignments.map { case (index, v) =>
  if (divisibleIndices.contains(index)) {
val children = Seq(leftChildIndex(index), rightChildIndex(index))
val selected = children.minBy { child =>
  KMeans.fastSquaredDistance(newClusterCenters(child), v)
}
(selected, v)
  } else {
(index, v)
  }
}
  }
{code}


was (Author: alokob...@gmail.com):
[~imatiach] , thanks for showing interest in this issue. I will try to share 
the dataset with you , please can you suggest where should I share it ? should 
I share it through github? is it fine?

Also , I have tried to diagnose this issue on my own , from my analysis it 
looks like , it is failing if it tries to bisect a node which does not have any 
children. I also have added a code fix , but not sure if this is the correct 
solution :- 

*Suggested solution*
{code:BisectingKMeans}
 private def updateAssignments(
  assignments: RDD[(Long, VectorWithNorm)],
  divisibleIndices: Set[Long],
  newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, 
VectorWithNorm)] = {
assignments.map { case (index, v) =>
  if (divisibleIndices.contains(index)) {
val children = Seq(leftChildIndex(index), rightChildIndex(index))
if ( children.length>0 ) {
val selected = children.minBy { child =>
  KMeans.fastSquaredDistance(newClusterCenters(child), v)
}
(selected, v)
}else {
  (index, v)
}
  } else {
(index, v)
  }
}
  }
{code}

*Original code* 
{code:BsiectingKMeans}
  private def updateAssignments(
  assignments: RDD[(Long, VectorWithNorm)],
  divisibleIndices: Set[Long],
  newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, 
VectorWithNorm)] = {
assignments.map { case (index, v) =>
  if (divisibleIndices.contains(index)) {
val children = Seq(leftChildIndex(index), rightChildIndex(index))
val selected = children.minBy { child =>
  KMeans.fastSquaredDistance(newClusterCenters(child), v)
}
(selected, v)
  } else {
(index, v)
  }
}
  }
{code}

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> 

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-19 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763324#comment-15763324
 ] 

Alok Bhandari commented on SPARK-16473:
---

[~imatiach] , thanks for showing interest in this issue. I will try to share 
the dataset with you , please can you suggest where should I share it ? should 
I share it through github? is it fine?

Also , I have tried to diagnose this issue on my own , from my analysis it 
looks like , it is failing if it tries to bisect a node which does not have any 
children. I also have added a code fix , but not sure if this is the correct 
solution :- 

*Suggested solution*
{code:BisectingKMeans}
 private def updateAssignments(
  assignments: RDD[(Long, VectorWithNorm)],
  divisibleIndices: Set[Long],
  newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, 
VectorWithNorm)] = {
assignments.map { case (index, v) =>
  if (divisibleIndices.contains(index)) {
val children = Seq(leftChildIndex(index), rightChildIndex(index))
if ( children.length>0 ) {
val selected = children.minBy { child =>
  KMeans.fastSquaredDistance(newClusterCenters(child), v)
}
(selected, v)
}else {
  (index, v)
}
  } else {
(index, v)
  }
}
  }
{code}

*Original code* 
{code:BsiectingKMeans}
  private def updateAssignments(
  assignments: RDD[(Long, VectorWithNorm)],
  divisibleIndices: Set[Long],
  newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, 
VectorWithNorm)] = {
assignments.map { case (index, v) =>
  if (divisibleIndices.contains(index)) {
val children = Seq(leftChildIndex(index), rightChildIndex(index))
val selected = children.minBy { child =>
  KMeans.fastSquaredDistance(newClusterCenters(child), v)
}
(selected, v)
  } else {
(index, v)
  }
}
  }
{code}

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> 

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-11-08 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647165#comment-15647165
 ] 

Alok Bhandari commented on SPARK-16473:
---

[~josephkb] , I have just found that you have worked on mllib , please can you 
help me out getting input about this issue.

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-10-27 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611340#comment-15611340
 ] 

Alok Bhandari commented on SPARK-16473:
---

This issue continue to exist for spark 2.0 "ml" library. Is this feature going 
to get any support?

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-10-27 Thread Alok Bhandari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alok Bhandari updated SPARK-16473:
--
Priority: Blocker  (was: Major)

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>Priority: Blocker
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-10-27 Thread Alok Bhandari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alok Bhandari updated SPARK-16473:
--
Affects Version/s: 2.0.0

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-10-27 Thread Alok Bhandari (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alok Bhandari updated SPARK-16473:
--
Component/s: ML

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-07-25 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391713#comment-15391713
 ] 

Alok Bhandari commented on SPARK-16473:
---

After reducing maxIterations for BisectingKMeans it finished successfully , 
does that mean maxIterations are data-set specific ?

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.1
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-07-10 Thread Alok Bhandari (JIRA)
Alok Bhandari created SPARK-16473:
-

 Summary: BisectingKMeans Algorithm failing with 
java.util.NoSuchElementException: key not found
 Key: SPARK-16473
 URL: https://issues.apache.org/jira/browse/SPARK-16473
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.6.1
 Environment: AWS EC2 linux instance. 
Reporter: Alok Bhandari


Hello , 

I am using apache spark 1.6.1. 
I am executing bisecting k means algorithm on a specific dataset .
Dataset details :- 
K=100,
input vector =100K*100k
Memory assigned 16GB per node ,
number of nodes =2.

 Till K=75 it os working fine , but when I set k=100 , it fails with 
java.util.NoSuchElementException: key not found. 

*I suspect it is failing because of lack of some resources , but somehow 
exception does not convey anything as why this spark job failed.* 

Please can someone point me to root cause of this exception , why it is 
failing. 

This is the exception stack-trace:- 
{code}
java.util.NoSuchElementException: key not found: 166 
at scala.collection.MapLike$class.default(MapLike.scala:228) 
at scala.collection.AbstractMap.default(Map.scala:58) 
at scala.collection.MapLike$class.apply(MapLike.scala:141) 
at scala.collection.AbstractMap.apply(Map.scala:58) 
at 
org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
at 
org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
at 
org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
at 
scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
 
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
 
at scala.collection.immutable.List.foldLeft(List.scala:84) 
at 
scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
 
at scala.collection.immutable.List.reduceLeft(List.scala:84) 
at 
scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
at 
org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
 
at 
org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
 
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
{code}

Issue is that , it is failing but not giving any explicit message as to why it 
failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org