[jira] [Comment Edited] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763324#comment-15763324 ] Alok Bhandari edited comment on SPARK-16473 at 12/20/16 5:38 AM: - [~imatiach] , thanks for showing interest in this issue. I will try to share the dataset with you , please can you suggest where should I share it ? should I share it through github? is it fine? Also , I have tried to diagnose this issue on my own , from my analysis it looks like , it is failing if it tries to bisect a node which does not have any children. I also have added a code fix , but not sure if this is the correct solution :- *Suggested solution* {code:title=BisectingkMeans.scala} private def updateAssignments( assignments: RDD[(Long, VectorWithNorm)], divisibleIndices: Set[Long], newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, VectorWithNorm)] = { assignments.map { case (index, v) => if (divisibleIndices.contains(index)) { val children = Seq(leftChildIndex(index), rightChildIndex(index)) if ( children.length>0 ) { val selected = children.minBy { child => KMeans.fastSquaredDistance(newClusterCenters(child), v) } (selected, v) }else { (index, v) } } else { (index, v) } } } {code} *Original code* {code:title=BisectingkMeans.scala} private def updateAssignments( assignments: RDD[(Long, VectorWithNorm)], divisibleIndices: Set[Long], newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, VectorWithNorm)] = { assignments.map { case (index, v) => if (divisibleIndices.contains(index)) { val children = Seq(leftChildIndex(index), rightChildIndex(index)) val selected = children.minBy { child => KMeans.fastSquaredDistance(newClusterCenters(child), v) } (selected, v) } else { (index, v) } } } {code} was (Author: alokob...@gmail.com): [~imatiach] , thanks for showing interest in this issue. I will try to share the dataset with you , please can you suggest where should I share it ? should I share it through github? is it fine? Also , I have tried to diagnose this issue on my own , from my analysis it looks like , it is failing if it tries to bisect a node which does not have any children. I also have added a code fix , but not sure if this is the correct solution :- *Suggested solution* {code:BisectingKMeans} private def updateAssignments( assignments: RDD[(Long, VectorWithNorm)], divisibleIndices: Set[Long], newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, VectorWithNorm)] = { assignments.map { case (index, v) => if (divisibleIndices.contains(index)) { val children = Seq(leftChildIndex(index), rightChildIndex(index)) if ( children.length>0 ) { val selected = children.minBy { child => KMeans.fastSquaredDistance(newClusterCenters(child), v) } (selected, v) }else { (index, v) } } else { (index, v) } } } {code} *Original code* {code:BsiectingKMeans} private def updateAssignments( assignments: RDD[(Long, VectorWithNorm)], divisibleIndices: Set[Long], newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, VectorWithNorm)] = { assignments.map { case (index, v) => if (divisibleIndices.contains(index)) { val children = Seq(leftChildIndex(index), rightChildIndex(index)) val selected = children.minBy { child => KMeans.fastSquaredDistance(newClusterCenters(child), v) } (selected, v) } else { (index, v) } } } {code} > BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key > not found > -- > > Key: SPARK-16473 > URL: https://issues.apache.org/jira/browse/SPARK-16473 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.1, 2.0.0 > Environment: AWS EC2 linux instance. >Reporter: Alok Bhandari > > Hello , > I am using apache spark 1.6.1. > I am executing bisecting k means algorithm on a specific dataset . > Dataset details :- > K=100, > input vector =100K*100k > Memory assigned 16GB per node , > number of nodes =2. > Till K=75 it os working fine , but when I set k=100 , it fails with > java.util.NoSuchElementException: key not found. > *I suspect it is failing because of lack of some resources , but somehow > exception does not convey anything as why this spark job failed.* > Please can someone point me to root cause of this exception , why it is > failing. > This is the exception stack-trace:- > {code} >
[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763324#comment-15763324 ] Alok Bhandari commented on SPARK-16473: --- [~imatiach] , thanks for showing interest in this issue. I will try to share the dataset with you , please can you suggest where should I share it ? should I share it through github? is it fine? Also , I have tried to diagnose this issue on my own , from my analysis it looks like , it is failing if it tries to bisect a node which does not have any children. I also have added a code fix , but not sure if this is the correct solution :- *Suggested solution* {code:BisectingKMeans} private def updateAssignments( assignments: RDD[(Long, VectorWithNorm)], divisibleIndices: Set[Long], newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, VectorWithNorm)] = { assignments.map { case (index, v) => if (divisibleIndices.contains(index)) { val children = Seq(leftChildIndex(index), rightChildIndex(index)) if ( children.length>0 ) { val selected = children.minBy { child => KMeans.fastSquaredDistance(newClusterCenters(child), v) } (selected, v) }else { (index, v) } } else { (index, v) } } } {code} *Original code* {code:BsiectingKMeans} private def updateAssignments( assignments: RDD[(Long, VectorWithNorm)], divisibleIndices: Set[Long], newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, VectorWithNorm)] = { assignments.map { case (index, v) => if (divisibleIndices.contains(index)) { val children = Seq(leftChildIndex(index), rightChildIndex(index)) val selected = children.minBy { child => KMeans.fastSquaredDistance(newClusterCenters(child), v) } (selected, v) } else { (index, v) } } } {code} > BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key > not found > -- > > Key: SPARK-16473 > URL: https://issues.apache.org/jira/browse/SPARK-16473 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.1, 2.0.0 > Environment: AWS EC2 linux instance. >Reporter: Alok Bhandari > > Hello , > I am using apache spark 1.6.1. > I am executing bisecting k means algorithm on a specific dataset . > Dataset details :- > K=100, > input vector =100K*100k > Memory assigned 16GB per node , > number of nodes =2. > Till K=75 it os working fine , but when I set k=100 , it fails with > java.util.NoSuchElementException: key not found. > *I suspect it is failing because of lack of some resources , but somehow > exception does not convey anything as why this spark job failed.* > Please can someone point me to root cause of this exception , why it is > failing. > This is the exception stack-trace:- > {code} > java.util.NoSuchElementException: key not found: 166 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) > > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) > > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) > > at scala.collection.immutable.List.reduceLeft(List.scala:84) > at > scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) > at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) > > at >
[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647165#comment-15647165 ] Alok Bhandari commented on SPARK-16473: --- [~josephkb] , I have just found that you have worked on mllib , please can you help me out getting input about this issue. > BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key > not found > -- > > Key: SPARK-16473 > URL: https://issues.apache.org/jira/browse/SPARK-16473 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.1, 2.0.0 > Environment: AWS EC2 linux instance. >Reporter: Alok Bhandari > > Hello , > I am using apache spark 1.6.1. > I am executing bisecting k means algorithm on a specific dataset . > Dataset details :- > K=100, > input vector =100K*100k > Memory assigned 16GB per node , > number of nodes =2. > Till K=75 it os working fine , but when I set k=100 , it fails with > java.util.NoSuchElementException: key not found. > *I suspect it is failing because of lack of some resources , but somehow > exception does not convey anything as why this spark job failed.* > Please can someone point me to root cause of this exception , why it is > failing. > This is the exception stack-trace:- > {code} > java.util.NoSuchElementException: key not found: 166 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) > > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) > > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) > > at scala.collection.immutable.List.reduceLeft(List.scala:84) > at > scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) > at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) > > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > {code} > Issue is that , it is failing but not giving any explicit message as to why > it failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611340#comment-15611340 ] Alok Bhandari commented on SPARK-16473: --- This issue continue to exist for spark 2.0 "ml" library. Is this feature going to get any support? > BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key > not found > -- > > Key: SPARK-16473 > URL: https://issues.apache.org/jira/browse/SPARK-16473 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.1, 2.0.0 > Environment: AWS EC2 linux instance. >Reporter: Alok Bhandari > > Hello , > I am using apache spark 1.6.1. > I am executing bisecting k means algorithm on a specific dataset . > Dataset details :- > K=100, > input vector =100K*100k > Memory assigned 16GB per node , > number of nodes =2. > Till K=75 it os working fine , but when I set k=100 , it fails with > java.util.NoSuchElementException: key not found. > *I suspect it is failing because of lack of some resources , but somehow > exception does not convey anything as why this spark job failed.* > Please can someone point me to root cause of this exception , why it is > failing. > This is the exception stack-trace:- > {code} > java.util.NoSuchElementException: key not found: 166 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) > > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) > > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) > > at scala.collection.immutable.List.reduceLeft(List.scala:84) > at > scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) > at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) > > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > {code} > Issue is that , it is failing but not giving any explicit message as to why > it failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alok Bhandari updated SPARK-16473: -- Priority: Blocker (was: Major) > BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key > not found > -- > > Key: SPARK-16473 > URL: https://issues.apache.org/jira/browse/SPARK-16473 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.1, 2.0.0 > Environment: AWS EC2 linux instance. >Reporter: Alok Bhandari >Priority: Blocker > > Hello , > I am using apache spark 1.6.1. > I am executing bisecting k means algorithm on a specific dataset . > Dataset details :- > K=100, > input vector =100K*100k > Memory assigned 16GB per node , > number of nodes =2. > Till K=75 it os working fine , but when I set k=100 , it fails with > java.util.NoSuchElementException: key not found. > *I suspect it is failing because of lack of some resources , but somehow > exception does not convey anything as why this spark job failed.* > Please can someone point me to root cause of this exception , why it is > failing. > This is the exception stack-trace:- > {code} > java.util.NoSuchElementException: key not found: 166 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) > > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) > > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) > > at scala.collection.immutable.List.reduceLeft(List.scala:84) > at > scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) > at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) > > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > {code} > Issue is that , it is failing but not giving any explicit message as to why > it failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alok Bhandari updated SPARK-16473: -- Affects Version/s: 2.0.0 > BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key > not found > -- > > Key: SPARK-16473 > URL: https://issues.apache.org/jira/browse/SPARK-16473 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.1, 2.0.0 > Environment: AWS EC2 linux instance. >Reporter: Alok Bhandari > > Hello , > I am using apache spark 1.6.1. > I am executing bisecting k means algorithm on a specific dataset . > Dataset details :- > K=100, > input vector =100K*100k > Memory assigned 16GB per node , > number of nodes =2. > Till K=75 it os working fine , but when I set k=100 , it fails with > java.util.NoSuchElementException: key not found. > *I suspect it is failing because of lack of some resources , but somehow > exception does not convey anything as why this spark job failed.* > Please can someone point me to root cause of this exception , why it is > failing. > This is the exception stack-trace:- > {code} > java.util.NoSuchElementException: key not found: 166 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) > > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) > > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) > > at scala.collection.immutable.List.reduceLeft(List.scala:84) > at > scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) > at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) > > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > {code} > Issue is that , it is failing but not giving any explicit message as to why > it failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alok Bhandari updated SPARK-16473: -- Component/s: ML > BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key > not found > -- > > Key: SPARK-16473 > URL: https://issues.apache.org/jira/browse/SPARK-16473 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.1, 2.0.0 > Environment: AWS EC2 linux instance. >Reporter: Alok Bhandari > > Hello , > I am using apache spark 1.6.1. > I am executing bisecting k means algorithm on a specific dataset . > Dataset details :- > K=100, > input vector =100K*100k > Memory assigned 16GB per node , > number of nodes =2. > Till K=75 it os working fine , but when I set k=100 , it fails with > java.util.NoSuchElementException: key not found. > *I suspect it is failing because of lack of some resources , but somehow > exception does not convey anything as why this spark job failed.* > Please can someone point me to root cause of this exception , why it is > failing. > This is the exception stack-trace:- > {code} > java.util.NoSuchElementException: key not found: 166 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) > > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) > > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) > > at scala.collection.immutable.List.reduceLeft(List.scala:84) > at > scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) > at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) > > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > {code} > Issue is that , it is failing but not giving any explicit message as to why > it failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391713#comment-15391713 ] Alok Bhandari commented on SPARK-16473: --- After reducing maxIterations for BisectingKMeans it finished successfully , does that mean maxIterations are data-set specific ? > BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key > not found > -- > > Key: SPARK-16473 > URL: https://issues.apache.org/jira/browse/SPARK-16473 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.6.1 > Environment: AWS EC2 linux instance. >Reporter: Alok Bhandari > > Hello , > I am using apache spark 1.6.1. > I am executing bisecting k means algorithm on a specific dataset . > Dataset details :- > K=100, > input vector =100K*100k > Memory assigned 16GB per node , > number of nodes =2. > Till K=75 it os working fine , but when I set k=100 , it fails with > java.util.NoSuchElementException: key not found. > *I suspect it is failing because of lack of some resources , but somehow > exception does not convey anything as why this spark job failed.* > Please can someone point me to root cause of this exception , why it is > failing. > This is the exception stack-trace:- > {code} > java.util.NoSuchElementException: key not found: 166 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) > at > scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) > > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) > > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) > > at scala.collection.immutable.List.reduceLeft(List.scala:84) > at > scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) > at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) > > at > org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > {code} > Issue is that , it is failing but not giving any explicit message as to why > it failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
Alok Bhandari created SPARK-16473: - Summary: BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found Key: SPARK-16473 URL: https://issues.apache.org/jira/browse/SPARK-16473 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.6.1 Environment: AWS EC2 linux instance. Reporter: Alok Bhandari Hello , I am using apache spark 1.6.1. I am executing bisecting k means algorithm on a specific dataset . Dataset details :- K=100, input vector =100K*100k Memory assigned 16GB per node , number of nodes =2. Till K=75 it os working fine , but when I set k=100 , it fails with java.util.NoSuchElementException: key not found. *I suspect it is failing because of lack of some resources , but somehow exception does not convey anything as why this spark job failed.* Please can someone point me to root cause of this exception , why it is failing. This is the exception stack-trace:- {code} java.util.NoSuchElementException: key not found: 166 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338) at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337) at scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) at scala.collection.immutable.List.reduceLeft(List.scala:84) at scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) {code} Issue is that , it is failing but not giving any explicit message as to why it failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org