[jira] [Updated] (SPARK-49804) Incorrect exit code on Kubernetes when deploying with sidecars
[ https://issues.apache.org/jira/browse/SPARK-49804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Dyagilev updated SPARK-49804: - Description: When deploying Spark pods on Kubernetes with sidecars, the reported executor's exit code may be incorrect. For example, the reported executor's exit code is 0, but the actual is 52 (OOM). {code:java} 2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972) - Lost executor 1 on X: The executor with id 1 exited with exit code 0(success). The API gave the following container statuses: container name: fluentd container image: docker-images-release.X.com/X/fluentd:X container state: terminated container started at: 2024-09-25T02:32:17Z container finished at: 2024-09-25T02:34:52Z exit code: 0 termination reason: Completed container name: istio-proxy container image: docker-images-release.X.com/X-istio/proxyv2:X container state: running container started at: 2024-09-25T02:32:16Z container name: spark-kubernetes-executor container image: docker-dev-artifactory.X.com/X/spark-X:X container state: terminated container started at: 2024-09-25T02:32:17Z container finished at: 2024-09-25T02:35:28Z exit code: 52 termination reason: Error {code} was: When deploying Spark pods on Kubernetes with sidecars, the reported executor's exit code may be incorrect. For example, the reported executor's exit code is 0, but the actual is 52. {code:java} 2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972) - Lost executor 1 on X: The executor with id 1 exited with exit code 0(success). The API gave the following container statuses: container name: fluentd container image: docker-images-release.X.com/X/fluentd:X container state: terminated container started at: 2024-09-25T02:32:17Z container finished at: 2024-09-25T02:34:52Z exit code: 0 termination reason: Completed container name: istio-proxy container image: docker-images-release.X.com/X-istio/proxyv2:X container state: running container started at: 2024-09-25T02:32:16Z container name: spark-kubernetes-executor container image: docker-dev-artifactory.X.com/X/spark-X:X container state: terminated container started at: 2024-09-25T02:32:17Z container finished at: 2024-09-25T02:35:28Z exit code: 52 termination reason: Error {code} > Incorrect exit code on Kubernetes when deploying with sidecars > -- > > Key: SPARK-49804 > URL: https://issues.apache.org/jira/browse/SPARK-49804 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.1, 3.4.3, 3.5.3 >Reporter: Oleksiy Dyagilev >Priority: Minor > > When deploying Spark pods on Kubernetes with sidecars, the reported > executor's exit code may be incorrect. > For example, the reported executor's exit code is 0, but the actual is 52 > (OOM). > {code:java} > 2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: > org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972) > - Lost executor 1 on X: The executor with id 1 exited with exit code > 0(success). > > The API gave the following container statuses: > > container name: fluentd > container image: docker-images-release.X.com/X/fluentd:X > container state: terminated > container started at: 2024-09-25T02:32:17Z > container finished at: 2024-09-25T02:34:52Z > exit code: 0 > termination reason: Completed > > container name: istio-proxy > container image: > docker-images-release.X.com/X-istio/proxyv2:X > container state: running > container started at: 2024-09-25T02:32:16Z > > container name: spark-kubernetes-executor > container image: docker-dev-artifactory.X.com/X/spark-X:X > container state: terminated > container started at: 2024-09-25T02:32:17Z > container finished at: 2024-09-25T02:35:28Z > exit code: 52 > termination reason: Error {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49804) Incorrect exit code on Kubernetes when deploying with sidecars
Oleksiy Dyagilev created SPARK-49804: Summary: Incorrect exit code on Kubernetes when deploying with sidecars Key: SPARK-49804 URL: https://issues.apache.org/jira/browse/SPARK-49804 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.5.3, 3.4.3, 3.1.1 Reporter: Oleksiy Dyagilev When deploying Spark pods on Kubernetes with sidecars, the reported executor's exit code may be incorrect. For example, the reported executor's exit code is 0, but the actual is 52. {code:java} 2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972) - Lost executor 1 on X: The executor with id 1 exited with exit code 0(success). The API gave the following container statuses: container name: fluentd container image: docker-images-release.X.com/X/fluentd:X container state: terminated container started at: 2024-09-25T02:32:17Z container finished at: 2024-09-25T02:34:52Z exit code: 0 termination reason: Completed container name: istio-proxy container image: docker-images-release.X.com/X-istio/proxyv2:X container state: running container started at: 2024-09-25T02:32:16Z container name: spark-kubernetes-executor container image: docker-dev-artifactory.X.com/X/spark-X:X container state: terminated container started at: 2024-09-25T02:32:17Z container finished at: 2024-09-25T02:35:28Z exit code: 52 termination reason: Error {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41554) Decimal.changePrecision produces ArrayIndexOutOfBoundsException
Oleksiy Dyagilev created SPARK-41554: Summary: Decimal.changePrecision produces ArrayIndexOutOfBoundsException Key: SPARK-41554 URL: https://issues.apache.org/jira/browse/SPARK-41554 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.1 Reporter: Oleksiy Dyagilev {{Reducing Decimal scale by more than 18 produces exception.}} {code:java} Decimal(1, 38, 19).changePrecision(38, 0){code} {code:java} java.lang.ArrayIndexOutOfBoundsException: 19 at org.apache.spark.sql.types.Decimal.changePrecision(Decimal.scala:377) at org.apache.spark.sql.types.Decimal.changePrecision(Decimal.scala:328){code} Reproducing with SQL query: {code:java} sql("select cast(cast(cast(cast(id as decimal(38,15)) as decimal(38,30)) as decimal(38,37)) as decimal(38,17)) from range(3)").show{code} The bug exists for {{Decimal}} that is stored using compact long only, it works fine with {{Decimal}} that uses {{scala.math.BigDecimal}} internally. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163000#comment-15163000 ] Oleksiy Dyagilev commented on SPARK-1199: - Yes, I did. It doesn't help, the inner class still doesn't have a no-arg constructor visible with a reflection. > Type mismatch in Spark shell when using case class defined in shell > --- > > Key: SPARK-1199 > URL: https://issues.apache.org/jira/browse/SPARK-1199 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.0 >Reporter: Andrew Kerr >Assignee: Prashant Sharma >Priority: Blocker > Fix For: 1.1.0 > > > *NOTE: This issue was fixed in 1.0.1, but the fix was reverted in Spark 1.0.2 > pending further testing. The final fix will be in Spark 1.1.0.* > Define a class in the shell: > {code} > case class TestClass(a:String) > {code} > and an RDD > {code} > val data = sc.parallelize(Seq("a")).map(TestClass(_)) > {code} > define a function on it and map over the RDD > {code} > def itemFunc(a:TestClass):TestClass = a > data.map(itemFunc) > {code} > Error: > {code} > :19: error: type mismatch; > found : TestClass => TestClass > required: TestClass => ? > data.map(itemFunc) > {code} > Similarly with a mapPartitions: > {code} > def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a > data.mapPartitions(partitionFunc) > {code} > {code} > :19: error: type mismatch; > found : Iterator[TestClass] => Iterator[TestClass] > required: Iterator[TestClass] => Iterator[?] > Error occurred in an application involving default arguments. > data.mapPartitions(partitionFunc) > {code} > The behavior is the same whether in local mode or on a cluster. > This isn't specific to RDDs. A Scala collection in the Spark shell has the > same problem. > {code} > scala> Seq(TestClass("foo")).map(itemFunc) > :15: error: type mismatch; > found : TestClass => TestClass > required: TestClass => ? > Seq(TestClass("foo")).map(itemFunc) > ^ > {code} > When run in the Scala console (not the Spark shell) there are no type > mismatch errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159448#comment-15159448 ] Oleksiy Dyagilev commented on SPARK-1199: - Thanks, any other options? I want to be able to define classes in the REPL. > Type mismatch in Spark shell when using case class defined in shell > --- > > Key: SPARK-1199 > URL: https://issues.apache.org/jira/browse/SPARK-1199 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.0 >Reporter: Andrew Kerr >Assignee: Prashant Sharma >Priority: Blocker > Fix For: 1.1.0 > > > *NOTE: This issue was fixed in 1.0.1, but the fix was reverted in Spark 1.0.2 > pending further testing. The final fix will be in Spark 1.1.0.* > Define a class in the shell: > {code} > case class TestClass(a:String) > {code} > and an RDD > {code} > val data = sc.parallelize(Seq("a")).map(TestClass(_)) > {code} > define a function on it and map over the RDD > {code} > def itemFunc(a:TestClass):TestClass = a > data.map(itemFunc) > {code} > Error: > {code} > :19: error: type mismatch; > found : TestClass => TestClass > required: TestClass => ? > data.map(itemFunc) > {code} > Similarly with a mapPartitions: > {code} > def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a > data.mapPartitions(partitionFunc) > {code} > {code} > :19: error: type mismatch; > found : Iterator[TestClass] => Iterator[TestClass] > required: Iterator[TestClass] => Iterator[?] > Error occurred in an application involving default arguments. > data.mapPartitions(partitionFunc) > {code} > The behavior is the same whether in local mode or on a cluster. > This isn't specific to RDDs. A Scala collection in the Spark shell has the > same problem. > {code} > scala> Seq(TestClass("foo")).map(itemFunc) > :15: error: type mismatch; > found : TestClass => TestClass > required: TestClass => ? > Seq(TestClass("foo")).map(itemFunc) > ^ > {code} > When run in the Scala console (not the Spark shell) there are no type > mismatch errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159432#comment-15159432 ] Oleksiy Dyagilev commented on SPARK-1199: - Michael Armbrust, in my use case I have a library that relies on having a default constructor and I want to use this library in the REPL. Any workaround for that? > Type mismatch in Spark shell when using case class defined in shell > --- > > Key: SPARK-1199 > URL: https://issues.apache.org/jira/browse/SPARK-1199 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.0 >Reporter: Andrew Kerr >Assignee: Prashant Sharma >Priority: Blocker > Fix For: 1.1.0 > > > *NOTE: This issue was fixed in 1.0.1, but the fix was reverted in Spark 1.0.2 > pending further testing. The final fix will be in Spark 1.1.0.* > Define a class in the shell: > {code} > case class TestClass(a:String) > {code} > and an RDD > {code} > val data = sc.parallelize(Seq("a")).map(TestClass(_)) > {code} > define a function on it and map over the RDD > {code} > def itemFunc(a:TestClass):TestClass = a > data.map(itemFunc) > {code} > Error: > {code} > :19: error: type mismatch; > found : TestClass => TestClass > required: TestClass => ? > data.map(itemFunc) > {code} > Similarly with a mapPartitions: > {code} > def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a > data.mapPartitions(partitionFunc) > {code} > {code} > :19: error: type mismatch; > found : Iterator[TestClass] => Iterator[TestClass] > required: Iterator[TestClass] => Iterator[?] > Error occurred in an application involving default arguments. > data.mapPartitions(partitionFunc) > {code} > The behavior is the same whether in local mode or on a cluster. > This isn't specific to RDDs. A Scala collection in the Spark shell has the > same problem. > {code} > scala> Seq(TestClass("foo")).map(itemFunc) > :15: error: type mismatch; > found : TestClass => TestClass > required: TestClass => ? > Seq(TestClass("foo")).map(itemFunc) > ^ > {code} > When run in the Scala console (not the Spark shell) there are no type > mismatch errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159338#comment-15159338 ] Oleksiy Dyagilev commented on SPARK-1199: - I have problems with declaring case classes in shell, Spark 1.6 This doesn't work for me: {code} scala> case class ABCD() defined class ABCD scala> new ABCD() res33: ABCD = ABCD() scala> classOf[ABCD].getConstructor() java.lang.NoSuchMethodException: $iwC$$iwC$ABCD.() at java.lang.Class.getConstructor0(Class.java:3074) at java.lang.Class.getConstructor(Class.java:1817) scala> classOf[ABCD].getConstructors() res31: Array[java.lang.reflect.Constructor[_]] = Array(public $iwC$$iwC$ABCD($iwC$$iwC)) {code} > Type mismatch in Spark shell when using case class defined in shell > --- > > Key: SPARK-1199 > URL: https://issues.apache.org/jira/browse/SPARK-1199 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 0.9.0 >Reporter: Andrew Kerr >Assignee: Prashant Sharma >Priority: Blocker > Fix For: 1.1.0 > > > *NOTE: This issue was fixed in 1.0.1, but the fix was reverted in Spark 1.0.2 > pending further testing. The final fix will be in Spark 1.1.0.* > Define a class in the shell: > {code} > case class TestClass(a:String) > {code} > and an RDD > {code} > val data = sc.parallelize(Seq("a")).map(TestClass(_)) > {code} > define a function on it and map over the RDD > {code} > def itemFunc(a:TestClass):TestClass = a > data.map(itemFunc) > {code} > Error: > {code} > :19: error: type mismatch; > found : TestClass => TestClass > required: TestClass => ? > data.map(itemFunc) > {code} > Similarly with a mapPartitions: > {code} > def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a > data.mapPartitions(partitionFunc) > {code} > {code} > :19: error: type mismatch; > found : Iterator[TestClass] => Iterator[TestClass] > required: Iterator[TestClass] => Iterator[?] > Error occurred in an application involving default arguments. > data.mapPartitions(partitionFunc) > {code} > The behavior is the same whether in local mode or on a cluster. > This isn't specific to RDDs. A Scala collection in the Spark shell has the > same problem. > {code} > scala> Seq(TestClass("foo")).map(itemFunc) > :15: error: type mismatch; > found : TestClass => TestClass > required: TestClass => ? > Seq(TestClass("foo")).map(itemFunc) > ^ > {code} > When run in the Scala console (not the Spark shell) there are no type > mismatch errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8525) Bug in Streaming k-means documentation
[ https://issues.apache.org/jira/browse/SPARK-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597798#comment-14597798 ] Oleksiy Dyagilev commented on SPARK-8525: - Makes sense, https://github.com/apache/spark/pull/6954 > Bug in Streaming k-means documentation > -- > > Key: SPARK-8525 > URL: https://issues.apache.org/jira/browse/SPARK-8525 > Project: Spark > Issue Type: Bug > Components: Documentation, MLlib >Affects Versions: 1.4.0 >Reporter: Oleksiy Dyagilev >Priority: Minor > > The expected input format is wrong in Streaming K-means documentation. > https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means > It might be a bug in implementation though, not sure. > There shouldn't be any spaces in test data points. I.e. instead of > (y, [x1, x2, x3]) it should be > (y,[x1,x2,x3]) > The exception thrown > org.apache.spark.SparkException: Cannot parse a double from: > at > org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118) > at > org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103) > at > org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41) > at > org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49) > Also I would improve documentation saying explicitly that expected data types > for both 'x' and 'y' is Double. At the moment it's not obvious especially for > 'y'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8558) Script /dev/run-tests fails when _JAVA_OPTIONS env var set
Oleksiy Dyagilev created SPARK-8558: --- Summary: Script /dev/run-tests fails when _JAVA_OPTIONS env var set Key: SPARK-8558 URL: https://issues.apache.org/jira/browse/SPARK-8558 Project: Spark Issue Type: Bug Components: Build, Tests Affects Versions: 1.4.0 Environment: Centos 6 Reporter: Oleksiy Dyagilev Priority: Minor Script /dev/run-tests.py fails when _JAVA_OPTIONS env. var set. Steps to reproduce in linux: 1. export _JAVA_OPTIONS="-Xmx2048M 2. ./dev/run-tests [pivot@fe2s spark]$ ./dev/run-tests Traceback (most recent call last): File "./dev/run-tests.py", line 793, in main() File "./dev/run-tests.py", line 722, in main java_version = determine_java_version(java_exe) File "./dev/run-tests.py", line 484, in determine_java_version version, update = version_str.split('_') # eg ['1.8.0', '25'] ValueError: need more than 1 value to unpack The problem is in 'determine_java_version' function in run-tests.py. It runs 'java' and extracts version from output. However when _JAVA_OPTIONS set the output of 'java' command is different and it breaks parser. See the first line [pivot@fe2s spark]$ java -version Picked up _JAVA_OPTIONS: -Xmx2048M java version "1.8.0_31" Java(TM) SE Runtime Environment (build 1.8.0_31-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8525) Bug in Streaming k-means documentation
[ https://issues.apache.org/jira/browse/SPARK-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Dyagilev updated SPARK-8525: Description: The expected input format is wrong in Streaming K-means documentation. https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means It might be a bug in implementation though, not sure. There shouldn't be any spaces in test data points. I.e. instead of (y, [x1, x2, x3]) it should be (y,[x1,x2,x3]) The exception thrown org.apache.spark.SparkException: Cannot parse a double from: at org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118) at org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103) at org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41) at org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49) Also I would improve documentation saying explicitly that expected data types for both 'x' and 'y' is Double. At the moment it's not obvious especially for 'y'. was: The expected input format is wrong in Streaming K-means documentation. https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means There shouldn't be any spaces in test data points. I.e. instead of (y, [x1, x2, x3]) it should be (y,[x1,x2,x3]) The exception thrown org.apache.spark.SparkException: Cannot parse a double from: at org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118) at org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103) at org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41) at org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49) Also I would improve documentation saying explicitly that expected data types for both 'x' and 'y' is Double. At the moment it's not obvious especially for 'y'. > Bug in Streaming k-means documentation > -- > > Key: SPARK-8525 > URL: https://issues.apache.org/jira/browse/SPARK-8525 > Project: Spark > Issue Type: Bug > Components: Documentation, MLlib >Affects Versions: 1.4.0 >Reporter: Oleksiy Dyagilev >Priority: Critical > > The expected input format is wrong in Streaming K-means documentation. > https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means > It might be a bug in implementation though, not sure. > There shouldn't be any spaces in test data points. I.e. instead of > (y, [x1, x2, x3]) it should be > (y,[x1,x2,x3]) > The exception thrown > org.apache.spark.SparkException: Cannot parse a double from: > at > org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118) > at > org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103) > at > org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41) > at > org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49) > Also I would improve documentation saying explicitly that expected data types > for both 'x' and 'y' is Double. At the moment it's not obvious especially for > 'y'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8525) Bug in Streaming k-means documentation
Oleksiy Dyagilev created SPARK-8525: --- Summary: Bug in Streaming k-means documentation Key: SPARK-8525 URL: https://issues.apache.org/jira/browse/SPARK-8525 Project: Spark Issue Type: Bug Components: Documentation, MLlib Affects Versions: 1.4.0 Reporter: Oleksiy Dyagilev Priority: Critical The expected input format is wrong in Streaming K-means documentation. https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means There shouldn't be any spaces in test data points. I.e. instead of (y, [x1, x2, x3]) it should be (y,[x1,x2,x3]) The exception thrown org.apache.spark.SparkException: Cannot parse a double from: at org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118) at org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103) at org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41) at org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49) Also I would improve documentation saying explicitly that expected data types for both 'x' and 'y' is Double. At the moment it's not obvious especially for 'y'. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org