[jira] [Updated] (SPARK-49804) Incorrect exit code on Kubernetes when deploying with sidecars

2024-09-26 Thread Oleksiy Dyagilev (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Dyagilev updated SPARK-49804:
-
Description: 
When deploying Spark pods on Kubernetes with sidecars, the reported executor's 
exit code may be incorrect.

For example, the reported executor's exit code is 0, but the actual is 52 (OOM).
{code:java}
2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: 
org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972)
 - Lost executor 1 on X: The executor with id 1 exited with exit code 
0(success).
  
The API gave the following container statuses:
 
     container name: fluentd
     container image: docker-images-release.X.com/X/fluentd:X
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:34:52Z
     exit code: 0
     termination reason: Completed
 
     container name: istio-proxy
     container image: docker-images-release.X.com/X-istio/proxyv2:X
     container state: running
     container started at: 2024-09-25T02:32:16Z
 
     container name: spark-kubernetes-executor
     container image: docker-dev-artifactory.X.com/X/spark-X:X
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:35:28Z
     exit code: 52
     termination reason: Error {code}

  was:
When deploying Spark pods on Kubernetes with sidecars, the reported executor's 
exit code may be incorrect.

For example, the reported executor's exit code is 0, but the actual is 52.
{code:java}
2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: 
org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972)
 - Lost executor 1 on X: The executor with id 1 exited with exit code 
0(success).
  
The API gave the following container statuses:
 
     container name: fluentd
     container image: docker-images-release.X.com/X/fluentd:X
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:34:52Z
     exit code: 0
     termination reason: Completed
 
     container name: istio-proxy
     container image: docker-images-release.X.com/X-istio/proxyv2:X
     container state: running
     container started at: 2024-09-25T02:32:16Z
 
     container name: spark-kubernetes-executor
     container image: docker-dev-artifactory.X.com/X/spark-X:X
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:35:28Z
     exit code: 52
     termination reason: Error {code}


> Incorrect exit code on Kubernetes when deploying with sidecars
> --
>
> Key: SPARK-49804
> URL: https://issues.apache.org/jira/browse/SPARK-49804
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.1, 3.4.3, 3.5.3
>Reporter: Oleksiy Dyagilev
>Priority: Minor
>
> When deploying Spark pods on Kubernetes with sidecars, the reported 
> executor's exit code may be incorrect.
> For example, the reported executor's exit code is 0, but the actual is 52 
> (OOM).
> {code:java}
> 2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: 
> org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972)
>  - Lost executor 1 on X: The executor with id 1 exited with exit code 
> 0(success).
>   
> The API gave the following container statuses:
>  
>      container name: fluentd
>      container image: docker-images-release.X.com/X/fluentd:X
>      container state: terminated
>      container started at: 2024-09-25T02:32:17Z
>      container finished at: 2024-09-25T02:34:52Z
>      exit code: 0
>      termination reason: Completed
>  
>      container name: istio-proxy
>      container image: 
> docker-images-release.X.com/X-istio/proxyv2:X
>      container state: running
>      container started at: 2024-09-25T02:32:16Z
>  
>      container name: spark-kubernetes-executor
>      container image: docker-dev-artifactory.X.com/X/spark-X:X
>      container state: terminated
>      container started at: 2024-09-25T02:32:17Z
>      container finished at: 2024-09-25T02:35:28Z
>      exit code: 52
>      termination reason: Error {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-49804) Incorrect exit code on Kubernetes when deploying with sidecars

2024-09-26 Thread Oleksiy Dyagilev (Jira)
Oleksiy Dyagilev created SPARK-49804:


 Summary: Incorrect exit code on Kubernetes when deploying with 
sidecars
 Key: SPARK-49804
 URL: https://issues.apache.org/jira/browse/SPARK-49804
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.5.3, 3.4.3, 3.1.1
Reporter: Oleksiy Dyagilev


When deploying Spark pods on Kubernetes with sidecars, the reported executor's 
exit code may be incorrect.

For example, the reported executor's exit code is 0, but the actual is 52.
{code:java}
2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: 
org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972)
 - Lost executor 1 on X: The executor with id 1 exited with exit code 
0(success).
  
The API gave the following container statuses:
 
     container name: fluentd
     container image: docker-images-release.X.com/X/fluentd:X
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:34:52Z
     exit code: 0
     termination reason: Completed
 
     container name: istio-proxy
     container image: docker-images-release.X.com/X-istio/proxyv2:X
     container state: running
     container started at: 2024-09-25T02:32:16Z
 
     container name: spark-kubernetes-executor
     container image: docker-dev-artifactory.X.com/X/spark-X:X
     container state: terminated
     container started at: 2024-09-25T02:32:17Z
     container finished at: 2024-09-25T02:35:28Z
     exit code: 52
     termination reason: Error {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41554) Decimal.changePrecision produces ArrayIndexOutOfBoundsException

2022-12-16 Thread Oleksiy Dyagilev (Jira)
Oleksiy Dyagilev created SPARK-41554:


 Summary: Decimal.changePrecision produces 
ArrayIndexOutOfBoundsException
 Key: SPARK-41554
 URL: https://issues.apache.org/jira/browse/SPARK-41554
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.1
Reporter: Oleksiy Dyagilev


{{Reducing Decimal scale by more than 18 produces exception.}}
{code:java}
Decimal(1, 38, 19).changePrecision(38, 0){code}
{code:java}
java.lang.ArrayIndexOutOfBoundsException: 19
    at org.apache.spark.sql.types.Decimal.changePrecision(Decimal.scala:377)
    at 
org.apache.spark.sql.types.Decimal.changePrecision(Decimal.scala:328){code}
Reproducing with SQL query:
{code:java}
sql("select cast(cast(cast(cast(id as decimal(38,15)) as decimal(38,30)) as 
decimal(38,37)) as decimal(38,17)) from range(3)").show{code}
The bug exists for {{Decimal}} that is stored using compact long only, it works 
fine with {{Decimal}} that uses {{scala.math.BigDecimal}} internally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell

2016-02-24 Thread Oleksiy Dyagilev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163000#comment-15163000
 ] 

Oleksiy Dyagilev commented on SPARK-1199:
-

Yes, I did. It doesn't help, the inner class still doesn't have a no-arg 
constructor visible with a reflection.

> Type mismatch in Spark shell when using case class defined in shell
> ---
>
> Key: SPARK-1199
> URL: https://issues.apache.org/jira/browse/SPARK-1199
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.0
>Reporter: Andrew Kerr
>Assignee: Prashant Sharma
>Priority: Blocker
> Fix For: 1.1.0
>
>
> *NOTE: This issue was fixed in 1.0.1, but the fix was reverted in Spark 1.0.2 
> pending further testing. The final fix will be in Spark 1.1.0.*
> Define a class in the shell:
> {code}
> case class TestClass(a:String)
> {code}
> and an RDD
> {code}
> val data = sc.parallelize(Seq("a")).map(TestClass(_))
> {code}
> define a function on it and map over the RDD
> {code}
> def itemFunc(a:TestClass):TestClass = a
> data.map(itemFunc)
> {code}
> Error:
> {code}
> :19: error: type mismatch;
>  found   : TestClass => TestClass
>  required: TestClass => ?
>   data.map(itemFunc)
> {code}
> Similarly with a mapPartitions:
> {code}
> def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a
> data.mapPartitions(partitionFunc)
> {code}
> {code}
> :19: error: type mismatch;
>  found   : Iterator[TestClass] => Iterator[TestClass]
>  required: Iterator[TestClass] => Iterator[?]
> Error occurred in an application involving default arguments.
>   data.mapPartitions(partitionFunc)
> {code}
> The behavior is the same whether in local mode or on a cluster.
> This isn't specific to RDDs. A Scala collection in the Spark shell has the 
> same problem.
> {code}
> scala> Seq(TestClass("foo")).map(itemFunc)
> :15: error: type mismatch;
>  found   : TestClass => TestClass
>  required: TestClass => ?
>   Seq(TestClass("foo")).map(itemFunc)
> ^
> {code}
> When run in the Scala console (not the Spark shell) there are no type 
> mismatch errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell

2016-02-23 Thread Oleksiy Dyagilev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159448#comment-15159448
 ] 

Oleksiy Dyagilev commented on SPARK-1199:
-

Thanks, any other options? I want to be able to define classes in the REPL.

> Type mismatch in Spark shell when using case class defined in shell
> ---
>
> Key: SPARK-1199
> URL: https://issues.apache.org/jira/browse/SPARK-1199
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.0
>Reporter: Andrew Kerr
>Assignee: Prashant Sharma
>Priority: Blocker
> Fix For: 1.1.0
>
>
> *NOTE: This issue was fixed in 1.0.1, but the fix was reverted in Spark 1.0.2 
> pending further testing. The final fix will be in Spark 1.1.0.*
> Define a class in the shell:
> {code}
> case class TestClass(a:String)
> {code}
> and an RDD
> {code}
> val data = sc.parallelize(Seq("a")).map(TestClass(_))
> {code}
> define a function on it and map over the RDD
> {code}
> def itemFunc(a:TestClass):TestClass = a
> data.map(itemFunc)
> {code}
> Error:
> {code}
> :19: error: type mismatch;
>  found   : TestClass => TestClass
>  required: TestClass => ?
>   data.map(itemFunc)
> {code}
> Similarly with a mapPartitions:
> {code}
> def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a
> data.mapPartitions(partitionFunc)
> {code}
> {code}
> :19: error: type mismatch;
>  found   : Iterator[TestClass] => Iterator[TestClass]
>  required: Iterator[TestClass] => Iterator[?]
> Error occurred in an application involving default arguments.
>   data.mapPartitions(partitionFunc)
> {code}
> The behavior is the same whether in local mode or on a cluster.
> This isn't specific to RDDs. A Scala collection in the Spark shell has the 
> same problem.
> {code}
> scala> Seq(TestClass("foo")).map(itemFunc)
> :15: error: type mismatch;
>  found   : TestClass => TestClass
>  required: TestClass => ?
>   Seq(TestClass("foo")).map(itemFunc)
> ^
> {code}
> When run in the Scala console (not the Spark shell) there are no type 
> mismatch errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell

2016-02-23 Thread Oleksiy Dyagilev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159432#comment-15159432
 ] 

Oleksiy Dyagilev commented on SPARK-1199:
-

 Michael Armbrust,
in my use case I have a library that relies on having a default constructor and 
I want to use this library in the REPL. Any workaround for that?

> Type mismatch in Spark shell when using case class defined in shell
> ---
>
> Key: SPARK-1199
> URL: https://issues.apache.org/jira/browse/SPARK-1199
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.0
>Reporter: Andrew Kerr
>Assignee: Prashant Sharma
>Priority: Blocker
> Fix For: 1.1.0
>
>
> *NOTE: This issue was fixed in 1.0.1, but the fix was reverted in Spark 1.0.2 
> pending further testing. The final fix will be in Spark 1.1.0.*
> Define a class in the shell:
> {code}
> case class TestClass(a:String)
> {code}
> and an RDD
> {code}
> val data = sc.parallelize(Seq("a")).map(TestClass(_))
> {code}
> define a function on it and map over the RDD
> {code}
> def itemFunc(a:TestClass):TestClass = a
> data.map(itemFunc)
> {code}
> Error:
> {code}
> :19: error: type mismatch;
>  found   : TestClass => TestClass
>  required: TestClass => ?
>   data.map(itemFunc)
> {code}
> Similarly with a mapPartitions:
> {code}
> def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a
> data.mapPartitions(partitionFunc)
> {code}
> {code}
> :19: error: type mismatch;
>  found   : Iterator[TestClass] => Iterator[TestClass]
>  required: Iterator[TestClass] => Iterator[?]
> Error occurred in an application involving default arguments.
>   data.mapPartitions(partitionFunc)
> {code}
> The behavior is the same whether in local mode or on a cluster.
> This isn't specific to RDDs. A Scala collection in the Spark shell has the 
> same problem.
> {code}
> scala> Seq(TestClass("foo")).map(itemFunc)
> :15: error: type mismatch;
>  found   : TestClass => TestClass
>  required: TestClass => ?
>   Seq(TestClass("foo")).map(itemFunc)
> ^
> {code}
> When run in the Scala console (not the Spark shell) there are no type 
> mismatch errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell

2016-02-23 Thread Oleksiy Dyagilev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159338#comment-15159338
 ] 

Oleksiy Dyagilev commented on SPARK-1199:
-

I have problems with declaring case classes in shell, Spark 1.6

This doesn't work for me:

{code}
scala> case class ABCD()
defined class ABCD

scala> new ABCD()
res33: ABCD = ABCD()

scala> classOf[ABCD].getConstructor()
java.lang.NoSuchMethodException: $iwC$$iwC$ABCD.()
 at java.lang.Class.getConstructor0(Class.java:3074)
 at java.lang.Class.getConstructor(Class.java:1817)

scala> classOf[ABCD].getConstructors()
res31: Array[java.lang.reflect.Constructor[_]] = Array(public 
$iwC$$iwC$ABCD($iwC$$iwC))
{code}

> Type mismatch in Spark shell when using case class defined in shell
> ---
>
> Key: SPARK-1199
> URL: https://issues.apache.org/jira/browse/SPARK-1199
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.9.0
>Reporter: Andrew Kerr
>Assignee: Prashant Sharma
>Priority: Blocker
> Fix For: 1.1.0
>
>
> *NOTE: This issue was fixed in 1.0.1, but the fix was reverted in Spark 1.0.2 
> pending further testing. The final fix will be in Spark 1.1.0.*
> Define a class in the shell:
> {code}
> case class TestClass(a:String)
> {code}
> and an RDD
> {code}
> val data = sc.parallelize(Seq("a")).map(TestClass(_))
> {code}
> define a function on it and map over the RDD
> {code}
> def itemFunc(a:TestClass):TestClass = a
> data.map(itemFunc)
> {code}
> Error:
> {code}
> :19: error: type mismatch;
>  found   : TestClass => TestClass
>  required: TestClass => ?
>   data.map(itemFunc)
> {code}
> Similarly with a mapPartitions:
> {code}
> def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a
> data.mapPartitions(partitionFunc)
> {code}
> {code}
> :19: error: type mismatch;
>  found   : Iterator[TestClass] => Iterator[TestClass]
>  required: Iterator[TestClass] => Iterator[?]
> Error occurred in an application involving default arguments.
>   data.mapPartitions(partitionFunc)
> {code}
> The behavior is the same whether in local mode or on a cluster.
> This isn't specific to RDDs. A Scala collection in the Spark shell has the 
> same problem.
> {code}
> scala> Seq(TestClass("foo")).map(itemFunc)
> :15: error: type mismatch;
>  found   : TestClass => TestClass
>  required: TestClass => ?
>   Seq(TestClass("foo")).map(itemFunc)
> ^
> {code}
> When run in the Scala console (not the Spark shell) there are no type 
> mismatch errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8525) Bug in Streaming k-means documentation

2015-06-23 Thread Oleksiy Dyagilev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597798#comment-14597798
 ] 

Oleksiy Dyagilev commented on SPARK-8525:
-

Makes sense, https://github.com/apache/spark/pull/6954

> Bug in Streaming k-means documentation
> --
>
> Key: SPARK-8525
> URL: https://issues.apache.org/jira/browse/SPARK-8525
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, MLlib
>Affects Versions: 1.4.0
>Reporter: Oleksiy Dyagilev
>Priority: Minor
>
> The expected input format is wrong in Streaming K-means documentation.
> https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means
> It might be a bug in implementation though, not sure.
> There shouldn't be any spaces in test data points. I.e. instead of 
> (y, [x1, x2, x3]) it should be
> (y,[x1,x2,x3])
> The exception thrown 
> org.apache.spark.SparkException: Cannot parse a double from:  
>   at 
> org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118)
>   at 
> org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103)
>   at 
> org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41)
>   at 
> org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49)
> Also I would improve documentation saying explicitly that expected data types 
> for both 'x' and 'y' is Double. At the moment it's not obvious especially for 
> 'y'. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8558) Script /dev/run-tests fails when _JAVA_OPTIONS env var set

2015-06-23 Thread Oleksiy Dyagilev (JIRA)
Oleksiy Dyagilev created SPARK-8558:
---

 Summary: Script /dev/run-tests fails when _JAVA_OPTIONS env var set
 Key: SPARK-8558
 URL: https://issues.apache.org/jira/browse/SPARK-8558
 Project: Spark
  Issue Type: Bug
  Components: Build, Tests
Affects Versions: 1.4.0
 Environment: Centos 6
Reporter: Oleksiy Dyagilev
Priority: Minor


Script /dev/run-tests.py fails when _JAVA_OPTIONS env. var set.

Steps to reproduce in linux:
1. export _JAVA_OPTIONS="-Xmx2048M
2. ./dev/run-tests

[pivot@fe2s spark]$ ./dev/run-tests
Traceback (most recent call last):
  File "./dev/run-tests.py", line 793, in 
main()
  File "./dev/run-tests.py", line 722, in main
java_version = determine_java_version(java_exe)
  File "./dev/run-tests.py", line 484, in determine_java_version
version, update = version_str.split('_')  # eg ['1.8.0', '25']
ValueError: need more than 1 value to unpack

The problem is in 'determine_java_version' function in run-tests.py.
It runs 'java' and extracts version from output. However when _JAVA_OPTIONS set 
the output of 'java' command is different and it breaks parser. See the first 
line

[pivot@fe2s spark]$ java -version
Picked up _JAVA_OPTIONS: -Xmx2048M
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8525) Bug in Streaming k-means documentation

2015-06-22 Thread Oleksiy Dyagilev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Dyagilev updated SPARK-8525:

Description: 
The expected input format is wrong in Streaming K-means documentation.
https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means

It might be a bug in implementation though, not sure.

There shouldn't be any spaces in test data points. I.e. instead of 
(y, [x1, x2, x3]) it should be
(y,[x1,x2,x3])

The exception thrown 
org.apache.spark.SparkException: Cannot parse a double from:  
at 
org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118)
at 
org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103)
at 
org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41)
at 
org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49)


Also I would improve documentation saying explicitly that expected data types 
for both 'x' and 'y' is Double. At the moment it's not obvious especially for 
'y'. 



  was:
The expected input format is wrong in Streaming K-means documentation.
https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means

There shouldn't be any spaces in test data points. I.e. instead of 
(y, [x1, x2, x3]) it should be
(y,[x1,x2,x3])

The exception thrown 
org.apache.spark.SparkException: Cannot parse a double from:  
at 
org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118)
at 
org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103)
at 
org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41)
at 
org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49)



Also I would improve documentation saying explicitly that expected data types 
for both 'x' and 'y' is Double. At the moment it's not obvious especially for 
'y'. 




> Bug in Streaming k-means documentation
> --
>
> Key: SPARK-8525
> URL: https://issues.apache.org/jira/browse/SPARK-8525
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, MLlib
>Affects Versions: 1.4.0
>Reporter: Oleksiy Dyagilev
>Priority: Critical
>
> The expected input format is wrong in Streaming K-means documentation.
> https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means
> It might be a bug in implementation though, not sure.
> There shouldn't be any spaces in test data points. I.e. instead of 
> (y, [x1, x2, x3]) it should be
> (y,[x1,x2,x3])
> The exception thrown 
> org.apache.spark.SparkException: Cannot parse a double from:  
>   at 
> org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118)
>   at 
> org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103)
>   at 
> org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41)
>   at 
> org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49)
> Also I would improve documentation saying explicitly that expected data types 
> for both 'x' and 'y' is Double. At the moment it's not obvious especially for 
> 'y'. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8525) Bug in Streaming k-means documentation

2015-06-22 Thread Oleksiy Dyagilev (JIRA)
Oleksiy Dyagilev created SPARK-8525:
---

 Summary: Bug in Streaming k-means documentation
 Key: SPARK-8525
 URL: https://issues.apache.org/jira/browse/SPARK-8525
 Project: Spark
  Issue Type: Bug
  Components: Documentation, MLlib
Affects Versions: 1.4.0
Reporter: Oleksiy Dyagilev
Priority: Critical


The expected input format is wrong in Streaming K-means documentation.
https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means

There shouldn't be any spaces in test data points. I.e. instead of 
(y, [x1, x2, x3]) it should be
(y,[x1,x2,x3])

The exception thrown 
org.apache.spark.SparkException: Cannot parse a double from:  
at 
org.apache.spark.mllib.util.NumericParser$.parseDouble(NumericParser.scala:118)
at 
org.apache.spark.mllib.util.NumericParser$.parseTuple(NumericParser.scala:103)
at 
org.apache.spark.mllib.util.NumericParser$.parse(NumericParser.scala:41)
at 
org.apache.spark.mllib.regression.LabeledPoint$.parse(LabeledPoint.scala:49)



Also I would improve documentation saying explicitly that expected data types 
for both 'x' and 'y' is Double. At the moment it's not obvious especially for 
'y'. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org