[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060650#comment-15060650
 ] 

Jakob Odersky commented on SPARK-12350:
---

A git-bisect showed that the issue was introduced in 
4a46b8859d3314b5b45a67cdc5c81fecb6e9e78c, a commit that fixes SPARK-11563.
[~vanzin], any idea what could have gone wrong?

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12374) Improve performance of range API via adding logical/physical operators

2015-12-16 Thread Xiao Li (JIRA)
Xiao Li created SPARK-12374:
---

 Summary: Improve performance of range API via adding 
logical/physical operators
 Key: SPARK-12374
 URL: https://issues.apache.org/jira/browse/SPARK-12374
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Xiao Li
Priority: Critical


Creating an actual logical/physical operator for range for matching the 
performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12374:


Assignee: Apache Spark

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12374:


Assignee: (was: Apache Spark)

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12350:


Assignee: Apache Spark

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>Assignee: Apache Spark
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12372) Unary operator "-" fails for MLlib vectors

2015-12-16 Thread Christos Iraklis Tsatsoulis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060858#comment-15060858
 ] 

Christos Iraklis Tsatsoulis commented on SPARK-12372:
-

If this is the case, then a warning/clarification in the documentation wouldn't 
hurt - Spark users are not supposed to be aware of the internal "ongoing 
discussions" between Spark developers (BTW, any relevant link would be very 
welcome - I could not find any mention in MLlib & Breeze docs, neither in the 
recent preprint papers on linalg & MLlib).
All in all, I suggest you re-open the issue with a different type (it's not a 
bug, as you say), and the required resolution being a notification in the 
relevant docs ("don't try this..., because...").

> Unary operator "-" fails for MLlib vectors
> --
>
> Key: SPARK-12372
> URL: https://issues.apache.org/jira/browse/SPARK-12372
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.5.2
>Reporter: Christos Iraklis Tsatsoulis
>
> Consider the following snippet in pyspark 1.5.2:
> {code:none}
> >>> from pyspark.mllib.linalg import Vectors
> >>> x = Vectors.dense([0.0, 1.0, 0.0, 7.0, 0.0])
> >>> x
> DenseVector([0.0, 1.0, 0.0, 7.0, 0.0])
> >>> -x
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: func() takes exactly 2 arguments (1 given)
> >>> y = Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0])
> >>> y
> DenseVector([2.0, 0.0, 3.0, 4.0, 5.0])
> >>> x-y
> DenseVector([-2.0, 1.0, -3.0, 3.0, -5.0])
> >>> -y+x
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: func() takes exactly 2 arguments (1 given)
> >>> -1*x
> DenseVector([-0.0, -1.0, -0.0, -7.0, -0.0])
> {code}
> Clearly, the unary operator {{-}} (minus) for vectors fails, giving errors 
> for expressions like {{-x}} and {{-y+x}}, despite the fact that {{x-y}} 
> behaves as expected.
> The last operation, {{-1*x}}, although mathematically "correct", includes 
> minus signs for the zero entries, which again is normally not expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060873#comment-15060873
 ] 

Apache Spark commented on SPARK-12350:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/10337

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>Assignee: Apache Spark
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060732#comment-15060732
 ] 

Jakob Odersky commented on SPARK-12350:
---

No functionality is broken, so if the exception can be silenced it would be a 
possible fix.

However, even if there is no loss of functionality, should the exception not be 
treated as an error?

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060819#comment-15060819
 ] 

Jakob Odersky commented on SPARK-12350:
---

Ok, thanks!

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12331) R^2 for regression through the origin

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060875#comment-15060875
 ] 

Joseph K. Bradley commented on SPARK-12331:
---

+1 for this change based on the description (though I haven't checked the code 
& references myself).  CCing [~dbtsai]

It'd be great to add a unit test comparing with R results on the same data.

> R^2 for regression through the origin
> -
>
> Key: SPARK-12331
> URL: https://issues.apache.org/jira/browse/SPARK-12331
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Imran Younus
>Priority: Minor
>
> The value of R^2 (coefficient of determination) obtained from 
> LinearRegressionModel is not consistent with R and statsmodels when the 
> fitIntercept is false i.e., regression through the origin. In this case, both 
> R and statsmodels use the definition of R^2 given by eq(4') in the following 
> review paper:
> https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf
> Here is the definition from this paper:
> R^2 = \sum(\hat( y)_i^2)/\sum(y_i^2)
> The paper also describes why this should be the case. I've double checked 
> that the value of R^2 from statsmodels and R are consistent with this 
> definition. On the other hand, scikit-learn doesn't use the above definition. 
> I would recommend using the above definition in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12371) Make sure Dataset nullability conforms to its underlying logical plan

2015-12-16 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-12371:
--

 Summary: Make sure Dataset nullability conforms to its underlying 
logical plan
 Key: SPARK-12371
 URL: https://issues.apache.org/jira/browse/SPARK-12371
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.6.0, 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Currently it's possible to construct a Dataset with different nullability from 
its underlying logical plan, which should be caught during analysis phase:

{code}
val rowRDD = sqlContext.sparkContext.parallelize(Seq(Row("hello"), Row(null)))
val schema = StructType(Seq(StructField("_1", StringType, nullable = false)))
val df = sqlContext.createDataFrame(rowRDD, schema)
df.as[Tuple1[String]].collect().foreach(println)

// Output:
//
//   (hello)
//   (null)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12345) Mesos cluster mode is broken

2015-12-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060475#comment-15060475
 ] 

Apache Spark commented on SPARK-12345:
--

User 'tnachen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10332

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>Priority: Critical
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12054) Consider nullable in codegen

2015-12-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060489#comment-15060489
 ] 

Apache Spark commented on SPARK-12054:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/10333

> Consider nullable in codegen
> 
>
> Key: SPARK-12054
> URL: https://issues.apache.org/jira/browse/SPARK-12054
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Currently, we always check the nullability for results of expressions, we 
> could skip that if the expression is not nullable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12361) Should set PYSPARK_DRIVER_PYTHON before python test

2015-12-16 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-12361:
---
Assignee: Jeff Zhang

> Should set PYSPARK_DRIVER_PYTHON before python test
> ---
>
> Key: SPARK-12361
> URL: https://issues.apache.org/jira/browse/SPARK-12361
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 1.6.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>Priority: Minor
>
> If PYSPARK_DRIVER_PYTHON is not set, python version mismatch exception may 
> happen (when I set PYSPARK_DRIVER_PYTHON in .profile). And the weird thing is 
> that this exception won't cause the unit test fail. The return_code is still 
> 0 which hide the unit test failure. And if I invoke the test command 
> directly, I can see the return code is not 0. This is very weird. 
> * invoke unit test command directly
> {code}
> export SPARK_TESTING = 1
> export PYSPARK_PYTHON=python2.6
> bin/pyspark pyspark.ml.clustering  
> {code}
> * return code from python unit test
> {code}
> retcode = subprocess.Popen(
> [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
> stderr=per_test_output, stdout=per_test_output, env=env).wait()
> {code}
> * exception of python version mismatch
> {code}
>  File "/Users/jzhang/github/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 64, in main
> ("%d.%d" % sys.version_info[:2], version))
> Exception: Python in worker has different version 2.6 than that in driver 
> 2.7, PySpark cannot run with different minor versions
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207)
> at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12057) Prevent failure on corrupt JSON records

2015-12-16 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12057:
-
Target Version/s: 1.6.1, 2.0.0  (was: 1.6.0)

> Prevent failure on corrupt JSON records
> ---
>
> Key: SPARK-12057
> URL: https://issues.apache.org/jira/browse/SPARK-12057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ian Macalinao
>Priority: Minor
>
> Return failed record when a record cannot be parsed. Allows parsing of files 
> containing corrupt records of any form. Currently a corrupt record throws an 
> exception, causing the entire job to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12273) Spark Streaming Web UI does not list Receivers in order

2015-12-16 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-12273:
-
Assignee: Liwei Lin

> Spark Streaming Web UI does not list Receivers in order
> ---
>
> Key: SPARK-12273
> URL: https://issues.apache.org/jira/browse/SPARK-12273
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming, Web UI
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>Assignee: Liwei Lin
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: Spark-12273.png
>
>
> Currently the Streaming web UI does NOT list Receivers in order, while it 
> seems more convenient for the users if Receivers are listed in order.
> !Spark-12273.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12361) Should set PYSPARK_DRIVER_PYTHON before python test

2015-12-16 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-12361:
---
Target Version/s:   (was: 1.6.1)

> Should set PYSPARK_DRIVER_PYTHON before python test
> ---
>
> Key: SPARK-12361
> URL: https://issues.apache.org/jira/browse/SPARK-12361
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 1.6.0
>Reporter: Jeff Zhang
>Priority: Minor
>
> If PYSPARK_DRIVER_PYTHON is not set, python version mismatch exception may 
> happen (when I set PYSPARK_DRIVER_PYTHON in .profile). And the weird thing is 
> that this exception won't cause the unit test fail. The return_code is still 
> 0 which hide the unit test failure. And if I invoke the test command 
> directly, I can see the return code is not 0. This is very weird. 
> * invoke unit test command directly
> {code}
> export SPARK_TESTING = 1
> export PYSPARK_PYTHON=python2.6
> bin/pyspark pyspark.ml.clustering  
> {code}
> * return code from python unit test
> {code}
> retcode = subprocess.Popen(
> [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
> stderr=per_test_output, stdout=per_test_output, env=env).wait()
> {code}
> * exception of python version mismatch
> {code}
>  File "/Users/jzhang/github/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 64, in main
> ("%d.%d" % sys.version_info[:2], version))
> Exception: Python in worker has different version 2.6 than that in driver 
> 2.7, PySpark cannot run with different minor versions
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207)
> at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12373) Type coercion rule of dividing two decimal values may choose an intermediate precision that does not have enough number of digits at the left of decimal point

2015-12-16 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12373:
-
Summary: Type coercion rule of dividing two decimal values may choose an 
intermediate precision that does not have enough number of digits at the left 
of decimal point   (was: Type coercion rule of dividing two decimal values may 
choose an intermediate precision that does not enough number of digits at the 
left of decimal point )

> Type coercion rule of dividing two decimal values may choose an intermediate 
> precision that does not have enough number of digits at the left of decimal 
> point 
> ---
>
> Key: SPARK-12373
> URL: https://issues.apache.org/jira/browse/SPARK-12373
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>
> Looks like the {{widerDecimalType}} at 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala#L432
>  can produce something like {{(38, 38)}} when we have have two operand types 
> {{Decimal(38, 0)}} and {{Decimal(38, 38)}}. We should take a look at if there 
> is more reasonable way to handle precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060740#comment-15060740
 ] 

Marcelo Vanzin commented on SPARK-12350:


bq. should the exception not be treated as an error?

No because the class might exist in other class loaders in the chain, as is the 
case here.

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12364) Add ML example for SparkR

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-12364:
--
Assignee: Yanbo Liang  (was: Apache Spark)

> Add ML example for SparkR
> -
>
> Key: SPARK-12364
> URL: https://issues.apache.org/jira/browse/SPARK-12364
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
> Fix For: 1.6.1, 2.0.0
>
>
> Add ML example for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12364) Add ML example for SparkR

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-12364.
---
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10324
[https://github.com/apache/spark/pull/10324]

> Add ML example for SparkR
> -
>
> Key: SPARK-12364
> URL: https://issues.apache.org/jira/browse/SPARK-12364
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Assignee: Apache Spark
> Fix For: 2.0.0, 1.6.1
>
>
> Add ML example for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12363) PowerIterationClustering test case failed if we deprecated KMeans.setRuns

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-12363:
--
Priority: Minor  (was: Major)

> PowerIterationClustering test case failed if we deprecated KMeans.setRuns
> -
>
> Key: SPARK-12363
> URL: https://issues.apache.org/jira/browse/SPARK-12363
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Yanbo Liang
>Priority: Minor
>
> We plan to deprecated `runs` of KMeans, PowerIterationClustering will 
> leverage KMeans to train model.
> I removed `setRuns` used in PowerIterationClustering, but one of the test 
> cases failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12363) PowerIterationClustering test case failed if we deprecated KMeans.setRuns

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060862#comment-15060862
 ] 

Joseph K. Bradley commented on SPARK-12363:
---

Thanks for identifying this.  What do the predictions look like?  Does it 
improve if you increase the number of iterations KMeans runs for when called 
from PIC?

It seems like an intuitively reasonable test, but I could see it failing for 
bad initial cluster centers or if KMeans needs to run for more iterations.

> PowerIterationClustering test case failed if we deprecated KMeans.setRuns
> -
>
> Key: SPARK-12363
> URL: https://issues.apache.org/jira/browse/SPARK-12363
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Yanbo Liang
>
> We plan to deprecated `runs` of KMeans, PowerIterationClustering will 
> leverage KMeans to train model.
> I removed `setRuns` used in PowerIterationClustering, but one of the test 
> cases failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12304) Make Spark Streaming web UI display more friendly Receiver graphs

2015-12-16 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-12304:
-
Assignee: Liwei Lin

> Make Spark Streaming web UI display more friendly Receiver graphs
> -
>
> Key: SPARK-12304
> URL: https://issues.apache.org/jira/browse/SPARK-12304
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Liwei Lin
>Assignee: Liwei Lin
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: after-5.png, before-5.png
>
>
> Currently, the Spark Streaming web UI uses the same maxY when displays 'Input 
> Rate Times& Histograms' and 'Per-Receiver Times& Histograms'. 
> This may lead to somewhat un-friendly graphs: once we have tens of Receivers 
> or more, every 'Per-Receiver Times' line almost hits the ground.
> This issue proposes to calculate a new maxY against the original one, which 
> is shared among all the `Per-Receiver Times& Histograms' graphs.
> Before:
> !before-5.png!
> After:
> !after-5.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12372) Unary operator "-" fails for MLlib vectors

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060699#comment-15060699
 ] 

Joseph K. Bradley commented on SPARK-12372:
---

There simply isn't a unary operation.  There are ongoing discussions about 
turning MLlib vectors and matrices into a full-fledged local linear algebra 
library, but currently, you could convert to numpy/scipy and use those library 
for pyspark.

> Unary operator "-" fails for MLlib vectors
> --
>
> Key: SPARK-12372
> URL: https://issues.apache.org/jira/browse/SPARK-12372
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.5.2
>Reporter: Christos Iraklis Tsatsoulis
>
> Consider the following snippet in pyspark 1.5.2:
> {code:none}
> >>> from pyspark.mllib.linalg import Vectors
> >>> x = Vectors.dense([0.0, 1.0, 0.0, 7.0, 0.0])
> >>> x
> DenseVector([0.0, 1.0, 0.0, 7.0, 0.0])
> >>> -x
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: func() takes exactly 2 arguments (1 given)
> >>> y = Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0])
> >>> y
> DenseVector([2.0, 0.0, 3.0, 4.0, 5.0])
> >>> x-y
> DenseVector([-2.0, 1.0, -3.0, 3.0, -5.0])
> >>> -y+x
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: func() takes exactly 2 arguments (1 given)
> >>> -1*x
> DenseVector([-0.0, -1.0, -0.0, -7.0, -0.0])
> {code}
> Clearly, the unary operator {{-}} (minus) for vectors fails, giving errors 
> for expressions like {{-x}} and {{-y+x}}, despite the fact that {{x-y}} 
> behaves as expected.
> The last operation, {{-1*x}}, although mathematically "correct", includes 
> minus signs for the zero entries, which again is normally not expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12372) Unary operator "-" fails for MLlib vectors

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley closed SPARK-12372.
-
Resolution: Not A Problem

> Unary operator "-" fails for MLlib vectors
> --
>
> Key: SPARK-12372
> URL: https://issues.apache.org/jira/browse/SPARK-12372
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.5.2
>Reporter: Christos Iraklis Tsatsoulis
>
> Consider the following snippet in pyspark 1.5.2:
> {code:none}
> >>> from pyspark.mllib.linalg import Vectors
> >>> x = Vectors.dense([0.0, 1.0, 0.0, 7.0, 0.0])
> >>> x
> DenseVector([0.0, 1.0, 0.0, 7.0, 0.0])
> >>> -x
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: func() takes exactly 2 arguments (1 given)
> >>> y = Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0])
> >>> y
> DenseVector([2.0, 0.0, 3.0, 4.0, 5.0])
> >>> x-y
> DenseVector([-2.0, 1.0, -3.0, 3.0, -5.0])
> >>> -y+x
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: func() takes exactly 2 arguments (1 given)
> >>> -1*x
> DenseVector([-0.0, -1.0, -0.0, -7.0, -0.0])
> {code}
> Clearly, the unary operator {{-}} (minus) for vectors fails, giving errors 
> for expressions like {{-x}} and {{-y+x}}, despite the fact that {{x-y}} 
> behaves as expected.
> The last operation, {{-1*x}}, although mathematically "correct", includes 
> minus signs for the zero entries, which again is normally not expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12374:


Assignee: Apache Spark

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12374:


Assignee: (was: Apache Spark)

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060773#comment-15060773
 ] 

Jakob Odersky commented on SPARK-12350:
---

Ok, but then why throw an exception in the first place?

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12374:


Assignee: Apache Spark

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12364) Add ML example for SparkR

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-12364:
--
Assignee: Yanbo Liang

> Add ML example for SparkR
> -
>
> Key: SPARK-12364
> URL: https://issues.apache.org/jira/browse/SPARK-12364
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>
> Add ML example for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12361) Should set PYSPARK_DRIVER_PYTHON before python test

2015-12-16 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-12361.

   Resolution: Fixed
Fix Version/s: 2.0.0

Fixed by https://github.com/apache/spark/pull/10322

> Should set PYSPARK_DRIVER_PYTHON before python test
> ---
>
> Key: SPARK-12361
> URL: https://issues.apache.org/jira/browse/SPARK-12361
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 1.6.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>Priority: Minor
> Fix For: 2.0.0
>
>
> If PYSPARK_DRIVER_PYTHON is not set, python version mismatch exception may 
> happen (when I set PYSPARK_DRIVER_PYTHON in .profile). And the weird thing is 
> that this exception won't cause the unit test fail. The return_code is still 
> 0 which hide the unit test failure. And if I invoke the test command 
> directly, I can see the return code is not 0. This is very weird. 
> * invoke unit test command directly
> {code}
> export SPARK_TESTING = 1
> export PYSPARK_PYTHON=python2.6
> bin/pyspark pyspark.ml.clustering  
> {code}
> * return code from python unit test
> {code}
> retcode = subprocess.Popen(
> [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
> stderr=per_test_output, stdout=per_test_output, env=env).wait()
> {code}
> * exception of python version mismatch
> {code}
>  File "/Users/jzhang/github/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 64, in main
> ("%d.%d" % sys.version_info[:2], version))
> Exception: Python in worker has different version 2.6 than that in driver 
> 2.7, PySpark cannot run with different minor versions
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207)
> at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12373) Type coercion rule of dividing two decimal values may choose an intermediate precision that does not enough number of digits at the left of decimal point

2015-12-16 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12373:
-
Summary: Type coercion rule of dividing two decimal values may choose an 
intermediate precision that does not enough number of digits at the left of 
decimal point   (was: Type coercion rule for dividing two decimal values may 
choose an intermediate precision that does not enough number of digits at the 
left of decimal point )

> Type coercion rule of dividing two decimal values may choose an intermediate 
> precision that does not enough number of digits at the left of decimal point 
> --
>
> Key: SPARK-12373
> URL: https://issues.apache.org/jira/browse/SPARK-12373
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>
> Looks like the {{widerDecimalType}} at 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala#L432
>  can produce something like {{(38, 38)}} when we have have two operand types 
> {{Decimal(38, 0)}} and {{Decimal(38, 38)}}. We should take a look at if there 
> is more reasonable way to handle precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12364) Add ML example for SparkR

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12364:


Assignee: Apache Spark  (was: Yanbo Liang)

> Add ML example for SparkR
> -
>
> Key: SPARK-12364
> URL: https://issues.apache.org/jira/browse/SPARK-12364
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>
> Add ML example for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060887#comment-15060887
 ] 

Joseph K. Bradley commented on SPARK-12326:
---

The plan sounds good.  The critical item is #1 of course since that will let us 
improve GBTs in spark.ml.

For #2, I'd also recommend we take this opportunity to make some of those 
helper classes private when possible (especially if they are only needed during 
training) and maybe change the APIs (especially if we can eliminate duplicate 
data stored in the final model).

Can you please make 1 subtask for each of these 4 steps? Thanks!

> Move GBT implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-12326
> URL: https://issues.apache.org/jira/browse/SPARK-12326
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Several improvements can be made to gradient boosted trees, but are not 
> possible without moving the GBT implementation to spark.ml (e.g. 
> rawPrediction column, feature importance). This Jira is for moving the 
> current GBT implementation to spark.ml, which will have roughly the following 
> steps:
> 1. Copy the implementation to spark.ml and change spark.ml classes to use 
> that implementation. Current tests will ensure that the implementations learn 
> exactly the same models. 
> 2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, 
> InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since 
> eventually all tree implementations will reside in spark.ml, the helper 
> classes should as well.
> 3. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation. The spark.ml tests will again 
> ensure that we do not change any behavior.
> 4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> Steps 2, 3, and 4 should be in separate Jiras. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12373) Type coercion rule for dividing two decimal values may choose an intermediate precision that does not enough number of digits at the left of decimal point

2015-12-16 Thread Yin Huai (JIRA)
Yin Huai created SPARK-12373:


 Summary: Type coercion rule for dividing two decimal values may 
choose an intermediate precision that does not enough number of digits at the 
left of decimal point 
 Key: SPARK-12373
 URL: https://issues.apache.org/jira/browse/SPARK-12373
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Yin Huai


Looks like the {{widerDecimalType}} at 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala#L432
 can produce something like {{(38, 38)}} when we have have two operand types 
{{Decimal(38, 0)}} and {{Decimal(38, 38)}}. We should take a look at if there 
is more reasonable way to handle precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11608) ML 1.6 QA: Programming guide update and migration guide

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-11608.
---
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10235
[https://github.com/apache/spark/pull/10235]

> ML 1.6 QA: Programming guide update and migration guide
> ---
>
> Key: SPARK-11608
> URL: https://issues.apache.org/jira/browse/SPARK-11608
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 2.0.0, 1.6.1
>
>
> Before the release, we need to update the MLlib Programming Guide.  Updates 
> will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs.
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")
> * Possibly reorganize parts of the Pipelines guide if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-12374:

Summary: Improve performance of Range APIs via adding logical/physical 
operators  (was: Improve performance of range API via adding logical/physical 
operators)

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12367) NoSuchElementException during prediction with Random Forest Regressor

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060720#comment-15060720
 ] 

Joseph K. Bradley commented on SPARK-12367:
---

This is likely caused by a feature value 1.0 which did not appear in the 
training data.  That prevents VectorIndexer from knowing about that value, so 
it does not have a corresponding index when trying to transform the test data.  
It will be handled by [SPARK-12375].

> NoSuchElementException during prediction with Random Forest Regressor
> -
>
> Key: SPARK-12367
> URL: https://issues.apache.org/jira/browse/SPARK-12367
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.5.2
>Reporter: Eugene Morozov
> Attachments: CodeThatGivesANoSuchElementException.java, 
> complete-stack-trace.log, input.gz
>
>
> I'm consistently getting "java.util.NoSuchElementException: key not found: 
> 1.0" while trying to do a prediction on a trained model.
> I use ml package - Pipeline API. The model is successfully trained, I see 
> some stats in the output: total, findSplitsBins, findBestSplits, 
> chooseSplits. I can even serialize it into a file and use afterwards, but the 
> prediction is broken somehow.
> Code, input data and stack trace attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12367) NoSuchElementException during prediction with Random Forest Regressor

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley closed SPARK-12367.
-
Resolution: Duplicate

> NoSuchElementException during prediction with Random Forest Regressor
> -
>
> Key: SPARK-12367
> URL: https://issues.apache.org/jira/browse/SPARK-12367
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.5.2
>Reporter: Eugene Morozov
> Attachments: CodeThatGivesANoSuchElementException.java, 
> complete-stack-trace.log, input.gz
>
>
> I'm consistently getting "java.util.NoSuchElementException: key not found: 
> 1.0" while trying to do a prediction on a trained model.
> I use ml package - Pipeline API. The model is successfully trained, I see 
> some stats in the output: total, findSplitsBins, findBestSplits, 
> chooseSplits. I can even serialize it into a file and use afterwards, but the 
> prediction is broken somehow.
> Code, input data and stack trace attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12375) VectorIndexer: allow unknown categories

2015-12-16 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-12375:
-

 Summary: VectorIndexer: allow unknown categories
 Key: SPARK-12375
 URL: https://issues.apache.org/jira/browse/SPARK-12375
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Joseph K. Bradley


Add option for allowing unknown categories, probably via a parameter like 
"allowUnknownCategories."
If true, then handle unknown categories during transform by assigning them to 
an extra category index.

The API should resemble the API used for StringIndexer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060721#comment-15060721
 ] 

Marcelo Vanzin commented on SPARK-12350:


I understand where the exception is coming from, I'm asking whether there's any 
actual functionality broken by this or is it just about the ugly exception 
being printed to the terminal.

It seems there's not, so it's just about silencing the exception.

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SPARK-12363) PowerIterationClustering test case failed if we deprecated KMeans.setRuns

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060865#comment-15060865
 ] 

Joseph K. Bradley commented on SPARK-12363:
---

Setting priority to Minor since we'll notice this bug when it becomes a bug.

> PowerIterationClustering test case failed if we deprecated KMeans.setRuns
> -
>
> Key: SPARK-12363
> URL: https://issues.apache.org/jira/browse/SPARK-12363
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Yanbo Liang
>Priority: Minor
>
> We plan to deprecated `runs` of KMeans, PowerIterationClustering will 
> leverage KMeans to train model.
> I removed `setRuns` used in PowerIterationClustering, but one of the test 
> cases failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12273) Spark Streaming Web UI does not list Receivers in order

2015-12-16 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-12273:
-
Affects Version/s: 1.6.0

> Spark Streaming Web UI does not list Receivers in order
> ---
>
> Key: SPARK-12273
> URL: https://issues.apache.org/jira/browse/SPARK-12273
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming, Web UI
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Liwei Lin
>Assignee: Liwei Lin
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: Spark-12273.png
>
>
> Currently the Streaming web UI does NOT list Receivers in order, while it 
> seems more convenient for the users if Receivers are listed in order.
> !Spark-12273.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12057) Prevent failure on corrupt JSON records

2015-12-16 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reassigned SPARK-12057:


Assignee: Yin Huai

> Prevent failure on corrupt JSON records
> ---
>
> Key: SPARK-12057
> URL: https://issues.apache.org/jira/browse/SPARK-12057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ian Macalinao
>Assignee: Yin Huai
>Priority: Minor
>
> Return failed record when a record cannot be parsed. Allows parsing of files 
> containing corrupt records of any form. Currently a corrupt record throws an 
> exception, causing the entire job to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060668#comment-15060668
 ] 

Marcelo Vanzin commented on SPARK-12350:


So, if I understand correctly, the issue is just the scary log message, not 
because there's anything wrong with the functionality?

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-12374:

Issue Type: Improvement  (was: Bug)

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060705#comment-15060705
 ] 

Jakob Odersky commented on SPARK-12350:
---

The end result seems to work, the console is however spammed with error 
messages.
I think it is due to a `require` that fails in 
{{core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala}}, 
line 60.
See my comment on 
https://github.com/apache/spark/commit/4a46b8859d3314b5b45a67cdc5c81fecb6e9e78c#commitcomment-15024736

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Assigned] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12374:


Assignee: Apache Spark

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12374:


Assignee: (was: Apache Spark)

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Priority: Critical
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12376) Spark Streaming Java8APISuite fails in assertOrderInvariantEquals method

2015-12-16 Thread Evan Chen (JIRA)
Evan Chen created SPARK-12376:
-

 Summary: Spark Streaming Java8APISuite fails in 
assertOrderInvariantEquals method
 Key: SPARK-12376
 URL: https://issues.apache.org/jira/browse/SPARK-12376
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.6.0
 Environment: Oracle Java 64-bit (build 1.8.0_66-b17)
Reporter: Evan Chen
Priority: Minor


org.apache.spark.streaming.Java8APISuite.java is failing due to trying to sort 
immutable list in assertOrderInvariantEquals method.

Here are the errors:

Tests run: 27, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 5.948 sec <<< 
FAILURE! - in org.apache.spark.streaming.Java8APISuite
testMap(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.217 sec  <<< 
ERROR!
java.lang.UnsupportedOperationException: null
at java.util.AbstractList.set(AbstractList.java:132)
at java.util.AbstractList$ListItr.set(AbstractList.java:426)
at java.util.List.sort(List.java:482)
at java.util.Collections.sort(Collections.java:141)
at 
org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)

testFlatMap(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.203 sec  
<<< ERROR!
java.lang.UnsupportedOperationException: null
at java.util.AbstractList.set(AbstractList.java:132)
at java.util.AbstractList$ListItr.set(AbstractList.java:426)
at java.util.List.sort(List.java:482)
at java.util.Collections.sort(Collections.java:141)
at 
org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)

testFilter(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.209 sec  
<<< ERROR!
java.lang.UnsupportedOperationException: null
at java.util.AbstractList.set(AbstractList.java:132)
at java.util.AbstractList$ListItr.set(AbstractList.java:426)
at java.util.List.sort(List.java:482)
at java.util.Collections.sort(Collections.java:141)
at 
org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)

testTransform(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.215 
sec  <<< ERROR!
java.lang.UnsupportedOperationException: null
at java.util.AbstractList.set(AbstractList.java:132)
at java.util.AbstractList$ListItr.set(AbstractList.java:426)
at java.util.List.sort(List.java:482)
at java.util.Collections.sort(Collections.java:141)
at 
org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)


Results :

Tests in error: 
  
Java8APISuite.testFilter:81->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
 » UnsupportedOperation
  
Java8APISuite.testFlatMap:360->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
 » UnsupportedOperation
  
Java8APISuite.testMap:63->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
 » UnsupportedOperation
  
Java8APISuite.testTransform:168->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
 » UnsupportedOperation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060788#comment-15060788
 ] 

Marcelo Vanzin commented on SPARK-12350:


Well, that's what the fix will be.

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12377) Wrong implementation for Row.__call__ in pyspark

2015-12-16 Thread Irakli Machabeli (JIRA)
Irakli Machabeli created SPARK-12377:


 Summary: Wrong implementation for Row.__call__ in pyspark
 Key: SPARK-12377
 URL: https://issues.apache.org/jira/browse/SPARK-12377
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Reporter: Irakli Machabeli


Current code

def __call__(self, *args):
"""create new Row object"""
return _create_row(self, args)


has to be 

def __call__(self, *args):
"""create new Row object"""
return _create_row(self.__fields__, args)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken when SPARK_HOME is set

2015-12-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12345:
--
Summary: Mesos cluster mode is broken when SPARK_HOME is set  (was: Mesos 
cluster mode is broken)

> Mesos cluster mode is broken when SPARK_HOME is set
> ---
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>Priority: Critical
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken

2015-12-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12345:
--
Summary: Mesos cluster mode is broken  (was: Mesos cluster mode is broken 
when SPARK_HOME is set)

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>Priority: Critical
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12289) Support UnsafeRow in TakeOrderedAndProject/Limit

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12289:


Assignee: Apache Spark

> Support UnsafeRow in TakeOrderedAndProject/Limit
> 
>
> Key: SPARK-12289
> URL: https://issues.apache.org/jira/browse/SPARK-12289
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12289) Support UnsafeRow in TakeOrderedAndProject/Limit

2015-12-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060287#comment-15060287
 ] 

Apache Spark commented on SPARK-12289:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/10330

> Support UnsafeRow in TakeOrderedAndProject/Limit
> 
>
> Key: SPARK-12289
> URL: https://issues.apache.org/jira/browse/SPARK-12289
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12318) Save mode in SparkR should be error by default

2015-12-16 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-12318:
--
Assignee: Jeff Zhang

> Save mode in SparkR should be error by default
> --
>
> Key: SPARK-12318
> URL: https://issues.apache.org/jira/browse/SPARK-12318
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>Priority: Minor
> Fix For: 2.0.0
>
>
> The save mode in SparkR should be consistent with that of scala api



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12345) Mesos cluster mode is broken

2015-12-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12345.
---
Resolution: Fixed
  Assignee: Luc Bourlier  (was: Apache Spark)

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Luc Bourlier
>Priority: Critical
> Fix For: 1.6.0
>
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12371) Make sure Dataset nullability conforms to its underlying logical plan

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12371:


Assignee: Apache Spark  (was: Cheng Lian)

> Make sure Dataset nullability conforms to its underlying logical plan
> -
>
> Key: SPARK-12371
> URL: https://issues.apache.org/jira/browse/SPARK-12371
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> Currently it's possible to construct a Dataset with different nullability 
> from its underlying logical plan, which should be caught during analysis 
> phase:
> {code}
> val rowRDD = sqlContext.sparkContext.parallelize(Seq(Row("hello"), Row(null)))
> val schema = StructType(Seq(StructField("_1", StringType, nullable = false)))
> val df = sqlContext.createDataFrame(rowRDD, schema)
> df.as[Tuple1[String]].collect().foreach(println)
> // Output:
> //
> //   (hello)
> //   (null)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12324) The documentation sidebar does not collapse properly

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-12324.
---
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10297
[https://github.com/apache/spark/pull/10297]

> The documentation sidebar does not collapse properly
> 
>
> Key: SPARK-12324
> URL: https://issues.apache.org/jira/browse/SPARK-12324
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.2
>Reporter: Timothy Hunter
>Assignee: Timothy Hunter
>Priority: Minor
> Fix For: 2.0.0, 1.6.1
>
> Attachments: Screen Shot 2015-12-14 at 12.29.57 PM.png
>
>
> When the browser's window is reduced horizontally, the sidebar slides under 
> the main content and does not collapse:
>  - hide the sidebar when the browser's width is not large enough
>  - add a button to show and hide the sidebar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken

2015-12-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12345:
--
Fix Version/s: 1.6.0

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>Priority: Critical
> Fix For: 1.6.0
>
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6518) Add example code and user guide for bisecting k-means

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-6518.
--
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 9952
[https://github.com/apache/spark/pull/9952]

> Add example code and user guide for bisecting k-means
> -
>
> Key: SPARK-6518
> URL: https://issues.apache.org/jira/browse/SPARK-6518
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib
>Reporter: Yu Ishikawa
>Assignee: Yu Ishikawa
> Fix For: 2.0.0, 1.6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12309) Use sqlContext from MLlibTestSparkContext for spark.ml test suites

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-12309.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10279
[https://github.com/apache/spark/pull/10279]

> Use sqlContext from MLlibTestSparkContext for spark.ml test suites
> --
>
> Key: SPARK-12309
> URL: https://issues.apache.org/jira/browse/SPARK-12309
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
> Fix For: 2.0.0
>
>
> Use sqlContext from MLlibTestSparkContext rather than creating new one for 
> spark.ml test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10951) Support private S3 repositories using spark-submit via --repositories flag

2015-12-16 Thread Jerry Lam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060451#comment-15060451
 ] 

Jerry Lam commented on SPARK-10951:
---

Any change to have this feature in 1.6? :)

> Support private S3 repositories using spark-submit via --repositories flag
> --
>
> Key: SPARK-10951
> URL: https://issues.apache.org/jira/browse/SPARK-10951
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 1.5.1
>Reporter: Jerry Lam
>
> Currently spark-submit allow users to specify remote repositories using 
> --repositories as a mean to use --packages to handle jars dependencies.
> However, the remote repositories does not include private s3 repositories 
> which require aws credentials. It would be great to include a s3 resolver to 
> handle private s3 repositories. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12215) User guide section for KMeans in spark.ml

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-12215.
---
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10244
[https://github.com/apache/spark/pull/10244]

> User guide section for KMeans in spark.ml
> -
>
> Key: SPARK-12215
> URL: https://issues.apache.org/jira/browse/SPARK-12215
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Yu Ishikawa
> Fix For: 2.0.0, 1.6.1
>
>
> [~yuu.ishik...@gmail.com] Will you have time to add a user guide section for 
> this?  Thanks in advance!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12360) Support using 64-bit long type in SparkR

2015-12-16 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060381#comment-15060381
 ] 

Shivaram Venkataraman commented on SPARK-12360:
---

The lack of 64 bit numbers is a limitation in R, but I'd like to understand the 
use-cases where this comes up before trying a complex fix. My understanding is 
that long values from JSON / HDFS / Parquet etc. will be read correctly because 
they go through the Scala layers and the problem only comes up when somebody 
does a collect / UDF ? If so I think the problem may not be that important as R 
users probably wouldn't expect long types to work on the R shell. 

Also it might lead to another solution where we don't add a dependency on 
bit64, but we check if bit64 is available and if so we avoid the truncation to 
double etc.

> Support using 64-bit long type in SparkR
> 
>
> Key: SPARK-12360
> URL: https://issues.apache.org/jira/browse/SPARK-12360
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Sun Rui
>
> R has no support for 64-bit integers. While in Scala/Java API, some methods 
> have one or more arguments of long type. Currently we support only passing an 
> integer cast from a numeric to Scala/Java side for parameters of long type of 
> such methods. This may have problem covering large data sets.
> Storing a 64-bit integer in a double obviously does not work as some 64-bit 
> integers can not be exactly represented in double format, so x and x+1 can't 
> be distinguished.
> There is a bit64 package 
> (https://cran.r-project.org/web/packages/bit64/index.html) in CRAN which 
> supports vectors of 64-bit integers. We can investigate if it can be used for 
> this purpose.
> two questions are:
> 1. Is the license acceptable?
> 2. This will have SparkR depends on a  non-base third-party package, which 
> may complicate the deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12318) Save mode in SparkR should be error by default

2015-12-16 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-12318.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10290
[https://github.com/apache/spark/pull/10290]

> Save mode in SparkR should be error by default
> --
>
> Key: SPARK-12318
> URL: https://issues.apache.org/jira/browse/SPARK-12318
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.2
>Reporter: Jeff Zhang
>Priority: Minor
> Fix For: 2.0.0
>
>
> The save mode in SparkR should be consistent with that of scala api



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8745) Remove GenerateProjection

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8745.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10316
[https://github.com/apache/spark/pull/10316]

> Remove GenerateProjection
> -
>
> Key: SPARK-8745
> URL: https://issues.apache.org/jira/browse/SPARK-8745
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> Based on discussion offline with [~marmbrus], we should remove 
> GenerateProjection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12310) Add write.json and write.parquet for SparkR

2015-12-16 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-12310.
---
   Resolution: Fixed
 Assignee: Yanbo Liang  (was: Apache Spark)
Fix Version/s: 2.0.0
   1.6.1

Resolved by https://github.com/apache/spark/pull/10281

> Add write.json and write.parquet for SparkR
> ---
>
> Key: SPARK-12310
> URL: https://issues.apache.org/jira/browse/SPARK-12310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
> Fix For: 1.6.1, 2.0.0
>
>
> Add write.json and write.parquet for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060500#comment-15060500
 ] 

Jakob Odersky edited comment on SPARK-12350 at 12/16/15 6:55 PM:
-

You're right, somewhere in the huge stack trace output I also see the dataframe 
displayed as a table
The error only occurs in the latest upstream


was (Author: jodersky):
You're right, somewhere in the huge stack trace output I also see the dataframe 
displayed as a table

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken

2015-12-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12345:
--
Target Version/s: 1.6.0  (was: 1.6.1)

> Mesos cluster mode is broken
> 
>
> Key: SPARK-12345
> URL: https://issues.apache.org/jira/browse/SPARK-12345
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>Priority: Critical
> Fix For: 1.6.0
>
>
> The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2.
> The driver is confused about where SPARK_HOME is. It resolves 
> `mesos.executor.uri` or `spark.mesos.executor.home` relative to the 
> filesystem where the driver runs, which is wrong.
> {code}
> I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
> I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave 
> 130bdc39-44e7-4256-8c22-602040d337f1-S1
> bin/spark-submit: line 27: 
> /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
>  No such file or directory
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060500#comment-15060500
 ] 

Jakob Odersky commented on SPARK-12350:
---

You're right, somewhere in the huge stack trace output I also see the dataframe 
displayed as a table

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: ML
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12324) The documentation sidebar does not collapse properly

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-12324:
--
Target Version/s: 1.6.1, 2.0.0

> The documentation sidebar does not collapse properly
> 
>
> Key: SPARK-12324
> URL: https://issues.apache.org/jira/browse/SPARK-12324
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.2
>Reporter: Timothy Hunter
>Assignee: Timothy Hunter
>Priority: Minor
> Attachments: Screen Shot 2015-12-14 at 12.29.57 PM.png
>
>
> When the browser's window is reduced horizontally, the sidebar slides under 
> the main content and does not collapse:
>  - hide the sidebar when the browser's width is not large enough
>  - add a button to show and hide the sidebar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12372) Unary operator "-" fails for MLlib vectors

2015-12-16 Thread Christos Iraklis Tsatsoulis (JIRA)
Christos Iraklis Tsatsoulis created SPARK-12372:
---

 Summary: Unary operator "-" fails for MLlib vectors
 Key: SPARK-12372
 URL: https://issues.apache.org/jira/browse/SPARK-12372
 Project: Spark
  Issue Type: Bug
  Components: MLlib, PySpark
Affects Versions: 1.5.2
Reporter: Christos Iraklis Tsatsoulis


Consider the following snippet in pyspark 1.5.2:

{code:none}
>>> from pyspark.mllib.linalg import Vectors
>>> x = Vectors.dense([0.0, 1.0, 0.0, 7.0, 0.0])
>>> x
DenseVector([0.0, 1.0, 0.0, 7.0, 0.0])
>>> -x
Traceback (most recent call last):
  File "", line 1, in 
TypeError: func() takes exactly 2 arguments (1 given)
>>> y = Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0])
>>> y
DenseVector([2.0, 0.0, 3.0, 4.0, 5.0])
>>> x-y
DenseVector([-2.0, 1.0, -3.0, 3.0, -5.0])
>>> -y+x
Traceback (most recent call last):
  File "", line 1, in 
TypeError: func() takes exactly 2 arguments (1 given)
>>> -1*x
DenseVector([-0.0, -1.0, -0.0, -7.0, -0.0])
{code}

Clearly, the unary operator {{-}} (minus) for vectors fails, giving errors for 
expressions like {{-x}} and {{-y+x}}, despite the fact that {{x-y}} behaves as 
expected.
The last operation, {{-1*x}}, although mathematically "correct", includes minus 
signs for the zero entries, which again is normally not expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9694) Add random seed Param to Scala CrossValidator

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-9694.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 9108
[https://github.com/apache/spark/pull/9108]

> Add random seed Param to Scala CrossValidator
> -
>
> Key: SPARK-9694
> URL: https://issues.apache.org/jira/browse/SPARK-9694
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12376) Spark Streaming Java8APISuite fails in assertOrderInvariantEquals method

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12376:


Assignee: (was: Apache Spark)

> Spark Streaming Java8APISuite fails in assertOrderInvariantEquals method
> 
>
> Key: SPARK-12376
> URL: https://issues.apache.org/jira/browse/SPARK-12376
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
> Environment: Oracle Java 64-bit (build 1.8.0_66-b17)
>Reporter: Evan Chen
>Priority: Minor
>
> org.apache.spark.streaming.Java8APISuite.java is failing due to trying to 
> sort immutable list in assertOrderInvariantEquals method.
> Here are the errors:
> Tests run: 27, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 5.948 sec 
> <<< FAILURE! - in org.apache.spark.streaming.Java8APISuite
> testMap(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.217 sec  
> <<< ERROR!
> java.lang.UnsupportedOperationException: null
>   at java.util.AbstractList.set(AbstractList.java:132)
>   at java.util.AbstractList$ListItr.set(AbstractList.java:426)
>   at java.util.List.sort(List.java:482)
>   at java.util.Collections.sort(Collections.java:141)
>   at 
> org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)
> testFlatMap(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.203 
> sec  <<< ERROR!
> java.lang.UnsupportedOperationException: null
>   at java.util.AbstractList.set(AbstractList.java:132)
>   at java.util.AbstractList$ListItr.set(AbstractList.java:426)
>   at java.util.List.sort(List.java:482)
>   at java.util.Collections.sort(Collections.java:141)
>   at 
> org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)
> testFilter(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.209 sec 
>  <<< ERROR!
> java.lang.UnsupportedOperationException: null
>   at java.util.AbstractList.set(AbstractList.java:132)
>   at java.util.AbstractList$ListItr.set(AbstractList.java:426)
>   at java.util.List.sort(List.java:482)
>   at java.util.Collections.sort(Collections.java:141)
>   at 
> org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)
> testTransform(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.215 
> sec  <<< ERROR!
> java.lang.UnsupportedOperationException: null
>   at java.util.AbstractList.set(AbstractList.java:132)
>   at java.util.AbstractList$ListItr.set(AbstractList.java:426)
>   at java.util.List.sort(List.java:482)
>   at java.util.Collections.sort(Collections.java:141)
>   at 
> org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)
> Results :
> Tests in error: 
>   
> Java8APISuite.testFilter:81->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
>  » UnsupportedOperation
>   
> Java8APISuite.testFlatMap:360->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
>  » UnsupportedOperation
>   
> Java8APISuite.testMap:63->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
>  » UnsupportedOperation
>   
> Java8APISuite.testTransform:168->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
>  » UnsupportedOperation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12376) Spark Streaming Java8APISuite fails in assertOrderInvariantEquals method

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12376:


Assignee: Apache Spark

> Spark Streaming Java8APISuite fails in assertOrderInvariantEquals method
> 
>
> Key: SPARK-12376
> URL: https://issues.apache.org/jira/browse/SPARK-12376
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
> Environment: Oracle Java 64-bit (build 1.8.0_66-b17)
>Reporter: Evan Chen
>Assignee: Apache Spark
>Priority: Minor
>
> org.apache.spark.streaming.Java8APISuite.java is failing due to trying to 
> sort immutable list in assertOrderInvariantEquals method.
> Here are the errors:
> Tests run: 27, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 5.948 sec 
> <<< FAILURE! - in org.apache.spark.streaming.Java8APISuite
> testMap(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.217 sec  
> <<< ERROR!
> java.lang.UnsupportedOperationException: null
>   at java.util.AbstractList.set(AbstractList.java:132)
>   at java.util.AbstractList$ListItr.set(AbstractList.java:426)
>   at java.util.List.sort(List.java:482)
>   at java.util.Collections.sort(Collections.java:141)
>   at 
> org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)
> testFlatMap(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.203 
> sec  <<< ERROR!
> java.lang.UnsupportedOperationException: null
>   at java.util.AbstractList.set(AbstractList.java:132)
>   at java.util.AbstractList$ListItr.set(AbstractList.java:426)
>   at java.util.List.sort(List.java:482)
>   at java.util.Collections.sort(Collections.java:141)
>   at 
> org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)
> testFilter(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.209 sec 
>  <<< ERROR!
> java.lang.UnsupportedOperationException: null
>   at java.util.AbstractList.set(AbstractList.java:132)
>   at java.util.AbstractList$ListItr.set(AbstractList.java:426)
>   at java.util.List.sort(List.java:482)
>   at java.util.Collections.sort(Collections.java:141)
>   at 
> org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)
> testTransform(org.apache.spark.streaming.Java8APISuite)  Time elapsed: 0.215 
> sec  <<< ERROR!
> java.lang.UnsupportedOperationException: null
>   at java.util.AbstractList.set(AbstractList.java:132)
>   at java.util.AbstractList$ListItr.set(AbstractList.java:426)
>   at java.util.List.sort(List.java:482)
>   at java.util.Collections.sort(Collections.java:141)
>   at 
> org.apache.spark.streaming.Java8APISuite.lambda$assertOrderInvariantEquals$1(Java8APISuite.java:444)
> Results :
> Tests in error: 
>   
> Java8APISuite.testFilter:81->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
>  » UnsupportedOperation
>   
> Java8APISuite.testFlatMap:360->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
>  » UnsupportedOperation
>   
> Java8APISuite.testMap:63->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
>  » UnsupportedOperation
>   
> Java8APISuite.testTransform:168->assertOrderInvariantEquals:444->lambda$assertOrderInvariantEquals$1:444
>  » UnsupportedOperation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11834) Ignore thresholds in LogisticRegression and update documentation

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-11834:
--
Target Version/s: 1.6.1, 2.0.0  (was: 1.6.0)

> Ignore thresholds in LogisticRegression and update documentation
> 
>
> Key: SPARK-11834
> URL: https://issues.apache.org/jira/browse/SPARK-11834
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Minor
>
> ml.LogisticRegression does not support multiclass yet. So we should ignore 
> `thresholds` and update the documentation. In the next release, we can do 
> SPARK-11543.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061051#comment-15061051
 ] 

Joseph K. Bradley commented on SPARK-10931:
---

I'd very strongly prefer not to modify every model.  I believe we can save a 
lot of code by using a generic, shared implementation.  Check out {{getattr}} 
here: [https://docs.python.org/2/library/functions.html]

In the wrapper.py file in spark.ml, there are some abstractions defined.  I'm 
hoping one of those can be modified to provide access to Params.

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12272) Gradient boosted trees: too slow at the first finding best siplts

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061064#comment-15061064
 ] 

Joseph K. Bradley commented on SPARK-12272:
---

First comment: I'd check the number of partitions and the Spark UI to make sure 
workers are doing equal amounts of work.

Second comment: MLlib follows the PLANET implementation, so it will have 
trouble with that many features.  There is ongoing work to overcome that issue: 
[SPARK-3717]; I hope to push that work into Spark within a couple of months.

Third comment: My understanding of xgboost is that it trains each tree on a 
single worker, using a subset of the data (only the data on that 1 worker).  
This differs from other implementations, which train each tree on all of the 
data.  This means xgboost does not have to communicate much data, but also 
means its trees cannot be as accurate individually; it's a trade-off.  There is 
a JIRA for exploring xgboost on Spark: [SPARK-8547]

I hope these 2 linked JIRAs will address your needs!

> Gradient boosted trees: too slow at the first finding best siplts
> -
>
> Key: SPARK-12272
> URL: https://issues.apache.org/jira/browse/SPARK-12272
> Project: Spark
>  Issue Type: Request
>  Components: MLlib
>Affects Versions: 1.5.2
>Reporter: Wenmin Wu
> Attachments: training-log1.png, training-log2.pnd.png, 
> training-log3.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12272) Gradient boosted trees: too slow at the first finding best siplts

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley closed SPARK-12272.
-
Resolution: Duplicate

> Gradient boosted trees: too slow at the first finding best siplts
> -
>
> Key: SPARK-12272
> URL: https://issues.apache.org/jira/browse/SPARK-12272
> Project: Spark
>  Issue Type: Request
>  Components: MLlib
>Affects Versions: 1.5.2
>Reporter: Wenmin Wu
> Attachments: training-log1.png, training-log2.pnd.png, 
> training-log3.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-16 Thread Evan Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061090#comment-15061090
 ] 

Evan Chen commented on SPARK-10931:
---

Hey Joseph,

If using the getattr method, are you suggesting fetching the parameter straight 
from the Model java object or from the Estimator and copying it into the Model 
itself?

Thanks

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12386) Setting "spark.executor.port" leads to NPE in SparkEnv

2015-12-16 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1506#comment-1506
 ] 

Shixiong Zhu commented on SPARK-12386:
--

If the user doesn't depend on the assumption that `spark.executor.port` is the 
port of Akka actor system in executor side, they can just remove the config. 
Even in 1.5, the assumption is unreliable because multiple executors may run in 
the same host.

> Setting "spark.executor.port" leads to NPE in SparkEnv
> --
>
> Key: SPARK-12386
> URL: https://issues.apache.org/jira/browse/SPARK-12386
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Critical
>
> From the list:
> {quote}
> when we set spark.executor.port in 1.6, we get thrown a NPE in 
> SparkEnv$.create(SparkEnv.scala:259).
> {quote}
> Fix is simple; probably should make it to 1.6.0 since it will affect anyone 
> using that config options, but I'll leave that to the release manager's 
> discretion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12380) MLLib should use existing SQLContext instead create new one

2015-12-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12380.

   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10338
[https://github.com/apache/spark/pull/10338]

> MLLib should use existing SQLContext instead create new one
> ---
>
> Key: SPARK-12380
> URL: https://issues.apache.org/jira/browse/SPARK-12380
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0, 1.6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12164) [SQL] Display the binary/encoded values

2015-12-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-12164.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10215
[https://github.com/apache/spark/pull/10215]

> [SQL] Display the binary/encoded values
> ---
>
> Key: SPARK-12164
> URL: https://issues.apache.org/jira/browse/SPARK-12164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> So far, we are using comma-separated decimal format to output the encoded 
> contents. This way is rare when the data is in binary. This could be a common 
> issue when we use Dataset API. 
> For example, 
> {code}
> implicit val kryoEncoder = Encoders.kryo[KryoClassData]
> val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), 
> KryoClassData("c", 3)).toDS()
> ds.show(20, false);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12164) [SQL] Display the binary/encoded values

2015-12-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-12164:
-
Assignee: Xiao Li

> [SQL] Display the binary/encoded values
> ---
>
> Key: SPARK-12164
> URL: https://issues.apache.org/jira/browse/SPARK-12164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> So far, we are using comma-separated decimal format to output the encoded 
> contents. This way is rare when the data is in binary. This could be a common 
> issue when we use Dataset API. 
> For example, 
> {code}
> implicit val kryoEncoder = Encoders.kryo[KryoClassData]
> val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), 
> KryoClassData("c", 3)).toDS()
> ds.show(20, false);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12350) VectorAssembler#transform() initially throws an exception

2015-12-16 Thread Jakob Odersky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Odersky updated SPARK-12350:
--
Component/s: (was: ML)
 Spark Shell
 Spark Core

> VectorAssembler#transform() initially throws an exception
> -
>
> Key: SPARK-12350
> URL: https://issues.apache.org/jira/browse/SPARK-12350
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
> Environment: sparkShell command from sbt
>Reporter: Jakob Odersky
>Assignee: Apache Spark
>
> Calling VectorAssembler.transform() initially throws an exception, subsequent 
> calls work.
> h3. Steps to reproduce
> In spark-shell,
> 1. Create a dummy dataframe and define an assembler
> {code}
> import org.apache.spark.ml.feature.VectorAssembler
> val df = sc.parallelize(List((1,2), (3,4))).toDF
> val assembler = new VectorAssembler().setInputCols(Array("_1", 
> "_2")).setOutputCol("features")
> {code}
> 2. Run
> {code}
> assembler.transform(df).show
> {code}
> Initially the following exception is thrown:
> {code}
> 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request 
> from /9.72.139.102:60610
> java.lang.IllegalArgumentException: requirement failed: File not found: 
> /classes/org/apache/spark/sql/catalyst/expressions/Object.class
>   at scala.Predef$.require(Predef.scala:233)
>   at 
> org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Subsequent calls work:
> {code}
> +---+---+-+
> | _1| _2| features|
> +---+---+-+
> |  1|  2|[1.0,2.0]|
> |  3|  4|[3.0,4.0]|
> +---+---+-+
> {code}
> It seems as though there is some internal state that is not initialized.
> [~iyounus] originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12381) Move decision tree helper classes from spark.mllib to spark.ml

2015-12-16 Thread Seth Hendrickson (JIRA)
Seth Hendrickson created SPARK-12381:


 Summary: Move decision tree helper classes from spark.mllib to 
spark.ml
 Key: SPARK-12381
 URL: https://issues.apache.org/jira/browse/SPARK-12381
 Project: Spark
  Issue Type: Sub-task
  Components: ML, MLlib
Reporter: Seth Hendrickson


The helper classes for decision trees and decision tree ensembles (e.g. 
Impurity, InformationGainStats, ImpurityStats, DTStatsAggregator, etc...) 
currently reside in spark.mllib, but as the algorithm implementations are moved 
to spark.ml, so should these helper classes.

We should take this opportunity to make some of those helper classes private 
when possible (especially if they are only needed during training) and maybe 
change the APIs (especially if we can eliminate duplicate data stored in the 
final model).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12382) Remove spark.mllib GBT implementation and wrap spark.ml

2015-12-16 Thread Seth Hendrickson (JIRA)
Seth Hendrickson created SPARK-12382:


 Summary: Remove spark.mllib GBT implementation and wrap spark.ml
 Key: SPARK-12382
 URL: https://issues.apache.org/jira/browse/SPARK-12382
 Project: Spark
  Issue Type: Sub-task
Reporter: Seth Hendrickson


After the GBT implementation is moved to spark.ml, we should remove the 
implementation from spark.mllib. The MLlib GBTs will then just call the 
implementation in spark.ml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9690) Add random seed Param to PySpark CrossValidator

2015-12-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-9690:
-
Assignee: Martin Menestret

> Add random seed Param to PySpark CrossValidator
> ---
>
> Key: SPARK-9690
> URL: https://issues.apache.org/jira/browse/SPARK-9690
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 1.4.1
>Reporter: Martin Menestret
>Assignee: Martin Menestret
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The fold in the ML CrossValidator depends on a rand whose seed is set to 0 
> and it leads the sql.functions rand to call sc._jvm.functions.rand() with no 
> seed.
> In order to be able to unit test a Cross Validation it would be a good idea 
> to be able to set this seed so the output of the cross validation (with a 
> featureSubsetStrategy set to "all") would always be the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12380) MLLib should use existing SQLContext instead create new one

2015-12-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060990#comment-15060990
 ] 

Apache Spark commented on SPARK-12380:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/10338

> MLLib should use existing SQLContext instead create new one
> ---
>
> Key: SPARK-12380
> URL: https://issues.apache.org/jira/browse/SPARK-12380
> Project: Spark
>  Issue Type: Bug
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12331) R^2 for regression through the origin

2015-12-16 Thread DB Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061039#comment-15061039
 ] 

DB Tsai commented on SPARK-12331:
-

+1 PR is welcome. Thanks.

> R^2 for regression through the origin
> -
>
> Key: SPARK-12331
> URL: https://issues.apache.org/jira/browse/SPARK-12331
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Imran Younus
>Priority: Minor
>
> The value of R^2 (coefficient of determination) obtained from 
> LinearRegressionModel is not consistent with R and statsmodels when the 
> fitIntercept is false i.e., regression through the origin. In this case, both 
> R and statsmodels use the definition of R^2 given by eq(4') in the following 
> review paper:
> https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf
> Here is the definition from this paper:
> R^2 = \sum(\hat( y)_i^2)/\sum(y_i^2)
> The paper also describes why this should be the case. I've double checked 
> that the value of R^2 from statsmodels and R are consistent with this 
> definition. On the other hand, scikit-learn doesn't use the above definition. 
> I would recommend using the above definition in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12320) throw exception if the number of fields does not line up for Tuple encoder

2015-12-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-12320.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10293
[https://github.com/apache/spark/pull/10293]

> throw exception if the number of fields does not line up for Tuple encoder
> --
>
> Key: SPARK-12320
> URL: https://issues.apache.org/jira/browse/SPARK-12320
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12320) throw exception if the number of fields does not line up for Tuple encoder

2015-12-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-12320:
-
Assignee: Wenchen Fan

> throw exception if the number of fields does not line up for Tuple encoder
> --
>
> Key: SPARK-12320
> URL: https://issues.apache.org/jira/browse/SPARK-12320
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12380) MLLib should use existing SQLContext instead create new one

2015-12-16 Thread Davies Liu (JIRA)
Davies Liu created SPARK-12380:
--

 Summary: MLLib should use existing SQLContext instead create new 
one
 Key: SPARK-12380
 URL: https://issues.apache.org/jira/browse/SPARK-12380
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu
Assignee: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml

2015-12-16 Thread Seth Hendrickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seth Hendrickson updated SPARK-12326:
-
Description: 
Several improvements can be made to gradient boosted trees, but are not 
possible without moving the GBT implementation to spark.ml (e.g. rawPrediction 
column, feature importance). This Jira is for moving the current GBT 
implementation to spark.ml, which will have roughly the following steps:

1. Copy the implementation to spark.ml and change spark.ml classes to use that 
implementation. Current tests will ensure that the implementations learn 
exactly the same models. 
2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, 
InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since 
eventually all tree implementations will reside in spark.ml, the helper classes 
should as well.
3. Remove the spark.mllib implementation, and make the spark.mllib APIs 
wrappers around the spark.ml implementation. The spark.ml tests will again 
ensure that we do not change any behavior.
4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
verify model equivalence.

  was:
Several improvements can be made to gradient boosted trees, but are not 
possible without moving the GBT implementation to spark.ml (e.g. rawPrediction 
column, feature importance). This Jira is for moving the current GBT 
implementation to spark.ml, which will have roughly the following steps:

1. Copy the implementation to spark.ml and change spark.ml classes to use that 
implementation. Current tests will ensure that the implementations learn 
exactly the same models. 
2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, 
InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since 
eventually all tree implementations will reside in spark.ml, the helper classes 
should as well.
3. Remove the spark.mllib implementation, and make the spark.mllib APIs 
wrappers around the spark.ml implementation. The spark.ml tests will again 
ensure that we do not change any behavior.
4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
verify model equivalence.

Steps 2, 3, and 4 should be in separate Jiras. 


> Move GBT implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-12326
> URL: https://issues.apache.org/jira/browse/SPARK-12326
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Several improvements can be made to gradient boosted trees, but are not 
> possible without moving the GBT implementation to spark.ml (e.g. 
> rawPrediction column, feature importance). This Jira is for moving the 
> current GBT implementation to spark.ml, which will have roughly the following 
> steps:
> 1. Copy the implementation to spark.ml and change spark.ml classes to use 
> that implementation. Current tests will ensure that the implementations learn 
> exactly the same models. 
> 2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, 
> InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since 
> eventually all tree implementations will reside in spark.ml, the helper 
> classes should as well.
> 3. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation. The spark.ml tests will again 
> ensure that we do not change any behavior.
> 4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12383) Move unit tests for GBT from spark.mllib to spark.ml

2015-12-16 Thread Seth Hendrickson (JIRA)
Seth Hendrickson created SPARK-12383:


 Summary: Move unit tests for GBT from spark.mllib to spark.ml
 Key: SPARK-12383
 URL: https://issues.apache.org/jira/browse/SPARK-12383
 Project: Spark
  Issue Type: Sub-task
Reporter: Seth Hendrickson


After the GBT implementation is moved from MLlib to ML, we should move the unit 
tests to ML as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12380) MLLib should use existing SQLContext instead create new one

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12380:


Assignee: Apache Spark  (was: Davies Liu)

> MLLib should use existing SQLContext instead create new one
> ---
>
> Key: SPARK-12380
> URL: https://issues.apache.org/jira/browse/SPARK-12380
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Reporter: Davies Liu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12380) MLLib should use existing SQLContext instead create new one

2015-12-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12380:


Assignee: Davies Liu  (was: Apache Spark)

> MLLib should use existing SQLContext instead create new one
> ---
>
> Key: SPARK-12380
> URL: https://issues.apache.org/jira/browse/SPARK-12380
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   >