[jira] [Updated] (SPARK-10263) Add @Since annotation to ml.param and ml.*

2015-10-26 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10263:
--
Assignee: Hiroshi Takahashi

> Add @Since annotation to ml.param and ml.*
> --
>
> Key: SPARK-10263
> URL: https://issues.apache.org/jira/browse/SPARK-10263
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML
>Reporter: Xiangrui Meng
>Assignee: Hiroshi Takahashi
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-10-26 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10654:
--
Assignee: Reza Zadeh

> Add columnSimilarities to IndexedRowMatrix
> --
>
> Key: SPARK-10654
> URL: https://issues.apache.org/jira/browse/SPARK-10654
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Reza Zadeh
>Assignee: Reza Zadeh
> Fix For: 1.6.0
>
>
> Add columnSimilarities to IndexedRowMatrix.
> In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
> SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-10-26 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-10654.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 8792
[https://github.com/apache/spark/pull/8792]

> Add columnSimilarities to IndexedRowMatrix
> --
>
> Key: SPARK-10654
> URL: https://issues.apache.org/jira/browse/SPARK-10654
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Reza Zadeh
> Fix For: 1.6.0
>
>
> Add columnSimilarities to IndexedRowMatrix.
> In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
> SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7751) Add @Since annotation to stable and experimental methods in MLlib

2015-10-26 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975735#comment-14975735
 ] 

Xiangrui Meng commented on SPARK-7751:
--

Some line changes actually do not change API. For example:

In 1.2:

{code}
class Pipeline {
}
{code}

Then in 1.4

{code}
class Pipeline(val uid: String) {

  def this() = this("...")
}
{code}

The default constructor is since 1.2 instead of 1.4.

> Add @Since annotation to stable and experimental methods in MLlib
> -
>
> Key: SPARK-7751
> URL: https://issues.apache.org/jira/browse/SPARK-7751
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, MLlib
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Minor
>  Labels: starter
>
> This is useful to check whether a feature exists in some version of Spark. 
> This is an umbrella JIRA to track the progress. We want to have -@since tag- 
> @Since annotation for both stable (those without any 
> Experimental/DeveloperApi/AlphaComponent annotations) and experimental 
> methods in MLlib:
> (Do NOT tag private or package private classes or methods, nor local 
> variables and methods.)
> * an example PR for Scala: https://github.com/apache/spark/pull/8309
> We need to dig the history of git commit to figure out what was the Spark 
> version when a method was first introduced. Take `NaiveBayes.setModelType` as 
> an example. We can grep `def setModelType` at different version git tags.
> {code}
> meng@xm:~/src/spark
> $ git show 
> v1.3.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
>  | grep "def setModelType"
> meng@xm:~/src/spark
> $ git show 
> v1.4.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
>  | grep "def setModelType"
>   def setModelType(modelType: String): NaiveBayes = {
> {code}
> If there are better ways, please let us know.
> We cannot add all -@since tags- @Since annotation in a single PR, which is 
> hard to review. So we made some subtasks for each package, for example 
> `org.apache.spark.classification`. Feel free to add more sub-tasks for Python 
> and the `spark.ml` package.
> Plan:
> 1. In 1.5, we try to add @Since annotation to all stable/experimental methods 
> under `spark.mllib`.
> 2. Starting from 1.6, we require @Since annotation in all new PRs.
> 3. In 1.6, we try to add @SInce annotation to all stable/experimental methods 
> under `spark.ml`, `pyspark.mllib`, and `pyspark.ml`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11340) Support setting driver properties when starting Spark from R programmatically or from RStudio

2015-10-26 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-11340:


 Summary: Support setting driver properties when starting Spark 
from R programmatically or from RStudio
 Key: SPARK-11340
 URL: https://issues.apache.org/jira/browse/SPARK-11340
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.5.1
Reporter: Felix Cheung
Priority: Minor


Currently when sparkR.init() is called in 'client' mode, it launches the JVM 
backend but driver properties (like driver-memory) are not passed or settable 
by the user calling sparkR.init().

[~sunrui][~shivaram] and I discussed this offline and think we should support 
this.

This is the original thread:
>> From: rui@intel.com
>> To: dirceu.semigh...@gmail.com
>> CC: u...@spark.apache.org
>> Subject: RE: How to set memory for SparkR with master="local[*]"
>> Date: Mon, 26 Oct 2015 02:24:00 +
>>
>> As documented in
>> http://spark.apache.org/docs/latest/configuration.html#available-prop
>> e
>> rties,
>>
>> Note for “spark.driver.memory”:
>>
>> Note: In client mode, this config must not be set through the 
>> SparkConf directly in your application, because the driver JVM has 
>> already started at that point. Instead, please set this through the 
>> --driver-memory command line option or in your default properties file.
>>
>>
>>
>> If you are to start a SparkR shell using bin/sparkR, then you can use 
>> bin/sparkR –driver-memory. You have no chance to set the driver 
>> memory size after the R shell has been launched via bin/sparkR.
>>
>>
>>
>> Buf if you are to start a SparkR shell manually without using 
>> bin/sparkR (for example, in Rstudio), you can:
>>
>> library(SparkR)
>>
>> Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g
>> sparkr-shell")
>>
>> sc <- sparkR.init()
>>
>>
>>
>> From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com]
>> Sent: Friday, October 23, 2015 7:53 PM
>> Cc: user
>> Subject: Re: How to set memory for SparkR with master="local[*]"
>>
>>
>>
>> Hi Matej,
>>
>> I'm also using this and I'm having the same behavior here, my driver 
>> has only 530mb which is the default value.
>>
>>
>>
>> Maybe this is a bug.
>>
>>
>>
>> 2015-10-23 9:43 GMT-02:00 Matej Holec :
>>
>> Hello!
>>
>> How to adjust the memory settings properly for SparkR with master="local[*]"
>> in R?
>>
>>
>> *When running from  R -- SparkR doesn't accept memory settings :(*
>>
>> I use the following commands:
>>
>> R>  library(SparkR)
>> R>  sc <- sparkR.init(master = "local[*]", sparkEnvir =
>> list(spark.driver.memory = "5g"))
>>
>> Despite the variable spark.driver.memory is correctly set (checked in 
>> http://node:4040/environment/), the driver has only the default 
>> amount of memory allocated (Storage Memory 530.3 MB).
>>
>> *But when running from  spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK*
>>
>> The following command:
>>
>> ]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g
>>
>> creates SparkR session with properly adjustest driver memory (Storage 
>> Memory
>> 2.6 GB).
>>
>>
>> Any suggestion?
>>
>> Thanks
>> Matej
>>
>>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11340) Support setting driver properties when starting Spark from R programmatically or from RStudio

2015-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975742#comment-14975742
 ] 

Apache Spark commented on SPARK-11340:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/9290

> Support setting driver properties when starting Spark from R programmatically 
> or from RStudio
> -
>
> Key: SPARK-11340
> URL: https://issues.apache.org/jira/browse/SPARK-11340
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Felix Cheung
>Priority: Minor
>
> Currently when sparkR.init() is called in 'client' mode, it launches the JVM 
> backend but driver properties (like driver-memory) are not passed or settable 
> by the user calling sparkR.init().
> [~sunrui][~shivaram] and I discussed this offline and think we should support 
> this.
> This is the original thread:
> >> From: rui@intel.com
> >> To: dirceu.semigh...@gmail.com
> >> CC: u...@spark.apache.org
> >> Subject: RE: How to set memory for SparkR with master="local[*]"
> >> Date: Mon, 26 Oct 2015 02:24:00 +
> >>
> >> As documented in
> >> http://spark.apache.org/docs/latest/configuration.html#available-prop
> >> e
> >> rties,
> >>
> >> Note for “spark.driver.memory”:
> >>
> >> Note: In client mode, this config must not be set through the 
> >> SparkConf directly in your application, because the driver JVM has 
> >> already started at that point. Instead, please set this through the 
> >> --driver-memory command line option or in your default properties file.
> >>
> >>
> >>
> >> If you are to start a SparkR shell using bin/sparkR, then you can use 
> >> bin/sparkR –driver-memory. You have no chance to set the driver 
> >> memory size after the R shell has been launched via bin/sparkR.
> >>
> >>
> >>
> >> Buf if you are to start a SparkR shell manually without using 
> >> bin/sparkR (for example, in Rstudio), you can:
> >>
> >> library(SparkR)
> >>
> >> Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g
> >> sparkr-shell")
> >>
> >> sc <- sparkR.init()
> >>
> >>
> >>
> >> From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com]
> >> Sent: Friday, October 23, 2015 7:53 PM
> >> Cc: user
> >> Subject: Re: How to set memory for SparkR with master="local[*]"
> >>
> >>
> >>
> >> Hi Matej,
> >>
> >> I'm also using this and I'm having the same behavior here, my driver 
> >> has only 530mb which is the default value.
> >>
> >>
> >>
> >> Maybe this is a bug.
> >>
> >>
> >>
> >> 2015-10-23 9:43 GMT-02:00 Matej Holec :
> >>
> >> Hello!
> >>
> >> How to adjust the memory settings properly for SparkR with 
> >> master="local[*]"
> >> in R?
> >>
> >>
> >> *When running from  R -- SparkR doesn't accept memory settings :(*
> >>
> >> I use the following commands:
> >>
> >> R>  library(SparkR)
> >> R>  sc <- sparkR.init(master = "local[*]", sparkEnvir =
> >> list(spark.driver.memory = "5g"))
> >>
> >> Despite the variable spark.driver.memory is correctly set (checked in 
> >> http://node:4040/environment/), the driver has only the default 
> >> amount of memory allocated (Storage Memory 530.3 MB).
> >>
> >> *But when running from  spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK*
> >>
> >> The following command:
> >>
> >> ]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g
> >>
> >> creates SparkR session with properly adjustest driver memory (Storage 
> >> Memory
> >> 2.6 GB).
> >>
> >>
> >> Any suggestion?
> >>
> >> Thanks
> >> Matej
> >>
> >>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11340) Support setting driver properties when starting Spark from R programmatically or from RStudio

2015-10-26 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11340:


Assignee: (was: Apache Spark)

> Support setting driver properties when starting Spark from R programmatically 
> or from RStudio
> -
>
> Key: SPARK-11340
> URL: https://issues.apache.org/jira/browse/SPARK-11340
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Felix Cheung
>Priority: Minor
>
> Currently when sparkR.init() is called in 'client' mode, it launches the JVM 
> backend but driver properties (like driver-memory) are not passed or settable 
> by the user calling sparkR.init().
> [~sunrui][~shivaram] and I discussed this offline and think we should support 
> this.
> This is the original thread:
> >> From: rui@intel.com
> >> To: dirceu.semigh...@gmail.com
> >> CC: u...@spark.apache.org
> >> Subject: RE: How to set memory for SparkR with master="local[*]"
> >> Date: Mon, 26 Oct 2015 02:24:00 +
> >>
> >> As documented in
> >> http://spark.apache.org/docs/latest/configuration.html#available-prop
> >> e
> >> rties,
> >>
> >> Note for “spark.driver.memory”:
> >>
> >> Note: In client mode, this config must not be set through the 
> >> SparkConf directly in your application, because the driver JVM has 
> >> already started at that point. Instead, please set this through the 
> >> --driver-memory command line option or in your default properties file.
> >>
> >>
> >>
> >> If you are to start a SparkR shell using bin/sparkR, then you can use 
> >> bin/sparkR –driver-memory. You have no chance to set the driver 
> >> memory size after the R shell has been launched via bin/sparkR.
> >>
> >>
> >>
> >> Buf if you are to start a SparkR shell manually without using 
> >> bin/sparkR (for example, in Rstudio), you can:
> >>
> >> library(SparkR)
> >>
> >> Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g
> >> sparkr-shell")
> >>
> >> sc <- sparkR.init()
> >>
> >>
> >>
> >> From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com]
> >> Sent: Friday, October 23, 2015 7:53 PM
> >> Cc: user
> >> Subject: Re: How to set memory for SparkR with master="local[*]"
> >>
> >>
> >>
> >> Hi Matej,
> >>
> >> I'm also using this and I'm having the same behavior here, my driver 
> >> has only 530mb which is the default value.
> >>
> >>
> >>
> >> Maybe this is a bug.
> >>
> >>
> >>
> >> 2015-10-23 9:43 GMT-02:00 Matej Holec :
> >>
> >> Hello!
> >>
> >> How to adjust the memory settings properly for SparkR with 
> >> master="local[*]"
> >> in R?
> >>
> >>
> >> *When running from  R -- SparkR doesn't accept memory settings :(*
> >>
> >> I use the following commands:
> >>
> >> R>  library(SparkR)
> >> R>  sc <- sparkR.init(master = "local[*]", sparkEnvir =
> >> list(spark.driver.memory = "5g"))
> >>
> >> Despite the variable spark.driver.memory is correctly set (checked in 
> >> http://node:4040/environment/), the driver has only the default 
> >> amount of memory allocated (Storage Memory 530.3 MB).
> >>
> >> *But when running from  spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK*
> >>
> >> The following command:
> >>
> >> ]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g
> >>
> >> creates SparkR session with properly adjustest driver memory (Storage 
> >> Memory
> >> 2.6 GB).
> >>
> >>
> >> Any suggestion?
> >>
> >> Thanks
> >> Matej
> >>
> >>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11340) Support setting driver properties when starting Spark from R programmatically or from RStudio

2015-10-26 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11340:


Assignee: Apache Spark

> Support setting driver properties when starting Spark from R programmatically 
> or from RStudio
> -
>
> Key: SPARK-11340
> URL: https://issues.apache.org/jira/browse/SPARK-11340
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Felix Cheung
>Assignee: Apache Spark
>Priority: Minor
>
> Currently when sparkR.init() is called in 'client' mode, it launches the JVM 
> backend but driver properties (like driver-memory) are not passed or settable 
> by the user calling sparkR.init().
> [~sunrui][~shivaram] and I discussed this offline and think we should support 
> this.
> This is the original thread:
> >> From: rui@intel.com
> >> To: dirceu.semigh...@gmail.com
> >> CC: u...@spark.apache.org
> >> Subject: RE: How to set memory for SparkR with master="local[*]"
> >> Date: Mon, 26 Oct 2015 02:24:00 +
> >>
> >> As documented in
> >> http://spark.apache.org/docs/latest/configuration.html#available-prop
> >> e
> >> rties,
> >>
> >> Note for “spark.driver.memory”:
> >>
> >> Note: In client mode, this config must not be set through the 
> >> SparkConf directly in your application, because the driver JVM has 
> >> already started at that point. Instead, please set this through the 
> >> --driver-memory command line option or in your default properties file.
> >>
> >>
> >>
> >> If you are to start a SparkR shell using bin/sparkR, then you can use 
> >> bin/sparkR –driver-memory. You have no chance to set the driver 
> >> memory size after the R shell has been launched via bin/sparkR.
> >>
> >>
> >>
> >> Buf if you are to start a SparkR shell manually without using 
> >> bin/sparkR (for example, in Rstudio), you can:
> >>
> >> library(SparkR)
> >>
> >> Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g
> >> sparkr-shell")
> >>
> >> sc <- sparkR.init()
> >>
> >>
> >>
> >> From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com]
> >> Sent: Friday, October 23, 2015 7:53 PM
> >> Cc: user
> >> Subject: Re: How to set memory for SparkR with master="local[*]"
> >>
> >>
> >>
> >> Hi Matej,
> >>
> >> I'm also using this and I'm having the same behavior here, my driver 
> >> has only 530mb which is the default value.
> >>
> >>
> >>
> >> Maybe this is a bug.
> >>
> >>
> >>
> >> 2015-10-23 9:43 GMT-02:00 Matej Holec :
> >>
> >> Hello!
> >>
> >> How to adjust the memory settings properly for SparkR with 
> >> master="local[*]"
> >> in R?
> >>
> >>
> >> *When running from  R -- SparkR doesn't accept memory settings :(*
> >>
> >> I use the following commands:
> >>
> >> R>  library(SparkR)
> >> R>  sc <- sparkR.init(master = "local[*]", sparkEnvir =
> >> list(spark.driver.memory = "5g"))
> >>
> >> Despite the variable spark.driver.memory is correctly set (checked in 
> >> http://node:4040/environment/), the driver has only the default 
> >> amount of memory allocated (Storage Memory 530.3 MB).
> >>
> >> *But when running from  spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK*
> >>
> >> The following command:
> >>
> >> ]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g
> >>
> >> creates SparkR session with properly adjustest driver memory (Storage 
> >> Memory
> >> 2.6 GB).
> >>
> >>
> >> Any suggestion?
> >>
> >> Thanks
> >> Matej
> >>
> >>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11277) sort_array throws exception scala.MatchError

2015-10-26 Thread Jia Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Li updated SPARK-11277:
---
Description: 
I was trying out the sort_array function then hit this exception. 

I looked into the spark source code. I found the root cause is that sort_array 
does not check for an array of NULLs. It's not meaningful to sort an array of 
entirely NULLs anyway. Similar issue exists with an array of struct type. 
I already have a fix for this issue and I'm going to create a pull request for 
it. 

scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show()
scala.MatchError: ArrayType(NullType,true) (of class 
org.apache.spark.sql.types.ArrayType)
at 
org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68)
at 
org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67)
at 
org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)


  was:
I was trying out the sort_array function then hit this exception. 

I looked into the spark source code. I found the root cause is that sort_array 
does not check for an array of NULLs. It's not meaningful to sort an array of 
entirely NULLs anyway.
I already have a fix for this issue and I'm going to create a pull request for 
it. 

scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show()
scala.MatchError: ArrayType(NullType,true) (of class 
org.apache.spark.sql.types.ArrayType)
at 
org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68)
at 
org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67)
at 
org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440)
at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)



> sort_array throws exception scala.MatchError
> 
>
> Key: SPARK-11277
> URL: https://issues.apache.org/jira/browse/SPARK-11277
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Linux
>Reporter: Jia Li
>
> I was trying out the sort_array function then hit this exception. 
> I looked into the spark source code. I found the root cause is that 
> sort_array does not check for an array of NULLs. It's not meaningful to sort 
> an array of entirely NULLs anyway. Similar issue exists with an array of 
> struct type. 
> I already have a fix for this issue and I'm going to create a pull request 
> for it. 
> scala> sqlContext.sql("selec

[jira] [Commented] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)

2015-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975743#comment-14975743
 ] 

Apache Spark commented on SPARK-11338:
--

User 'ckadner' has created a pull request for this issue:
https://github.com/apache/spark/pull/9291

> HistoryPage not multi-tenancy enabled (app links not prefixed with 
> APPLICATION_WEB_PROXY_BASE)
> --
>
> Key: SPARK-11338
> URL: https://issues.apache.org/jira/browse/SPARK-11338
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Christian Kadner
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
> APPLICATION_WEB_PROXY_BASE=}}). This makes it 
> impossible/unpractical to expose the *History Server* in a multi-tenancy 
> environment where each Spark service instance has one history server behind a 
> multi-tenant enabled proxy server.  All other Spark web UI pages are 
> correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.
> *Repro steps:*\\
> # Configure history log collection:
> {code:title=conf/spark-defaults.conf|borderStyle=solid}
> spark.eventLog.enabled true
> spark.eventLog.dir logs/history
> spark.history.fs.logDirectory  logs/history
> {code}
> ...create the logs folders:
> {code}
> $ mkdir -p logs/history
> {code}
> # Start the Spark shell and run the word count example:
> {code:java|borderStyle=solid}
> $ bin/spark-shell
> ...
> scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
> 1)).reduceByKey(_ + _).collect
> scala> sc.stop
> {code}
> # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}):
> {code}
> $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
> {code}
> # Start the history server:
> {code}
> $  sbin/start-history-server.sh
> {code}
> # Bring up the History Server web UI at {{localhost:18080}} and view the 
> application link in the HTML source text:
> {code:xml|borderColor=#c00}
> ...
> App IDApp 
> Name...
>   
> 
>href="/history/local-1445896187531">local-1445896187531Spark 
> shell
>   ...
> {code}
> *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have 
> the prefix {{/testwebuiproxy/..}} \\ \\
> All site-relative links (URL starting with {{"/"}}) should have been 
> prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ...
> {code:xml|borderColor=#0c0}
> ...
> App IDApp 
> Name...
>   
> 
>href="/testwebuiproxy/../history/local-1445896187531">local-1445896187531Spark
>  shell
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)

2015-10-26 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11338:


Assignee: (was: Apache Spark)

> HistoryPage not multi-tenancy enabled (app links not prefixed with 
> APPLICATION_WEB_PROXY_BASE)
> --
>
> Key: SPARK-11338
> URL: https://issues.apache.org/jira/browse/SPARK-11338
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Christian Kadner
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
> APPLICATION_WEB_PROXY_BASE=}}). This makes it 
> impossible/unpractical to expose the *History Server* in a multi-tenancy 
> environment where each Spark service instance has one history server behind a 
> multi-tenant enabled proxy server.  All other Spark web UI pages are 
> correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.
> *Repro steps:*\\
> # Configure history log collection:
> {code:title=conf/spark-defaults.conf|borderStyle=solid}
> spark.eventLog.enabled true
> spark.eventLog.dir logs/history
> spark.history.fs.logDirectory  logs/history
> {code}
> ...create the logs folders:
> {code}
> $ mkdir -p logs/history
> {code}
> # Start the Spark shell and run the word count example:
> {code:java|borderStyle=solid}
> $ bin/spark-shell
> ...
> scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
> 1)).reduceByKey(_ + _).collect
> scala> sc.stop
> {code}
> # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}):
> {code}
> $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
> {code}
> # Start the history server:
> {code}
> $  sbin/start-history-server.sh
> {code}
> # Bring up the History Server web UI at {{localhost:18080}} and view the 
> application link in the HTML source text:
> {code:xml|borderColor=#c00}
> ...
> App IDApp 
> Name...
>   
> 
>href="/history/local-1445896187531">local-1445896187531Spark 
> shell
>   ...
> {code}
> *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have 
> the prefix {{/testwebuiproxy/..}} \\ \\
> All site-relative links (URL starting with {{"/"}}) should have been 
> prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ...
> {code:xml|borderColor=#0c0}
> ...
> App IDApp 
> Name...
>   
> 
>href="/testwebuiproxy/../history/local-1445896187531">local-1445896187531Spark
>  shell
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)

2015-10-26 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11338:


Assignee: Apache Spark

> HistoryPage not multi-tenancy enabled (app links not prefixed with 
> APPLICATION_WEB_PROXY_BASE)
> --
>
> Key: SPARK-11338
> URL: https://issues.apache.org/jira/browse/SPARK-11338
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Christian Kadner
>Assignee: Apache Spark
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
> APPLICATION_WEB_PROXY_BASE=}}). This makes it 
> impossible/unpractical to expose the *History Server* in a multi-tenancy 
> environment where each Spark service instance has one history server behind a 
> multi-tenant enabled proxy server.  All other Spark web UI pages are 
> correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.
> *Repro steps:*\\
> # Configure history log collection:
> {code:title=conf/spark-defaults.conf|borderStyle=solid}
> spark.eventLog.enabled true
> spark.eventLog.dir logs/history
> spark.history.fs.logDirectory  logs/history
> {code}
> ...create the logs folders:
> {code}
> $ mkdir -p logs/history
> {code}
> # Start the Spark shell and run the word count example:
> {code:java|borderStyle=solid}
> $ bin/spark-shell
> ...
> scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
> 1)).reduceByKey(_ + _).collect
> scala> sc.stop
> {code}
> # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}):
> {code}
> $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
> {code}
> # Start the history server:
> {code}
> $  sbin/start-history-server.sh
> {code}
> # Bring up the History Server web UI at {{localhost:18080}} and view the 
> application link in the HTML source text:
> {code:xml|borderColor=#c00}
> ...
> App IDApp 
> Name...
>   
> 
>href="/history/local-1445896187531">local-1445896187531Spark 
> shell
>   ...
> {code}
> *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have 
> the prefix {{/testwebuiproxy/..}} \\ \\
> All site-relative links (URL starting with {{"/"}}) should have been 
> prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ...
> {code:xml|borderColor=#0c0}
> ...
> App IDApp 
> Name...
>   
> 
>href="/testwebuiproxy/../history/local-1445896187531">local-1445896187531Spark
>  shell
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11341) Given non-zero ordinal toRow in the encoders of primitive types will cause problem

2015-10-26 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-11341:
---

 Summary: Given non-zero ordinal toRow in the encoders of primitive 
types will cause problem
 Key: SPARK-11341
 URL: https://issues.apache.org/jira/browse/SPARK-11341
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Liang-Chi Hsieh


The toRow in LongEncoder, IntEncoder writes given ordinal of an unsafe row with 
only one field. Since the ordinal is parametric. If given non-zero ordinal, it 
will cause problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11341) Given non-zero ordinal toRow in the encoders of primitive types will cause problem

2015-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975749#comment-14975749
 ] 

Apache Spark commented on SPARK-11341:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/9292

> Given non-zero ordinal toRow in the encoders of primitive types will cause 
> problem
> --
>
> Key: SPARK-11341
> URL: https://issues.apache.org/jira/browse/SPARK-11341
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>
> The toRow in LongEncoder, IntEncoder writes given ordinal of an unsafe row 
> with only one field. Since the ordinal is parametric. If given non-zero 
> ordinal, it will cause problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11341) Given non-zero ordinal toRow in the encoders of primitive types will cause problem

2015-10-26 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11341:


Assignee: Apache Spark

> Given non-zero ordinal toRow in the encoders of primitive types will cause 
> problem
> --
>
> Key: SPARK-11341
> URL: https://issues.apache.org/jira/browse/SPARK-11341
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> The toRow in LongEncoder, IntEncoder writes given ordinal of an unsafe row 
> with only one field. Since the ordinal is parametric. If given non-zero 
> ordinal, it will cause problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11341) Given non-zero ordinal toRow in the encoders of primitive types will cause problem

2015-10-26 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11341:


Assignee: (was: Apache Spark)

> Given non-zero ordinal toRow in the encoders of primitive types will cause 
> problem
> --
>
> Key: SPARK-11341
> URL: https://issues.apache.org/jira/browse/SPARK-11341
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>
> The toRow in LongEncoder, IntEncoder writes given ordinal of an unsafe row 
> with only one field. Since the ordinal is parametric. If given non-zero 
> ordinal, it will cause problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)

2015-10-26 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-11338:
-
Description: 
Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
APPLICATION_WEB_PROXY_BASE=}}). This makes it 
impossible/unpractical to expose the *History Server* in a multi-tenancy 
environment where each Spark service instance has one history server behind a 
multi-tenant enabled proxy server.  All other Spark web UI pages are correctly 
prefixed when the {{APPLICATION_WEB_PROXY_BASE}} environment variable is set.

*Repro steps:*\\
# Configure history log collection:
{code:title=conf/spark-defaults.conf|borderStyle=solid}
spark.eventLog.enabled true
spark.eventLog.dir logs/history
spark.history.fs.logDirectory  logs/history
{code}
...create the logs folders:
{code}
$ mkdir -p logs/history
{code}
# Start the Spark shell and run the word count example:
{code:java|borderStyle=solid}
$ bin/spark-shell
...
scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
1)).reduceByKey(_ + _).collect
scala> sc.stop
{code}
# Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}):
{code}
$ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
{code}
# Start the history server:
{code}
$  sbin/start-history-server.sh
{code}
# Bring up the History Server web UI at {{localhost:18080}} and view the 
application link in the HTML source text:
{code:xml|borderColor=#c00}
...
App IDApp 
Name...
  

  local-1445896187531Spark 
shell
  ...
{code}
*Notice*, application link "{{/history/local-1445896187531}}" does _not_ have 
the prefix {{/testwebuiproxy/..}} \\ \\
All site-relative links (URL starting with {{"/"}}) should have been prepended 
with the uiRoot prefix {{/testwebuiproxy/..}} like this ...
{code:xml|borderColor=#0c0}
...
App IDApp 
Name...
  

  local-1445896187531Spark
 shell
  ...
{code}

  was:
Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
APPLICATION_WEB_PROXY_BASE=}}). This makes it 
impossible/unpractical to expose the *History Server* in a multi-tenancy 
environment where each Spark service instance has one history server behind a 
multi-tenant enabled proxy server.  All other Spark web UI pages are correctly 
prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set.

*Repro steps:*\\
# Configure history log collection:
{code:title=conf/spark-defaults.conf|borderStyle=solid}
spark.eventLog.enabled true
spark.eventLog.dir logs/history
spark.history.fs.logDirectory  logs/history
{code}
...create the logs folders:
{code}
$ mkdir -p logs/history
{code}
# Start the Spark shell and run the word count example:
{code:java|borderStyle=solid}
$ bin/spark-shell
...
scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 
1)).reduceByKey(_ + _).collect
scala> sc.stop
{code}
# Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}):
{code}
$ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/..
{code}
# Start the history server:
{code}
$  sbin/start-history-server.sh
{code}
# Bring up the History Server web UI at {{localhost:18080}} and view the 
application link in the HTML source text:
{code:xml|borderColor=#c00}
...
App IDApp 
Name...
  

  local-1445896187531Spark 
shell
  ...
{code}
*Notice*, application link "{{/history/local-1445896187531}}" does _not_ have 
the prefix {{/testwebuiproxy/..}} \\ \\
All site-relative links (URL starting with {{"/"}}) should have been prepended 
with the uiRoot prefix {{/testwebuiproxy/..}} like this ...
{code:xml|borderColor=#0c0}
...
App IDApp 
Name...
  

  local-1445896187531Spark
 shell
  ...
{code}


> HistoryPage not multi-tenancy enabled (app links not prefixed with 
> APPLICATION_WEB_PROXY_BASE)
> --
>
> Key: SPARK-11338
> URL: https://issues.apache.org/jira/browse/SPARK-11338
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Christian Kadner
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export 
> APPLICATION_WEB_PROXY_BASE=}}). This makes it 
> impossible/unpractical to expose the *History Server* in a multi-tenancy 
> environment where each Spark service instance has one history server behind a 
> multi-tenant enabled proxy server.  All other Spark web UI pages are 
> correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} environment 
> variable is set.
> *Repro steps:*\\
> # Configure history log collection:
> {code:title=conf/spark-defaults.conf|borderStyle=solid}
> spark.eventLog.enabled true
> spark.eventLog.dir logs/history
> spark.history

[jira] [Resolved] (SPARK-11297) code example generated by include_example is not exactly the same with {% highlight %}

2015-10-26 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-11297.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 9265
[https://github.com/apache/spark/pull/9265]

> code example generated by include_example is not exactly the same with {% 
> highlight %}
> --
>
> Key: SPARK-11297
> URL: https://issues.apache.org/jira/browse/SPARK-11297
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, MLlib
>Reporter: Xusen Yin
>Assignee: Xusen Yin
> Fix For: 1.6.0
>
>
> Code example generated by include_example is a little different with previous 
> {% highlight %} results, which causes a bigger font size of code examples. We 
> need to substitute "" to "", and add new code 
> tags to make it looks the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3