[jira] [Updated] (SPARK-10263) Add @Since annotation to ml.param and ml.*
[ https://issues.apache.org/jira/browse/SPARK-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10263: -- Assignee: Hiroshi Takahashi > Add @Since annotation to ml.param and ml.* > -- > > Key: SPARK-10263 > URL: https://issues.apache.org/jira/browse/SPARK-10263 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML >Reporter: Xiangrui Meng >Assignee: Hiroshi Takahashi >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10654: -- Assignee: Reza Zadeh > Add columnSimilarities to IndexedRowMatrix > -- > > Key: SPARK-10654 > URL: https://issues.apache.org/jira/browse/SPARK-10654 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Reza Zadeh >Assignee: Reza Zadeh > Fix For: 1.6.0 > > > Add columnSimilarities to IndexedRowMatrix. > In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by > SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-10654. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8792 [https://github.com/apache/spark/pull/8792] > Add columnSimilarities to IndexedRowMatrix > -- > > Key: SPARK-10654 > URL: https://issues.apache.org/jira/browse/SPARK-10654 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Reza Zadeh > Fix For: 1.6.0 > > > Add columnSimilarities to IndexedRowMatrix. > In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by > SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7751) Add @Since annotation to stable and experimental methods in MLlib
[ https://issues.apache.org/jira/browse/SPARK-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975735#comment-14975735 ] Xiangrui Meng commented on SPARK-7751: -- Some line changes actually do not change API. For example: In 1.2: {code} class Pipeline { } {code} Then in 1.4 {code} class Pipeline(val uid: String) { def this() = this("...") } {code} The default constructor is since 1.2 instead of 1.4. > Add @Since annotation to stable and experimental methods in MLlib > - > > Key: SPARK-7751 > URL: https://issues.apache.org/jira/browse/SPARK-7751 > Project: Spark > Issue Type: Umbrella > Components: Documentation, MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Minor > Labels: starter > > This is useful to check whether a feature exists in some version of Spark. > This is an umbrella JIRA to track the progress. We want to have -@since tag- > @Since annotation for both stable (those without any > Experimental/DeveloperApi/AlphaComponent annotations) and experimental > methods in MLlib: > (Do NOT tag private or package private classes or methods, nor local > variables and methods.) > * an example PR for Scala: https://github.com/apache/spark/pull/8309 > We need to dig the history of git commit to figure out what was the Spark > version when a method was first introduced. Take `NaiveBayes.setModelType` as > an example. We can grep `def setModelType` at different version git tags. > {code} > meng@xm:~/src/spark > $ git show > v1.3.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala > | grep "def setModelType" > meng@xm:~/src/spark > $ git show > v1.4.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala > | grep "def setModelType" > def setModelType(modelType: String): NaiveBayes = { > {code} > If there are better ways, please let us know. > We cannot add all -@since tags- @Since annotation in a single PR, which is > hard to review. So we made some subtasks for each package, for example > `org.apache.spark.classification`. Feel free to add more sub-tasks for Python > and the `spark.ml` package. > Plan: > 1. In 1.5, we try to add @Since annotation to all stable/experimental methods > under `spark.mllib`. > 2. Starting from 1.6, we require @Since annotation in all new PRs. > 3. In 1.6, we try to add @SInce annotation to all stable/experimental methods > under `spark.ml`, `pyspark.mllib`, and `pyspark.ml`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11340) Support setting driver properties when starting Spark from R programmatically or from RStudio
Felix Cheung created SPARK-11340: Summary: Support setting driver properties when starting Spark from R programmatically or from RStudio Key: SPARK-11340 URL: https://issues.apache.org/jira/browse/SPARK-11340 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 1.5.1 Reporter: Felix Cheung Priority: Minor Currently when sparkR.init() is called in 'client' mode, it launches the JVM backend but driver properties (like driver-memory) are not passed or settable by the user calling sparkR.init(). [~sunrui][~shivaram] and I discussed this offline and think we should support this. This is the original thread: >> From: rui@intel.com >> To: dirceu.semigh...@gmail.com >> CC: u...@spark.apache.org >> Subject: RE: How to set memory for SparkR with master="local[*]" >> Date: Mon, 26 Oct 2015 02:24:00 + >> >> As documented in >> http://spark.apache.org/docs/latest/configuration.html#available-prop >> e >> rties, >> >> Note for “spark.driver.memory”: >> >> Note: In client mode, this config must not be set through the >> SparkConf directly in your application, because the driver JVM has >> already started at that point. Instead, please set this through the >> --driver-memory command line option or in your default properties file. >> >> >> >> If you are to start a SparkR shell using bin/sparkR, then you can use >> bin/sparkR –driver-memory. You have no chance to set the driver >> memory size after the R shell has been launched via bin/sparkR. >> >> >> >> Buf if you are to start a SparkR shell manually without using >> bin/sparkR (for example, in Rstudio), you can: >> >> library(SparkR) >> >> Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g >> sparkr-shell") >> >> sc <- sparkR.init() >> >> >> >> From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com] >> Sent: Friday, October 23, 2015 7:53 PM >> Cc: user >> Subject: Re: How to set memory for SparkR with master="local[*]" >> >> >> >> Hi Matej, >> >> I'm also using this and I'm having the same behavior here, my driver >> has only 530mb which is the default value. >> >> >> >> Maybe this is a bug. >> >> >> >> 2015-10-23 9:43 GMT-02:00 Matej Holec : >> >> Hello! >> >> How to adjust the memory settings properly for SparkR with master="local[*]" >> in R? >> >> >> *When running from R -- SparkR doesn't accept memory settings :(* >> >> I use the following commands: >> >> R> library(SparkR) >> R> sc <- sparkR.init(master = "local[*]", sparkEnvir = >> list(spark.driver.memory = "5g")) >> >> Despite the variable spark.driver.memory is correctly set (checked in >> http://node:4040/environment/), the driver has only the default >> amount of memory allocated (Storage Memory 530.3 MB). >> >> *But when running from spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK* >> >> The following command: >> >> ]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g >> >> creates SparkR session with properly adjustest driver memory (Storage >> Memory >> 2.6 GB). >> >> >> Any suggestion? >> >> Thanks >> Matej >> >> -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11340) Support setting driver properties when starting Spark from R programmatically or from RStudio
[ https://issues.apache.org/jira/browse/SPARK-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975742#comment-14975742 ] Apache Spark commented on SPARK-11340: -- User 'felixcheung' has created a pull request for this issue: https://github.com/apache/spark/pull/9290 > Support setting driver properties when starting Spark from R programmatically > or from RStudio > - > > Key: SPARK-11340 > URL: https://issues.apache.org/jira/browse/SPARK-11340 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.1 >Reporter: Felix Cheung >Priority: Minor > > Currently when sparkR.init() is called in 'client' mode, it launches the JVM > backend but driver properties (like driver-memory) are not passed or settable > by the user calling sparkR.init(). > [~sunrui][~shivaram] and I discussed this offline and think we should support > this. > This is the original thread: > >> From: rui@intel.com > >> To: dirceu.semigh...@gmail.com > >> CC: u...@spark.apache.org > >> Subject: RE: How to set memory for SparkR with master="local[*]" > >> Date: Mon, 26 Oct 2015 02:24:00 + > >> > >> As documented in > >> http://spark.apache.org/docs/latest/configuration.html#available-prop > >> e > >> rties, > >> > >> Note for “spark.driver.memory”: > >> > >> Note: In client mode, this config must not be set through the > >> SparkConf directly in your application, because the driver JVM has > >> already started at that point. Instead, please set this through the > >> --driver-memory command line option or in your default properties file. > >> > >> > >> > >> If you are to start a SparkR shell using bin/sparkR, then you can use > >> bin/sparkR –driver-memory. You have no chance to set the driver > >> memory size after the R shell has been launched via bin/sparkR. > >> > >> > >> > >> Buf if you are to start a SparkR shell manually without using > >> bin/sparkR (for example, in Rstudio), you can: > >> > >> library(SparkR) > >> > >> Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g > >> sparkr-shell") > >> > >> sc <- sparkR.init() > >> > >> > >> > >> From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com] > >> Sent: Friday, October 23, 2015 7:53 PM > >> Cc: user > >> Subject: Re: How to set memory for SparkR with master="local[*]" > >> > >> > >> > >> Hi Matej, > >> > >> I'm also using this and I'm having the same behavior here, my driver > >> has only 530mb which is the default value. > >> > >> > >> > >> Maybe this is a bug. > >> > >> > >> > >> 2015-10-23 9:43 GMT-02:00 Matej Holec : > >> > >> Hello! > >> > >> How to adjust the memory settings properly for SparkR with > >> master="local[*]" > >> in R? > >> > >> > >> *When running from R -- SparkR doesn't accept memory settings :(* > >> > >> I use the following commands: > >> > >> R> library(SparkR) > >> R> sc <- sparkR.init(master = "local[*]", sparkEnvir = > >> list(spark.driver.memory = "5g")) > >> > >> Despite the variable spark.driver.memory is correctly set (checked in > >> http://node:4040/environment/), the driver has only the default > >> amount of memory allocated (Storage Memory 530.3 MB). > >> > >> *But when running from spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK* > >> > >> The following command: > >> > >> ]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g > >> > >> creates SparkR session with properly adjustest driver memory (Storage > >> Memory > >> 2.6 GB). > >> > >> > >> Any suggestion? > >> > >> Thanks > >> Matej > >> > >> -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11340) Support setting driver properties when starting Spark from R programmatically or from RStudio
[ https://issues.apache.org/jira/browse/SPARK-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11340: Assignee: (was: Apache Spark) > Support setting driver properties when starting Spark from R programmatically > or from RStudio > - > > Key: SPARK-11340 > URL: https://issues.apache.org/jira/browse/SPARK-11340 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.1 >Reporter: Felix Cheung >Priority: Minor > > Currently when sparkR.init() is called in 'client' mode, it launches the JVM > backend but driver properties (like driver-memory) are not passed or settable > by the user calling sparkR.init(). > [~sunrui][~shivaram] and I discussed this offline and think we should support > this. > This is the original thread: > >> From: rui@intel.com > >> To: dirceu.semigh...@gmail.com > >> CC: u...@spark.apache.org > >> Subject: RE: How to set memory for SparkR with master="local[*]" > >> Date: Mon, 26 Oct 2015 02:24:00 + > >> > >> As documented in > >> http://spark.apache.org/docs/latest/configuration.html#available-prop > >> e > >> rties, > >> > >> Note for “spark.driver.memory”: > >> > >> Note: In client mode, this config must not be set through the > >> SparkConf directly in your application, because the driver JVM has > >> already started at that point. Instead, please set this through the > >> --driver-memory command line option or in your default properties file. > >> > >> > >> > >> If you are to start a SparkR shell using bin/sparkR, then you can use > >> bin/sparkR –driver-memory. You have no chance to set the driver > >> memory size after the R shell has been launched via bin/sparkR. > >> > >> > >> > >> Buf if you are to start a SparkR shell manually without using > >> bin/sparkR (for example, in Rstudio), you can: > >> > >> library(SparkR) > >> > >> Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g > >> sparkr-shell") > >> > >> sc <- sparkR.init() > >> > >> > >> > >> From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com] > >> Sent: Friday, October 23, 2015 7:53 PM > >> Cc: user > >> Subject: Re: How to set memory for SparkR with master="local[*]" > >> > >> > >> > >> Hi Matej, > >> > >> I'm also using this and I'm having the same behavior here, my driver > >> has only 530mb which is the default value. > >> > >> > >> > >> Maybe this is a bug. > >> > >> > >> > >> 2015-10-23 9:43 GMT-02:00 Matej Holec : > >> > >> Hello! > >> > >> How to adjust the memory settings properly for SparkR with > >> master="local[*]" > >> in R? > >> > >> > >> *When running from R -- SparkR doesn't accept memory settings :(* > >> > >> I use the following commands: > >> > >> R> library(SparkR) > >> R> sc <- sparkR.init(master = "local[*]", sparkEnvir = > >> list(spark.driver.memory = "5g")) > >> > >> Despite the variable spark.driver.memory is correctly set (checked in > >> http://node:4040/environment/), the driver has only the default > >> amount of memory allocated (Storage Memory 530.3 MB). > >> > >> *But when running from spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK* > >> > >> The following command: > >> > >> ]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g > >> > >> creates SparkR session with properly adjustest driver memory (Storage > >> Memory > >> 2.6 GB). > >> > >> > >> Any suggestion? > >> > >> Thanks > >> Matej > >> > >> -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11340) Support setting driver properties when starting Spark from R programmatically or from RStudio
[ https://issues.apache.org/jira/browse/SPARK-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11340: Assignee: Apache Spark > Support setting driver properties when starting Spark from R programmatically > or from RStudio > - > > Key: SPARK-11340 > URL: https://issues.apache.org/jira/browse/SPARK-11340 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.1 >Reporter: Felix Cheung >Assignee: Apache Spark >Priority: Minor > > Currently when sparkR.init() is called in 'client' mode, it launches the JVM > backend but driver properties (like driver-memory) are not passed or settable > by the user calling sparkR.init(). > [~sunrui][~shivaram] and I discussed this offline and think we should support > this. > This is the original thread: > >> From: rui@intel.com > >> To: dirceu.semigh...@gmail.com > >> CC: u...@spark.apache.org > >> Subject: RE: How to set memory for SparkR with master="local[*]" > >> Date: Mon, 26 Oct 2015 02:24:00 + > >> > >> As documented in > >> http://spark.apache.org/docs/latest/configuration.html#available-prop > >> e > >> rties, > >> > >> Note for “spark.driver.memory”: > >> > >> Note: In client mode, this config must not be set through the > >> SparkConf directly in your application, because the driver JVM has > >> already started at that point. Instead, please set this through the > >> --driver-memory command line option or in your default properties file. > >> > >> > >> > >> If you are to start a SparkR shell using bin/sparkR, then you can use > >> bin/sparkR –driver-memory. You have no chance to set the driver > >> memory size after the R shell has been launched via bin/sparkR. > >> > >> > >> > >> Buf if you are to start a SparkR shell manually without using > >> bin/sparkR (for example, in Rstudio), you can: > >> > >> library(SparkR) > >> > >> Sys.setenv("SPARKR_SUBMIT_ARGS" = "--conf spark.driver.memory=2g > >> sparkr-shell") > >> > >> sc <- sparkR.init() > >> > >> > >> > >> From: Dirceu Semighini Filho [mailto:dirceu.semigh...@gmail.com] > >> Sent: Friday, October 23, 2015 7:53 PM > >> Cc: user > >> Subject: Re: How to set memory for SparkR with master="local[*]" > >> > >> > >> > >> Hi Matej, > >> > >> I'm also using this and I'm having the same behavior here, my driver > >> has only 530mb which is the default value. > >> > >> > >> > >> Maybe this is a bug. > >> > >> > >> > >> 2015-10-23 9:43 GMT-02:00 Matej Holec : > >> > >> Hello! > >> > >> How to adjust the memory settings properly for SparkR with > >> master="local[*]" > >> in R? > >> > >> > >> *When running from R -- SparkR doesn't accept memory settings :(* > >> > >> I use the following commands: > >> > >> R> library(SparkR) > >> R> sc <- sparkR.init(master = "local[*]", sparkEnvir = > >> list(spark.driver.memory = "5g")) > >> > >> Despite the variable spark.driver.memory is correctly set (checked in > >> http://node:4040/environment/), the driver has only the default > >> amount of memory allocated (Storage Memory 530.3 MB). > >> > >> *But when running from spark-1.5.1-bin-hadoop2.6/bin/sparkR -- OK* > >> > >> The following command: > >> > >> ]$ spark-1.5.1-bin-hadoop2.6/bin/sparkR --driver-memory 5g > >> > >> creates SparkR session with properly adjustest driver memory (Storage > >> Memory > >> 2.6 GB). > >> > >> > >> Any suggestion? > >> > >> Thanks > >> Matej > >> > >> -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11277) sort_array throws exception scala.MatchError
[ https://issues.apache.org/jira/browse/SPARK-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Li updated SPARK-11277: --- Description: I was trying out the sort_array function then hit this exception. I looked into the spark source code. I found the root cause is that sort_array does not check for an array of NULLs. It's not meaningful to sort an array of entirely NULLs anyway. Similar issue exists with an array of struct type. I already have a fix for this issue and I'm going to create a pull request for it. scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show() scala.MatchError: ArrayType(NullType,true) (of class org.apache.spark.sql.types.ArrayType) at org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68) at org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67) at org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) was: I was trying out the sort_array function then hit this exception. I looked into the spark source code. I found the root cause is that sort_array does not check for an array of NULLs. It's not meaningful to sort an array of entirely NULLs anyway. I already have a fix for this issue and I'm going to create a pull request for it. scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show() scala.MatchError: ArrayType(NullType,true) (of class org.apache.spark.sql.types.ArrayType) at org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68) at org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67) at org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > sort_array throws exception scala.MatchError > > > Key: SPARK-11277 > URL: https://issues.apache.org/jira/browse/SPARK-11277 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 > Environment: Linux >Reporter: Jia Li > > I was trying out the sort_array function then hit this exception. > I looked into the spark source code. I found the root cause is that > sort_array does not check for an array of NULLs. It's not meaningful to sort > an array of entirely NULLs anyway. Similar issue exists with an array of > struct type. > I already have a fix for this issue and I'm going to create a pull request > for it. > scala> sqlContext.sql("selec
[jira] [Commented] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)
[ https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975743#comment-14975743 ] Apache Spark commented on SPARK-11338: -- User 'ckadner' has created a pull request for this issue: https://github.com/apache/spark/pull/9291 > HistoryPage not multi-tenancy enabled (app links not prefixed with > APPLICATION_WEB_PROXY_BASE) > -- > > Key: SPARK-11338 > URL: https://issues.apache.org/jira/browse/SPARK-11338 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Christian Kadner > Original Estimate: 48h > Remaining Estimate: 48h > > Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export > APPLICATION_WEB_PROXY_BASE=}}). This makes it > impossible/unpractical to expose the *History Server* in a multi-tenancy > environment where each Spark service instance has one history server behind a > multi-tenant enabled proxy server. All other Spark web UI pages are > correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. > *Repro steps:*\\ > # Configure history log collection: > {code:title=conf/spark-defaults.conf|borderStyle=solid} > spark.eventLog.enabled true > spark.eventLog.dir logs/history > spark.history.fs.logDirectory logs/history > {code} > ...create the logs folders: > {code} > $ mkdir -p logs/history > {code} > # Start the Spark shell and run the word count example: > {code:java|borderStyle=solid} > $ bin/spark-shell > ... > scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, > 1)).reduceByKey(_ + _).collect > scala> sc.stop > {code} > # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}): > {code} > $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. > {code} > # Start the history server: > {code} > $ sbin/start-history-server.sh > {code} > # Bring up the History Server web UI at {{localhost:18080}} and view the > application link in the HTML source text: > {code:xml|borderColor=#c00} > ... > App IDApp > Name... > > >href="/history/local-1445896187531">local-1445896187531Spark > shell > ... > {code} > *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have > the prefix {{/testwebuiproxy/..}} \\ \\ > All site-relative links (URL starting with {{"/"}}) should have been > prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ... > {code:xml|borderColor=#0c0} > ... > App IDApp > Name... > > >href="/testwebuiproxy/../history/local-1445896187531">local-1445896187531Spark > shell > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)
[ https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11338: Assignee: (was: Apache Spark) > HistoryPage not multi-tenancy enabled (app links not prefixed with > APPLICATION_WEB_PROXY_BASE) > -- > > Key: SPARK-11338 > URL: https://issues.apache.org/jira/browse/SPARK-11338 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Christian Kadner > Original Estimate: 48h > Remaining Estimate: 48h > > Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export > APPLICATION_WEB_PROXY_BASE=}}). This makes it > impossible/unpractical to expose the *History Server* in a multi-tenancy > environment where each Spark service instance has one history server behind a > multi-tenant enabled proxy server. All other Spark web UI pages are > correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. > *Repro steps:*\\ > # Configure history log collection: > {code:title=conf/spark-defaults.conf|borderStyle=solid} > spark.eventLog.enabled true > spark.eventLog.dir logs/history > spark.history.fs.logDirectory logs/history > {code} > ...create the logs folders: > {code} > $ mkdir -p logs/history > {code} > # Start the Spark shell and run the word count example: > {code:java|borderStyle=solid} > $ bin/spark-shell > ... > scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, > 1)).reduceByKey(_ + _).collect > scala> sc.stop > {code} > # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}): > {code} > $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. > {code} > # Start the history server: > {code} > $ sbin/start-history-server.sh > {code} > # Bring up the History Server web UI at {{localhost:18080}} and view the > application link in the HTML source text: > {code:xml|borderColor=#c00} > ... > App IDApp > Name... > > >href="/history/local-1445896187531">local-1445896187531Spark > shell > ... > {code} > *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have > the prefix {{/testwebuiproxy/..}} \\ \\ > All site-relative links (URL starting with {{"/"}}) should have been > prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ... > {code:xml|borderColor=#0c0} > ... > App IDApp > Name... > > >href="/testwebuiproxy/../history/local-1445896187531">local-1445896187531Spark > shell > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)
[ https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11338: Assignee: Apache Spark > HistoryPage not multi-tenancy enabled (app links not prefixed with > APPLICATION_WEB_PROXY_BASE) > -- > > Key: SPARK-11338 > URL: https://issues.apache.org/jira/browse/SPARK-11338 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Christian Kadner >Assignee: Apache Spark > Original Estimate: 48h > Remaining Estimate: 48h > > Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export > APPLICATION_WEB_PROXY_BASE=}}). This makes it > impossible/unpractical to expose the *History Server* in a multi-tenancy > environment where each Spark service instance has one history server behind a > multi-tenant enabled proxy server. All other Spark web UI pages are > correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. > *Repro steps:*\\ > # Configure history log collection: > {code:title=conf/spark-defaults.conf|borderStyle=solid} > spark.eventLog.enabled true > spark.eventLog.dir logs/history > spark.history.fs.logDirectory logs/history > {code} > ...create the logs folders: > {code} > $ mkdir -p logs/history > {code} > # Start the Spark shell and run the word count example: > {code:java|borderStyle=solid} > $ bin/spark-shell > ... > scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, > 1)).reduceByKey(_ + _).collect > scala> sc.stop > {code} > # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}): > {code} > $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. > {code} > # Start the history server: > {code} > $ sbin/start-history-server.sh > {code} > # Bring up the History Server web UI at {{localhost:18080}} and view the > application link in the HTML source text: > {code:xml|borderColor=#c00} > ... > App IDApp > Name... > > >href="/history/local-1445896187531">local-1445896187531Spark > shell > ... > {code} > *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have > the prefix {{/testwebuiproxy/..}} \\ \\ > All site-relative links (URL starting with {{"/"}}) should have been > prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ... > {code:xml|borderColor=#0c0} > ... > App IDApp > Name... > > >href="/testwebuiproxy/../history/local-1445896187531">local-1445896187531Spark > shell > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11341) Given non-zero ordinal toRow in the encoders of primitive types will cause problem
Liang-Chi Hsieh created SPARK-11341: --- Summary: Given non-zero ordinal toRow in the encoders of primitive types will cause problem Key: SPARK-11341 URL: https://issues.apache.org/jira/browse/SPARK-11341 Project: Spark Issue Type: Bug Components: SQL Reporter: Liang-Chi Hsieh The toRow in LongEncoder, IntEncoder writes given ordinal of an unsafe row with only one field. Since the ordinal is parametric. If given non-zero ordinal, it will cause problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11341) Given non-zero ordinal toRow in the encoders of primitive types will cause problem
[ https://issues.apache.org/jira/browse/SPARK-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975749#comment-14975749 ] Apache Spark commented on SPARK-11341: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/9292 > Given non-zero ordinal toRow in the encoders of primitive types will cause > problem > -- > > Key: SPARK-11341 > URL: https://issues.apache.org/jira/browse/SPARK-11341 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh > > The toRow in LongEncoder, IntEncoder writes given ordinal of an unsafe row > with only one field. Since the ordinal is parametric. If given non-zero > ordinal, it will cause problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11341) Given non-zero ordinal toRow in the encoders of primitive types will cause problem
[ https://issues.apache.org/jira/browse/SPARK-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11341: Assignee: Apache Spark > Given non-zero ordinal toRow in the encoders of primitive types will cause > problem > -- > > Key: SPARK-11341 > URL: https://issues.apache.org/jira/browse/SPARK-11341 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > The toRow in LongEncoder, IntEncoder writes given ordinal of an unsafe row > with only one field. Since the ordinal is parametric. If given non-zero > ordinal, it will cause problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11341) Given non-zero ordinal toRow in the encoders of primitive types will cause problem
[ https://issues.apache.org/jira/browse/SPARK-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11341: Assignee: (was: Apache Spark) > Given non-zero ordinal toRow in the encoders of primitive types will cause > problem > -- > > Key: SPARK-11341 > URL: https://issues.apache.org/jira/browse/SPARK-11341 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh > > The toRow in LongEncoder, IntEncoder writes given ordinal of an unsafe row > with only one field. Since the ordinal is parametric. If given non-zero > ordinal, it will cause problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11338) HistoryPage not multi-tenancy enabled (app links not prefixed with APPLICATION_WEB_PROXY_BASE)
[ https://issues.apache.org/jira/browse/SPARK-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-11338: - Description: Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export APPLICATION_WEB_PROXY_BASE=}}). This makes it impossible/unpractical to expose the *History Server* in a multi-tenancy environment where each Spark service instance has one history server behind a multi-tenant enabled proxy server. All other Spark web UI pages are correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} environment variable is set. *Repro steps:*\\ # Configure history log collection: {code:title=conf/spark-defaults.conf|borderStyle=solid} spark.eventLog.enabled true spark.eventLog.dir logs/history spark.history.fs.logDirectory logs/history {code} ...create the logs folders: {code} $ mkdir -p logs/history {code} # Start the Spark shell and run the word count example: {code:java|borderStyle=solid} $ bin/spark-shell ... scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 1)).reduceByKey(_ + _).collect scala> sc.stop {code} # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}): {code} $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. {code} # Start the history server: {code} $ sbin/start-history-server.sh {code} # Bring up the History Server web UI at {{localhost:18080}} and view the application link in the HTML source text: {code:xml|borderColor=#c00} ... App IDApp Name... local-1445896187531Spark shell ... {code} *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have the prefix {{/testwebuiproxy/..}} \\ \\ All site-relative links (URL starting with {{"/"}}) should have been prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ... {code:xml|borderColor=#0c0} ... App IDApp Name... local-1445896187531Spark shell ... {code} was: Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export APPLICATION_WEB_PROXY_BASE=}}). This makes it impossible/unpractical to expose the *History Server* in a multi-tenancy environment where each Spark service instance has one history server behind a multi-tenant enabled proxy server. All other Spark web UI pages are correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} variable is set. *Repro steps:*\\ # Configure history log collection: {code:title=conf/spark-defaults.conf|borderStyle=solid} spark.eventLog.enabled true spark.eventLog.dir logs/history spark.history.fs.logDirectory logs/history {code} ...create the logs folders: {code} $ mkdir -p logs/history {code} # Start the Spark shell and run the word count example: {code:java|borderStyle=solid} $ bin/spark-shell ... scala> sc.textFile("README.md").flatMap(_.split(" ")).map(w => (w, 1)).reduceByKey(_ + _).collect scala> sc.stop {code} # Set the web proxy root path path (i.e. {{/testwebuiproxy/..}}): {code} $ export APPLICATION_WEB_PROXY_BASE=/testwebuiproxy/.. {code} # Start the history server: {code} $ sbin/start-history-server.sh {code} # Bring up the History Server web UI at {{localhost:18080}} and view the application link in the HTML source text: {code:xml|borderColor=#c00} ... App IDApp Name... local-1445896187531Spark shell ... {code} *Notice*, application link "{{/history/local-1445896187531}}" does _not_ have the prefix {{/testwebuiproxy/..}} \\ \\ All site-relative links (URL starting with {{"/"}}) should have been prepended with the uiRoot prefix {{/testwebuiproxy/..}} like this ... {code:xml|borderColor=#0c0} ... App IDApp Name... local-1445896187531Spark shell ... {code} > HistoryPage not multi-tenancy enabled (app links not prefixed with > APPLICATION_WEB_PROXY_BASE) > -- > > Key: SPARK-11338 > URL: https://issues.apache.org/jira/browse/SPARK-11338 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Christian Kadner > Original Estimate: 48h > Remaining Estimate: 48h > > Links on {{HistoryPage}} are not prepended with {{uiRoot}} ({{export > APPLICATION_WEB_PROXY_BASE=}}). This makes it > impossible/unpractical to expose the *History Server* in a multi-tenancy > environment where each Spark service instance has one history server behind a > multi-tenant enabled proxy server. All other Spark web UI pages are > correctly prefixed when the {{APPLICATION_WEB_PROXY_BASE}} environment > variable is set. > *Repro steps:*\\ > # Configure history log collection: > {code:title=conf/spark-defaults.conf|borderStyle=solid} > spark.eventLog.enabled true > spark.eventLog.dir logs/history > spark.history
[jira] [Resolved] (SPARK-11297) code example generated by include_example is not exactly the same with {% highlight %}
[ https://issues.apache.org/jira/browse/SPARK-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-11297. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 9265 [https://github.com/apache/spark/pull/9265] > code example generated by include_example is not exactly the same with {% > highlight %} > -- > > Key: SPARK-11297 > URL: https://issues.apache.org/jira/browse/SPARK-11297 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML, MLlib >Reporter: Xusen Yin >Assignee: Xusen Yin > Fix For: 1.6.0 > > > Code example generated by include_example is a little different with previous > {% highlight %} results, which causes a bigger font size of code examples. We > need to substitute "" to "", and add new code > tags to make it looks the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org