[jira] [Assigned] (SPARK-12230) WeightedLeastSquares.fit() should handle division by zero properly if standard deviation of target variable is zero.
[ https://issues.apache.org/jira/browse/SPARK-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12230: Assignee: Apache Spark > WeightedLeastSquares.fit() should handle division by zero properly if > standard deviation of target variable is zero. > > > Key: SPARK-12230 > URL: https://issues.apache.org/jira/browse/SPARK-12230 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Imran Younus >Assignee: Apache Spark >Priority: Trivial > > This is a TODO in WeightedLeastSquares.fit() method. If the standard > deviation of the target variablel is zero, then the regression is > meaningless. I think the fit() method should inform the user and exit nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12230) WeightedLeastSquares.fit() should handle division by zero properly if standard deviation of target variable is zero.
[ https://issues.apache.org/jira/browse/SPARK-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12230: Assignee: (was: Apache Spark) > WeightedLeastSquares.fit() should handle division by zero properly if > standard deviation of target variable is zero. > > > Key: SPARK-12230 > URL: https://issues.apache.org/jira/browse/SPARK-12230 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Imran Younus >Priority: Trivial > > This is a TODO in WeightedLeastSquares.fit() method. If the standard > deviation of the target variablel is zero, then the regression is > meaningless. I think the fit() method should inform the user and exit nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12230) WeightedLeastSquares.fit() should handle division by zero properly if standard deviation of target variable is zero.
[ https://issues.apache.org/jira/browse/SPARK-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056360#comment-15056360 ] Apache Spark commented on SPARK-12230: -- User 'iyounus' has created a pull request for this issue: https://github.com/apache/spark/pull/10274 > WeightedLeastSquares.fit() should handle division by zero properly if > standard deviation of target variable is zero. > > > Key: SPARK-12230 > URL: https://issues.apache.org/jira/browse/SPARK-12230 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Imran Younus >Priority: Trivial > > This is a TODO in WeightedLeastSquares.fit() method. If the standard > deviation of the target variablel is zero, then the regression is > meaningless. I think the fit() method should inform the user and exit nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12271) Improve error message for Dataset.as[] when the schema is incompatible.
[ https://issues.apache.org/jira/browse/SPARK-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12271: Assignee: Apache Spark > Improve error message for Dataset.as[] when the schema is incompatible. > --- > > Key: SPARK-12271 > URL: https://issues.apache.org/jira/browse/SPARK-12271 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Nong Li >Assignee: Apache Spark > > It currently fails with an unexecutable exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12296: Assignee: (was: Apache Spark) > Feature parity for pyspark.mllib StandardScalerModel > > > Key: SPARK-12296 > URL: https://issues.apache.org/jira/browse/SPARK-12296 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Priority: Minor > > Some methods are missing, such as ways to access the std, mean, etc. This > JIRA is for feature parity for pyspark.mllib.feature.StandardScaler & > StandardScalerModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12296: Assignee: Apache Spark > Feature parity for pyspark.mllib StandardScalerModel > > > Key: SPARK-12296 > URL: https://issues.apache.org/jira/browse/SPARK-12296 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Minor > > Some methods are missing, such as ways to access the std, mean, etc. This > JIRA is for feature parity for pyspark.mllib.feature.StandardScaler & > StandardScalerModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-12062: --- > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12062: -- Target Version/s: 1.6.1, 2.0.0 > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work
[ https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056630#comment-15056630 ] RJ Nowling commented on SPARK-4816: --- Tried with Maven 3.3.9. I see no issues with the newer version of Maven: {code} $ mvn -version Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) Maven home: /root/apache-maven-3.3.9 Java version: 1.7.0_85, vendor: Oracle Corporation Java home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85-2.6.1.2.el7_1.x86_64/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "3.10.0-229.1.2.el7.x86_64", arch: "amd64", family: "unix" $ zipinfo -1 assembly/target/scala-2.10/spark-assembly-1.4.1-hadoop2.4.0.jar | grep netlib-native netlib-native_ref-osx-x86_64.jnilib netlib-native_ref-osx-x86_64.jnilib.asc netlib-native_ref-osx-x86_64.pom netlib-native_ref-osx-x86_64.pom.asc META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/ META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_ref-osx-x86_64/pom.properties netlib-native_ref-linux-x86_64.so META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/ META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-x86_64/pom.properties netlib-native_ref-linux-i686.so META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/ META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-i686/pom.properties netlib-native_ref-win-x86_64.dll netlib-native_ref-win-x86_64.dll.asc netlib-native_ref-win-x86_64.pom netlib-native_ref-win-x86_64.pom.asc META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/ META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-x86_64/pom.properties netlib-native_ref-win-i686.dll netlib-native_ref-win-i686.dll.asc netlib-native_ref-win-i686.pom netlib-native_ref-win-i686.pom.asc META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/ META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_ref-win-i686/pom.properties netlib-native_ref-linux-armhf.so META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/ META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_ref-linux-armhf/pom.properties netlib-native_system-osx-x86_64.jnilib netlib-native_system-osx-x86_64.jnilib.asc netlib-native_system-osx-x86_64.pom netlib-native_system-osx-x86_64.pom.asc META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/ META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_system-osx-x86_64/pom.properties netlib-native_system-linux-x86_64.pom.asc netlib-native_system-linux-x86_64.pom netlib-native_system-linux-x86_64.so netlib-native_system-linux-x86_64.so.asc META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/ META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-x86_64/pom.properties netlib-native_system-linux-i686.pom netlib-native_system-linux-i686.so.asc netlib-native_system-linux-i686.pom.asc netlib-native_system-linux-i686.so META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/ META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-i686/pom.properties netlib-native_system-linux-armhf.pom netlib-native_system-linux-armhf.so.asc netlib-native_system-linux-armhf.pom.asc netlib-native_system-linux-armhf.so META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/ META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_system-linux-armhf/pom.properties netlib-native_system-win-x86_64.dll netlib-native_system-win-x86_64.dll.asc netlib-native_system-win-x86_64.pom netlib-native_system-win-x86_64.pom.asc META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/ META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/pom.xml META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-x86_64/pom.properties netlib-native_system-win-i686.dll netlib-native_system-win-i686.dll.asc netlib-native_system-win-i686.pom netlib-native_system-win-i686.pom.asc META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-i686/ META-INF/maven/com.github.fommil.netlib/netlib-native_system-win-i686/pom.xml
[jira] [Resolved] (SPARK-4816) Maven profile netlib-lgpl does not work
[ https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4816. -- Resolution: Fixed Assignee: Sean Owen Fix Version/s: 1.4.2 I see, so it's re-fixed for older (but supported) versions of Maven by a commit already in the branch. Elsewhere, it's a moot point. I guess we can consider it fixed as a better resolution here. > Maven profile netlib-lgpl does not work > --- > > Key: SPARK-4816 > URL: https://issues.apache.org/jira/browse/SPARK-4816 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 > Environment: maven 3.0.5 / Ubuntu >Reporter: Guillaume Pitel >Assignee: Sean Owen >Priority: Minor > Fix For: 1.4.2, 1.1.1 > > > When doing what the documentation recommends to recompile Spark with Netlib > Native system binding (i.e. to bind with openblas or, in my case, MKL), > mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > clean package > The resulting assembly jar still lacked the netlib-system class. (I checked > the content of spark-assembly...jar) > When forcing the netlib-lgpl profile in MLLib package to be active, the jar > is correctly built. > So I guess it's a problem with the way maven passes profiles activitations to > children modules. > Also, despite the documentation claiming that if the job's jar contains > netlib with necessary bindings, it should works, it does not. The classloader > must be unhappy with two occurrences of netlib ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop
[ https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055633#comment-15055633 ] Michael Han edited comment on SPARK-2356 at 12/14/15 9:21 AM: -- Hello Everyone, I encounter this issue today again when I tried to create a cluster using two windows 7 (64) desktop. This errors happens when I register the second worker to the master using the following command: spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077 Strange it works fine when I register the first worker to the master. anyone knows some work around to fix this issue? The above work around works fine when I using local mode. Since I registered one worker successfully in the cluster, but when run spark-submit in the successfully worker, it also throw this exception. I tried to set the HADOOP_HOME = C:\winutil in the env variables, but it doesn't work. The error is: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/12/14 16:49:22 WARN NativeCodeLoader: Unable to load native-hadoop library fo r your platform... using builtin-java classes where applicable 15/12/14 16:49:22 ERROR Shell: Failed to locate the winutils binary in the hadoo p binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Ha doop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370) at org.apache.hadoop.util.Shell.(Shell.java:363) at org.apache.hadoop.util.StringUtils.(StringUtils.java:79) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104) at org.apache.hadoop.security.Groups.(Groups.java:86) at org.apache.hadoop.security.Groups.(Groups.java:66) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Group s.java:280) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupI nformation.java:271) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(Use rGroupInformation.java:248) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject( UserGroupInformation.java:763) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGrou pInformation.java:748) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGr oupInformation.java:621) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils .scala:2091) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils .scala:2091) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2091) at org.apache.spark.SecurityManager.(SecurityManager.scala:212) at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker. scala:692) at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:674) at org.apache.spark.deploy.worker.Worker.main(Worker.scala) 15/12/14 16:49:22 INFO SecurityManager: Changing view acls to: mh6 15/12/14 16:49:22 INFO SecurityManager: Changing modify acls to: mh6 15/12/14 16:49:22 INFO SecurityManager: SecurityManager: authentication disabled ; ui acls disabled; users with view permissions: Set(mh6); users with modify per missions: Set(mh6) 15/12/14 16:49:23 INFO Slf4jLogger: Slf4jLogger started 15/12/14 16:49:23 INFO Remoting: Starting remoting 15/12/14 16:49:24 INFO Remoting: Remoting started; listening on addresses :[akka .tcp://sparkWorker@167.3.129.160:46862] 15/12/14 16:49:24 INFO Utils: Successfully started service 'sparkWorker' on port 46862. 15/12/14 16:49:24 INFO Worker: Starting Spark worker 167.3.129.160:46862 with 4 cores, 2.9 GB RAM 15/12/14 16:49:24 INFO Worker: Running Spark version 1.5.2 15/12/14 16:49:24 INFO Worker: Spark home: C:\spark-1.5.2-bin-hadoop2.6\bin\.. 15/12/14 16:49:24 INFO Utils: Successfully started service 'WorkerUI' on port 80 81. 15/12/14 16:49:24 INFO WorkerWebUI: Started WorkerWebUI at http://167.3.129.160: 8081 15/12/14 16:49:24 INFO Worker: Connecting to master 192.168.79.1:7077... 15/12/14 16:49:39 INFO Worker: Retrying connection to master (attempt # 1) 15/12/14 16:49:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thr ead Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main] java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.Futur eTask@3ef5e68c rejected from java.util.concurrent.ThreadPoolExecutor@741cb720[Ru nning, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution (ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.jav a:823) at
[jira] [Assigned] (SPARK-11882) Allow for running Spark applications against a custom coarse grained scheduler
[ https://issues.apache.org/jira/browse/SPARK-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11882: Assignee: (was: Apache Spark) > Allow for running Spark applications against a custom coarse grained scheduler > -- > > Key: SPARK-11882 > URL: https://issues.apache.org/jira/browse/SPARK-11882 > Project: Spark > Issue Type: Wish > Components: Spark Core, Spark Submit >Reporter: Jacek Lewandowski >Priority: Minor > > SparkContext makes a decision which scheduler to use according to the Master > URI. How about running applications against a custom scheduler? Such a custom > scheduler would just extend {{CoarseGrainedSchedulerBackend}}. > The custom scheduler would be created by a provided factory. Factories would > be defined in the configuration like > {{spark.scheduler.factory.=}}, where {{name}} is the > scheduler name. {{SparkContext}}, once it learns that master address is not > for standalone, Yarn, Mesos, local or any other predefined scheduler, it > would resolve scheme from the provided master URI and look for the scheduler > factory with the name equal to the resolved scheme. > For example: > {{spark.scheduler.factory.custom=org.a.b.c.CustomSchedulerFactory}} > then Master address would be {{custom://192.168.1.1}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11882) Allow for running Spark applications against a custom coarse grained scheduler
[ https://issues.apache.org/jira/browse/SPARK-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055842#comment-15055842 ] Apache Spark commented on SPARK-11882: -- User 'jacek-lewandowski' has created a pull request for this issue: https://github.com/apache/spark/pull/10292 > Allow for running Spark applications against a custom coarse grained scheduler > -- > > Key: SPARK-11882 > URL: https://issues.apache.org/jira/browse/SPARK-11882 > Project: Spark > Issue Type: Wish > Components: Spark Core, Spark Submit >Reporter: Jacek Lewandowski >Priority: Minor > > SparkContext makes a decision which scheduler to use according to the Master > URI. How about running applications against a custom scheduler? Such a custom > scheduler would just extend {{CoarseGrainedSchedulerBackend}}. > The custom scheduler would be created by a provided factory. Factories would > be defined in the configuration like > {{spark.scheduler.factory.=}}, where {{name}} is the > scheduler name. {{SparkContext}}, once it learns that master address is not > for standalone, Yarn, Mesos, local or any other predefined scheduler, it > would resolve scheme from the provided master URI and look for the scheduler > factory with the name equal to the resolved scheme. > For example: > {{spark.scheduler.factory.custom=org.a.b.c.CustomSchedulerFactory}} > then Master address would be {{custom://192.168.1.1}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11882) Allow for running Spark applications against a custom coarse grained scheduler
[ https://issues.apache.org/jira/browse/SPARK-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11882: Assignee: Apache Spark > Allow for running Spark applications against a custom coarse grained scheduler > -- > > Key: SPARK-11882 > URL: https://issues.apache.org/jira/browse/SPARK-11882 > Project: Spark > Issue Type: Wish > Components: Spark Core, Spark Submit >Reporter: Jacek Lewandowski >Assignee: Apache Spark >Priority: Minor > > SparkContext makes a decision which scheduler to use according to the Master > URI. How about running applications against a custom scheduler? Such a custom > scheduler would just extend {{CoarseGrainedSchedulerBackend}}. > The custom scheduler would be created by a provided factory. Factories would > be defined in the configuration like > {{spark.scheduler.factory.=}}, where {{name}} is the > scheduler name. {{SparkContext}}, once it learns that master address is not > for standalone, Yarn, Mesos, local or any other predefined scheduler, it > would resolve scheme from the provided master URI and look for the scheduler > factory with the name equal to the resolved scheme. > For example: > {{spark.scheduler.factory.custom=org.a.b.c.CustomSchedulerFactory}} > then Master address would be {{custom://192.168.1.1}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12320) throw exception if the number of fields does not line up for Tuple encoder
[ https://issues.apache.org/jira/browse/SPARK-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12320: Assignee: (was: Apache Spark) > throw exception if the number of fields does not line up for Tuple encoder > -- > > Key: SPARK-12320 > URL: https://issues.apache.org/jira/browse/SPARK-12320 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12320) throw exception if the number of fields does not line up for Tuple encoder
[ https://issues.apache.org/jira/browse/SPARK-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055858#comment-15055858 ] Apache Spark commented on SPARK-12320: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/10293 > throw exception if the number of fields does not line up for Tuple encoder > -- > > Key: SPARK-12320 > URL: https://issues.apache.org/jira/browse/SPARK-12320 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12320) throw exception if the number of fields does not line up for Tuple encoder
[ https://issues.apache.org/jira/browse/SPARK-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12320: Assignee: Apache Spark > throw exception if the number of fields does not line up for Tuple encoder > -- > > Key: SPARK-12320 > URL: https://issues.apache.org/jira/browse/SPARK-12320 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12323) Don't assign default value for non-nullable columns of a Dataset
[ https://issues.apache.org/jira/browse/SPARK-12323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12323: Assignee: Apache Spark (was: Cheng Lian) > Don't assign default value for non-nullable columns of a Dataset > > > Key: SPARK-12323 > URL: https://issues.apache.org/jira/browse/SPARK-12323 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.0.0 >Reporter: Cheng Lian >Assignee: Apache Spark > > For a field of a Dataset, if it's specified as non-nullable in the schema of > the Dataset, we shouldn't assign default value for it if input data contain > null. Instead, a runtime exception with nice error message should be thrown, > and ask the user to use {{Option}} or nullable types (e.g., > {{java.lang.Integer}} instead of {{scala.Int}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12016) word2vec load model can't use findSynonyms to get words
[ https://issues.apache.org/jira/browse/SPARK-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-12016. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10100 [https://github.com/apache/spark/pull/10100] > word2vec load model can't use findSynonyms to get words > > > Key: SPARK-12016 > URL: https://issues.apache.org/jira/browse/SPARK-12016 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.2 > Environment: ubuntu 14.04 >Reporter: yuangang.liu > Fix For: 2.0.0 > > > I use word2vec.fit to train a word2vecModel and then save the model to file > system. when I load the model from file system, I found I can use > transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get > some words. > I use the fellow code to test word2vec > from pyspark import SparkContext > from pyspark.mllib.feature import Word2Vec, Word2VecModel > import os, tempfile > from shutil import rmtree > if __name__ == '__main__': > sc = SparkContext('local', 'test') > sentence = "a b " * 100 + "a c " * 10 > localDoc = [sentence, sentence] > doc = sc.parallelize(localDoc).map(lambda line: line.split(" ")) > model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc) > syms = model.findSynonyms("a", 2) > print [s[0] for s in syms] > path = tempfile.mkdtemp() > model.save(sc, path) > sameModel = Word2VecModel.load(sc, path) > print model.transform("a") == sameModel.transform("a") > syms = sameModel.findSynonyms("a", 2) > print [s[0] for s in syms] > try: > rmtree(path) > except OSError: > pass > I got "[u'b', u'c']" when the first printf > then the “True” and " [u'__class__'] " > I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work
[ https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056414#comment-15056414 ] RJ Nowling commented on SPARK-4816: --- Happy to try Maven 3.3.x and report back. Would certainly confirm if it's a Maven bug or regression in behavior. > Maven profile netlib-lgpl does not work > --- > > Key: SPARK-4816 > URL: https://issues.apache.org/jira/browse/SPARK-4816 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 > Environment: maven 3.0.5 / Ubuntu >Reporter: Guillaume Pitel >Priority: Minor > Fix For: 1.1.1 > > > When doing what the documentation recommends to recompile Spark with Netlib > Native system binding (i.e. to bind with openblas or, in my case, MKL), > mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > clean package > The resulting assembly jar still lacked the netlib-system class. (I checked > the content of spark-assembly...jar) > When forcing the netlib-lgpl profile in MLLib package to be active, the jar > is correctly built. > So I guess it's a problem with the way maven passes profiles activitations to > children modules. > Also, despite the documentation claiming that if the job's jar contains > netlib with necessary bindings, it should works, it does not. The classloader > must be unhappy with two occurrences of netlib ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12271) Improve error message for Dataset.as[] when the schema is incompatible.
[ https://issues.apache.org/jira/browse/SPARK-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12271: Assignee: Apache Spark > Improve error message for Dataset.as[] when the schema is incompatible. > --- > > Key: SPARK-12271 > URL: https://issues.apache.org/jira/browse/SPARK-12271 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Nong Li >Assignee: Apache Spark > > It currently fails with an unexecutable exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-11255) R Test build should run on R 3.1.1
[ https://issues.apache.org/jira/browse/SPARK-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp closed SPARK-11255. --- > R Test build should run on R 3.1.1 > -- > > Key: SPARK-11255 > URL: https://issues.apache.org/jira/browse/SPARK-11255 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Felix Cheung >Assignee: shane knapp >Priority: Minor > > Test should run on R 3.1.1 which is the version listed as supported. > Apparently there are few R changes that can go undetected since Jenkins Test > build is running something newer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11255) R Test build should run on R 3.1.1
[ https://issues.apache.org/jira/browse/SPARK-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp resolved SPARK-11255. - Resolution: Fixed this is done > R Test build should run on R 3.1.1 > -- > > Key: SPARK-11255 > URL: https://issues.apache.org/jira/browse/SPARK-11255 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Felix Cheung >Assignee: shane knapp >Priority: Minor > > Test should run on R 3.1.1 which is the version listed as supported. > Apparently there are few R changes that can go undetected since Jenkins Test > build is running something newer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12271) Improve error message for Dataset.as[] when the schema is incompatible.
[ https://issues.apache.org/jira/browse/SPARK-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12271: Assignee: (was: Apache Spark) > Improve error message for Dataset.as[] when the schema is incompatible. > --- > > Key: SPARK-12271 > URL: https://issues.apache.org/jira/browse/SPARK-12271 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Nong Li > > It currently fails with an unexecutable exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056576#comment-15056576 ] holdenk commented on SPARK-12296: - I can take this one :) > Feature parity for pyspark.mllib StandardScalerModel > > > Key: SPARK-12296 > URL: https://issues.apache.org/jira/browse/SPARK-12296 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Priority: Minor > > Some methods are missing, such as ways to access the std, mean, etc. This > JIRA is for feature parity for pyspark.mllib.feature.StandardScaler & > StandardScalerModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12304) Make Spark Streaming web UI display more friendly Receiver graphs
[ https://issues.apache.org/jira/browse/SPARK-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056422#comment-15056422 ] Apache Spark commented on SPARK-12304: -- User 'proflin' has created a pull request for this issue: https://github.com/apache/spark/pull/10276 > Make Spark Streaming web UI display more friendly Receiver graphs > - > > Key: SPARK-12304 > URL: https://issues.apache.org/jira/browse/SPARK-12304 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Liwei Lin >Priority: Minor > Attachments: after-5.png, before-5.png > > > Currently, the Spark Streaming web UI uses the same maxY when displays 'Input > Rate Times& Histograms' and 'Per-Receiver Times& Histograms'. > This may lead to somewhat un-friendly graphs: once we have tens of Receivers > or more, every 'Per-Receiver Times' line almost hits the ground. > This issue proposes to calculate a new maxY against the original one, which > is shared among all the `Per-Receiver Times& Histograms' graphs. > Before: > !before-5.png! > After: > !after-5.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12317) Support configurate value with unit(e.g. kb/mb/gb) in SQL
[ https://issues.apache.org/jira/browse/SPARK-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056471#comment-15056471 ] Bo Meng commented on SPARK-12317: - Good point. We can follow JVM convention for memory configuration as [g|G|m|M|k|K] > Support configurate value with unit(e.g. kb/mb/gb) in SQL > - > > Key: SPARK-12317 > URL: https://issues.apache.org/jira/browse/SPARK-12317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2 >Reporter: Yadong Qi >Priority: Minor > > e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` > instead of `10485760`, because `10MB` is more easier than `10485760`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12323) Don't assign default value for non-nullable columns of a Dataset
[ https://issues.apache.org/jira/browse/SPARK-12323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12323: Assignee: Cheng Lian (was: Apache Spark) > Don't assign default value for non-nullable columns of a Dataset > > > Key: SPARK-12323 > URL: https://issues.apache.org/jira/browse/SPARK-12323 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > For a field of a Dataset, if it's specified as non-nullable in the schema of > the Dataset, we shouldn't assign default value for it if input data contain > null. Instead, a runtime exception with nice error message should be thrown, > and ask the user to use {{Option}} or nullable types (e.g., > {{java.lang.Integer}} instead of {{scala.Int}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work
[ https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056409#comment-15056409 ] Sean Owen commented on SPARK-4816: -- I'm on Maven 3.3.x. I wonder if that could be a difference -- can you try 3.3.x just to check? If you're correct, this is already fixed for the next 1.4 which should be 1.4.2. I don't know if/when that will be released though. (I also don't know why the branch shows 1.4.3-SNAPSHOT) It's as fixed as it would be for this branch though. But then yes it would be listed as fixed as part of any release notes, automatically. I think finding a relevant JIRA may be as good as it gets in the general case for finding whether something's already known as an issue and fixed. This one ought to be easy to find by keyword. Of course -- if there is a problem -- just having it work in later releases is even better. I'm not aware of any additional fix that needs to be made though. As I say I can't even reproduce it. > Maven profile netlib-lgpl does not work > --- > > Key: SPARK-4816 > URL: https://issues.apache.org/jira/browse/SPARK-4816 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 > Environment: maven 3.0.5 / Ubuntu >Reporter: Guillaume Pitel >Priority: Minor > Fix For: 1.1.1 > > > When doing what the documentation recommends to recompile Spark with Netlib > Native system binding (i.e. to bind with openblas or, in my case, MKL), > mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > clean package > The resulting assembly jar still lacked the netlib-system class. (I checked > the content of spark-assembly...jar) > When forcing the netlib-lgpl profile in MLLib package to be active, the jar > is correctly built. > So I guess it's a problem with the way maven passes profiles activitations to > children modules. > Also, despite the documentation claiming that if the job's jar contains > netlib with necessary bindings, it should works, it does not. The classloader > must be unhappy with two occurrences of netlib ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5506) java.lang.ClassCastException using lambda expressions in combination of spark and Servlet
[ https://issues.apache.org/jira/browse/SPARK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056413#comment-15056413 ] Pavan Achanta commented on SPARK-5506: -- I get the same exception while running a job from Intellij IDE. {code} import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; public class App { public static void main(String[] args) { String logFile = "/usr/local/spark-1.5.2/README.md"; // Should be some file on your system SparkConf conf = new SparkConf().setAppName("Simple Application") .set("spark.eventLog.enabled", "true") .set("spark.eventLog.dir", "/opt/logs/") //.setMaster("local") .setMaster("spark://localhost:7077") ; JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD logData = sc.textFile(logFile).cache(); long numAs = logData.filter(s -> s.contains("a")).count(); long numBs = logData.filter(s -> s.contains("b")).count(); System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs); } } {code} THe exception I see is as follows: {code} 15/12/13 23:47:58 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@127.0.0.1:50873/user/Executor#-484673147]) with ID 0 15/12/13 23:47:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 127.0.0.1, PROCESS_LOCAL, 2146 bytes) 15/12/13 23:47:59 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 127.0.0.1, PROCESS_LOCAL, 2146 bytes) 15/12/13 23:47:59 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:50877 with 530.0 MB RAM, BlockManagerId(0, 127.0.0.1, 50877) 15/12/13 23:48:00 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 127.0.0.1:50877 (size: 2.2 KB, free: 530.0 MB) 15/12/13 23:48:01 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 127.0.0.1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.f$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaRDD$$anonfun$filter$1 at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 15/12/13 23:48:01 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor 127.0.0.1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.f$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaRDD$$anonfun$filter$1) [duplicate 1] 15/12/13 23:48:01 INFO
[jira] [Closed] (SPARK-12282) Document spark.jars
[ https://issues.apache.org/jira/browse/SPARK-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Bailey closed SPARK-12282. - > Document spark.jars > --- > > Key: SPARK-12282 > URL: https://issues.apache.org/jira/browse/SPARK-12282 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Justin Bailey >Priority: Trivial > > The spark.jars property (as implemented in SparkSubmit.scala, > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L516) > is not documented anywhere, and should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11255) R Test build should run on R 3.1.1
[ https://issues.apache.org/jira/browse/SPARK-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056461#comment-15056461 ] shane knapp commented on SPARK-11255: - this is happening now. i forgot about it last week... :) > R Test build should run on R 3.1.1 > -- > > Key: SPARK-11255 > URL: https://issues.apache.org/jira/browse/SPARK-11255 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Felix Cheung >Assignee: shane knapp >Priority: Minor > > Test should run on R 3.1.1 which is the version listed as supported. > Apparently there are few R changes that can go undetected since Jenkins Test > build is running something newer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work
[ https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056368#comment-15056368 ] RJ Nowling commented on SPARK-4816: --- I want to push for two things (a) some sort of documentation for users (e.g., release notes in the next releases) and (b) make sure it's fixed in the latest releases. I want users to be able to find documentation (like this JIRA) so they don't have to spend time tracking it down like I did. Spark 1.4.2 hasn't been released yet and git has moved to a 1.4.3 SNAPSHOT. You mention adding the commit to the 1.5.x branch in the commit -- has this been done? Until 1.4.3 and a 1.5.x release are out with your change, this could still hit certain users, even if it's rare because it's tied to a specific Maven version or such. > Maven profile netlib-lgpl does not work > --- > > Key: SPARK-4816 > URL: https://issues.apache.org/jira/browse/SPARK-4816 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 > Environment: maven 3.0.5 / Ubuntu >Reporter: Guillaume Pitel >Priority: Minor > Fix For: 1.1.1 > > > When doing what the documentation recommends to recompile Spark with Netlib > Native system binding (i.e. to bind with openblas or, in my case, MKL), > mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > clean package > The resulting assembly jar still lacked the netlib-system class. (I checked > the content of spark-assembly...jar) > When forcing the netlib-lgpl profile in MLLib package to be active, the jar > is correctly built. > So I guess it's a problem with the way maven passes profiles activitations to > children modules. > Also, despite the documentation claiming that if the job's jar contains > netlib with necessary bindings, it should works, it does not. The classloader > must be unhappy with two occurrences of netlib ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work
[ https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056375#comment-15056375 ] RJ Nowling commented on SPARK-4816: --- Also, what version of Maven are you running? > Maven profile netlib-lgpl does not work > --- > > Key: SPARK-4816 > URL: https://issues.apache.org/jira/browse/SPARK-4816 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 > Environment: maven 3.0.5 / Ubuntu >Reporter: Guillaume Pitel >Priority: Minor > Fix For: 1.1.1 > > > When doing what the documentation recommends to recompile Spark with Netlib > Native system binding (i.e. to bind with openblas or, in my case, MKL), > mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > clean package > The resulting assembly jar still lacked the netlib-system class. (I checked > the content of spark-assembly...jar) > When forcing the netlib-lgpl profile in MLLib package to be active, the jar > is correctly built. > So I guess it's a problem with the way maven passes profiles activitations to > children modules. > Also, despite the documentation claiming that if the job's jar contains > netlib with necessary bindings, it should works, it does not. The classloader > must be unhappy with two occurrences of netlib ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12317) Support configurate value with unit(e.g. kb/mb/gb) in SQL
[ https://issues.apache.org/jira/browse/SPARK-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056446#comment-15056446 ] Bo Meng commented on SPARK-12317: - If we wanna go that route, my suggestion is to support Double as the number plus the unit. For example, 1.5GB, that will make the configuration more general. > Support configurate value with unit(e.g. kb/mb/gb) in SQL > - > > Key: SPARK-12317 > URL: https://issues.apache.org/jira/browse/SPARK-12317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2 >Reporter: Yadong Qi >Priority: Minor > > e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` > instead of `10485760`, because `10MB` is more easier than `10485760`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12282) Document spark.jars
[ https://issues.apache.org/jira/browse/SPARK-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Bailey resolved SPARK-12282. --- Resolution: Not A Problem > Document spark.jars > --- > > Key: SPARK-12282 > URL: https://issues.apache.org/jira/browse/SPARK-12282 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Justin Bailey >Priority: Trivial > > The spark.jars property (as implemented in SparkSubmit.scala, > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L516) > is not documented anywhere, and should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12317) Support configurate value with unit(e.g. kb/mb/gb) in SQL
[ https://issues.apache.org/jira/browse/SPARK-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056448#comment-15056448 ] Sean Owen commented on SPARK-12317: --- I like the idea though am so used to the JVM's version of this which doesn't allow fractional values. > Support configurate value with unit(e.g. kb/mb/gb) in SQL > - > > Key: SPARK-12317 > URL: https://issues.apache.org/jira/browse/SPARK-12317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2 >Reporter: Yadong Qi >Priority: Minor > > e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` > instead of `10485760`, because `10MB` is more easier than `10485760`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12325) Inappropriate error messages in DataFrame StatFunctions
[ https://issues.apache.org/jira/browse/SPARK-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narine Kokhlikyan updated SPARK-12325: -- Description: Hi there, I have mentioned this issue earlier in one of my pull requests for SQL component, but I've never received a feedback in any of them. https://github.com/apache/spark/pull/9366#issuecomment-155171975 Although this has been very frustrating, I'll try to list certain facts again: 1. I call dataframe correlation method and it says that covariance is wrong. I do not think that this is an appropriate message to show here. scala> df.stat.corr("rating", "income") java.lang.IllegalArgumentException: requirement failed: Covariance calculation for columns with dataType StringType not supported. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:81) 2. The biggest issue here is not the message shown, but the design. A class called CovarianceCounter does the computations both for correlation and covariance. This might be a convenient way from certain perspective, however something like this is harder to understand and extend, especially if you want to add another algorithm e.g. Spearman correlation, or something else. There are many possible solutions here: starting from 1. just fixing the message 2. fixing the message and renaming CovarianceCounter and corresponding methods 3. create CorrelationCounter and splitting the computations for correlation and covariance and many more Since I'm not getting any response and according to github all five of you have been working on this, I'll try again: [~brkyvz], [~rxin], [~davies], [~viirya], [~cloud_fan] Can any of you ,please, explain me such a behavior with the stat functions or communicate more about this ? In case you are planning to remove it or something else, we'd truly appreciate if you communicate. In fact, I would like to do a pull request on this, but since my pull requests in SQL/ML components are just staying there without any response, I'll wait for your response first. cc: [~shivaram], [~mengxr] Thank you, Narine was: Hi there, I have mentioned this issue earlier in one of my pull requests for SQL component, but I've never received a feedback in any of them. https://github.com/apache/spark/pull/9366#issuecomment-155171975 Although this has been very frustrating, I'll try to list certain facts again: 1. I call dataframe correlation method and it says that covariance is wrong. I do not think that this is an appropriate message to show here. scala> df.stat.corr("rating", "income") java.lang.IllegalArgumentException: requirement failed: Covariance calculation for columns with dataType StringType not supported. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:81) 2. The biggest issue here is not the message shown, but the design. A class called CovarianceCounter does the computations both for correlation and covariance. This might be a convenient way from certain perspective, however something like this is harder to understand and extend, especially if you want to add another algorithm e.g. Spearman correlation, or something else. There are many possible solutions here: starting from 1. just fixing the message 2. fixing the message and renaming CovarianceCounter and corresponding methods 3. create CorrelationCounter and splitting the computations for correlation and covariance and many more Since I'm not getting any response and according to github all five of you have been working on this, I'll try again: [~brkyvz], [~rxin], [~davies], [~viirya], [~cloud_fan] Can any of you ,please, explain me such a behavior or communicate more about this ? In case you are planning to remove it or something else, we'd truly appreciate if you communicate. In fact, I would like to do a pull request on this, but since my pull requests in SQL/ML components are just staying there without any response, I'll wait for your response first. cc: [~shivaram], [~mengxr] Thank you, Narine > Inappropriate error messages in DataFrame StatFunctions > > > Key: SPARK-12325 > URL: https://issues.apache.org/jira/browse/SPARK-12325 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Narine Kokhlikyan >Priority: Critical > > Hi there, > I have mentioned this issue earlier in one of my pull requests for SQL > component, but I've never received a feedback in any of them. > https://github.com/apache/spark/pull/9366#issuecomment-155171975 > Although this has been very frustrating, I'll try to list certain facts again: > 1. I call
[jira] [Assigned] (SPARK-12302) Example for servlet filter used by spark.ui.filters
[ https://issues.apache.org/jira/browse/SPARK-12302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12302: Assignee: Apache Spark > Example for servlet filter used by spark.ui.filters > --- > > Key: SPARK-12302 > URL: https://issues.apache.org/jira/browse/SPARK-12302 > Project: Spark > Issue Type: Improvement > Components: Examples >Affects Versions: 1.5.2 >Reporter: Kai Sasaki >Assignee: Apache Spark >Priority: Trivial > Labels: examples, security > > Although {{spark.ui.filters}} configuration uses simple servlet filter, it is > often difficult to understand how to write filter code and how to integrate > actual spark applications. > It can be help to write examples for trying secure Spark cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12302) Example for servlet filter used by spark.ui.filters
[ https://issues.apache.org/jira/browse/SPARK-12302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12302: Assignee: (was: Apache Spark) > Example for servlet filter used by spark.ui.filters > --- > > Key: SPARK-12302 > URL: https://issues.apache.org/jira/browse/SPARK-12302 > Project: Spark > Issue Type: Improvement > Components: Examples >Affects Versions: 1.5.2 >Reporter: Kai Sasaki >Priority: Trivial > Labels: examples, security > > Although {{spark.ui.filters}} configuration uses simple servlet filter, it is > often difficult to understand how to write filter code and how to integrate > actual spark applications. > It can be help to write examples for trying secure Spark cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12270) JDBC Where clause comparison doesn't work for DB2 char(n)
[ https://issues.apache.org/jira/browse/SPARK-12270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056870#comment-15056870 ] Apache Spark commented on SPARK-12270: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/10262 > JDBC Where clause comparison doesn't work for DB2 char(n) > -- > > Key: SPARK-12270 > URL: https://issues.apache.org/jira/browse/SPARK-12270 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Huaxin Gao >Priority: Minor > > I am doing some Spark jdbc test against DB2. My test is like this: > {code} > conn.prepareStatement( > "create table people (name char(32)").executeUpdate() > conn.prepareStatement("insert into people values > ('fred')").executeUpdate() > sql( >s""" > |CREATE TEMPORARY TABLE foobar > |USING org.apache.spark.sql.jdbc > |OPTIONS (url '$url', dbtable 'PEOPLE', user 'testuser', password > 'testpassword') > """.stripMargin.replaceAll("\n", " ")) > val df = sqlContext.sql("SELECT * FROM foobar WHERE NAME = 'fred'") > {code} > I am expecting to see one row with content 'fred' in df. However, there is no > row returned. If I changed the data type to varchar (32) in the create table > ddl , then I can get the row back correctly. The cause of the problem is that > for data type char (num), DB2 defines it as fixed-length character strings, > so if I have char (32), when doing "SELECT * FROM foobar WHERE NAME = > 'fred'", DB2 returns 'fred' padded with 28 empty space. Spark treats "fred' > padded with empty space not the same as 'fred' so df doesn't have any row. If > I have varchar (32), DB2 just returns 'fred' for the select statement and df > has the right row. In order to make DB2 char (num) works for spark, I suggest > to change spark code to trim the empty space after get the data from > database. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12325) Inappropriate error messages in DataFrame StatFunctions
Narine Kokhlikyan created SPARK-12325: - Summary: Inappropriate error messages in DataFrame StatFunctions Key: SPARK-12325 URL: https://issues.apache.org/jira/browse/SPARK-12325 Project: Spark Issue Type: Bug Components: SQL Reporter: Narine Kokhlikyan Priority: Critical Hi there, I have mentioned this issue earlier in one of my pull requests for SQL component, but I've never received a feedback in any of them. https://github.com/apache/spark/pull/9366#issuecomment-155171975 Although this has been very frustrating, I'll try to list certain facts again: 1. I call dataframe correlation method and it says that covariance is wrong. I do not think that this is an appropriate message to show here. scala> df.stat.corr("rating", "income") java.lang.IllegalArgumentException: requirement failed: Covariance calculation for columns with dataType StringType not supported. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:81) 2. The biggest issue here is not the message shown, but the design. A class called CovarianceCounter does the computations both for correlation and covariance. This might be a convenient way from certain perspective, however something like this is harder to understand and extend, especially if you want to add another algorithm e.g. Spearman correlation, or something else. There are many possible solutions here: starting from 1. just fixing the message 2. fixing the message and renaming CovarianceCounter and corresponding methods 3. create CorrelationCounter and splitting the computations for correlation and covariance and many more Since I'm not getting any response and according to github all five of you have been working on this, I'll try again: [~brkyvz], [~rxin], [~davies], [~viirya], [~cloud_fan] Can any of you ,please, explain me such a behavior or communicate more about this. In case you are planning to remove it or something else, we'd truly appreciate if you communicate. In fact, I would like to do a pull request on this, but since my pull requests in SQL/ML components are just staying there without any response, I'll wait for your response first. cc: [~shivaram], [~mengxr] Thank you, Narine -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12325) Inappropriate error messages in DataFrame StatFunctions
[ https://issues.apache.org/jira/browse/SPARK-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narine Kokhlikyan updated SPARK-12325: -- Description: Hi there, I have mentioned this issue earlier in one of my pull requests for SQL component, but I've never received a feedback in any of them. https://github.com/apache/spark/pull/9366#issuecomment-155171975 Although this has been very frustrating, I'll try to list certain facts again: 1. I call dataframe correlation method and it says that covariance is wrong. I do not think that this is an appropriate message to show here. scala> df.stat.corr("rating", "income") java.lang.IllegalArgumentException: requirement failed: Covariance calculation for columns with dataType StringType not supported. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:81) 2. The biggest issue here is not the message shown, but the design. A class called CovarianceCounter does the computations both for correlation and covariance. This might be a convenient way from certain perspective, however something like this is harder to understand and extend, especially if you want to add another algorithm e.g. Spearman correlation, or something else. There are many possible solutions here: starting from 1. just fixing the message 2. fixing the message and renaming CovarianceCounter and corresponding methods 3. create CorrelationCounter and splitting the computations for correlation and covariance and many more Since I'm not getting any response and according to github all five of you have been working on this, I'll try again: [~brkyvz], [~rxin], [~davies], [~viirya], [~cloud_fan] Can any of you ,please, explain me such a behavior or communicate more about this ? In case you are planning to remove it or something else, we'd truly appreciate if you communicate. In fact, I would like to do a pull request on this, but since my pull requests in SQL/ML components are just staying there without any response, I'll wait for your response first. cc: [~shivaram], [~mengxr] Thank you, Narine was: Hi there, I have mentioned this issue earlier in one of my pull requests for SQL component, but I've never received a feedback in any of them. https://github.com/apache/spark/pull/9366#issuecomment-155171975 Although this has been very frustrating, I'll try to list certain facts again: 1. I call dataframe correlation method and it says that covariance is wrong. I do not think that this is an appropriate message to show here. scala> df.stat.corr("rating", "income") java.lang.IllegalArgumentException: requirement failed: Covariance calculation for columns with dataType StringType not supported. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:81) 2. The biggest issue here is not the message shown, but the design. A class called CovarianceCounter does the computations both for correlation and covariance. This might be a convenient way from certain perspective, however something like this is harder to understand and extend, especially if you want to add another algorithm e.g. Spearman correlation, or something else. There are many possible solutions here: starting from 1. just fixing the message 2. fixing the message and renaming CovarianceCounter and corresponding methods 3. create CorrelationCounter and splitting the computations for correlation and covariance and many more Since I'm not getting any response and according to github all five of you have been working on this, I'll try again: [~brkyvz], [~rxin], [~davies], [~viirya], [~cloud_fan] Can any of you ,please, explain me such a behavior or communicate more about this. In case you are planning to remove it or something else, we'd truly appreciate if you communicate. In fact, I would like to do a pull request on this, but since my pull requests in SQL/ML components are just staying there without any response, I'll wait for your response first. cc: [~shivaram], [~mengxr] Thank you, Narine > Inappropriate error messages in DataFrame StatFunctions > > > Key: SPARK-12325 > URL: https://issues.apache.org/jira/browse/SPARK-12325 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Narine Kokhlikyan >Priority: Critical > > Hi there, > I have mentioned this issue earlier in one of my pull requests for SQL > component, but I've never received a feedback in any of them. > https://github.com/apache/spark/pull/9366#issuecomment-155171975 > Although this has been very frustrating, I'll try to list certain facts again: > 1. I call dataframe correlation
[jira] [Created] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml
Seth Hendrickson created SPARK-12326: Summary: Move GBT implementation from spark.mllib to spark.ml Key: SPARK-12326 URL: https://issues.apache.org/jira/browse/SPARK-12326 Project: Spark Issue Type: Improvement Components: ML, MLlib Reporter: Seth Hendrickson Several improvements can be made to gradient boosted trees, but are not possible without moving the GBT implementation to spark.ml (e.g. rawPrediction column, feature importance). This Jira is for moving the current GBT implementation to spark.ml, which will have roughly the following steps: 1. Copy the implementation to spark.ml and change spark.ml classes to use that implementation. Current tests will ensure that the implementations learn exactly the same models. 2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since eventually all tree implementations will reside in spark.ml, the helper classes should as well. 3. Remove the spark.mllib implementation, and make the spark.mllib APIs wrappers around the spark.ml implementation. The spark.ml tests will again ensure that we do not change any behavior. 4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to verify model equivalence. Steps 2, 3, and 4 should be in separate Jiras. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12296: Assignee: (was: Apache Spark) > Feature parity for pyspark.mllib StandardScalerModel > > > Key: SPARK-12296 > URL: https://issues.apache.org/jira/browse/SPARK-12296 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Priority: Minor > > Some methods are missing, such as ways to access the std, mean, etc. This > JIRA is for feature parity for pyspark.mllib.feature.StandardScaler & > StandardScalerModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12296: Assignee: Apache Spark > Feature parity for pyspark.mllib StandardScalerModel > > > Key: SPARK-12296 > URL: https://issues.apache.org/jira/browse/SPARK-12296 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Minor > > Some methods are missing, such as ways to access the std, mean, etc. This > JIRA is for feature parity for pyspark.mllib.feature.StandardScaler & > StandardScalerModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12324) The documentation sidebar does not collapse properly
Timothy Hunter created SPARK-12324: -- Summary: The documentation sidebar does not collapse properly Key: SPARK-12324 URL: https://issues.apache.org/jira/browse/SPARK-12324 Project: Spark Issue Type: Bug Components: Documentation, MLlib Affects Versions: 1.5.2 Reporter: Timothy Hunter When the browser's window is reduced horizontally, the sidebar slides under the main content and does not collapse: - hide the sidebar when the browser's width is not large enough - add a button to show and hide the sidebar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12324) The documentation sidebar does not collapse properly
[ https://issues.apache.org/jira/browse/SPARK-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Hunter updated SPARK-12324: --- Attachment: Screen Shot 2015-12-14 at 12.29.57 PM.png > The documentation sidebar does not collapse properly > > > Key: SPARK-12324 > URL: https://issues.apache.org/jira/browse/SPARK-12324 > Project: Spark > Issue Type: Bug > Components: Documentation, MLlib >Affects Versions: 1.5.2 >Reporter: Timothy Hunter > Attachments: Screen Shot 2015-12-14 at 12.29.57 PM.png > > > When the browser's window is reduced horizontally, the sidebar slides under > the main content and does not collapse: > - hide the sidebar when the browser's width is not large enough > - add a button to show and hide the sidebar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4816) Maven profile netlib-lgpl does not work
[ https://issues.apache.org/jira/browse/SPARK-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056675#comment-15056675 ] RJ Nowling commented on SPARK-4816: --- Agreed. Thanks! > Maven profile netlib-lgpl does not work > --- > > Key: SPARK-4816 > URL: https://issues.apache.org/jira/browse/SPARK-4816 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 > Environment: maven 3.0.5 / Ubuntu >Reporter: Guillaume Pitel >Assignee: Sean Owen >Priority: Minor > Fix For: 1.1.1, 1.4.2 > > > When doing what the documentation recommends to recompile Spark with Netlib > Native system binding (i.e. to bind with openblas or, in my case, MKL), > mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > clean package > The resulting assembly jar still lacked the netlib-system class. (I checked > the content of spark-assembly...jar) > When forcing the netlib-lgpl profile in MLLib package to be active, the jar > is correctly built. > So I guess it's a problem with the way maven passes profiles activitations to > children modules. > Also, despite the documentation claiming that if the job's jar contains > netlib with necessary bindings, it should works, it does not. The classloader > must be unhappy with two occurrences of netlib ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12324) The documentation sidebar does not collapse properly
[ https://issues.apache.org/jira/browse/SPARK-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056677#comment-15056677 ] Timothy Hunter commented on SPARK-12324: I am creating a PR with a fix. cc [~josephkb] > The documentation sidebar does not collapse properly > > > Key: SPARK-12324 > URL: https://issues.apache.org/jira/browse/SPARK-12324 > Project: Spark > Issue Type: Bug > Components: Documentation, MLlib >Affects Versions: 1.5.2 >Reporter: Timothy Hunter > Attachments: Screen Shot 2015-12-14 at 12.29.57 PM.png > > > When the browser's window is reduced horizontally, the sidebar slides under > the main content and does not collapse: > - hide the sidebar when the browser's width is not large enough > - add a button to show and hide the sidebar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12324) The documentation sidebar does not collapse properly
[ https://issues.apache.org/jira/browse/SPARK-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-12324: -- Priority: Minor (was: Major) Component/s: (was: MLlib) > The documentation sidebar does not collapse properly > > > Key: SPARK-12324 > URL: https://issues.apache.org/jira/browse/SPARK-12324 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.2 >Reporter: Timothy Hunter >Priority: Minor > Attachments: Screen Shot 2015-12-14 at 12.29.57 PM.png > > > When the browser's window is reduced horizontally, the sidebar slides under > the main content and does not collapse: > - hide the sidebar when the browser's width is not large enough > - add a button to show and hide the sidebar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12324) The documentation sidebar does not collapse properly
[ https://issues.apache.org/jira/browse/SPARK-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12324: Assignee: Apache Spark > The documentation sidebar does not collapse properly > > > Key: SPARK-12324 > URL: https://issues.apache.org/jira/browse/SPARK-12324 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.2 >Reporter: Timothy Hunter >Assignee: Apache Spark >Priority: Minor > Attachments: Screen Shot 2015-12-14 at 12.29.57 PM.png > > > When the browser's window is reduced horizontally, the sidebar slides under > the main content and does not collapse: > - hide the sidebar when the browser's width is not large enough > - add a button to show and hide the sidebar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12324) The documentation sidebar does not collapse properly
[ https://issues.apache.org/jira/browse/SPARK-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056703#comment-15056703 ] Apache Spark commented on SPARK-12324: -- User 'thunterdb' has created a pull request for this issue: https://github.com/apache/spark/pull/10297 > The documentation sidebar does not collapse properly > > > Key: SPARK-12324 > URL: https://issues.apache.org/jira/browse/SPARK-12324 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.2 >Reporter: Timothy Hunter >Priority: Minor > Attachments: Screen Shot 2015-12-14 at 12.29.57 PM.png > > > When the browser's window is reduced horizontally, the sidebar slides under > the main content and does not collapse: > - hide the sidebar when the browser's width is not large enough > - add a button to show and hide the sidebar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml
[ https://issues.apache.org/jira/browse/SPARK-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056715#comment-15056715 ] Seth Hendrickson commented on SPARK-12326: -- [~josephkb] Could you review the plan above? I couldn't find any other Jira for moving GBTs to ML and it seems like it would be good to get this done so we can move on some other improvements that are needed as well. Thanks! > Move GBT implementation from spark.mllib to spark.ml > > > Key: SPARK-12326 > URL: https://issues.apache.org/jira/browse/SPARK-12326 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Seth Hendrickson > > Several improvements can be made to gradient boosted trees, but are not > possible without moving the GBT implementation to spark.ml (e.g. > rawPrediction column, feature importance). This Jira is for moving the > current GBT implementation to spark.ml, which will have roughly the following > steps: > 1. Copy the implementation to spark.ml and change spark.ml classes to use > that implementation. Current tests will ensure that the implementations learn > exactly the same models. > 2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, > InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since > eventually all tree implementations will reside in spark.ml, the helper > classes should as well. > 3. Remove the spark.mllib implementation, and make the spark.mllib APIs > wrappers around the spark.ml implementation. The spark.ml tests will again > ensure that we do not change any behavior. > 4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to > verify model equivalence. > Steps 2, 3, and 4 should be in separate Jiras. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12275) No plan for BroadcastHint in some condition
[ https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12275. --- Resolution: Fixed Fix Version/s: 1.5.3 Target Version/s: 1.5.3, 1.6.1, 2.0.0 (was: 1.5.3) > No plan for BroadcastHint in some condition > --- > > Key: SPARK-12275 > URL: https://issues.apache.org/jira/browse/SPARK-12275 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: yucai >Assignee: yucai > Labels: backport-needed > Fix For: 1.5.3, 1.6.1, 2.0.0 > > > *Summary* > No plan for BroadcastHint is generated in some condition. > *Test Case* > {code} > val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") > val parquetTempFile = > "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), > scala.util.Random.nextInt) > df1.write.parquet(parquetTempFile) > val pf1 = sqlContext.read.parquet(parquetTempFile) > #1. df1.join(broadcast(pf1)).count() > #2. broadcast(pf1).count() > {code} > *Result* > It will trigger assertion in QueryPlanner.scala, like below: > {code} > scala> df1.join(broadcast(pf1)).count() > java.lang.AssertionError: assertion failed: No plan for BroadcastHint > +- Relation[key#6,value#7] > ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet] > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > at > org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056723#comment-15056723 ] Apache Spark commented on SPARK-12296: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/10298 > Feature parity for pyspark.mllib StandardScalerModel > > > Key: SPARK-12296 > URL: https://issues.apache.org/jira/browse/SPARK-12296 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Joseph K. Bradley >Priority: Minor > > Some methods are missing, such as ways to access the std, mean, etc. This > JIRA is for feature parity for pyspark.mllib.feature.StandardScaler & > StandardScalerModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12318) Save mode in SparkR should be error by default
Jeff Zhang created SPARK-12318: -- Summary: Save mode in SparkR should be error by default Key: SPARK-12318 URL: https://issues.apache.org/jira/browse/SPARK-12318 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 1.5.2 Reporter: Jeff Zhang Priority: Minor The save mode in SparkR should be consistent with that of scala api -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12318) Save mode in SparkR should be error by default
[ https://issues.apache.org/jira/browse/SPARK-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055625#comment-15055625 ] Jeff Zhang commented on SPARK-12318: Working on it. > Save mode in SparkR should be error by default > -- > > Key: SPARK-12318 > URL: https://issues.apache.org/jira/browse/SPARK-12318 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.2 >Reporter: Jeff Zhang >Priority: Minor > > The save mode in SparkR should be consistent with that of scala api -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9578) Stemmer feature transformer
[ https://issues.apache.org/jira/browse/SPARK-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055626#comment-15055626 ] yuhao yang commented on SPARK-9578: --- PR was sent two days ago. I'm not sure why it's not linked here... https://github.com/apache/spark/pull/10272 > Stemmer feature transformer > --- > > Key: SPARK-9578 > URL: https://issues.apache.org/jira/browse/SPARK-9578 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > Transformer mentioned first in [SPARK-5571] based on suggestion from > [~aloknsingh]. Very standard NLP preprocessing task. > From [~aloknsingh]: > {quote} > We have one scala stemmer in scalanlp%chalk > https://github.com/scalanlp/chalk/tree/master/src/main/scala/chalk/text/analyze > which can easily copied (as it is apache project) and is in scala too. > I think this will be better alternative than lucene englishAnalyzer or > opennlp. > Note: we already use the scalanlp%breeze via the maven dependency so I think > adding scalanlp%chalk dependency is also the options. But as you had said we > can copy the code as it is small. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10347) Investigate the usage of normalizePath()
[ https://issues.apache.org/jira/browse/SPARK-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055629#comment-15055629 ] Sun Rui commented on SPARK-10347: - A possible solution would be: provide a utility function, within which pesudo code is like follows: {code} if (path does not contain scheme && default hadoop file system scheme is local) { normalizePath(path, mustWork=TRUE) } else { path } {code} The code piece to get default hadoop file system scheme: {code} hadoopConf <- callJMethod(sc, "hadoopConfiguration") defaultScheme <- callJMethod(hadoopConf, "get", "fs.default.name") {code} > Investigate the usage of normalizePath() > > > Key: SPARK-10347 > URL: https://issues.apache.org/jira/browse/SPARK-10347 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Sun Rui >Priority: Minor > > Currently normalizePath() is used in several places allowing users to specify > paths via the use of tilde expansion, or to normalize a relative path to an > absolute path. However, normalizePath() is used for paths which are actually > expected to be a URI. normalizePath() may display warning messages when it > does not recognize a URI as a local file path. So suppressWarnings() is used > to suppress the possible warnings. > Worse than warnings, call normalizePath() on a URI may cause error. Because > it may turn a user specified relative path to an absolute path using the > local current directory, but this may not be true because the path is > actually relative to the working directory of the default file system instead > of the local file system (depends on the Hadoop configuration of Spark). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop
[ https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055633#comment-15055633 ] Michael Han commented on SPARK-2356: Hello Everyone, I encounter this issue today again when I tried to create a cluster using two windows 7 (64) desktop. This errors happens when I register the second worker to the master using the following command: spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077 Strange it works fine when I register the first worker to the master. anyone knows some work around to fix this issue? The above work around works fine when I using local mode. > Exception: Could not locate executable null\bin\winutils.exe in the Hadoop > --- > > Key: SPARK-2356 > URL: https://issues.apache.org/jira/browse/SPARK-2356 > Project: Spark > Issue Type: Bug > Components: Windows >Affects Versions: 1.0.0 >Reporter: Kostiantyn Kudriavtsev >Priority: Critical > > I'm trying to run some transformation on Spark, it works fine on cluster > (YARN, linux machines). However, when I'm trying to run it on local machine > (Windows 7) under unit test, I got errors (I don't use Hadoop, I'm read file > from local filesystem): > {code} > 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the > hadoop binary path > java.io.IOException: Could not locate executable null\bin\winutils.exe in the > Hadoop binaries. > at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) > at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) > at org.apache.hadoop.util.Shell.(Shell.java:326) > at org.apache.hadoop.util.StringUtils.(StringUtils.java:76) > at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) > at org.apache.hadoop.security.Groups.(Groups.java:77) > at > org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) > at > org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) > at > org.apache.spark.deploy.SparkHadoopUtil.(SparkHadoopUtil.scala:36) > at > org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala:109) > at > org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala) > at org.apache.spark.SparkContext.(SparkContext.scala:228) > at org.apache.spark.SparkContext.(SparkContext.scala:97) > {code} > It's happened because Hadoop config is initialized each time when spark > context is created regardless is hadoop required or not. > I propose to add some special flag to indicate if hadoop config is required > (or start this configuration manually) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12325) Inappropriate error messages in DataFrame StatFunctions
[ https://issues.apache.org/jira/browse/SPARK-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narine Kokhlikyan updated SPARK-12325: -- Affects Version/s: 1.5.2 > Inappropriate error messages in DataFrame StatFunctions > > > Key: SPARK-12325 > URL: https://issues.apache.org/jira/browse/SPARK-12325 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: Narine Kokhlikyan >Priority: Critical > > Hi there, > I have mentioned this issue earlier in one of my pull requests for SQL > component, but I've never received a feedback in any of them. > https://github.com/apache/spark/pull/9366#issuecomment-155171975 > Although this has been very frustrating, I'll try to list certain facts again: > 1. I call dataframe correlation method and it says that covariance is wrong. > I do not think that this is an appropriate message to show here. > scala> df.stat.corr("rating", "income") > java.lang.IllegalArgumentException: requirement failed: Covariance > calculation for columns with dataType StringType not supported. > at scala.Predef$.require(Predef.scala:233) > at > org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:81) > 2. The biggest issue here is not the message shown, but the design. > A class called CovarianceCounter does the computations both for correlation > and covariance. This might be a convenient way > from certain perspective, however something like this is harder to understand > and extend, especially if you want to add another algorithm > e.g. Spearman correlation, or something else. > There are many possible solutions here: > starting from > 1. just fixing the message > 2. fixing the message and renaming CovarianceCounter and corresponding > methods > 3. create CorrelationCounter and splitting the computations for correlation > and covariance > and many more > Since I'm not getting any response and according to github all five of you > have been working on this, I'll try again: > [~brkyvz], [~rxin], [~davies], [~viirya], [~cloud_fan] > Can any of you ,please, explain me such a behavior with the stat functions or > communicate more about this ? > In case you are planning to remove it or something else, we'd truly > appreciate if you communicate. > In fact, I would like to do a pull request on this, but since my pull > requests in SQL/ML components are just staying there without any response, > I'll wait for your response first. > cc: [~shivaram], [~mengxr] > Thank you, > Narine -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12317) Support configurate value with unit(e.g. kb/mb/gb) in SQL
[ https://issues.apache.org/jira/browse/SPARK-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056948#comment-15056948 ] kevin yu commented on SPARK-12317: -- I talked with Bo, I will work on this PR. Thanks. Kevin > Support configurate value with unit(e.g. kb/mb/gb) in SQL > - > > Key: SPARK-12317 > URL: https://issues.apache.org/jira/browse/SPARK-12317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2 >Reporter: Yadong Qi >Priority: Minor > > e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` > instead of `10485760`, because `10MB` is more easier than `10485760`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12327) lint-r checks fail with commented code
Shivaram Venkataraman created SPARK-12327: - Summary: lint-r checks fail with commented code Key: SPARK-12327 URL: https://issues.apache.org/jira/browse/SPARK-12327 Project: Spark Issue Type: Bug Components: SparkR Reporter: Shivaram Venkataraman We get this after our R version downgrade {code} R/RDD.R:183:68: style: Commented code should be removed. rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # rddRef$asJavaRDD() ^~ R/RDD.R:228:63: style: Commented code should be removed. #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence. ^~~~ R/RDD.R:388:24: style: Commented code should be removed. #' collectAsMap(rdd) # list(`1` = 2, `3` = 4) ^~ R/RDD.R:603:61: style: Commented code should be removed. #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2) ^~~~ R/RDD.R:762:20: style: Commented code should be removed. #' take(rdd, 2L) # list(1, 2) ^~ R/RDD.R:830:42: style: Commented code should be removed. #' sort(unlist(collect(distinct(rdd # c(1, 2, 3) ^~~ R/RDD.R:980:47: style: Commented code should be removed. #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), list(9, 3)) ^~~~ R/RDD.R:1194:27: style: Commented code should be removed. #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6) ^~ R/RDD.R:1215:19: style: Commented code should be removed. #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4) ^~~ R/RDD.R:1270:50: style: Commented code should be removed. #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4) ^~~ R/RDD.R:1374:6: style: Commented code should be removed. #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 2)) ^~ R/RDD.R:1415:6: style: Commented code should be removed. #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 4)) ^~ R/RDD.R:1461:6: style: Commented code should be removed. #' # list(list(1, 2), list(3, 4)) ^~~~ R/RDD.R:1527:6: style: Commented code should be removed. #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 1004)) ^~~ R/RDD.R:1564:6: style: Commented code should be removed. #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2)) ^~~~ R/RDD.R:1595:6: style: Commented code should be removed. #' # list(1, 1, 3) ^ R/RDD.R:1627:6: style: Commented code should be removed. #' # list(1, 2, 3) ^ R/RDD.R:1663:6: style: Commented code should be removed. #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6))) ^~ R/deserialize.R:22:3: style: Commented code should be removed. # void -> NULL ^~~~ R/deserialize.R:23:3: style: Commented code should be removed. # Int -> integer ^~ R/deserialize.R:24:3: style: Commented code should be removed. # String -> character ^~~ R/deserialize.R:25:3: style: Commented code should be removed. # Boolean -> logical ^~ R/deserialize.R:26:3: style: Commented code should be removed. # Float -> double ^~~ R/deserialize.R:27:3: style: Commented code should be removed. # Double -> double ^~~~ R/deserialize.R:28:3: style: Commented code should be removed. # Long -> double ^~ R/deserialize.R:29:3: style: Commented code should be removed. # Array[Byte] -> raw ^~ R/deserialize.R:30:3: style: Commented code should be removed. # Date -> Date ^~~~ R/deserialize.R:31:3: style: Commented code should be removed. # Time -> POSIXct ^~~ R/deserialize.R:33:3: style: Commented code should be removed. # Array[T] -> list() ^~ R/deserialize.R:34:3: style: Commented code should be removed. # Object -> jobj ^~ R/pairRDD.R:37:21: style: Commented code should be removed. #' lookup(rdd, 1) # list(1, 3) ^~ R/pairRDD.R:83:25: style: Commented code should be removed. #' collect(keys(rdd)) # list(1, 3)
[jira] [Updated] (SPARK-12232) Create new R API for read.table to avoid conflict
[ https://issues.apache.org/jira/browse/SPARK-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-12232: - Summary: Create new R API for read.table to avoid conflict (was: Consider exporting read.table in R) > Create new R API for read.table to avoid conflict > - > > Key: SPARK-12232 > URL: https://issues.apache.org/jira/browse/SPARK-12232 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.2 >Reporter: Felix Cheung >Priority: Minor > > Since we have read.df, read.json, read.parquet (some in pending PRs), we have > table() and we should consider having read.table() for consistency and > R-likeness. > However, this conflicts with utils::read.table which returns a R data.frame. > It seems neither table() or read.table() is desirable in this case. > table: https://stat.ethz.ch/R-manual/R-devel/library/base/html/table.html > read.table: > https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org