[jira] [Assigned] (SPARK-12530) Build break at Spark-Master-Maven-Snapshots from #1293
[ https://issues.apache.org/jira/browse/SPARK-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12530: Assignee: Apache Spark > Build break at Spark-Master-Maven-Snapshots from #1293 > -- > > Key: SPARK-12530 > URL: https://issues.apache.org/jira/browse/SPARK-12530 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.0.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark > > Build break happens at Spark-Master-Maven-Snapshots from #1293 due to > compilation error of misc.scala. > {noformat} > /home/jenkins/workspace/Spark-Master-Maven-Snapshots/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala:61: > error: annotation argument needs to be a constant; found: "_FUNC_(input, > bitLength) - Returns a checksum of SHA-2 family as a hex string of the > ".+("input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length > of 0 is equivalent ").+("to 256") > "input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length > of 0 is equivalent " + > > ^ > {noformat} > https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/ > https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/consoleFull > This file is changed by [SPARK-12456] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12530) Build break at Spark-Master-Maven-Snapshots from #1293
[ https://issues.apache.org/jira/browse/SPARK-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12530: Assignee: (was: Apache Spark) > Build break at Spark-Master-Maven-Snapshots from #1293 > -- > > Key: SPARK-12530 > URL: https://issues.apache.org/jira/browse/SPARK-12530 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.0.0 >Reporter: Kazuaki Ishizaki > > Build break happens at Spark-Master-Maven-Snapshots from #1293 due to > compilation error of misc.scala. > {noformat} > /home/jenkins/workspace/Spark-Master-Maven-Snapshots/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala:61: > error: annotation argument needs to be a constant; found: "_FUNC_(input, > bitLength) - Returns a checksum of SHA-2 family as a hex string of the > ".+("input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length > of 0 is equivalent ").+("to 256") > "input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length > of 0 is equivalent " + > > ^ > {noformat} > https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/ > https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/consoleFull > This file is changed by [SPARK-12456] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12530) Build break at Spark-Master-Maven-Snapshots from #1293
[ https://issues.apache.org/jira/browse/SPARK-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072103#comment-15072103 ] Apache Spark commented on SPARK-12530: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/10488 > Build break at Spark-Master-Maven-Snapshots from #1293 > -- > > Key: SPARK-12530 > URL: https://issues.apache.org/jira/browse/SPARK-12530 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.0.0 >Reporter: Kazuaki Ishizaki > > Build break happens at Spark-Master-Maven-Snapshots from #1293 due to > compilation error of misc.scala. > {noformat} > /home/jenkins/workspace/Spark-Master-Maven-Snapshots/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala:61: > error: annotation argument needs to be a constant; found: "_FUNC_(input, > bitLength) - Returns a checksum of SHA-2 family as a hex string of the > ".+("input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length > of 0 is equivalent ").+("to 256") > "input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length > of 0 is equivalent " + > > ^ > {noformat} > https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/ > https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/consoleFull > This file is changed by [SPARK-12456] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12461) Add ExpressionDescription to math functions
[ https://issues.apache.org/jira/browse/SPARK-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072110#comment-15072110 ] Apache Spark commented on SPARK-12461: -- User 'vectorijk' has created a pull request for this issue: https://github.com/apache/spark/pull/10489 > Add ExpressionDescription to math functions > --- > > Key: SPARK-12461 > URL: https://issues.apache.org/jira/browse/SPARK-12461 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12517) No default RDD name for ones created by sc.textFile
[ https://issues.apache.org/jira/browse/SPARK-12517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072184#comment-15072184 ] yaron weinsberg commented on SPARK-12517: - https://github.com/apache/spark/pull/10456 > No default RDD name for ones created by sc.textFile > > > Key: SPARK-12517 > URL: https://issues.apache.org/jira/browse/SPARK-12517 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.2 >Reporter: yaron weinsberg >Priority: Minor > Labels: easyfix > Original Estimate: 24h > Remaining Estimate: 24h > > Having a default name for an RDD created from a file is very handy. > The feature was first added at commit: 7b877b2 but was later removed > (probably by mistake) at commit: fc8b581. > This change sets the default path of RDDs created via sc.textFile(...) to the > path argument. > Here is the symptom: > Using spark-1.5.2-bin-hadoop2.6: > scala> sc.textFile("/home/root/.bashrc").name > res5: String = null > scala> sc.binaryFiles("/home/root/.bashrc").name > res6: String = /home/root/.bashrc > while using Spark 1.3.1: > scala> sc.textFile("/home/root/.bashrc").name > res0: String = /home/root/.bashrc > scala> sc.binaryFiles("/home/root/.bashrc").name > res1: String = /home/root/.bashrc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12531) Add median and mode to Summary statistics
[ https://issues.apache.org/jira/browse/SPARK-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072196#comment-15072196 ] Sean Owen commented on SPARK-12531: --- Those are non-trivial to compute exactly; unlike moments, they can't be computed from a couple summary statistics. I am not sure this can be added to the generic object, no. Is it valuable enough for an additional implementation? > Add median and mode to Summary statistics > - > > Key: SPARK-12531 > URL: https://issues.apache.org/jira/browse/SPARK-12531 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.5.2 >Reporter: Gaurav Kumar >Priority: Minor > > Summary statistics should also include calculating median and mode in > addition to mean, variance and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12521) DataFrame Partitions in java does not work
[ https://issues.apache.org/jira/browse/SPARK-12521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072225#comment-15072225 ] Sean Owen commented on SPARK-12521: --- PS see https://issues.apache.org/jira/browse/SPARK-12515 > DataFrame Partitions in java does not work > -- > > Key: SPARK-12521 > URL: https://issues.apache.org/jira/browse/SPARK-12521 > Project: Spark > Issue Type: Bug > Components: Java API, SQL >Affects Versions: 1.5.2 >Reporter: Sergey Podolsky > > Hello, > Partition does not work in Java interface of the DataFrame: > {code} > SQLContext sqlContext = new SQLContext(sc); > Mapoptions = new HashMap<>(); > options.put("driver", ORACLE_DRIVER); > options.put("url", ORACLE_CONNECTION_URL); > options.put("dbtable", > "(SELECT * FROM JOBS WHERE ROWNUM < 1) tt"); > options.put("lowerBound", "2704225000"); > options.put("upperBound", "2704226000"); > options.put("partitionColumn", "ID"); > options.put("numPartitions", "10"); > DataFrame jdbcDF = sqlContext.load("jdbc", options); > List jobsRows = jdbcDF.collectAsList(); > System.out.println(jobsRows.size()); > {code} > gives while expected 1000. Is it because of big decimal of boundaries or > partitioins does not work at all in Java? > Thanks. > Sergey -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-12518) Problem in Spark deserialization with htsjdk BAMRecordCodec
[ https://issues.apache.org/jira/browse/SPARK-12518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-12518: --- (Questions should go to the mailing list; this was not a Spark problem and was not resolved by a commit so "Fixed" is not the right resolution) > Problem in Spark deserialization with htsjdk BAMRecordCodec > --- > > Key: SPARK-12518 > URL: https://issues.apache.org/jira/browse/SPARK-12518 > Project: Spark > Issue Type: Question > Components: Java API >Affects Versions: 1.5.2 > Environment: Linux Red Hat 4.8.2-16, Java 8, htsjdk-1.130 >Reporter: Zhanpeng Wu > > When I used [htsjdk|https://github.com/samtools/htsjdk] in my Spark > application, I found some problem in record deserialization. The object of > *SAMRecord* could not be deserialzed and throw the exception: > {quote} > WARN ThrowableSerializationWrapper: Task exception could not be deserialized > java.lang.ClassNotFoundException: htsjdk.samtools.util.RuntimeIOException > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:340) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {quote} > It seems that the application encountered a premature EOF when deserialing. > Here is my test code: >
[jira] [Commented] (SPARK-12263) IllegalStateException: Memory can't be 0 for SPARK_WORKER_MEMORY without unit
[ https://issues.apache.org/jira/browse/SPARK-12263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072200#comment-15072200 ] Apache Spark commented on SPARK-12263: -- User 'nssalian' has created a pull request for this issue: https://github.com/apache/spark/pull/10483 > IllegalStateException: Memory can't be 0 for SPARK_WORKER_MEMORY without unit > - > > Key: SPARK-12263 > URL: https://issues.apache.org/jira/browse/SPARK-12263 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Jacek Laskowski >Priority: Trivial > Labels: starter > > When starting a worker with the following command - note > {{SPARK_WORKER_MEMORY=1024}} it fails saying that the memory was 0 while it > was 1024 (without size unit). > {code} > ➜ spark git:(master) ✗ SPARK_WORKER_MEMORY=1024 SPARK_WORKER_CORES=5 > ./sbin/start-slave.sh spark://localhost:7077 > starting org.apache.spark.deploy.worker.Worker, logging to > /Users/jacek/dev/oss/spark/logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out > failed to launch org.apache.spark.deploy.worker.Worker: > INFO ShutdownHookManager: Shutdown hook called > INFO ShutdownHookManager: Deleting directory > /private/var/folders/0w/kb0d3rqn4zb9fcc91pxhgn8wgn/T/spark-f4e5f222-e938-46b2-a189-241453cf1f50 > full log in > /Users/jacek/dev/oss/spark/logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out > {code} > The full stack trace is as follows: > {code} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). > INFO Worker: Registered signal handlers for [TERM, HUP, INT] > Exception in thread "main" java.lang.IllegalStateException: Memory can't be > 0, missing a M or G on the end of the memory specification? > at > org.apache.spark.deploy.worker.WorkerArguments.checkWorkerMemory(WorkerArguments.scala:179) > at > org.apache.spark.deploy.worker.WorkerArguments.(WorkerArguments.scala:64) > at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:691) > at org.apache.spark.deploy.worker.Worker.main(Worker.scala) > INFO ShutdownHookManager: Shutdown hook called > INFO ShutdownHookManager: Deleting directory > /private/var/folders/0w/kb0d3rqn4zb9fcc91pxhgn8wgn/T/spark-f4e5f222-e938-46b2-a189-241453cf1f50 > {code} > The following command starts spark standalone worker successfully: > {code} > SPARK_WORKER_MEMORY=1g SPARK_WORKER_CORES=5 ./sbin/start-slave.sh > spark://localhost:7077 > {code} > The master reports: > {code} > INFO Master: Registering worker 192.168.1.6:63884 with 5 cores, 1024.0 MB RAM > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12263) IllegalStateException: Memory can't be 0 for SPARK_WORKER_MEMORY without unit
[ https://issues.apache.org/jira/browse/SPARK-12263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12263: Assignee: Apache Spark > IllegalStateException: Memory can't be 0 for SPARK_WORKER_MEMORY without unit > - > > Key: SPARK-12263 > URL: https://issues.apache.org/jira/browse/SPARK-12263 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Jacek Laskowski >Assignee: Apache Spark >Priority: Trivial > Labels: starter > > When starting a worker with the following command - note > {{SPARK_WORKER_MEMORY=1024}} it fails saying that the memory was 0 while it > was 1024 (without size unit). > {code} > ➜ spark git:(master) ✗ SPARK_WORKER_MEMORY=1024 SPARK_WORKER_CORES=5 > ./sbin/start-slave.sh spark://localhost:7077 > starting org.apache.spark.deploy.worker.Worker, logging to > /Users/jacek/dev/oss/spark/logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out > failed to launch org.apache.spark.deploy.worker.Worker: > INFO ShutdownHookManager: Shutdown hook called > INFO ShutdownHookManager: Deleting directory > /private/var/folders/0w/kb0d3rqn4zb9fcc91pxhgn8wgn/T/spark-f4e5f222-e938-46b2-a189-241453cf1f50 > full log in > /Users/jacek/dev/oss/spark/logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out > {code} > The full stack trace is as follows: > {code} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). > INFO Worker: Registered signal handlers for [TERM, HUP, INT] > Exception in thread "main" java.lang.IllegalStateException: Memory can't be > 0, missing a M or G on the end of the memory specification? > at > org.apache.spark.deploy.worker.WorkerArguments.checkWorkerMemory(WorkerArguments.scala:179) > at > org.apache.spark.deploy.worker.WorkerArguments.(WorkerArguments.scala:64) > at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:691) > at org.apache.spark.deploy.worker.Worker.main(Worker.scala) > INFO ShutdownHookManager: Shutdown hook called > INFO ShutdownHookManager: Deleting directory > /private/var/folders/0w/kb0d3rqn4zb9fcc91pxhgn8wgn/T/spark-f4e5f222-e938-46b2-a189-241453cf1f50 > {code} > The following command starts spark standalone worker successfully: > {code} > SPARK_WORKER_MEMORY=1g SPARK_WORKER_CORES=5 ./sbin/start-slave.sh > spark://localhost:7077 > {code} > The master reports: > {code} > INFO Master: Registering worker 192.168.1.6:63884 with 5 cores, 1024.0 MB RAM > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12532) Join-key Pushdown via Predicate Transitivity
Xiao Li created SPARK-12532: --- Summary: Join-key Pushdown via Predicate Transitivity Key: SPARK-12532 URL: https://issues.apache.org/jira/browse/SPARK-12532 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.6.0 Reporter: Xiao Li {code} "SELECT * FROM upperCaseData JOIN lowerCaseData where lowerCaseData.n = upperCaseData.N and lowerCaseData.n = 3" {code} {code} == Analyzed Logical Plan == N: int, L: string, n: int, l: string Project [N#16,L#17,n#18,l#19] +- Filter ((n#18 = N#16) && (n#18 = 3)) +- Join Inner, None :- Subquery upperCaseData : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187 +- Subquery lowerCaseData +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187 {code} {code} == Optimized Logical Plan == Project [N#16,L#17,n#18,l#19] +- Join Inner, Some((n#18 = N#16)) :- Filter (N#16 = 3) : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187 +- Filter (n#18 = 3) +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12532) Join-key Pushdown via Predicate Transitivity
[ https://issues.apache.org/jira/browse/SPARK-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-12532: Description: {code} "SELECT * FROM upperCaseData JOIN lowerCaseData where lowerCaseData.n = upperCaseData.N and lowerCaseData.n = 3" {code} {code} == Analyzed Logical Plan == N: int, L: string, n: int, l: string Project [N#16,L#17,n#18,l#19] +- Filter ((n#18 = N#16) && (n#18 = 3)) +- Join Inner, None :- Subquery upperCaseData : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187 +- Subquery lowerCaseData +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187 {code} Before the improvement, the optimized logical plan is {code} == Optimized Logical Plan == Project [N#16,L#17,n#18,l#19] +- Join Inner, Some((n#18 = N#16)) :- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187 +- Filter (n#18 = 3) +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187 {code} After the improvement, the optimized logical plan should be like {code} == Optimized Logical Plan == Project [N#16,L#17,n#18,l#19] +- Join Inner, Some((n#18 = N#16)) :- Filter (N#16 = 3) : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187 +- Filter (n#18 = 3) +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187 {code} was: {code} "SELECT * FROM upperCaseData JOIN lowerCaseData where lowerCaseData.n = upperCaseData.N and lowerCaseData.n = 3" {code} {code} == Analyzed Logical Plan == N: int, L: string, n: int, l: string Project [N#16,L#17,n#18,l#19] +- Filter ((n#18 = N#16) && (n#18 = 3)) +- Join Inner, None :- Subquery upperCaseData : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187 +- Subquery lowerCaseData +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187 {code} {code} == Optimized Logical Plan == Project [N#16,L#17,n#18,l#19] +- Join Inner, Some((n#18 = N#16)) :- Filter (N#16 = 3) : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187 +- Filter (n#18 = 3) +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187 {code} > Join-key Pushdown via Predicate Transitivity > > > Key: SPARK-12532 > URL: https://issues.apache.org/jira/browse/SPARK-12532 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li > Labels: SQL > > {code} > "SELECT * FROM upperCaseData JOIN lowerCaseData where lowerCaseData.n = > upperCaseData.N and lowerCaseData.n = 3" > {code} > {code} > == Analyzed Logical Plan == > N: int, L: string, n: int, l: string > Project [N#16,L#17,n#18,l#19] > +- Filter ((n#18 = N#16) && (n#18 = 3)) >+- Join Inner, None > :- Subquery upperCaseData > : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 > +- Subquery lowerCaseData > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} > Before the improvement, the optimized logical plan is > {code} > == Optimized Logical Plan == > Project [N#16,L#17,n#18,l#19] > +- Join Inner, Some((n#18 = N#16)) >:- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 >+- Filter (n#18 = 3) > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} > After the improvement, the optimized logical plan should be like > {code} > == Optimized Logical Plan == > Project [N#16,L#17,n#18,l#19] > +- Join Inner, Some((n#18 = N#16)) >:- Filter (N#16 = 3) >: +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 >+- Filter (n#18 = 3) > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12529) Spark streaming: java.lang.NoSuchFieldException: SHUTDOWN_HOOK_PRIORITY
[ https://issues.apache.org/jira/browse/SPARK-12529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072197#comment-15072197 ] Sean Owen commented on SPARK-12529: --- This means you've dragged in (old) Hadoop dependencies somehow in your app, or in your runtime classpath. I don't think this has to do with Spark per se. > Spark streaming: java.lang.NoSuchFieldException: SHUTDOWN_HOOK_PRIORITY > --- > > Key: SPARK-12529 > URL: https://issues.apache.org/jira/browse/SPARK-12529 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 > Environment: MacOSX Standalone >Reporter: Brad Cox > > Posted originally on stackoverflow. Reposted here on request by Josh Rosen. > I'm trying to start spark streaming in standalone mode (MacOSX) and getting > the following error nomatter what: > Exception in thread "main" java.lang.ExceptionInInitializerError at > org.apache.spark.storage.DiskBlockManager.addShutdownHook(DiskBlockManager.scala:147) > at org.apache.spark.storage.DiskBlockManager.(DiskBlockManager.scala:54) at > org.apache.spark.storage.BlockManager.(BlockManager.scala:75) at > org.apache.spark.storage.BlockManager.(BlockManager.scala:173) at > org.apache.spark.SparkEnv$.create(SparkEnv.scala:347) at > org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194) at > org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277) at > org.apache.spark.SparkContext.(SparkContext.scala:450) at > org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:566) > at > org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:578) > at org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:90) > at > org.apache.spark.streaming.api.java.JavaStreamingContext.(JavaStreamingContext.scala:78) > at io.ascolta.pcap.PcapOfflineReceiver.main(PcapOfflineReceiver.java:103) > Caused by: java.lang.NoSuchFieldException: SHUTDOWN_HOOK_PRIORITY at > java.lang.Class.getField(Class.java:1584) at > org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:220) > at > org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50) > at > org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48) > at > org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:189) > at org.apache.spark.util.ShutdownHookManager$.(ShutdownHookManager.scala:58) > at org.apache.spark.util.ShutdownHookManager$.(ShutdownHookManager.scala) ... > 13 more > This symptom is discussed in relation to EC2 at > https://forums.databricks.com/questions/2227/shutdown-hook-priority-javalangnosuchfieldexceptio.html > as a Hadoop2 dependency. But I'm running locally (for now), and am using the > spark-1.5.2-bin-hadoop2.6.tgz binary from > https://spark.apache.org/downloads.html which I'd hoped would eliminate this > possibility. > I've pruned my code down to essentially nothing; like this: > SparkConf conf = new SparkConf() > .setAppName(appName) > .setMaster(master); > JavaStreamingContext ssc = new JavaStreamingContext(conf, new > Duration(1000)); > I've permuted maven dependencies to ensure all spark stuff is consistent at > version 1.5.2. Yet the ssc initialization above fails nomatter what. So I > thought it was time to ask for help. > Build environment is eclipse and maven with the shade plugin. Launch/run is > from eclipse debugger, not spark-submit, for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12518) Problem in Spark deserialization with htsjdk BAMRecordCodec
[ https://issues.apache.org/jira/browse/SPARK-12518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12518. --- Resolution: Not A Problem > Problem in Spark deserialization with htsjdk BAMRecordCodec > --- > > Key: SPARK-12518 > URL: https://issues.apache.org/jira/browse/SPARK-12518 > Project: Spark > Issue Type: Question > Components: Java API >Affects Versions: 1.5.2 > Environment: Linux Red Hat 4.8.2-16, Java 8, htsjdk-1.130 >Reporter: Zhanpeng Wu > > When I used [htsjdk|https://github.com/samtools/htsjdk] in my Spark > application, I found some problem in record deserialization. The object of > *SAMRecord* could not be deserialzed and throw the exception: > {quote} > WARN ThrowableSerializationWrapper: Task exception could not be deserialized > java.lang.ClassNotFoundException: htsjdk.samtools.util.RuntimeIOException > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:340) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {quote} > It seems that the application encountered a premature EOF when deserialing. > Here is my test code: > {code:title=Test.java|borderStyle=solid} > public class Test { > public static void main(String[] args) { > SparkConf sparkConf =
[jira] [Assigned] (SPARK-12263) IllegalStateException: Memory can't be 0 for SPARK_WORKER_MEMORY without unit
[ https://issues.apache.org/jira/browse/SPARK-12263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12263: Assignee: (was: Apache Spark) > IllegalStateException: Memory can't be 0 for SPARK_WORKER_MEMORY without unit > - > > Key: SPARK-12263 > URL: https://issues.apache.org/jira/browse/SPARK-12263 > Project: Spark > Issue Type: Improvement > Components: Documentation >Reporter: Jacek Laskowski >Priority: Trivial > Labels: starter > > When starting a worker with the following command - note > {{SPARK_WORKER_MEMORY=1024}} it fails saying that the memory was 0 while it > was 1024 (without size unit). > {code} > ➜ spark git:(master) ✗ SPARK_WORKER_MEMORY=1024 SPARK_WORKER_CORES=5 > ./sbin/start-slave.sh spark://localhost:7077 > starting org.apache.spark.deploy.worker.Worker, logging to > /Users/jacek/dev/oss/spark/logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out > failed to launch org.apache.spark.deploy.worker.Worker: > INFO ShutdownHookManager: Shutdown hook called > INFO ShutdownHookManager: Deleting directory > /private/var/folders/0w/kb0d3rqn4zb9fcc91pxhgn8wgn/T/spark-f4e5f222-e938-46b2-a189-241453cf1f50 > full log in > /Users/jacek/dev/oss/spark/logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out > {code} > The full stack trace is as follows: > {code} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). > INFO Worker: Registered signal handlers for [TERM, HUP, INT] > Exception in thread "main" java.lang.IllegalStateException: Memory can't be > 0, missing a M or G on the end of the memory specification? > at > org.apache.spark.deploy.worker.WorkerArguments.checkWorkerMemory(WorkerArguments.scala:179) > at > org.apache.spark.deploy.worker.WorkerArguments.(WorkerArguments.scala:64) > at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:691) > at org.apache.spark.deploy.worker.Worker.main(Worker.scala) > INFO ShutdownHookManager: Shutdown hook called > INFO ShutdownHookManager: Deleting directory > /private/var/folders/0w/kb0d3rqn4zb9fcc91pxhgn8wgn/T/spark-f4e5f222-e938-46b2-a189-241453cf1f50 > {code} > The following command starts spark standalone worker successfully: > {code} > SPARK_WORKER_MEMORY=1g SPARK_WORKER_CORES=5 ./sbin/start-slave.sh > spark://localhost:7077 > {code} > The master reports: > {code} > INFO Master: Registering worker 192.168.1.6:63884 with 5 cores, 1024.0 MB RAM > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12521) DataFrame Partitions in java does not work
[ https://issues.apache.org/jira/browse/SPARK-12521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-12521. --- Resolution: Not A Problem It's already described as an arg that controls partition stride. It wouldn't make sense to specify filters separately outside the WHERE clause here. > DataFrame Partitions in java does not work > -- > > Key: SPARK-12521 > URL: https://issues.apache.org/jira/browse/SPARK-12521 > Project: Spark > Issue Type: Bug > Components: Java API, SQL >Affects Versions: 1.5.2 >Reporter: Sergey Podolsky > > Hello, > Partition does not work in Java interface of the DataFrame: > {code} > SQLContext sqlContext = new SQLContext(sc); > Mapoptions = new HashMap<>(); > options.put("driver", ORACLE_DRIVER); > options.put("url", ORACLE_CONNECTION_URL); > options.put("dbtable", > "(SELECT * FROM JOBS WHERE ROWNUM < 1) tt"); > options.put("lowerBound", "2704225000"); > options.put("upperBound", "2704226000"); > options.put("partitionColumn", "ID"); > options.put("numPartitions", "10"); > DataFrame jdbcDF = sqlContext.load("jdbc", options); > List jobsRows = jdbcDF.collectAsList(); > System.out.println(jobsRows.size()); > {code} > gives while expected 1000. Is it because of big decimal of boundaries or > partitioins does not work at all in Java? > Thanks. > Sergey -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11600) Spark MLlib 1.6 QA umbrella
[ https://issues.apache.org/jira/browse/SPARK-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11600: - Target Version/s: 1.6.1 (was: 1.6.0) > Spark MLlib 1.6 QA umbrella > --- > > Key: SPARK-11600 > URL: https://issues.apache.org/jira/browse/SPARK-11600 > Project: Spark > Issue Type: Umbrella > Components: Documentation, ML, MLlib >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Critical > > This JIRA lists tasks for the next MLlib release's QA period. > h2. API > * Check binary API compatibility (SPARK-11601) > * Audit new public APIs (from the generated html doc) > ** Scala (SPARK-11602) > ** Java compatibility (SPARK-11605) > ** Python coverage (SPARK-11604) > * Check Experimental, DeveloperApi tags (SPARK-11603) > h2. Algorithms and performance > *Performance* > * _List any other missing performance tests from spark-perf here_ > * ALS.recommendAll (SPARK-7457) > * perf-tests in Python (SPARK-7539) > * perf-tests for transformers (SPARK-2838) > * MultilayerPerceptron (SPARK-11911) > h2. Documentation and example code > * For new algorithms, create JIRAs for updating the user guide (SPARK-11606) > * For major components, create JIRAs for example code (SPARK-9670) > * Update Programming Guide for 1.6 (towards end of QA) (SPARK-11608) > * Update website (SPARK-11607) > * Merge duplicate content under examples/ (SPARK-11685) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8447) Test external shuffle service with all shuffle managers
[ https://issues.apache.org/jira/browse/SPARK-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-8447: Target Version/s: 1.6.1 (was: 1.6.0) > Test external shuffle service with all shuffle managers > --- > > Key: SPARK-8447 > URL: https://issues.apache.org/jira/browse/SPARK-8447 > Project: Spark > Issue Type: Bug > Components: Shuffle, Tests >Affects Versions: 1.4.0 >Reporter: Andrew Or >Priority: Critical > > There is a mismatch between the shuffle managers in Spark core and in the > external shuffle service. The latest unsafe shuffle manager is an example of > this (SPARK-8430). This issue arose because we apparently do not have > sufficient tests for making sure that these two components deal with the same > set of shuffle managers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11224) Flaky test: o.a.s.ExternalShuffleServiceSuite
[ https://issues.apache.org/jira/browse/SPARK-11224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11224: - Target Version/s: 1.6.1 (was: 1.6.0) > Flaky test: o.a.s.ExternalShuffleServiceSuite > - > > Key: SPARK-11224 > URL: https://issues.apache.org/jira/browse/SPARK-11224 > Project: Spark > Issue Type: Bug > Components: Tests >Reporter: Andrew Or >Priority: Critical > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-Master-SBT/3798/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/testReport/junit/org.apache.spark/ExternalShuffleServiceSuite/using_external_shuffle_service/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11266) Peak memory tests swallow failures
[ https://issues.apache.org/jira/browse/SPARK-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11266: - Target Version/s: 1.6.1 (was: 1.6.0) > Peak memory tests swallow failures > -- > > Key: SPARK-11266 > URL: https://issues.apache.org/jira/browse/SPARK-11266 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Andrew Or >Priority: Critical > > You have something like the following without the tests failing: > {code} > 22:29:03.493 ERROR org.apache.spark.scheduler.LiveListenerBus: Listener > SaveInfoListener threw an exception > org.scalatest.exceptions.TestFailedException: peak execution memory > accumulator not set in 'aggregation with codegen' > at > org.apache.spark.AccumulatorSuite$$anonfun$verifyPeakExecutionMemorySet$1$$anonfun$27.apply(AccumulatorSuite.scala:340) > at > org.apache.spark.AccumulatorSuite$$anonfun$verifyPeakExecutionMemorySet$1$$anonfun$27.apply(AccumulatorSuite.scala:340) > at scala.Option.getOrElse(Option.scala:120) > {code} > E.g. > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1936/consoleFull -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11607) Update MLlib website for 1.6
[ https://issues.apache.org/jira/browse/SPARK-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11607: - Target Version/s: 1.6.1 (was: 1.6.0) > Update MLlib website for 1.6 > > > Key: SPARK-11607 > URL: https://issues.apache.org/jira/browse/SPARK-11607 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, MLlib >Reporter: Joseph K. Bradley >Assignee: Xiangrui Meng > > Update MLlib's website to include features in 1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11603) ML 1.6 QA: API: Experimental, DeveloperApi, final, sealed audit
[ https://issues.apache.org/jira/browse/SPARK-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11603: - Target Version/s: 1.6.1 (was: 1.6.0) > ML 1.6 QA: API: Experimental, DeveloperApi, final, sealed audit > --- > > Key: SPARK-11603 > URL: https://issues.apache.org/jira/browse/SPARK-11603 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, MLlib >Reporter: Joseph K. Bradley >Assignee: Xiangrui Meng > > We should make a pass through the items marked as Experimental or > DeveloperApi and see if any are stable enough to be unmarked. This will > probably not include the Pipeline APIs yet since some parts (e.g., feature > attributes) are still under flux. > We should also check for items marked final or sealed to see if they are > stable enough to be opened up as APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10680) Flaky test: network.RequestTimeoutIntegrationSuite.timeoutInactiveRequests
[ https://issues.apache.org/jira/browse/SPARK-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10680: - Target Version/s: 1.6.1 (was: 1.6.0) > Flaky test: network.RequestTimeoutIntegrationSuite.timeoutInactiveRequests > -- > > Key: SPARK-10680 > URL: https://issues.apache.org/jira/browse/SPARK-10680 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Reporter: Xiangrui Meng >Assignee: Josh Rosen >Priority: Critical > Labels: flaky-test > > Saw several failures recently. > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.3,label=spark-test/3560/testReport/junit/org.apache.spark.network/RequestTimeoutIntegrationSuite/timeoutInactiveRequests/ > {code} > org.apache.spark.network.RequestTimeoutIntegrationSuite.timeoutInactiveRequests > Failing for the past 1 build (Since Failed#3560 ) > Took 6 sec. > Stacktrace > java.lang.NullPointerException: null > at > org.apache.spark.network.RequestTimeoutIntegrationSuite.timeoutInactiveRequests(RequestTimeoutIntegrationSuite.java:115) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12507) Update Streaming configurations for 1.6
[ https://issues.apache.org/jira/browse/SPARK-12507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12507: - Target Version/s: 1.6.1 (was: 1.6.0) > Update Streaming configurations for 1.6 > --- > > Key: SPARK-12507 > URL: https://issues.apache.org/jira/browse/SPARK-12507 > Project: Spark > Issue Type: Documentation > Components: Documentation, Streaming >Reporter: Shixiong Zhu >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12534) Document missing command line options to Spark properties mapping
Felix Cheung created SPARK-12534: Summary: Document missing command line options to Spark properties mapping Key: SPARK-12534 URL: https://issues.apache.org/jira/browse/SPARK-12534 Project: Spark Issue Type: Bug Components: Deploy, Documentation, YARN Affects Versions: 1.5.2 Reporter: Felix Cheung Priority: Minor Several Spark properties equivalent to Spark submit command line options are missing. {quote} The equivalent for spark-submit --num-executors should be spark.executor.instances When use in SparkConf? http://spark.apache.org/docs/latest/running-on-yarn.html Could you try setting that with sparkR.init()? _ From: Franc CarterSent: Friday, December 25, 2015 9:23 PM Subject: number of executors in sparkR.init() To: Hi, I'm having trouble working out how to get the number of executors set when using sparkR.init(). If I start sparkR with sparkR --master yarn --num-executors 6 then I get 6 executors However, if start sparkR with sparkR followed by sc <- sparkR.init(master="yarn-client", sparkEnvir=list(spark.num.executors='6')) then I only get 2 executors. Can anyone point me in the direction of what I might doing wrong ? I need to initialise this was so that rStudio can hook in to SparkR thanks -- Franc {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4924) Factor out code to launch Spark applications into a separate library
[ https://issues.apache.org/jira/browse/SPARK-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072396#comment-15072396 ] Jiahongchao commented on SPARK-4924: Where is the official document? > Factor out code to launch Spark applications into a separate library > > > Key: SPARK-4924 > URL: https://issues.apache.org/jira/browse/SPARK-4924 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 1.4.0 > > Attachments: spark-launcher.txt > > > One of the questions we run into rather commonly is "how to start a Spark > application from my Java/Scala program?". There currently isn't a good answer > to that: > - Instantiating SparkContext has limitations (e.g., you can only have one > active context at the moment, plus you lose the ability to submit apps in > cluster mode) > - Calling SparkSubmit directly is doable but you lose a lot of the logic > handled by the shell scripts > - Calling the shell script directly is doable, but sort of ugly from an API > point of view. > I think it would be nice to have a small library that handles that for users. > On top of that, this library could be used by Spark itself to replace a lot > of the code in the current shell scripts, which have a lot of duplication. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12534) Document missing command line options to Spark properties mapping
[ https://issues.apache.org/jira/browse/SPARK-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12534: Assignee: Apache Spark > Document missing command line options to Spark properties mapping > - > > Key: SPARK-12534 > URL: https://issues.apache.org/jira/browse/SPARK-12534 > Project: Spark > Issue Type: Bug > Components: Deploy, Documentation, YARN >Affects Versions: 1.5.2 >Reporter: Felix Cheung >Assignee: Apache Spark >Priority: Minor > > Several Spark properties equivalent to Spark submit command line options are > missing. > {quote} > The equivalent for spark-submit --num-executors should be > spark.executor.instances > When use in SparkConf? > http://spark.apache.org/docs/latest/running-on-yarn.html > Could you try setting that with sparkR.init()? > _ > From: Franc Carter> Sent: Friday, December 25, 2015 9:23 PM > Subject: number of executors in sparkR.init() > To: > Hi, > I'm having trouble working out how to get the number of executors set when > using sparkR.init(). > If I start sparkR with > sparkR --master yarn --num-executors 6 > then I get 6 executors > However, if start sparkR with > sparkR > followed by > sc <- sparkR.init(master="yarn-client", > sparkEnvir=list(spark.num.executors='6')) > then I only get 2 executors. > Can anyone point me in the direction of what I might doing wrong ? I need to > initialise this was so that rStudio can hook in to SparkR > thanks > -- > Franc > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12513) SocketReceiver hang in Netcat example
[ https://issues.apache.org/jira/browse/SPARK-12513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12513: Assignee: Apache Spark > SocketReceiver hang in Netcat example > - > > Key: SPARK-12513 > URL: https://issues.apache.org/jira/browse/SPARK-12513 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Shawn Guo >Assignee: Apache Spark >Priority: Minor > > I add a SocketReceiver test based on the NetworkWordCount. > Using pipeline and tail the continuous output to netcat > tail -f xxx.log | nc -lk > and create a SocketReceiver to receive the continuous output from remote > netcat. > After about 10 hours, SocketReceiver hang and can not receive no more data. > Netcat only accept one socket connection and push "tail -f xxx.log" to > connected socket. other connection is wating in the netcat queue. > When restart SocketReceive, new socket connection is created to connect to > Netcat. However old connection is not closed properly. new connection can not > read anything from Netcat. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12513) SocketReceiver hang in Netcat example
[ https://issues.apache.org/jira/browse/SPARK-12513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12513: Assignee: (was: Apache Spark) > SocketReceiver hang in Netcat example > - > > Key: SPARK-12513 > URL: https://issues.apache.org/jira/browse/SPARK-12513 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Shawn Guo >Priority: Minor > > I add a SocketReceiver test based on the NetworkWordCount. > Using pipeline and tail the continuous output to netcat > tail -f xxx.log | nc -lk > and create a SocketReceiver to receive the continuous output from remote > netcat. > After about 10 hours, SocketReceiver hang and can not receive no more data. > Netcat only accept one socket connection and push "tail -f xxx.log" to > connected socket. other connection is wating in the netcat queue. > When restart SocketReceive, new socket connection is created to connect to > Netcat. However old connection is not closed properly. new connection can not > read anything from Netcat. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12461) Add ExpressionDescription to math functions
[ https://issues.apache.org/jira/browse/SPARK-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12461: Assignee: Apache Spark > Add ExpressionDescription to math functions > --- > > Key: SPARK-12461 > URL: https://issues.apache.org/jira/browse/SPARK-12461 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12532) Join-key Pushdown via Predicate Transitivity
[ https://issues.apache.org/jira/browse/SPARK-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12532: - Shepherd: Michael Armbrust > Join-key Pushdown via Predicate Transitivity > > > Key: SPARK-12532 > URL: https://issues.apache.org/jira/browse/SPARK-12532 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li > Labels: SQL > > {code} > "SELECT * FROM upperCaseData JOIN lowerCaseData where lowerCaseData.n = > upperCaseData.N and lowerCaseData.n = 3" > {code} > {code} > == Analyzed Logical Plan == > N: int, L: string, n: int, l: string > Project [N#16,L#17,n#18,l#19] > +- Filter ((n#18 = N#16) && (n#18 = 3)) >+- Join Inner, None > :- Subquery upperCaseData > : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 > +- Subquery lowerCaseData > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} > Before the improvement, the optimized logical plan is > {code} > == Optimized Logical Plan == > Project [N#16,L#17,n#18,l#19] > +- Join Inner, Some((n#18 = N#16)) >:- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 >+- Filter (n#18 = 3) > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} > After the improvement, the optimized logical plan should be like > {code} > == Optimized Logical Plan == > Project [N#16,L#17,n#18,l#19] > +- Join Inner, Some((n#18 = N#16)) >:- Filter (N#16 = 3) >: +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 >+- Filter (n#18 = 3) > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12532) Join-key Pushdown via Predicate Transitivity
[ https://issues.apache.org/jira/browse/SPARK-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12532: - Target Version/s: 2.0.0 > Join-key Pushdown via Predicate Transitivity > > > Key: SPARK-12532 > URL: https://issues.apache.org/jira/browse/SPARK-12532 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li >Assignee: Xiao Li > Labels: SQL > > {code} > "SELECT * FROM upperCaseData JOIN lowerCaseData where lowerCaseData.n = > upperCaseData.N and lowerCaseData.n = 3" > {code} > {code} > == Analyzed Logical Plan == > N: int, L: string, n: int, l: string > Project [N#16,L#17,n#18,l#19] > +- Filter ((n#18 = N#16) && (n#18 = 3)) >+- Join Inner, None > :- Subquery upperCaseData > : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 > +- Subquery lowerCaseData > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} > Before the improvement, the optimized logical plan is > {code} > == Optimized Logical Plan == > Project [N#16,L#17,n#18,l#19] > +- Join Inner, Some((n#18 = N#16)) >:- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 >+- Filter (n#18 = 3) > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} > After the improvement, the optimized logical plan should be like > {code} > == Optimized Logical Plan == > Project [N#16,L#17,n#18,l#19] > +- Join Inner, Some((n#18 = N#16)) >:- Filter (N#16 = 3) >: +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 >+- Filter (n#18 = 3) > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12532) Join-key Pushdown via Predicate Transitivity
[ https://issues.apache.org/jira/browse/SPARK-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12532: - Assignee: Xiao Li > Join-key Pushdown via Predicate Transitivity > > > Key: SPARK-12532 > URL: https://issues.apache.org/jira/browse/SPARK-12532 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Xiao Li >Assignee: Xiao Li > Labels: SQL > > {code} > "SELECT * FROM upperCaseData JOIN lowerCaseData where lowerCaseData.n = > upperCaseData.N and lowerCaseData.n = 3" > {code} > {code} > == Analyzed Logical Plan == > N: int, L: string, n: int, l: string > Project [N#16,L#17,n#18,l#19] > +- Filter ((n#18 = N#16) && (n#18 = 3)) >+- Join Inner, None > :- Subquery upperCaseData > : +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 > +- Subquery lowerCaseData > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} > Before the improvement, the optimized logical plan is > {code} > == Optimized Logical Plan == > Project [N#16,L#17,n#18,l#19] > +- Join Inner, Some((n#18 = N#16)) >:- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 >+- Filter (n#18 = 3) > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} > After the improvement, the optimized logical plan should be like > {code} > == Optimized Logical Plan == > Project [N#16,L#17,n#18,l#19] > +- Join Inner, Some((n#18 = N#16)) >:- Filter (N#16 = 3) >: +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at > BeforeAndAfterAll.scala:187 >+- Filter (n#18 = 3) > +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at > BeforeAndAfterAll.scala:187 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12453) Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
[ https://issues.apache.org/jira/browse/SPARK-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072435#comment-15072435 ] Apache Spark commented on SPARK-12453: -- User 'BrianLondon' has created a pull request for this issue: https://github.com/apache/spark/pull/10492 > Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version > > > Key: SPARK-12453 > URL: https://issues.apache.org/jira/browse/SPARK-12453 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.2 >Reporter: Martin Schade >Priority: Critical > Labels: easyfix > > The Spark Streaming Kinesis Example (kinesis-asl) is broken due to wrong AWS > Java SDK version (1.9.16) referenced with the AWS KCL version (1.3.0). > AWS KCL 1.3.0 references AWS Java SDK version 1.9.37. > Using 1.9.16 in combination with 1.3.0 does fail to get data out of the > stream. > I tested Spark Streaming with 1.9.37 and it works fine. > Testing a simple KCL client outside of Spark with 1.3.0 and 1.9.16 also > fails, so it is due to the specific versions used in 1.5.2 and not a Spark > related implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12505) Pushdown a Limit on top of an Outer-Join
[ https://issues.apache.org/jira/browse/SPARK-12505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12505: - Target Version/s: 2.0.0 > Pushdown a Limit on top of an Outer-Join > > > Key: SPARK-12505 > URL: https://issues.apache.org/jira/browse/SPARK-12505 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0, 1.6.0 >Reporter: Xiao Li >Assignee: Apache Spark > > "Rule that applies to a Limit on top of an OUTER Join. The original Limit > won't go away after applying this rule, but additional Limit node(s) will be > created on top of the outer-side child (or children if it's a FULL OUTER > Join). " > – from https://issues.apache.org/jira/browse/CALCITE-832 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12531) Add median and mode to Summary statistics
[ https://issues.apache.org/jira/browse/SPARK-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072386#comment-15072386 ] Gaurav Kumar commented on SPARK-12531: -- [~srowen], I agree these are not exactly difficult to implement, but I guess these should be there in the library for the sake of completeness. For instance, while doing EDA on the data, one would use the {{Statistics.colStats(observations)}} and would expect to get all the required summary data similar to what {{R}}'s {{summary(dataset)}} does. > Add median and mode to Summary statistics > - > > Key: SPARK-12531 > URL: https://issues.apache.org/jira/browse/SPARK-12531 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.5.2 >Reporter: Gaurav Kumar >Priority: Minor > > Summary statistics should also include calculating median and mode in > addition to mean, variance and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12533) hiveContext.table() throws the wrong exception
Michael Armbrust created SPARK-12533: Summary: hiveContext.table() throws the wrong exception Key: SPARK-12533 URL: https://issues.apache.org/jira/browse/SPARK-12533 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: Michael Armbrust This should throw an {{AnalysisException}} that includes the table name instead of the following: {code} org.apache.spark.sql.catalyst.analysis.NoSuchTableException at org.apache.spark.sql.hive.client.ClientInterface$$anonfun$getTable$1.apply(ClientInterface.scala:122) at org.apache.spark.sql.hive.client.ClientInterface$$anonfun$getTable$1.apply(ClientInterface.scala:122) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:122) at org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:60) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:384) at org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:458) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161) at org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:458) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:830) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:826) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11559) Make `runs` no effect in k-means
[ https://issues.apache.org/jira/browse/SPARK-11559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11559: Assignee: Apache Spark > Make `runs` no effect in k-means > > > Key: SPARK-11559 > URL: https://issues.apache.org/jira/browse/SPARK-11559 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Apache Spark > > We deprecated `runs` in Spark 1.6 (SPARK-11358). In 1.7.0, we can either > remove `runs` or make it no effect (with warning messages). So we can > simplify the implementation. I prefer the latter for better binary > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072403#comment-15072403 ] Apache Spark commented on SPARK-11560: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/10306 > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11559) Make `runs` no effect in k-means
[ https://issues.apache.org/jira/browse/SPARK-11559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11559: Assignee: (was: Apache Spark) > Make `runs` no effect in k-means > > > Key: SPARK-11559 > URL: https://issues.apache.org/jira/browse/SPARK-11559 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng > > We deprecated `runs` in Spark 1.6 (SPARK-11358). In 1.7.0, we can either > remove `runs` or make it no effect (with warning messages). So we can > simplify the implementation. I prefer the latter for better binary > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12363) PowerIterationClustering test case failed if we deprecated KMeans.setRuns
[ https://issues.apache.org/jira/browse/SPARK-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072254#comment-15072254 ] Apache Spark commented on SPARK-12363: -- User 'nongli' has created a pull request for this issue: https://github.com/apache/spark/pull/10420 > PowerIterationClustering test case failed if we deprecated KMeans.setRuns > - > > Key: SPARK-12363 > URL: https://issues.apache.org/jira/browse/SPARK-12363 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: Yanbo Liang >Assignee: Apache Spark >Priority: Minor > > We plan to deprecated `runs` of KMeans, PowerIterationClustering will > leverage KMeans to train model. > I removed `setRuns` used in PowerIterationClustering, but one of the test > cases failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12533) hiveContext.table() throws the wrong exception
[ https://issues.apache.org/jira/browse/SPARK-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072447#comment-15072447 ] Thomas Sebastian commented on SPARK-12533: -- Hi Michael,Can you tell the scenario to replicate this exception? like any specific commands, you are trying. I shall work on a fix for this. > hiveContext.table() throws the wrong exception > -- > > Key: SPARK-12533 > URL: https://issues.apache.org/jira/browse/SPARK-12533 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > This should throw an {{AnalysisException}} that includes the table name > instead of the following: > {code} > org.apache.spark.sql.catalyst.analysis.NoSuchTableException > at > org.apache.spark.sql.hive.client.ClientInterface$$anonfun$getTable$1.apply(ClientInterface.scala:122) > at > org.apache.spark.sql.hive.client.ClientInterface$$anonfun$getTable$1.apply(ClientInterface.scala:122) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:122) > at > org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:60) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:384) > at > org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:458) > at > org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161) > at > org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:458) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:830) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:826) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12531) Add median and mode to Summary statistics
[ https://issues.apache.org/jira/browse/SPARK-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072386#comment-15072386 ] Gaurav Kumar edited comment on SPARK-12531 at 12/28/15 3:36 AM: [~srowen], I agree these are not exactly difficult to implement, but I guess these should be there in the library for the sake of completeness. For instance, while doing EDA on the data, one would use the {{Statistics.colStats(observations)}} and would expect to get all the required summary data similar to what {{R}}'s {{summary(dataset)}} does. For the same reasons, while we are at it, we should also add 25th and 75th percentiles as well. was (Author: gauravkumar37): [~srowen], I agree these are not exactly difficult to implement, but I guess these should be there in the library for the sake of completeness. For instance, while doing EDA on the data, one would use the {{Statistics.colStats(observations)}} and would expect to get all the required summary data similar to what {{R}}'s {{summary(dataset)}} does. > Add median and mode to Summary statistics > - > > Key: SPARK-12531 > URL: https://issues.apache.org/jira/browse/SPARK-12531 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.5.2 >Reporter: Gaurav Kumar >Priority: Minor > > Summary statistics should also include calculating median and mode in > addition to mean, variance and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12239) SparkR - Not distributing SparkR module in YARN
[ https://issues.apache.org/jira/browse/SPARK-12239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072402#comment-15072402 ] Sun Rui commented on SPARK-12239: - To have a formal fix for this issue, we can have two ways: 1. Similar to https://github.com/apache/spark/pull/9290 for SPARK-11340. That is, if "yarn-client" is detected for master, then insert "--master yarn-client" into SPARKR_SUBMIT_ARGS; 2. A more generic way is to standardize the SPARKR_SUBMIT_ARGS env var and document that assign the command line arguments intended for spark-submit to SPARKR_SUBMIT_ARGS in order to launch SparkR in Rstudio. > SparkR - Not distributing SparkR module in YARN > > > Key: SPARK-12239 > URL: https://issues.apache.org/jira/browse/SPARK-12239 > Project: Spark > Issue Type: Bug > Components: SparkR, YARN >Affects Versions: 1.5.2, 1.5.3 >Reporter: Sebastian YEPES FERNANDEZ >Priority: Critical > > Hello, > I am trying to use the SparkR in a YARN environment and I have encountered > the following problem: > Every thing work correctly when using bin/sparkR, but if I try running the > same jobs using sparkR directly through R it does not work. > I have managed to track down what is causing the problem, when sparkR is > launched through R the "SparkR" module is not distributed to the worker nodes. > I have tried working around this issue using the setting > "spark.yarn.dist.archives", but it does not work as it deploys the > file/extracted folder with the extension ".zip" and workers are actually > looking for a folder with the name "sparkr" > Is there currently any way to make this work? > {code} > # spark-defaults.conf > spark.yarn.dist.archives /opt/apps/spark/R/lib/sparkr.zip > # R > library(SparkR, lib.loc="/opt/apps/spark/R/lib/") > sc <- sparkR.init(appName="SparkR", master="yarn-client", > sparkEnvir=list(spark.executor.instances="1")) > sqlContext <- sparkRSQL.init(sc) > df <- createDataFrame(sqlContext, faithful) > head(df) > 15/12/09 09:04:24 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, > fr-s-cour-wrk3.alidaho.com): java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) > {code} > Container stderr: > {code} > 15/12/09 09:04:14 INFO storage.MemoryStore: Block broadcast_1 stored as > values in memory (estimated size 8.7 KB, free 530.0 MB) > 15/12/09 09:04:14 INFO r.BufferedStreamThread: Fatal error: cannot open file > '/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02/sparkr/SparkR/worker/daemon.R': > No such file or directory > 15/12/09 09:04:24 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) > at java.net.ServerSocket.implAccept(ServerSocket.java:545) > at java.net.ServerSocket.accept(ServerSocket.java:513) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426) > {code} > Worker Node that runned the Container: > {code} > # ls -la > /hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02 > total 71M > drwx--x--- 3 yarn hadoop 4.0K Dec 9 09:04 . > drwx--x--- 7 yarn hadoop 4.0K Dec 9 09:04 .. > -rw-r--r-- 1 yarn hadoop 110 Dec 9 09:03 container_tokens > -rw-r--r-- 1 yarn hadoop 12 Dec 9 09:03 .container_tokens.crc > -rwx-- 1 yarn hadoop 736 Dec 9 09:03 > default_container_executor_session.sh > -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:03 > .default_container_executor_session.sh.crc > -rwx-- 1 yarn hadoop 790 Dec 9 09:03 default_container_executor.sh > -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:03 .default_container_executor.sh.crc > -rwxr-xr-x 1 yarn hadoop 61K Dec 9 09:04 hadoop-lzo-0.6.0.2.3.2.0-2950.jar > -rwxr-xr-x 1 yarn hadoop 317K Dec 9 09:04 kafka-clients-0.8.2.2.jar > -rwx-- 1 yarn hadoop 6.0K Dec 9 09:03 launch_container.sh > -rw-r--r-- 1 yarn hadoop 56 Dec 9 09:03 .launch_container.sh.crc > -rwxr-xr-x 1 yarn hadoop 2.2M Dec 9 09:04 > spark-cassandra-connector_2.10-1.5.0-M3.jar > -rwxr-xr-x 1 yarn hadoop 7.1M Dec 9 09:04 spark-csv-assembly-1.3.0.jar > lrwxrwxrwx 1 yarn hadoop 119 Dec 9 09:03 __spark__.jar -> > /hadoop/hdfs/disk03/hadoop/yarn/local/usercache/spark/filecache/361/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar > lrwxrwxrwx 1 yarn hadoop 84 Dec 9 09:03 sparkr.zip -> > /hadoop/hdfs/disk01/hadoop/yarn/local/usercache/spark/filecache/359/sparkr.zip
[jira] [Resolved] (SPARK-12520) Python API dataframe join returns wrong results on outer join
[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-12520. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10477 [https://github.com/apache/spark/pull/10477] > Python API dataframe join returns wrong results on outer join > - > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.1 >Reporter: Aravind B > Fix For: 2.0.0 > > > Consider the following dataframes: > """ > left_table: > +++-+--+ > |head_id_left|tail_id_left|weight|joining_column| > +++-+--+ > | 1| 2|1| 1~2| > +++-+--+ > right_table: > +-+-+--+ > |head_id_right|tail_id_right|joining_column| > +-+-+--+ > +-+-+--+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join
[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-12520: --- Assignee: Xiao Li > Python API dataframe join returns wrong results on outer join > - > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.1 >Reporter: Aravind B >Assignee: Xiao Li > Fix For: 1.6.0, 2.0.0 > > > Consider the following dataframes: > """ > left_table: > +++-+--+ > |head_id_left|tail_id_left|weight|joining_column| > +++-+--+ > | 1| 2|1| 1~2| > +++-+--+ > right_table: > +-+-+--+ > |head_id_right|tail_id_right|joining_column| > +-+-+--+ > +-+-+--+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join
[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-12520: --- Fix Version/s: 1.6.0 > Python API dataframe join returns wrong results on outer join > - > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.1 >Reporter: Aravind B > Fix For: 1.6.0, 2.0.0 > > > Consider the following dataframes: > """ > left_table: > +++-+--+ > |head_id_left|tail_id_left|weight|joining_column| > +++-+--+ > | 1| 2|1| 1~2| > +++-+--+ > right_table: > +-+-+--+ > |head_id_right|tail_id_right|joining_column| > +-+-+--+ > +-+-+--+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join
[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-12520: --- Fix Version/s: 1.5.3 > Python API dataframe join returns wrong results on outer join > - > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.1 >Reporter: Aravind B >Assignee: Xiao Li > Fix For: 1.5.3, 1.6.0, 2.0.0 > > > Consider the following dataframes: > """ > left_table: > +++-+--+ > |head_id_left|tail_id_left|weight|joining_column| > +++-+--+ > | 1| 2|1| 1~2| > +++-+--+ > right_table: > +-+-+--+ > |head_id_right|tail_id_right|joining_column| > +-+-+--+ > +-+-+--+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12535) Generating scaladoc using sbt fails for network-common and catalyst modules
Jacek Laskowski created SPARK-12535: --- Summary: Generating scaladoc using sbt fails for network-common and catalyst modules Key: SPARK-12535 URL: https://issues.apache.org/jira/browse/SPARK-12535 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.0 Reporter: Jacek Laskowski Priority: Blocker Executing {{./build/sbt -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Dscala-2.11 -Phive -Phive-thriftserver -DskipTests network-common/compile:doc catalyst/compile:doc}} fail with scaladoc errors (the command was narrowed to the modules that failed - I initially used {{clean publishLocal}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12535) Generating scaladoc using sbt fails for network-common and catalyst modules
[ https://issues.apache.org/jira/browse/SPARK-12535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072501#comment-15072501 ] Jacek Laskowski commented on SPARK-12535: - I fixed the others, but this one has no solution yet: {code} [error] /Users/jacek/dev/oss/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala:61: annotation argument needs to be a constant; found: "_FUNC_(input, bitLength) - Returns a checksum of SHA-2 family as a hex string of the ".+("input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent ").+("to 256") [error] "input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent " + [error] ^ {code} > Generating scaladoc using sbt fails for network-common and catalyst modules > --- > > Key: SPARK-12535 > URL: https://issues.apache.org/jira/browse/SPARK-12535 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Blocker > > Executing {{./build/sbt -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 > -Dscala-2.11 -Phive -Phive-thriftserver -DskipTests > network-common/compile:doc catalyst/compile:doc}} fail with scaladoc errors > (the command was narrowed to the modules that failed - I initially used > {{clean publishLocal}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12535) Generating scaladoc using sbt fails for network-common and catalyst modules
[ https://issues.apache.org/jira/browse/SPARK-12535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072525#comment-15072525 ] Herman van Hovell commented on SPARK-12535: --- This is caused by the same problem as SPARK-12530. > Generating scaladoc using sbt fails for network-common and catalyst modules > --- > > Key: SPARK-12535 > URL: https://issues.apache.org/jira/browse/SPARK-12535 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Jacek Laskowski >Priority: Blocker > > Executing {{./build/sbt -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 > -Dscala-2.11 -Phive -Phive-thriftserver -DskipTests > network-common/compile:doc catalyst/compile:doc}} fail with scaladoc errors > (the command was narrowed to the modules that failed - I initially used > {{clean publishLocal}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12531) Add median and mode to Summary statistics
Gaurav Kumar created SPARK-12531: Summary: Add median and mode to Summary statistics Key: SPARK-12531 URL: https://issues.apache.org/jira/browse/SPARK-12531 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.2 Reporter: Gaurav Kumar Priority: Minor Summary statistics should also include calculating median and mode in addition to mean, variance and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org