[jira] [Resolved] (SPARK-5478) Add miss right parenthesis in Stage page Pending stages label
[ https://issues.apache.org/jira/browse/SPARK-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5478. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Saisai Shao > Add miss right parenthesis in Stage page Pending stages label > - > > Key: SPARK-5478 > URL: https://issues.apache.org/jira/browse/SPARK-5478 > Project: Spark > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Minor > Fix For: 1.3.0 > > > right parenthesis is missing in one label, minor problem in UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5478) Add miss right parenthesis in Stage page Pending stages label
[ https://issues.apache.org/jira/browse/SPARK-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5478: --- Affects Version/s: 1.3.0 > Add miss right parenthesis in Stage page Pending stages label > - > > Key: SPARK-5478 > URL: https://issues.apache.org/jira/browse/SPARK-5478 > Project: Spark > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Minor > Fix For: 1.3.0 > > > right parenthesis is missing in one label, minor problem in UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions
[ https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300968#comment-14300968 ] Apache Spark commented on SPARK-5492: - User 'sryza' has created a pull request for this issue: https://github.com/apache/spark/pull/4305 > Thread statistics can break with older Hadoop versions > -- > > Key: SPARK-5492 > URL: https://issues.apache.org/jira/browse/SPARK-5492 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Sandy Ryza >Priority: Blocker > > {code} > java.lang.ClassNotFoundException: > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180) > at > org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118) > at scala.Option.orElse(Option.scala:257) > {code} > I think the issue is we need to catch ClassNotFoundException here: > https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144 > However, I'm really confused how this didn't fail our unit tests, since we > explicitly tried to test this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300939#comment-14300939 ] yuhao yang commented on SPARK-5510: --- https://spark.apache.org/community.html check the mailing list section. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5021) GaussianMixtureEM should be faster for SparseVector input
[ https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300931#comment-14300931 ] Joseph K. Bradley commented on SPARK-5021: -- It should be similar to GLMs: Take an RDD of Vectors. Inside GaussianMixture, you can match-case on Vector to see if it is Dense or Sparse and handle as needed. > GaussianMixtureEM should be faster for SparseVector input > - > > Key: SPARK-5021 > URL: https://issues.apache.org/jira/browse/SPARK-5021 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Manoj Kumar > > GaussianMixtureEM currently converts everything to dense vectors. It would > be nice if it were faster for SparseVectors (running in time linear in the > number of non-zero values). > However, this may not be too important since clustering should rarely be done > in high dimensions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5021) GaussianMixtureEM should be faster for SparseVector input
[ https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300927#comment-14300927 ] Manoj Kumar commented on SPARK-5021: I see that it is resolved in master. What do you think should be the preferred datatype, to handle an array of SparseVectors? Do we use CoordinateMatrix? This might involve improving CoordinateMatrix to add additional functionality. > GaussianMixtureEM should be faster for SparseVector input > - > > Key: SPARK-5021 > URL: https://issues.apache.org/jira/browse/SPARK-5021 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Manoj Kumar > > GaussianMixtureEM currently converts everything to dense vectors. It would > be nice if it were faster for SparseVectors (running in time linear in the > number of non-zero values). > However, this may not be too important since clustering should rarely be done > in high dimensions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string
[ https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-5523: - Description: TaskMetrics and TaskInfo objects have the hostname associated with the task. As these are created (directly or through deserialization of RPC messages), each of them have a separate String object for the hostname even though most of them have the same string data in them. This results in thousands of string objects, increasing memory requirement of the driver. This can be easily deduped when deserializing a TaskMetrics object, or when creating a TaskInfo object. This affects streaming particularly bad due to the rate of job/stage/task generation. For solution, see how this dedup is done for StorageLevel. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226 was: TaskMetrics and TaskInfo objects have the hostname associated with the task. As these are created (directly or through deserialization of RPC messages), each of them have a separate String object for the hostname even though most of them have the same string data in them. This results in thousands of string objects, increasing memory requirement of the driver. This can be easily deduped when deserializing a TaskMetrics object, or when creating a TaskInfo object (in TaskSchedulerImpl). This affects streaming particularly bad due to the rate of job/stage/task generation. For solution, see how this dedup is done for StorageLevel. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226 > TaskMetrics and TaskInfo have innumerable copies of the hostname string > --- > > Key: SPARK-5523 > URL: https://issues.apache.org/jira/browse/SPARK-5523 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das > > TaskMetrics and TaskInfo objects have the hostname associated with the task. > As these are created (directly or through deserialization of RPC messages), > each of them have a separate String object for the hostname even though most > of them have the same string data in them. This results in thousands of > string objects, increasing memory requirement of the driver. > This can be easily deduped when deserializing a TaskMetrics object, or when > creating a TaskInfo object. > This affects streaming particularly bad due to the rate of job/stage/task > generation. > For solution, see how this dedup is done for StorageLevel. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226 > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string
[ https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-5523: - Description: TaskMetrics and TaskInfo objects have the hostname associated with the task. As these are created (directly or through deserialization of RPC messages), each of them have a separate String object for the hostname even though most of them have the same string data in them. This results in thousands of string objects, increasing memory requirement of the driver. This can be easily deduped when deserializing a TaskMetrics object, or when creating a TaskInfo object (in TaskSchedulerImpl). This affects streaming particularly bad due to the rate of job/stage/task generation. For solution, see how this dedup is done for StorageLevel. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226 was: TaskMetrics and TaskInfo objects have the hostname associated with the task. As these are created (directly or through deserialization of RPC messages), each of them have a separate String object for the hostname even though most of them have the same string data in them. This results in thousands of string objects, increasing memory requirement of the driver. This can be easily deduped when deserializing a TaskMetrics object, or when creating a TaskInfo object (in TaskSchedulerImpl). This affects streaming particularly bad due to the rate of job/stage/task generation. > TaskMetrics and TaskInfo have innumerable copies of the hostname string > --- > > Key: SPARK-5523 > URL: https://issues.apache.org/jira/browse/SPARK-5523 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das > > TaskMetrics and TaskInfo objects have the hostname associated with the task. > As these are created (directly or through deserialization of RPC messages), > each of them have a separate String object for the hostname even though most > of them have the same string data in them. This results in thousands of > string objects, increasing memory requirement of the driver. > This can be easily deduped when deserializing a TaskMetrics object, or when > creating a TaskInfo object (in TaskSchedulerImpl). > This affects streaming particularly bad due to the rate of job/stage/task > generation. > For solution, see how this dedup is done for StorageLevel. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226 > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string
[ https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-5523: - Description: TaskMetrics and TaskInfo objects have the hostname associated with the task. As these are created (directly or through deserialization of RPC messages), each of them have a separate String object for the hostname even though most of them have the same string data in them. This results in thousands of string objects, increasing memory requirement of the driver. This can be easily deduped when deserializing a TaskMetrics object, or when creating a TaskInfo object (in TaskSchedulerImpl). This affects streaming particularly bad due to the rate of job/stage/task generation. was: TaskMetrics and TaskInfo objects have the hostname associated with the task. As these are created (directly or through deserialization of RPC messages), each of them have a separate String object for the hostname even though most of them have the same string data in them. This results in thousands of string objects, increasing memory requirement of the driver. This can be easily deduped when deserializing a TaskMetrics object, or when creating a TaskInfo object (in TaskSchedulerImpl). > TaskMetrics and TaskInfo have innumerable copies of the hostname string > --- > > Key: SPARK-5523 > URL: https://issues.apache.org/jira/browse/SPARK-5523 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das > > TaskMetrics and TaskInfo objects have the hostname associated with the task. > As these are created (directly or through deserialization of RPC messages), > each of them have a separate String object for the hostname even though most > of them have the same string data in them. This results in thousands of > string objects, increasing memory requirement of the driver. > This can be easily deduped when deserializing a TaskMetrics object, or when > creating a TaskInfo object (in TaskSchedulerImpl). > This affects streaming particularly bad due to the rate of job/stage/task > generation. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string
[ https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-5523: - Component/s: Streaming Spark Core > TaskMetrics and TaskInfo have innumerable copies of the hostname string > --- > > Key: SPARK-5523 > URL: https://issues.apache.org/jira/browse/SPARK-5523 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das > > TaskMetrics and TaskInfo objects have the hostname associated with the task. > As these are created (directly or through deserialization of RPC messages), > each of them have a separate String object for the hostname even though most > of them have the same string data in them. This results in thousands of > string objects, increasing memory requirement of the driver. > This can be easily deduped when deserializing a TaskMetrics object, or when > creating a TaskInfo object (in TaskSchedulerImpl). > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string
Tathagata Das created SPARK-5523: Summary: TaskMetrics and TaskInfo have innumerable copies of the hostname string Key: SPARK-5523 URL: https://issues.apache.org/jira/browse/SPARK-5523 Project: Spark Issue Type: Bug Reporter: Tathagata Das TaskMetrics and TaskInfo objects have the hostname associated with the task. As these are created (directly or through deserialization of RPC messages), each of them have a separate String object for the hostname even though most of them have the same string data in them. This results in thousands of string objects, increasing memory requirement of the driver. This can be easily deduped when deserializing a TaskMetrics object, or when creating a TaskInfo object (in TaskSchedulerImpl). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization
[ https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300913#comment-14300913 ] Patrick Wendell commented on SPARK-4349: [~mcheah] - minor, but when we close an issue that is a duplicate, we typically resolve it as "Duplicate" instead of fixed. Also the duplication link usually goes in the other direction, i.e. we say that this duplicates SPARK-4737 rather than "is duplicated by", because SPARK-4737 was merged. > Spark driver hangs on sc.parallelize() if exception is thrown during > serialization > -- > > Key: SPARK-4349 > URL: https://issues.apache.org/jira/browse/SPARK-4349 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Matt Cheah >Priority: Critical > > Executing the following in the Spark Shell will lead to the Spark Shell > hanging after a stack trace is printed. The serializer is set to the Kryo > serializer. > {code} > scala> import com.esotericsoftware.kryo.io.Input > import com.esotericsoftware.kryo.io.Input > scala> import com.esotericsoftware.kryo.io.Output > import com.esotericsoftware.kryo.io.Output > scala> class MyKryoSerializable extends > com.esotericsoftware.kryo.KryoSerializable { def write (kryo: > com.esotericsoftware.kryo.Kryo, output: Output) { throw new > com.esotericsoftware.kryo.KryoException; } ; def read (kryo: > com.esotericsoftware.kryo.Kryo, input: Input) { throw new > com.esotericsoftware.kryo.KryoException; } } > defined class MyKryoSerializable > scala> sc.parallelize(Seq(new MyKryoSerializable, new > MyKryoSerializable)).collect > {code} > A stack trace is printed during serialization as expected, but another stack > trace is printed afterwards, indicating that the driver can't recover: > {code} > 14/11/11 14:10:03 ERROR OneForOneStrategy: actor name [ExecutorActor] is not > unique! > akka.actor.PostRestartException: exception post restart (class > java.io.IOException) > at > akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:249) > at > akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:247) > at > akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:302) > at > akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:297) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:247) > at > akka.actor.dungeon.FaultHandling$class.faultRecreate(FaultHandling.scala:76) > at akka.actor.ActorCell.faultRecreate(ActorCell.scala:369) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:459) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: akka.actor.InvalidActorNameException: actor name [ExecutorActor] > is not unique! > at > akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130) > at akka.actor.dungeon.Children$class.reserveChild(Children.scala:77) > at akka.actor.ActorCell.reserveChild(ActorCell.scala:369) > at akka.actor.dungeon.Children$class.makeChild(Children.scala:202) > at akka.actor.dungeon.Children$class.attachChild(Children.scala:42) > at akka.actor.ActorCell.attachChild(ActorCell.scala:369) > at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552) > at org.apache.spark.executor.Executor.(Executor.scala:97) > at > org.apache.spark.scheduler.local.LocalActor.(LocalBackend.scala:53) > at > org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96) > at > org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96) > at akka.actor.TypedCreatorFunctionConsumer.produce(Props
[jira] [Reopened] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization
[ https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-4349: > Spark driver hangs on sc.parallelize() if exception is thrown during > serialization > -- > > Key: SPARK-4349 > URL: https://issues.apache.org/jira/browse/SPARK-4349 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Matt Cheah >Priority: Critical > > Executing the following in the Spark Shell will lead to the Spark Shell > hanging after a stack trace is printed. The serializer is set to the Kryo > serializer. > {code} > scala> import com.esotericsoftware.kryo.io.Input > import com.esotericsoftware.kryo.io.Input > scala> import com.esotericsoftware.kryo.io.Output > import com.esotericsoftware.kryo.io.Output > scala> class MyKryoSerializable extends > com.esotericsoftware.kryo.KryoSerializable { def write (kryo: > com.esotericsoftware.kryo.Kryo, output: Output) { throw new > com.esotericsoftware.kryo.KryoException; } ; def read (kryo: > com.esotericsoftware.kryo.Kryo, input: Input) { throw new > com.esotericsoftware.kryo.KryoException; } } > defined class MyKryoSerializable > scala> sc.parallelize(Seq(new MyKryoSerializable, new > MyKryoSerializable)).collect > {code} > A stack trace is printed during serialization as expected, but another stack > trace is printed afterwards, indicating that the driver can't recover: > {code} > 14/11/11 14:10:03 ERROR OneForOneStrategy: actor name [ExecutorActor] is not > unique! > akka.actor.PostRestartException: exception post restart (class > java.io.IOException) > at > akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:249) > at > akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:247) > at > akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:302) > at > akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:297) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:247) > at > akka.actor.dungeon.FaultHandling$class.faultRecreate(FaultHandling.scala:76) > at akka.actor.ActorCell.faultRecreate(ActorCell.scala:369) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:459) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: akka.actor.InvalidActorNameException: actor name [ExecutorActor] > is not unique! > at > akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130) > at akka.actor.dungeon.Children$class.reserveChild(Children.scala:77) > at akka.actor.ActorCell.reserveChild(ActorCell.scala:369) > at akka.actor.dungeon.Children$class.makeChild(Children.scala:202) > at akka.actor.dungeon.Children$class.attachChild(Children.scala:42) > at akka.actor.ActorCell.attachChild(ActorCell.scala:369) > at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552) > at org.apache.spark.executor.Executor.(Executor.scala:97) > at > org.apache.spark.scheduler.local.LocalActor.(LocalBackend.scala:53) > at > org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96) > at > org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96) > at akka.actor.TypedCreatorFunctionConsumer.produce(Props.scala:343) > at akka.actor.Props.newActor(Props.scala:252) > at akka.actor.ActorCell.newActor(ActorCell.scala:552) > at > akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:234) > ... 11 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --
[jira] [Resolved] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization
[ https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4349. Resolution: Duplicate > Spark driver hangs on sc.parallelize() if exception is thrown during > serialization > -- > > Key: SPARK-4349 > URL: https://issues.apache.org/jira/browse/SPARK-4349 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Matt Cheah >Priority: Critical > > Executing the following in the Spark Shell will lead to the Spark Shell > hanging after a stack trace is printed. The serializer is set to the Kryo > serializer. > {code} > scala> import com.esotericsoftware.kryo.io.Input > import com.esotericsoftware.kryo.io.Input > scala> import com.esotericsoftware.kryo.io.Output > import com.esotericsoftware.kryo.io.Output > scala> class MyKryoSerializable extends > com.esotericsoftware.kryo.KryoSerializable { def write (kryo: > com.esotericsoftware.kryo.Kryo, output: Output) { throw new > com.esotericsoftware.kryo.KryoException; } ; def read (kryo: > com.esotericsoftware.kryo.Kryo, input: Input) { throw new > com.esotericsoftware.kryo.KryoException; } } > defined class MyKryoSerializable > scala> sc.parallelize(Seq(new MyKryoSerializable, new > MyKryoSerializable)).collect > {code} > A stack trace is printed during serialization as expected, but another stack > trace is printed afterwards, indicating that the driver can't recover: > {code} > 14/11/11 14:10:03 ERROR OneForOneStrategy: actor name [ExecutorActor] is not > unique! > akka.actor.PostRestartException: exception post restart (class > java.io.IOException) > at > akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:249) > at > akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:247) > at > akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:302) > at > akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:297) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:247) > at > akka.actor.dungeon.FaultHandling$class.faultRecreate(FaultHandling.scala:76) > at akka.actor.ActorCell.faultRecreate(ActorCell.scala:369) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:459) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: akka.actor.InvalidActorNameException: actor name [ExecutorActor] > is not unique! > at > akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130) > at akka.actor.dungeon.Children$class.reserveChild(Children.scala:77) > at akka.actor.ActorCell.reserveChild(ActorCell.scala:369) > at akka.actor.dungeon.Children$class.makeChild(Children.scala:202) > at akka.actor.dungeon.Children$class.attachChild(Children.scala:42) > at akka.actor.ActorCell.attachChild(ActorCell.scala:369) > at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552) > at org.apache.spark.executor.Executor.(Executor.scala:97) > at > org.apache.spark.scheduler.local.LocalActor.(LocalBackend.scala:53) > at > org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96) > at > org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96) > at akka.actor.TypedCreatorFunctionConsumer.produce(Props.scala:343) > at akka.actor.Props.newActor(Props.scala:252) > at akka.actor.ActorCell.newActor(ActorCell.scala:552) > at > akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:234) > ... 11 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SPARK-5500) Document that feeding hadoopFile into a shuffle operation will cause problems
[ https://issues.apache.org/jira/browse/SPARK-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5500: --- Priority: Critical (was: Major) > Document that feeding hadoopFile into a shuffle operation will cause problems > - > > Key: SPARK-5500 > URL: https://issues.apache.org/jira/browse/SPARK-5500 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.3.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
[ https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1517: --- Assignee: Nicholas Chammas > Publish nightly snapshots of documentation, maven artifacts, and binary builds > -- > > Key: SPARK-1517 > URL: https://issues.apache.org/jira/browse/SPARK-1517 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Patrick Wendell >Assignee: Nicholas Chammas >Priority: Blocker > > Should be pretty easy to do with Jenkins. The only thing I can think of that > would be tricky is to set up credentials so that jenkins can publish this > stuff somewhere on apache infra. > Ideally we don't want to have to put a private key on every jenkins box > (since they are otherwise pretty stateless). One idea is to encrypt these > credentials with a passphrase and post them somewhere publicly visible. Then > the jenkins build can download the credentials provided we set a passphrase > in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5522) Accelerate the Histroty Server start
Mars Gu created SPARK-5522: -- Summary: Accelerate the Histroty Server start Key: SPARK-5522 URL: https://issues.apache.org/jira/browse/SPARK-5522 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Mars Gu When starting the history server, all the log files will be fetched and parsed in order to get the applications' meta data e.g. App Name, Start Time, Duration, etc. In our production cluster, there exist 2600 log files (160G) in HDFS and it costs 3 hours to restart the history server, which is a little bit too long for us. It would be better, if the history server does not fetch all the log files but only the meta data during start-up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions
[ https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300905#comment-14300905 ] Patrick Wendell commented on SPARK-5492: [~sandyr] I share your confusion, Sandy. Nonetheless, catching ClassNotFoundException does seem reasonable. If anything it's just a better hedge against random Hadoop versions that might have intermediate sets of functionality. What do you think? > Thread statistics can break with older Hadoop versions > -- > > Key: SPARK-5492 > URL: https://issues.apache.org/jira/browse/SPARK-5492 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Sandy Ryza >Priority: Blocker > > {code} > java.lang.ClassNotFoundException: > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180) > at > org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118) > at scala.Option.orElse(Option.scala:257) > {code} > I think the issue is we need to catch ClassNotFoundException here: > https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144 > However, I'm really confused how this didn't fail our unit tests, since we > explicitly tried to test this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5508) [hive context] Unable to query array once saved as parquet
[ https://issues.apache.org/jira/browse/SPARK-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5508: --- Component/s: (was: spark sql) SQL > [hive context] Unable to query array once saved as parquet > -- > > Key: SPARK-5508 > URL: https://issues.apache.org/jira/browse/SPARK-5508 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.1 > Environment: mesos, cdh >Reporter: Ayoub Benali > Labels: hivecontext, parquet > > When the table is saved as parquet, we cannot query a field which is an array > of struct, like show bellow: > {noformat} > scala> val data1="""{ > | "timestamp": 1422435598, > | "data_array": [ > | { > | "field1": 1, > | "field2": 2 > | } > | ] > | }""" > scala> val data2="""{ > | "timestamp": 1422435598, > | "data_array": [ > | { > | "field1": 3, > | "field2": 4 > | } > | ] > scala> val jsonRDD = sc.makeRDD(data1 :: data2 :: Nil) > scala> val rdd = hiveContext.jsonRDD(jsonRDD) > scala> rdd.printSchema > root > |-- data_array: array (nullable = true) > ||-- element: struct (containsNull = false) > |||-- field1: integer (nullable = true) > |||-- field2: integer (nullable = true) > |-- timestamp: integer (nullable = true) > scala> rdd.registerTempTable("tmp_table") > scala> hiveContext.sql("select data.field1 from tmp_table LATERAL VIEW > explode(data_array) nestedStuff AS data").collect > res3: Array[org.apache.spark.sql.Row] = Array([1], [3]) > scala> hiveContext.sql("SET hive.exec.dynamic.partition = true") > scala> hiveContext.sql("SET hive.exec.dynamic.partition.mode = nonstrict") > scala> hiveContext.sql("set parquet.compression=GZIP") > scala> hiveContext.setConf("spark.sql.parquet.binaryAsString", "true") > scala> hiveContext.sql("create external table if not exists > persisted_table(data_array ARRAY >, > timestamp INT) STORED AS PARQUET Location 'hdfs:///test_table'") > scala> hiveContext.sql("insert into table persisted_table select * from > tmp_table").collect > scala> hiveContext.sql("select data.field1 from persisted_table LATERAL VIEW > explode(data_array) nestedStuff AS data").collect > parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in > file hdfs://*/test_table/part-1 > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) > at > parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) > at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:145) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:797) > at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:797) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(
[jira] [Updated] (SPARK-5508) [hive context] Unable to query array once saved as parquet
[ https://issues.apache.org/jira/browse/SPARK-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5508: --- Component/s: (was: Spark Core) spark sql > [hive context] Unable to query array once saved as parquet > -- > > Key: SPARK-5508 > URL: https://issues.apache.org/jira/browse/SPARK-5508 > Project: Spark > Issue Type: Bug > Components: spark sql >Affects Versions: 1.2.1 > Environment: mesos, cdh >Reporter: Ayoub Benali > Labels: hivecontext, parquet > > When the table is saved as parquet, we cannot query a field which is an array > of struct, like show bellow: > {noformat} > scala> val data1="""{ > | "timestamp": 1422435598, > | "data_array": [ > | { > | "field1": 1, > | "field2": 2 > | } > | ] > | }""" > scala> val data2="""{ > | "timestamp": 1422435598, > | "data_array": [ > | { > | "field1": 3, > | "field2": 4 > | } > | ] > scala> val jsonRDD = sc.makeRDD(data1 :: data2 :: Nil) > scala> val rdd = hiveContext.jsonRDD(jsonRDD) > scala> rdd.printSchema > root > |-- data_array: array (nullable = true) > ||-- element: struct (containsNull = false) > |||-- field1: integer (nullable = true) > |||-- field2: integer (nullable = true) > |-- timestamp: integer (nullable = true) > scala> rdd.registerTempTable("tmp_table") > scala> hiveContext.sql("select data.field1 from tmp_table LATERAL VIEW > explode(data_array) nestedStuff AS data").collect > res3: Array[org.apache.spark.sql.Row] = Array([1], [3]) > scala> hiveContext.sql("SET hive.exec.dynamic.partition = true") > scala> hiveContext.sql("SET hive.exec.dynamic.partition.mode = nonstrict") > scala> hiveContext.sql("set parquet.compression=GZIP") > scala> hiveContext.setConf("spark.sql.parquet.binaryAsString", "true") > scala> hiveContext.sql("create external table if not exists > persisted_table(data_array ARRAY >, > timestamp INT) STORED AS PARQUET Location 'hdfs:///test_table'") > scala> hiveContext.sql("insert into table persisted_table select * from > tmp_table").collect > scala> hiveContext.sql("select data.field1 from persisted_table LATERAL VIEW > explode(data_array) nestedStuff AS data").collect > parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in > file hdfs://*/test_table/part-1 > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) > at > parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) > at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:145) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:797) > at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:797) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayLis
[jira] [Updated] (SPARK-5521) PCA wrapper for easy transform vectors
[ https://issues.apache.org/jira/browse/SPARK-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-5521: - Target Version/s: 1.4.0 > PCA wrapper for easy transform vectors > -- > > Key: SPARK-5521 > URL: https://issues.apache.org/jira/browse/SPARK-5521 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Kirill A. Korinskiy >Assignee: Kirill A. Korinskiy > > Implement a simple PCA wrapper for easy transform of vectors by PCA for > example LabeledPoint or another complicated structure. > Now all PCA transformation may take only matrix and haven't got any way to > take project from vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5521) PCA wrapper for easy transform vectors
[ https://issues.apache.org/jira/browse/SPARK-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-5521: - Assignee: Kirill A. Korinskiy > PCA wrapper for easy transform vectors > -- > > Key: SPARK-5521 > URL: https://issues.apache.org/jira/browse/SPARK-5521 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Kirill A. Korinskiy >Assignee: Kirill A. Korinskiy > > Implement a simple PCA wrapper for easy transform of vectors by PCA for > example LabeledPoint or another complicated structure. > Now all PCA transformation may take only matrix and haven't got any way to > take project from vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5521) PCA wrapper for easy transform vectors
[ https://issues.apache.org/jira/browse/SPARK-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300902#comment-14300902 ] Apache Spark commented on SPARK-5521: - User 'catap' has created a pull request for this issue: https://github.com/apache/spark/pull/4304 > PCA wrapper for easy transform vectors > -- > > Key: SPARK-5521 > URL: https://issues.apache.org/jira/browse/SPARK-5521 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Kirill A. Korinskiy > > Implement a simple PCA wrapper for easy transform of vectors by PCA for > example LabeledPoint or another complicated structure. > Now all PCA transformation may take only matrix and haven't got any way to > take project from vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5353) Log failures in ExceutorClassLoader
[ https://issues.apache.org/jira/browse/SPARK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5353. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Tobias Schlatter > Log failures in ExceutorClassLoader > --- > > Key: SPARK-5353 > URL: https://issues.apache.org/jira/browse/SPARK-5353 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Reporter: Tobias Schlatter >Assignee: Tobias Schlatter >Priority: Minor > Fix For: 1.3.0 > > > When the ExecutorClassLoader tries to load classes compiled in the Spark > Shell and fails, it silently passes loading to the parent ClassLoader. It > should log these failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5515) Build fails with spark-ganglia-lgpl profile
[ https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-5515. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Kousuke Saruta > Build fails with spark-ganglia-lgpl profile > --- > > Key: SPARK-5515 > URL: https://issues.apache.org/jira/browse/SPARK-5515 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Blocker > Fix For: 1.3.0 > > > Build fails with spark-ganglia-lgpl profile at the moment. This is because > pom.xml for spark-ganglia-lgpl is not updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5515) Build fails with spark-ganglia-lgpl profile
[ https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300897#comment-14300897 ] Andrew Or commented on SPARK-5515: -- yes I just closed this > Build fails with spark-ganglia-lgpl profile > --- > > Key: SPARK-5515 > URL: https://issues.apache.org/jira/browse/SPARK-5515 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Blocker > Fix For: 1.3.0 > > > Build fails with spark-ganglia-lgpl profile at the moment. This is because > pom.xml for spark-ganglia-lgpl is not updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5515) Build fails with spark-ganglia-lgpl profile
[ https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300894#comment-14300894 ] Patrick Wendell commented on SPARK-5515: [~andrewor14] is this fixed now? > Build fails with spark-ganglia-lgpl profile > --- > > Key: SPARK-5515 > URL: https://issues.apache.org/jira/browse/SPARK-5515 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.3.0 >Reporter: Kousuke Saruta >Priority: Blocker > > Build fails with spark-ganglia-lgpl profile at the moment. This is because > pom.xml for spark-ganglia-lgpl is not updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5208) Add more documentation to Netty-based configs
[ https://issues.apache.org/jira/browse/SPARK-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5208. Resolution: Won't Fix > Add more documentation to Netty-based configs > -- > > Key: SPARK-5208 > URL: https://issues.apache.org/jira/browse/SPARK-5208 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.3.0 >Reporter: Kousuke Saruta > > SPARK-4864 added some documentation about Netty-based configs but I think we > need more. I think following configs can be useful for performance tuning. > * spark.shuffle.io.mode > * spark.shuffle.io.backLog > * spark.shuffle.io.receiveBuffer > * spark.shuffle.io.sendBuffer -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5521) PCA wrapper for easy transform vectors
Kirill A. Korinskiy created SPARK-5521: -- Summary: PCA wrapper for easy transform vectors Key: SPARK-5521 URL: https://issues.apache.org/jira/browse/SPARK-5521 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Kirill A. Korinskiy Implement a simple PCA wrapper for easy transform of vectors by PCA for example LabeledPoint or another complicated structure. Now all PCA transformation may take only matrix and haven't got any way to take project from vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5021) GaussianMixtureEM should be faster for SparseVector input
[ https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300889#comment-14300889 ] Manoj Kumar commented on SPARK-5021: Sorry for the delay, I just started going through the source. Just a random question, why is this model named GaussianMixtureEM? Shouldn't it be renamed just GaussianMixtureModel since EM is just an optimization algo. > GaussianMixtureEM should be faster for SparseVector input > - > > Key: SPARK-5021 > URL: https://issues.apache.org/jira/browse/SPARK-5021 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Assignee: Manoj Kumar > > GaussianMixtureEM currently converts everything to dense vectors. It would > be nice if it were faster for SparseVectors (running in time linear in the > number of non-zero values). > However, this may not be too important since clustering should rarely be done > in high dimensions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3996) Shade Jetty in Spark deliverables
[ https://issues.apache.org/jira/browse/SPARK-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3996. Resolution: Fixed I've merged a new patch so closing this for now. > Shade Jetty in Spark deliverables > - > > Key: SPARK-3996 > URL: https://issues.apache.org/jira/browse/SPARK-3996 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Mingyu Kim >Assignee: Patrick Wendell > Fix For: 1.3.0 > > > We'd like to use Spark in a Jetty 9 server, and it's causing a version > conflict. Given that Spark's dependency on Jetty is light, it'd be a good > idea to shade this dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5420) Cross-langauge load/store functions for creating and saving DataFrames
[ https://issues.apache.org/jira/browse/SPARK-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300873#comment-14300873 ] Apache Spark commented on SPARK-5420: - User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/4294 > Cross-langauge load/store functions for creating and saving DataFrames > -- > > Key: SPARK-5420 > URL: https://issues.apache.org/jira/browse/SPARK-5420 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Patrick Wendell >Assignee: Yin Huai >Priority: Blocker > > We should have standard API's for loading or saving a table from a data > store. Per comment discussion: > {code} > def loadData(datasource: String, parameters: Map[String, String]): DataFrame > def loadData(datasource: String, parameters: java.util.Map[String, String]): > DataFrame > def storeData(datasource: String, parameters: Map[String, String]): DataFrame > def storeData(datasource: String, parameters: java.util.Map[String, String]): > DataFrame > {code} > Python should have this too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300868#comment-14300868 ] Twinkle Sachdeva commented on SPARK-4705: - Hi, Sorry, i saw just this update of yours. Missed last two comments, will work on it. Thanks, > Driver retries in yarn-cluster mode always fail if event logging is enabled > --- > > Key: SPARK-4705 > URL: https://issues.apache.org/jira/browse/SPARK-4705 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 1.2.0 >Reporter: Marcelo Vanzin > > yarn-cluster mode will retry to run the driver in certain failure modes. If > even logging is enabled, this will most probably fail, because: > {noformat} > Exception in thread "Driver" java.io.IOException: Log directory > hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003 > already exists! > at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129) > at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) > at > org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) > at org.apache.spark.SparkContext.(SparkContext.scala:353) > {noformat} > The even log path should be "more unique". Or perhaps retries of the same app > should clean up the old logs first. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300867#comment-14300867 ] Twinkle Sachdeva commented on SPARK-4705: - Hi, Sorry, i saw just this update of yours. Missed last two comments, will work on it. Thanks, > Driver retries in yarn-cluster mode always fail if event logging is enabled > --- > > Key: SPARK-4705 > URL: https://issues.apache.org/jira/browse/SPARK-4705 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 1.2.0 >Reporter: Marcelo Vanzin > > yarn-cluster mode will retry to run the driver in certain failure modes. If > even logging is enabled, this will most probably fail, because: > {noformat} > Exception in thread "Driver" java.io.IOException: Log directory > hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003 > already exists! > at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129) > at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) > at > org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) > at org.apache.spark.SparkContext.(SparkContext.scala:353) > {noformat} > The even log path should be "more unique". Or perhaps retries of the same app > should clean up the old logs first. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Twinkle Sachdeva updated SPARK-4705: Comment: was deleted (was: Hi, Sorry, i saw just this update of yours. Missed last two comments, will work on it. Thanks,) > Driver retries in yarn-cluster mode always fail if event logging is enabled > --- > > Key: SPARK-4705 > URL: https://issues.apache.org/jira/browse/SPARK-4705 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 1.2.0 >Reporter: Marcelo Vanzin > > yarn-cluster mode will retry to run the driver in certain failure modes. If > even logging is enabled, this will most probably fail, because: > {noformat} > Exception in thread "Driver" java.io.IOException: Log directory > hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003 > already exists! > at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129) > at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) > at > org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) > at org.apache.spark.SparkContext.(SparkContext.scala:353) > {noformat} > The even log path should be "more unique". Or perhaps retries of the same app > should clean up the old logs first. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5406) LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound
[ https://issues.apache.org/jira/browse/SPARK-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuhao yang closed SPARK-5406. - fix and merged. Thanks > LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound > - > > Key: SPARK-5406 > URL: https://issues.apache.org/jira/browse/SPARK-5406 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: centos, others should be similar >Reporter: yuhao yang >Assignee: yuhao yang >Priority: Minor > Fix For: 1.3.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > In RowMatrix.computeSVD, under LocalLAPACK mode, the code would invoke > brzSvd. Yet breeze svd for dense matrix has latent constraint. In it's > implementation > ( > https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/svd.scala >): > val workSize = ( 3 > * scala.math.min(m, n) > * scala.math.min(m, n) > + scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n) > * scala.math.min(m, n) + 4 * scala.math.min(m, n)) > ) > val work = new Array[Double](workSize) > as a result, column num must satisfy 7 * n * n + 4 * n < Int.MaxValue > thus, n < 17515. > This jira is only the first step. If possbile, I hope spark can handle matrix > computation up to 80K * 80K. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4001) Add Apriori algorithm to Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-4001. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 2847 [https://github.com/apache/spark/pull/2847] > Add Apriori algorithm to Spark MLlib > > > Key: SPARK-4001 > URL: https://issues.apache.org/jira/browse/SPARK-4001 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Jacky Li >Assignee: Jacky Li > Fix For: 1.3.0 > > Attachments: Distributed frequent item mining algorithm based on > Spark.pptx > > > Apriori is the classic algorithm for frequent item set mining in a > transactional data set. It will be useful if Apriori algorithm is added to > MLLib in Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5406) LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound
[ https://issues.apache.org/jira/browse/SPARK-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5406. -- Resolution: Fixed Fix Version/s: 1.3.0 Target Version/s: 1.3.0 (was: 1.2.1) fixed by https://github.com/apache/spark/pull/4200 > LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound > - > > Key: SPARK-5406 > URL: https://issues.apache.org/jira/browse/SPARK-5406 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: centos, others should be similar >Reporter: yuhao yang >Assignee: yuhao yang >Priority: Minor > Fix For: 1.3.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > In RowMatrix.computeSVD, under LocalLAPACK mode, the code would invoke > brzSvd. Yet breeze svd for dense matrix has latent constraint. In it's > implementation > ( > https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/svd.scala >): > val workSize = ( 3 > * scala.math.min(m, n) > * scala.math.min(m, n) > + scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n) > * scala.math.min(m, n) + 4 * scala.math.min(m, n)) > ) > val work = new Array[Double](workSize) > as a result, column num must satisfy 7 * n * n + 4 * n < Int.MaxValue > thus, n < 17515. > This jira is only the first step. If possbile, I hope spark can handle matrix > computation up to 80K * 80K. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5406) LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound
[ https://issues.apache.org/jira/browse/SPARK-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-5406: - Assignee: yuhao yang > LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound > - > > Key: SPARK-5406 > URL: https://issues.apache.org/jira/browse/SPARK-5406 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 > Environment: centos, others should be similar >Reporter: yuhao yang >Assignee: yuhao yang >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > In RowMatrix.computeSVD, under LocalLAPACK mode, the code would invoke > brzSvd. Yet breeze svd for dense matrix has latent constraint. In it's > implementation > ( > https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/svd.scala >): > val workSize = ( 3 > * scala.math.min(m, n) > * scala.math.min(m, n) > + scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n) > * scala.math.min(m, n) + 4 * scala.math.min(m, n)) > ) > val work = new Array[Double](workSize) > as a result, column num must satisfy 7 * n * n + 4 * n < Int.MaxValue > thus, n < 17515. > This jira is only the first step. If possbile, I hope spark can handle matrix > computation up to 80K * 80K. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5520) Make FP-Growth implementation take generic item types
Xiangrui Meng created SPARK-5520: Summary: Make FP-Growth implementation take generic item types Key: SPARK-5520 URL: https://issues.apache.org/jira/browse/SPARK-5520 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Xiangrui Meng There is not technical restriction on the item types in the FP-Growth implementation. We used String in the first PR for simplicity. Maybe we could make the type generic before 1.3 (and specialize it for Int/Long). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5519) Add user guide for FP-Growth
Xiangrui Meng created SPARK-5519: Summary: Add user guide for FP-Growth Key: SPARK-5519 URL: https://issues.apache.org/jira/browse/SPARK-5519 Project: Spark Issue Type: Documentation Components: Documentation, MLlib Reporter: Xiangrui Meng We need to add a section for FP-Growth in the user guide after we merge the FP-Growth PR is merged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5518) Error messages for plans with invalid AttributeReferences
Michael Armbrust created SPARK-5518: --- Summary: Error messages for plans with invalid AttributeReferences Key: SPARK-5518 URL: https://issues.apache.org/jira/browse/SPARK-5518 Project: Spark Issue Type: Sub-task Reporter: Michael Armbrust Priority: Blocker It is now possible for users to put invalid attribute references into query plans. We should check for this case at the end of analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-4981) Add a streaming singular value decomposition
[ https://issues.apache.org/jira/browse/SPARK-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reza Zadeh updated SPARK-4981: -- Comment: was deleted (was: Another option: see slide 31 to solve the problem using IndexedRDDs, thanks to Ankur's nice slides and work on IndexedRDD: https://issues.apache.org/jira/secure/attachment/12656374/2014-07-07-IndexedRDD-design-review.pdf) > Add a streaming singular value decomposition > > > Key: SPARK-4981 > URL: https://issues.apache.org/jira/browse/SPARK-4981 > Project: Spark > Issue Type: New Feature > Components: MLlib, Streaming >Reporter: Jeremy Freeman > > This is for tracking WIP on a streaming singular value decomposition > implementation. This will likely be more complex than the existing streaming > algorithms (k-means, regression), but should be possible using the family of > sequential update rule outlined in this paper: > "Fast low-rank modifications of the thin singular value decomposition" > by Matthew Brand > http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4981) Add a streaming singular value decomposition
[ https://issues.apache.org/jira/browse/SPARK-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300849#comment-14300849 ] Reza Zadeh commented on SPARK-4981: --- Another option: see slide 31 to solve the problem using IndexedRDDs, thanks to Ankur's nice slides and work on IndexedRDD: https://issues.apache.org/jira/secure/attachment/12656374/2014-07-07-IndexedRDD-design-review.pdf > Add a streaming singular value decomposition > > > Key: SPARK-4981 > URL: https://issues.apache.org/jira/browse/SPARK-4981 > Project: Spark > Issue Type: New Feature > Components: MLlib, Streaming >Reporter: Jeremy Freeman > > This is for tracking WIP on a streaming singular value decomposition > implementation. This will likely be more complex than the existing streaming > algorithms (k-means, regression), but should be possible using the family of > sequential update rule outlined in this paper: > "Fast low-rank modifications of the thin singular value decomposition" > by Matthew Brand > http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4981) Add a streaming singular value decomposition
[ https://issues.apache.org/jira/browse/SPARK-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300848#comment-14300848 ] Reza Zadeh commented on SPARK-4981: --- Another option: see slide 31 to solve the problem using IndexedRDDs, thanks to Ankur's nice slides and work on IndexedRDD: https://issues.apache.org/jira/secure/attachment/12656374/2014-07-07-IndexedRDD-design-review.pdf > Add a streaming singular value decomposition > > > Key: SPARK-4981 > URL: https://issues.apache.org/jira/browse/SPARK-4981 > Project: Spark > Issue Type: New Feature > Components: MLlib, Streaming >Reporter: Jeremy Freeman > > This is for tracking WIP on a streaming singular value decomposition > implementation. This will likely be more complex than the existing streaming > algorithms (k-means, regression), but should be possible using the family of > sequential update rule outlined in this paper: > "Fast low-rank modifications of the thin singular value decomposition" > by Matthew Brand > http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5476) SQLContext.createDataFrame shouldn't be an implicit function
[ https://issues.apache.org/jira/browse/SPARK-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust closed SPARK-5476. --- Resolution: Won't Fix We decided this was too hard to do while maintaining compatibility. > SQLContext.createDataFrame shouldn't be an implicit function > > > Key: SPARK-5476 > URL: https://issues.apache.org/jira/browse/SPARK-5476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > It is sort of strange to ask users to import sqlContext._ or > sqlContext.createDataFrame. > The proposal here is to ask users to define an implicit val for SQLContext, > and then dsl package object should include an implicit function that converts > an RDD[Product] to a DataFrame. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5517) Add input types for Java UDFs
Michael Armbrust created SPARK-5517: --- Summary: Add input types for Java UDFs Key: SPARK-5517 URL: https://issues.apache.org/jira/browse/SPARK-5517 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.3.0 Reporter: Michael Armbrust Assignee: Reynold Xin Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5182) Partitioning support for tables created by the data source API
[ https://issues.apache.org/jira/browse/SPARK-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-5182: Target Version/s: 1.4.0 (was: 1.3.0) > Partitioning support for tables created by the data source API > -- > > Key: SPARK-5182 > URL: https://issues.apache.org/jira/browse/SPARK-5182 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4867) UDF clean up
[ https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4867: Target Version/s: 1.4.0 (was: 1.3.0) > UDF clean up > > > Key: SPARK-4867 > URL: https://issues.apache.org/jira/browse/SPARK-4867 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Michael Armbrust >Priority: Blocker > > Right now our support and internal implementation of many functions has a few > issues. Specifically: > - UDFS don't know their input types and thus don't do type coercion. > - We hard code a bunch of built in functions into the parser. This is bad > because in SQL it creates new reserved words for things that aren't actually > keywords. Also it means that for each function we need to add support to > both SQLContext and HiveContext separately. > For this JIRA I propose we do the following: > - Change the interfaces for registerFunction and ScalaUdf to include types > for the input arguments as well as the output type. > - Add a rule to analysis that does type coercion for UDFs. > - Add a parse rule for functions to SQLParser. > - Rewrite all the UDFs that are currently hacked into the various parsers > using this new functionality. > Depending on how big this refactoring becomes we could split parts 1&2 from > part 3 above. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5465) Data source version of Parquet doesn't push down And filters properly
[ https://issues.apache.org/jira/browse/SPARK-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5465. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4255 [https://github.com/apache/spark/pull/4255] > Data source version of Parquet doesn't push down And filters properly > - > > Key: SPARK-5465 > URL: https://issues.apache.org/jira/browse/SPARK-5465 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.2.0, 1.2.1 >Reporter: Cheng Lian >Priority: Blocker > Fix For: 1.3.0 > > > The current implementation combines all predicates and then tries to convert > it to a single Parquet filter predicate. In this way, the Parquet filter > predicate can not be generated if any component of the original filters can > not be converted. (code lines > [here|https://github.com/apache/spark/blob/a731314c319a6f265060e05267844069027804fd/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala#L197-L201]). > For example, {{a > 10 AND a < 20}} can be successfully converted, while {{a > > 10 AND a < b}} can't because Parquet doesn't accept filters like {{a < b}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5262) widen types for parameters of coalesce()
[ https://issues.apache.org/jira/browse/SPARK-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5262. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4057 [https://github.com/apache/spark/pull/4057] > widen types for parameters of coalesce() > > > Key: SPARK-5262 > URL: https://issues.apache.org/jira/browse/SPARK-5262 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Adrian Wang > Fix For: 1.3.0 > > > Currently Coalesce(null, 1, null) would throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3454) Expose JSON representation of data shown in WebUI
[ https://issues.apache.org/jira/browse/SPARK-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300834#comment-14300834 ] Ryan Williams commented on SPARK-3454: -- I think it looks great, [~imranr]! Is the logical next step to flesh out exactly what the new POJOs will be? > Expose JSON representation of data shown in WebUI > - > > Key: SPARK-3454 > URL: https://issues.apache.org/jira/browse/SPARK-3454 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.1.0 >Reporter: Kousuke Saruta > Attachments: sparkmonitoringjsondesign.pdf > > > If WebUI support to JSON format extracting, it's helpful for user who want to > analyse stage / task / executor information. > Fortunately, WebUI has renderJson method so we can implement the method in > each subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5196) Add comment field in Create Table Field DDL
[ https://issues.apache.org/jira/browse/SPARK-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5196. - Resolution: Fixed Issue resolved by pull request 3999 [https://github.com/apache/spark/pull/3999] > Add comment field in Create Table Field DDL > --- > > Key: SPARK-5196 > URL: https://issues.apache.org/jira/browse/SPARK-5196 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: shengli > Fix For: 1.3.0 > > > Support `comment` in Create Table Field DDL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-1825) Windows Spark fails to work with Linux YARN
[ https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-1825. Resolution: Fixed Fix Version/s: 1.3.0 > Windows Spark fails to work with Linux YARN > --- > > Key: SPARK-1825 > URL: https://issues.apache.org/jira/browse/SPARK-1825 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.0.0 >Reporter: Taeyun Kim >Assignee: Masayoshi TSUZUKI > Fix For: 1.3.0 > > Attachments: SPARK-1825.patch > > > Windows Spark fails to work with Linux YARN. > This is a cross-platform problem. > This error occurs when 'yarn-client' mode is used. > (yarn-cluster/yarn-standalone mode was not tested.) > On YARN side, Hadoop 2.4.0 resolved the issue as follows: > https://issues.apache.org/jira/browse/YARN-1824 > But Spark YARN module does not incorporate the new YARN API yet, so problem > persists for Spark. > First, the following source files should be changed: > - /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala > - > /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala > Change is as follows: > - Replace .$() to .$$() > - Replace File.pathSeparator for Environment.CLASSPATH.name to > ApplicationConstants.CLASS_PATH_SEPARATOR (import > org.apache.hadoop.yarn.api.ApplicationConstants is required for this) > Unless the above are applied, launch_container.sh will contain invalid shell > script statements(since they will contain Windows-specific separators), and > job will fail. > Also, the following symptom should also be fixed (I could not find the > relevant source code): > - SPARK_HOME environment variable is copied straight to launch_container.sh. > It should be changed to the path format for the server OS, or, the better, a > separate environment variable or a configuration variable should be created. > - '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after > the above change is applied. maybe I missed a few lines. > I'm not sure whether this is all, since I'm new to both Spark and YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5176) Thrift server fails with confusing error message when deploy-mode is cluster
[ https://issues.apache.org/jira/browse/SPARK-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-5176. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Tom Panning Target Version/s: 1.3.0 > Thrift server fails with confusing error message when deploy-mode is cluster > > > Key: SPARK-5176 > URL: https://issues.apache.org/jira/browse/SPARK-5176 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0, 1.2.0 >Reporter: Tom Panning >Assignee: Tom Panning > Labels: starter > Fix For: 1.3.0 > > > With Spark 1.2.0, when I try to run > {noformat} > $SPARK_HOME/sbin/start-thriftserver.sh --deploy-mode cluster --master > spark://xd-spark.xdata.data-tactics-corp.com:7077 > {noformat} > The log output is > {noformat} > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Spark Command: /usr/java/latest/bin/java -cp > ::/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/sbin/../conf:/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar > -XX:MaxPermSize=128m -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit > --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 > --deploy-mode cluster --master > spark://xd-spark.xdata.data-tactics-corp.com:7077 spark-internal > > Jar url 'spark-internal' is not in valid format. > Must be a jar file path in URL format (e.g. hdfs://host:port/XX.jar, > file:///XX.jar) > Usage: DriverClient [options] launch > [driver options] > Usage: DriverClient kill > Options: >-c CORES, --cores CORESNumber of cores to request (default: 1) >-m MEMORY, --memory MEMORY Megabytes of memory to request (default: > 512) >-s, --superviseWhether to restart the driver on failure >-v, --verbose Print more debugging output > > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > {noformat} > I do not get this error if deploy-mode is set to client. The --deploy-mode > option is described by the --help output, so I expected it to work. I > checked, and this behavior seems to be present in Spark 1.1.0 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5515) Build fails with spark-ganglia-lgpl profile
[ https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300819#comment-14300819 ] Apache Spark commented on SPARK-5515: - User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/4303 > Build fails with spark-ganglia-lgpl profile > --- > > Key: SPARK-5515 > URL: https://issues.apache.org/jira/browse/SPARK-5515 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.3.0 >Reporter: Kousuke Saruta >Priority: Blocker > > Build fails with spark-ganglia-lgpl profile at the moment. This is because > pom.xml for spark-ganglia-lgpl is not updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5515) Build fails with spark-ganglia-lgpl profile
[ https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300818#comment-14300818 ] Andrew Or commented on SPARK-5515: -- https://github.com/apache/spark/pull/4303 > Build fails with spark-ganglia-lgpl profile > --- > > Key: SPARK-5515 > URL: https://issues.apache.org/jira/browse/SPARK-5515 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.3.0 >Reporter: Kousuke Saruta >Priority: Blocker > > Build fails with spark-ganglia-lgpl profile at the moment. This is because > pom.xml for spark-ganglia-lgpl is not updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5516) ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-22] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java
wuyukai created SPARK-5516: -- Summary: ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-22] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java heap space Key: SPARK-5516 URL: https://issues.apache.org/jira/browse/SPARK-5516 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.2.0 Environment: centos 6.5 Reporter: wuyukai Fix For: 1.2.2 When we ran the model of Gradient Boosting Tree, it throwed this exception below. The data we used is only 45M. We ran these data on 4 computers that each have 4 cores and 16GB RAM. We set the parameter "gradientboostedtrees.maxiteration" 50. 15/02/01 01:39:48 INFO DAGScheduler: Job 965 failed: collectAsMap at DecisionTree.scala:653, took 1.616976 s Exception in thread "main" org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428) at akka.actor.Actor$class.aroundPostStop(Actor.scala:475) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundPostStop(DAGScheduler.scala:1375) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) at akka.actor.ActorCell.terminate(ActorCell.scala:369) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15/02/01 01:39:48 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-22] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at scala.collection.immutable.$colon$colon.writeObject(List.scala:379) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStre
[jira] [Closed] (SPARK-4859) Refactor LiveListenerBus and StreamingListenerBus
[ https://issues.apache.org/jira/browse/SPARK-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4859. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Shixiong Zhu > Refactor LiveListenerBus and StreamingListenerBus > - > > Key: SPARK-4859 > URL: https://issues.apache.org/jira/browse/SPARK-4859 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 1.3.0 > > > [#4006|https://github.com/apache/spark/pull/4006] refactors LiveListenerBus > and StreamingListenerBus and extracts the common codes to a parent class > ListenerBus. > It also includes bug fixes in > [#3710|https://github.com/apache/spark/pull/3710]: > 1. Fix the race condition of queueFullErrorMessageLogged in LiveListenerBus > and StreamingListenerBus to avoid outputing queue-full-error logs multiple > times. > 2. Make sure the SHUTDOWN message will be delivered to listenerThread, so > that we can make sure listenerThread will always be able to exit. > 3. Log the error from listener rather than crashing listenerThread in > StreamingListenerBus. > During fixing the above bugs, we find it's better to make LiveListenerBus and > StreamingListenerBus have the same bahaviors. Then there will be many > duplicated codes in LiveListenerBus and StreamingListenerBus. > Therefore, I extracted their common codes to ListenerBus as a parent class: > LiveListenerBus and StreamingListenerBus only need to extend ListenerBus and > implement onPostEvent (how to process an event) and onDropEvent (do something > when droppping an event). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions
[ https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300814#comment-14300814 ] Sandy Ryza commented on SPARK-5492: --- After seeing this I tried with 1.0.4 and didn't hit anything. I guess the ec2 setup is different in some way - I'll post a patch tonight. > Thread statistics can break with older Hadoop versions > -- > > Key: SPARK-5492 > URL: https://issues.apache.org/jira/browse/SPARK-5492 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Sandy Ryza >Priority: Blocker > > {code} > java.lang.ClassNotFoundException: > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180) > at > org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118) > at scala.Option.orElse(Option.scala:257) > {code} > I think the issue is we need to catch ClassNotFoundException here: > https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144 > However, I'm really confused how this didn't fail our unit tests, since we > explicitly tried to test this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5155) Python API for MQTT streaming
[ https://issues.apache.org/jira/browse/SPARK-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300813#comment-14300813 ] Apache Spark commented on SPARK-5155: - User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/4303 > Python API for MQTT streaming > - > > Key: SPARK-5155 > URL: https://issues.apache.org/jira/browse/SPARK-5155 > Project: Spark > Issue Type: New Feature > Components: PySpark, Streaming >Reporter: Davies Liu > > Python API for MQTT Utils -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5515) Build fails with spark-ganglia-lgpl profile
Kousuke Saruta created SPARK-5515: - Summary: Build fails with spark-ganglia-lgpl profile Key: SPARK-5515 URL: https://issues.apache.org/jira/browse/SPARK-5515 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.3.0 Reporter: Kousuke Saruta Priority: Blocker Build fails with spark-ganglia-lgpl profile at the moment. This is because pom.xml for spark-ganglia-lgpl is not updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions
[ https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300804#comment-14300804 ] Michael Armbrust commented on SPARK-5492: - We hit this bug using the default hadoop version for the spark-ec2 scripts (1.0.4). > Thread statistics can break with older Hadoop versions > -- > > Key: SPARK-5492 > URL: https://issues.apache.org/jira/browse/SPARK-5492 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Patrick Wendell >Assignee: Sandy Ryza >Priority: Blocker > > {code} > java.lang.ClassNotFoundException: > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180) > at > org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118) > at scala.Option.orElse(Option.scala:257) > {code} > I think the issue is we need to catch ClassNotFoundException here: > https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144 > However, I'm really confused how this didn't fail our unit tests, since we > explicitly tried to test this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5514) collect should call executeCollect
Reynold Xin created SPARK-5514: -- Summary: collect should call executeCollect Key: SPARK-5514 URL: https://issues.apache.org/jira/browse/SPARK-5514 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.3.0 Reporter: Reynold Xin Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5513) Add NMF option to the new ALS implementation
[ https://issues.apache.org/jira/browse/SPARK-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300787#comment-14300787 ] Apache Spark commented on SPARK-5513: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/4302 > Add NMF option to the new ALS implementation > > > Key: SPARK-5513 > URL: https://issues.apache.org/jira/browse/SPARK-5513 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > Then we can swap "spark.mllib"'s implementation to use the new ALS impl. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5513) Add NMF option to the new ALS implementation
Xiangrui Meng created SPARK-5513: Summary: Add NMF option to the new ALS implementation Key: SPARK-5513 URL: https://issues.apache.org/jira/browse/SPARK-5513 Project: Spark Issue Type: New Feature Components: ML, MLlib Reporter: Xiangrui Meng Assignee: Xiangrui Meng Then we can swap "spark.mllib"'s implementation to use the new ALS impl. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5424) Make the new ALS implementation take generic ID types
[ https://issues.apache.org/jira/browse/SPARK-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5424. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4281 [https://github.com/apache/spark/pull/4281] > Make the new ALS implementation take generic ID types > - > > Key: SPARK-5424 > URL: https://issues.apache.org/jira/browse/SPARK-5424 > Project: Spark > Issue Type: Improvement > Components: MLlib, Spark ML >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Fix For: 1.3.0 > > > The new implementation uses local indices of users and items. So the input > user/item type could be generic, at least specialized for Int and Long. We > can expose the generic interface as a developer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300762#comment-14300762 ] bc Wong commented on SPARK-4705: [~twinkle], have you had a chance to worked on this? > Driver retries in yarn-cluster mode always fail if event logging is enabled > --- > > Key: SPARK-4705 > URL: https://issues.apache.org/jira/browse/SPARK-4705 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 1.2.0 >Reporter: Marcelo Vanzin > > yarn-cluster mode will retry to run the driver in certain failure modes. If > even logging is enabled, this will most probably fail, because: > {noformat} > Exception in thread "Driver" java.io.IOException: Log directory > hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003 > already exists! > at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129) > at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) > at > org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) > at org.apache.spark.SparkContext.(SparkContext.scala:353) > {noformat} > The even log path should be "more unique". Or perhaps retries of the same app > should clean up the old logs first. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1180) Allow to provide a custom persistence engine
[ https://issues.apache.org/jira/browse/SPARK-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson resolved SPARK-1180. --- Resolution: Duplicate Fix Version/s: 1.3.0 Assignee: Prashant Sharma > Allow to provide a custom persistence engine > > > Key: SPARK-1180 > URL: https://issues.apache.org/jira/browse/SPARK-1180 > Project: Spark > Issue Type: Improvement >Reporter: Jacek Lewandowski >Assignee: Prashant Sharma >Priority: Minor > Fix For: 1.3.0 > > > Currently Spark supports only predefined ZOOKEEPER and FILESYSTEM persistence > engines. It would be nice to give a possibility to provide custom persistence > engine by specifying a class name in {{spark.deploy.recoveryMode}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5207) StandardScalerModel mean and variance re-use
[ https://issues.apache.org/jira/browse/SPARK-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5207. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4140 [https://github.com/apache/spark/pull/4140] > StandardScalerModel mean and variance re-use > > > Key: SPARK-5207 > URL: https://issues.apache.org/jira/browse/SPARK-5207 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Octavian Geagla >Assignee: Octavian Geagla > Fix For: 1.3.0 > > > From this discussion: > http://apache-spark-developers-list.1001551.n3.nabble.com/Re-use-scaling-means-and-variances-from-StandardScalerModel-td10073.html > Changing constructor to public would be a simple change, but a discussion is > needed to determine what args necessary for this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop
[ https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300243#comment-14300243 ] DeepakVohra commented on SPARK-2356: Thanks Sean. HADOOP_CONF_DIR shouldn't be required to be set if Hadoop is not used. Hadoop doesn't even get installed on Windows. > Exception: Could not locate executable null\bin\winutils.exe in the Hadoop > --- > > Key: SPARK-2356 > URL: https://issues.apache.org/jira/browse/SPARK-2356 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Kostiantyn Kudriavtsev >Priority: Critical > > I'm trying to run some transformation on Spark, it works fine on cluster > (YARN, linux machines). However, when I'm trying to run it on local machine > (Windows 7) under unit test, I got errors (I don't use Hadoop, I'm read file > from local filesystem): > {code} > 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the > hadoop binary path > java.io.IOException: Could not locate executable null\bin\winutils.exe in the > Hadoop binaries. > at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) > at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) > at org.apache.hadoop.util.Shell.(Shell.java:326) > at org.apache.hadoop.util.StringUtils.(StringUtils.java:76) > at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) > at org.apache.hadoop.security.Groups.(Groups.java:77) > at > org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) > at > org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) > at > org.apache.spark.deploy.SparkHadoopUtil.(SparkHadoopUtil.scala:36) > at > org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala:109) > at > org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala) > at org.apache.spark.SparkContext.(SparkContext.scala:228) > at org.apache.spark.SparkContext.(SparkContext.scala:97) > {code} > It's happened because Hadoop config is initialized each time when spark > context is created regardless is hadoop required or not. > I propose to add some special flag to indicate if hadoop config is required > (or start this configuration manually) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5512) Run the PIC algorithm with degree vector
[ https://issues.apache.org/jira/browse/SPARK-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-5512: --- Priority: Minor (was: Major) > Run the PIC algorithm with degree vector > > > Key: SPARK-5512 > URL: https://issues.apache.org/jira/browse/SPARK-5512 > Project: Spark > Issue Type: Improvement >Reporter: Liang-Chi Hsieh >Priority: Minor > > As suggested by the paper of Power Iteration Clustering, it is useful to set > the initial vector v0 as the degree vector d. This pr tries to add a running > method for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5512) Run the PIC algorithm with degree vector
[ https://issues.apache.org/jira/browse/SPARK-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300191#comment-14300191 ] Apache Spark commented on SPARK-5512: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/4301 > Run the PIC algorithm with degree vector > > > Key: SPARK-5512 > URL: https://issues.apache.org/jira/browse/SPARK-5512 > Project: Spark > Issue Type: Improvement >Reporter: Liang-Chi Hsieh >Priority: Minor > > As suggested by the paper of Power Iteration Clustering, it is useful to set > the initial vector v0 as the degree vector d. This pr tries to add a running > method for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5512) Run the PIC algorithm with degree vector
Liang-Chi Hsieh created SPARK-5512: -- Summary: Run the PIC algorithm with degree vector Key: SPARK-5512 URL: https://issues.apache.org/jira/browse/SPARK-5512 Project: Spark Issue Type: Improvement Reporter: Liang-Chi Hsieh As suggested by the paper of Power Iteration Clustering, it is useful to set the initial vector v0 as the degree vector d. This pr tries to add a running method for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen closed SPARK-5510. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hash-x updated SPARK-5510: -- Comment: was deleted (was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala.) > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300167#comment-14300167 ] hash-x commented on SPARK-5510: --- mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300169#comment-14300169 ] hash-x commented on SPARK-5510: --- mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300171#comment-14300171 ] hash-x commented on SPARK-5510: --- mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300159#comment-14300159 ] hash-x commented on SPARK-5510: --- mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300164#comment-14300164 ] hash-x commented on SPARK-5510: --- mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300175#comment-14300175 ] hash-x commented on SPARK-5510: --- mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?
[ https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300157#comment-14300157 ] hash-x commented on SPARK-5510: --- mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a beginner at spark and scala. > How can I fix the spark-submit script and then running the program on cluster > ? > --- > > Key: SPARK-5510 > URL: https://issues.apache.org/jira/browse/SPARK-5510 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.0.2 >Reporter: hash-x > Labels: Help!!, spark-submit > > Reference: My Question is how can I fix the script and can submit the program > to a Master from my laptop? Not submit the program from a cluster. Submit > program from Node 2 is work for me.But the laptop is not!How can i do to fix > ??? help!!! > I have looked the follow Email and I accept the recommend of One - run > spark-shell from a cluster node! But I want to solve the program with the > recommend of 2.But I am confused.. > Hi Ken, > This is unfortunately a limitation of spark-shell and the way it works on the > standalone mode. > spark-shell sets an environment variable, SPARK_HOME, which tells Spark where > to find its > code installed on the cluster. This means that the path on your laptop must > be the same as > on the cluster, which is not the case. I recommend one of two things: > 1) Either run spark-shell from a cluster node, where it will have the right > path. (In general > it’s also better for performance to have it close to the cluster) > 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it > runs the Java > command (ugly but will probably work). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org