[jira] [Resolved] (SPARK-5478) Add miss right parenthesis in Stage page Pending stages label

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5478.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Saisai Shao

> Add miss right parenthesis in Stage page Pending stages label
> -
>
> Key: SPARK-5478
> URL: https://issues.apache.org/jira/browse/SPARK-5478
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 1.3.0
>
>
> right parenthesis is missing in one label, minor problem in UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5478) Add miss right parenthesis in Stage page Pending stages label

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5478:
---
Affects Version/s: 1.3.0

> Add miss right parenthesis in Stage page Pending stages label
> -
>
> Key: SPARK-5478
> URL: https://issues.apache.org/jira/browse/SPARK-5478
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 1.3.0
>
>
> right parenthesis is missing in one label, minor problem in UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-02-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300968#comment-14300968
 ] 

Apache Spark commented on SPARK-5492:
-

User 'sryza' has created a pull request for this issue:
https://github.com/apache/spark/pull/4305

> Thread statistics can break with older Hadoop versions
> --
>
> Key: SPARK-5492
> URL: https://issues.apache.org/jira/browse/SPARK-5492
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Sandy Ryza
>Priority: Blocker
>
> {code}
>  java.lang.ClassNotFoundException: 
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118)
> at scala.Option.orElse(Option.scala:257)
> {code}
> I think the issue is we need to catch ClassNotFoundException here:
> https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144
> However, I'm really confused how this didn't fail our unit tests, since we 
> explicitly tried to test this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300939#comment-14300939
 ] 

yuhao yang commented on SPARK-5510:
---

https://spark.apache.org/community.html
check the mailing list section.

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5021) GaussianMixtureEM should be faster for SparseVector input

2015-02-01 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300931#comment-14300931
 ] 

Joseph K. Bradley commented on SPARK-5021:
--

It should be similar to GLMs: Take an RDD of Vectors.  Inside GaussianMixture, 
you can match-case on Vector to see if it is Dense or Sparse and handle as 
needed.

> GaussianMixtureEM should be faster for SparseVector input
> -
>
> Key: SPARK-5021
> URL: https://issues.apache.org/jira/browse/SPARK-5021
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Manoj Kumar
>
> GaussianMixtureEM currently converts everything to dense vectors.  It would 
> be nice if it were faster for SparseVectors (running in time linear in the 
> number of non-zero values).
> However, this may not be too important since clustering should rarely be done 
> in high dimensions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5021) GaussianMixtureEM should be faster for SparseVector input

2015-02-01 Thread Manoj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300927#comment-14300927
 ] 

Manoj Kumar commented on SPARK-5021:


I see that it is resolved in master.

What do you think should be the preferred datatype, to handle an array of 
SparseVectors? Do we use CoordinateMatrix? This might involve improving 
CoordinateMatrix to add additional functionality.

> GaussianMixtureEM should be faster for SparseVector input
> -
>
> Key: SPARK-5021
> URL: https://issues.apache.org/jira/browse/SPARK-5021
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Manoj Kumar
>
> GaussianMixtureEM currently converts everything to dense vectors.  It would 
> be nice if it were faster for SparseVectors (running in time linear in the 
> number of non-zero values).
> However, this may not be too important since clustering should rarely be done 
> in high dimensions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string

2015-02-01 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-5523:
-
Description: 
 TaskMetrics and TaskInfo objects have the hostname associated with the task. 
As these are created (directly or through deserialization of RPC messages), 
each of them have a separate String object for the hostname even though most of 
them have the same string data in them. This results in thousands of string 
objects, increasing memory requirement of the driver. 
This can be easily deduped when deserializing a TaskMetrics object, or when 
creating a TaskInfo object.

This affects streaming particularly bad due to the rate of job/stage/task 
generation. 

For solution, see how this dedup is done for StorageLevel. 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226
 

  was:
 TaskMetrics and TaskInfo objects have the hostname associated with the task. 
As these are created (directly or through deserialization of RPC messages), 
each of them have a separate String object for the hostname even though most of 
them have the same string data in them. This results in thousands of string 
objects, increasing memory requirement of the driver. 
This can be easily deduped when deserializing a TaskMetrics object, or when 
creating a TaskInfo object (in TaskSchedulerImpl).

This affects streaming particularly bad due to the rate of job/stage/task 
generation. 

For solution, see how this dedup is done for StorageLevel. 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226
 


> TaskMetrics and TaskInfo have innumerable copies of the hostname string
> ---
>
> Key: SPARK-5523
> URL: https://issues.apache.org/jira/browse/SPARK-5523
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Streaming
>Reporter: Tathagata Das
>
>  TaskMetrics and TaskInfo objects have the hostname associated with the task. 
> As these are created (directly or through deserialization of RPC messages), 
> each of them have a separate String object for the hostname even though most 
> of them have the same string data in them. This results in thousands of 
> string objects, increasing memory requirement of the driver. 
> This can be easily deduped when deserializing a TaskMetrics object, or when 
> creating a TaskInfo object.
> This affects streaming particularly bad due to the rate of job/stage/task 
> generation. 
> For solution, see how this dedup is done for StorageLevel. 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string

2015-02-01 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-5523:
-
Description: 
 TaskMetrics and TaskInfo objects have the hostname associated with the task. 
As these are created (directly or through deserialization of RPC messages), 
each of them have a separate String object for the hostname even though most of 
them have the same string data in them. This results in thousands of string 
objects, increasing memory requirement of the driver. 
This can be easily deduped when deserializing a TaskMetrics object, or when 
creating a TaskInfo object (in TaskSchedulerImpl).

This affects streaming particularly bad due to the rate of job/stage/task 
generation. 

For solution, see how this dedup is done for StorageLevel. 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226
 

  was:
 TaskMetrics and TaskInfo objects have the hostname associated with the task. 
As these are created (directly or through deserialization of RPC messages), 
each of them have a separate String object for the hostname even though most of 
them have the same string data in them. This results in thousands of string 
objects, increasing memory requirement of the driver. 
This can be easily deduped when deserializing a TaskMetrics object, or when 
creating a TaskInfo object (in TaskSchedulerImpl).

This affects streaming particularly bad due to the rate of job/stage/task 
generation. 

 


> TaskMetrics and TaskInfo have innumerable copies of the hostname string
> ---
>
> Key: SPARK-5523
> URL: https://issues.apache.org/jira/browse/SPARK-5523
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Streaming
>Reporter: Tathagata Das
>
>  TaskMetrics and TaskInfo objects have the hostname associated with the task. 
> As these are created (directly or through deserialization of RPC messages), 
> each of them have a separate String object for the hostname even though most 
> of them have the same string data in them. This results in thousands of 
> string objects, increasing memory requirement of the driver. 
> This can be easily deduped when deserializing a TaskMetrics object, or when 
> creating a TaskInfo object (in TaskSchedulerImpl).
> This affects streaming particularly bad due to the rate of job/stage/task 
> generation. 
> For solution, see how this dedup is done for StorageLevel. 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string

2015-02-01 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-5523:
-
Description: 
 TaskMetrics and TaskInfo objects have the hostname associated with the task. 
As these are created (directly or through deserialization of RPC messages), 
each of them have a separate String object for the hostname even though most of 
them have the same string data in them. This results in thousands of string 
objects, increasing memory requirement of the driver. 
This can be easily deduped when deserializing a TaskMetrics object, or when 
creating a TaskInfo object (in TaskSchedulerImpl).

This affects streaming particularly bad due to the rate of job/stage/task 
generation. 

 

  was:
 TaskMetrics and TaskInfo objects have the hostname associated with the task. 
As these are created (directly or through deserialization of RPC messages), 
each of them have a separate String object for the hostname even though most of 
them have the same string data in them. This results in thousands of string 
objects, increasing memory requirement of the driver. 
This can be easily deduped when deserializing a TaskMetrics object, or when 
creating a TaskInfo object (in TaskSchedulerImpl).

 


> TaskMetrics and TaskInfo have innumerable copies of the hostname string
> ---
>
> Key: SPARK-5523
> URL: https://issues.apache.org/jira/browse/SPARK-5523
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Streaming
>Reporter: Tathagata Das
>
>  TaskMetrics and TaskInfo objects have the hostname associated with the task. 
> As these are created (directly or through deserialization of RPC messages), 
> each of them have a separate String object for the hostname even though most 
> of them have the same string data in them. This results in thousands of 
> string objects, increasing memory requirement of the driver. 
> This can be easily deduped when deserializing a TaskMetrics object, or when 
> creating a TaskInfo object (in TaskSchedulerImpl).
> This affects streaming particularly bad due to the rate of job/stage/task 
> generation. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string

2015-02-01 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-5523:
-
Component/s: Streaming
 Spark Core

> TaskMetrics and TaskInfo have innumerable copies of the hostname string
> ---
>
> Key: SPARK-5523
> URL: https://issues.apache.org/jira/browse/SPARK-5523
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Streaming
>Reporter: Tathagata Das
>
>  TaskMetrics and TaskInfo objects have the hostname associated with the task. 
> As these are created (directly or through deserialization of RPC messages), 
> each of them have a separate String object for the hostname even though most 
> of them have the same string data in them. This results in thousands of 
> string objects, increasing memory requirement of the driver. 
> This can be easily deduped when deserializing a TaskMetrics object, or when 
> creating a TaskInfo object (in TaskSchedulerImpl).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string

2015-02-01 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-5523:


 Summary: TaskMetrics and TaskInfo have innumerable copies of the 
hostname string
 Key: SPARK-5523
 URL: https://issues.apache.org/jira/browse/SPARK-5523
 Project: Spark
  Issue Type: Bug
Reporter: Tathagata Das


 TaskMetrics and TaskInfo objects have the hostname associated with the task. 
As these are created (directly or through deserialization of RPC messages), 
each of them have a separate String object for the hostname even though most of 
them have the same string data in them. This results in thousands of string 
objects, increasing memory requirement of the driver. 
This can be easily deduped when deserializing a TaskMetrics object, or when 
creating a TaskInfo object (in TaskSchedulerImpl).

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization

2015-02-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300913#comment-14300913
 ] 

Patrick Wendell commented on SPARK-4349:


[~mcheah] - minor, but when we close an issue that is a duplicate, we typically 
resolve it as "Duplicate" instead of fixed. Also the duplication link usually 
goes in the other direction, i.e. we say that this duplicates SPARK-4737 rather 
than "is duplicated by", because SPARK-4737 was merged.

> Spark driver hangs on sc.parallelize() if exception is thrown during 
> serialization
> --
>
> Key: SPARK-4349
> URL: https://issues.apache.org/jira/browse/SPARK-4349
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Executing the following in the Spark Shell will lead to the Spark Shell 
> hanging after a stack trace is printed. The serializer is set to the Kryo 
> serializer.
> {code}
> scala> import com.esotericsoftware.kryo.io.Input
> import com.esotericsoftware.kryo.io.Input
> scala> import com.esotericsoftware.kryo.io.Output
> import com.esotericsoftware.kryo.io.Output
> scala> class MyKryoSerializable extends 
> com.esotericsoftware.kryo.KryoSerializable { def write (kryo: 
> com.esotericsoftware.kryo.Kryo, output: Output) { throw new 
> com.esotericsoftware.kryo.KryoException; } ; def read (kryo: 
> com.esotericsoftware.kryo.Kryo, input: Input) { throw new 
> com.esotericsoftware.kryo.KryoException; } }
> defined class MyKryoSerializable
> scala> sc.parallelize(Seq(new MyKryoSerializable, new 
> MyKryoSerializable)).collect
> {code}
> A stack trace is printed during serialization as expected, but another stack 
> trace is printed afterwards, indicating that the driver can't recover:
> {code}
> 14/11/11 14:10:03 ERROR OneForOneStrategy: actor name [ExecutorActor] is not 
> unique!
> akka.actor.PostRestartException: exception post restart (class 
> java.io.IOException)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:249)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:247)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:302)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:297)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:247)
>   at 
> akka.actor.dungeon.FaultHandling$class.faultRecreate(FaultHandling.scala:76)
>   at akka.actor.ActorCell.faultRecreate(ActorCell.scala:369)
>   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:459)
>   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: akka.actor.InvalidActorNameException: actor name [ExecutorActor] 
> is not unique!
>   at 
> akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130)
>   at akka.actor.dungeon.Children$class.reserveChild(Children.scala:77)
>   at akka.actor.ActorCell.reserveChild(ActorCell.scala:369)
>   at akka.actor.dungeon.Children$class.makeChild(Children.scala:202)
>   at akka.actor.dungeon.Children$class.attachChild(Children.scala:42)
>   at akka.actor.ActorCell.attachChild(ActorCell.scala:369)
>   at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552)
>   at org.apache.spark.executor.Executor.(Executor.scala:97)
>   at 
> org.apache.spark.scheduler.local.LocalActor.(LocalBackend.scala:53)
>   at 
> org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96)
>   at 
> org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96)
>   at akka.actor.TypedCreatorFunctionConsumer.produce(Props

[jira] [Reopened] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-4349:


> Spark driver hangs on sc.parallelize() if exception is thrown during 
> serialization
> --
>
> Key: SPARK-4349
> URL: https://issues.apache.org/jira/browse/SPARK-4349
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Executing the following in the Spark Shell will lead to the Spark Shell 
> hanging after a stack trace is printed. The serializer is set to the Kryo 
> serializer.
> {code}
> scala> import com.esotericsoftware.kryo.io.Input
> import com.esotericsoftware.kryo.io.Input
> scala> import com.esotericsoftware.kryo.io.Output
> import com.esotericsoftware.kryo.io.Output
> scala> class MyKryoSerializable extends 
> com.esotericsoftware.kryo.KryoSerializable { def write (kryo: 
> com.esotericsoftware.kryo.Kryo, output: Output) { throw new 
> com.esotericsoftware.kryo.KryoException; } ; def read (kryo: 
> com.esotericsoftware.kryo.Kryo, input: Input) { throw new 
> com.esotericsoftware.kryo.KryoException; } }
> defined class MyKryoSerializable
> scala> sc.parallelize(Seq(new MyKryoSerializable, new 
> MyKryoSerializable)).collect
> {code}
> A stack trace is printed during serialization as expected, but another stack 
> trace is printed afterwards, indicating that the driver can't recover:
> {code}
> 14/11/11 14:10:03 ERROR OneForOneStrategy: actor name [ExecutorActor] is not 
> unique!
> akka.actor.PostRestartException: exception post restart (class 
> java.io.IOException)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:249)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:247)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:302)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:297)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:247)
>   at 
> akka.actor.dungeon.FaultHandling$class.faultRecreate(FaultHandling.scala:76)
>   at akka.actor.ActorCell.faultRecreate(ActorCell.scala:369)
>   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:459)
>   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: akka.actor.InvalidActorNameException: actor name [ExecutorActor] 
> is not unique!
>   at 
> akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130)
>   at akka.actor.dungeon.Children$class.reserveChild(Children.scala:77)
>   at akka.actor.ActorCell.reserveChild(ActorCell.scala:369)
>   at akka.actor.dungeon.Children$class.makeChild(Children.scala:202)
>   at akka.actor.dungeon.Children$class.attachChild(Children.scala:42)
>   at akka.actor.ActorCell.attachChild(ActorCell.scala:369)
>   at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552)
>   at org.apache.spark.executor.Executor.(Executor.scala:97)
>   at 
> org.apache.spark.scheduler.local.LocalActor.(LocalBackend.scala:53)
>   at 
> org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96)
>   at 
> org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96)
>   at akka.actor.TypedCreatorFunctionConsumer.produce(Props.scala:343)
>   at akka.actor.Props.newActor(Props.scala:252)
>   at akka.actor.ActorCell.newActor(ActorCell.scala:552)
>   at 
> akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:234)
>   ... 11 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

--

[jira] [Resolved] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4349.

Resolution: Duplicate

> Spark driver hangs on sc.parallelize() if exception is thrown during 
> serialization
> --
>
> Key: SPARK-4349
> URL: https://issues.apache.org/jira/browse/SPARK-4349
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Executing the following in the Spark Shell will lead to the Spark Shell 
> hanging after a stack trace is printed. The serializer is set to the Kryo 
> serializer.
> {code}
> scala> import com.esotericsoftware.kryo.io.Input
> import com.esotericsoftware.kryo.io.Input
> scala> import com.esotericsoftware.kryo.io.Output
> import com.esotericsoftware.kryo.io.Output
> scala> class MyKryoSerializable extends 
> com.esotericsoftware.kryo.KryoSerializable { def write (kryo: 
> com.esotericsoftware.kryo.Kryo, output: Output) { throw new 
> com.esotericsoftware.kryo.KryoException; } ; def read (kryo: 
> com.esotericsoftware.kryo.Kryo, input: Input) { throw new 
> com.esotericsoftware.kryo.KryoException; } }
> defined class MyKryoSerializable
> scala> sc.parallelize(Seq(new MyKryoSerializable, new 
> MyKryoSerializable)).collect
> {code}
> A stack trace is printed during serialization as expected, but another stack 
> trace is printed afterwards, indicating that the driver can't recover:
> {code}
> 14/11/11 14:10:03 ERROR OneForOneStrategy: actor name [ExecutorActor] is not 
> unique!
> akka.actor.PostRestartException: exception post restart (class 
> java.io.IOException)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:249)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:247)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:302)
>   at 
> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:297)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:247)
>   at 
> akka.actor.dungeon.FaultHandling$class.faultRecreate(FaultHandling.scala:76)
>   at akka.actor.ActorCell.faultRecreate(ActorCell.scala:369)
>   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:459)
>   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: akka.actor.InvalidActorNameException: actor name [ExecutorActor] 
> is not unique!
>   at 
> akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130)
>   at akka.actor.dungeon.Children$class.reserveChild(Children.scala:77)
>   at akka.actor.ActorCell.reserveChild(ActorCell.scala:369)
>   at akka.actor.dungeon.Children$class.makeChild(Children.scala:202)
>   at akka.actor.dungeon.Children$class.attachChild(Children.scala:42)
>   at akka.actor.ActorCell.attachChild(ActorCell.scala:369)
>   at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552)
>   at org.apache.spark.executor.Executor.(Executor.scala:97)
>   at 
> org.apache.spark.scheduler.local.LocalActor.(LocalBackend.scala:53)
>   at 
> org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96)
>   at 
> org.apache.spark.scheduler.local.LocalBackend$$anonfun$start$1.apply(LocalBackend.scala:96)
>   at akka.actor.TypedCreatorFunctionConsumer.produce(Props.scala:343)
>   at akka.actor.Props.newActor(Props.scala:252)
>   at akka.actor.ActorCell.newActor(ActorCell.scala:552)
>   at 
> akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:234)
>   ... 11 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SPARK-5500) Document that feeding hadoopFile into a shuffle operation will cause problems

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5500:
---
Priority: Critical  (was: Major)

> Document that feeding hadoopFile into a shuffle operation will cause problems
> -
>
> Key: SPARK-5500
> URL: https://issues.apache.org/jira/browse/SPARK-5500
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.3.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1517:
---
Assignee: Nicholas Chammas

> Publish nightly snapshots of documentation, maven artifacts, and binary builds
> --
>
> Key: SPARK-1517
> URL: https://issues.apache.org/jira/browse/SPARK-1517
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Patrick Wendell
>Assignee: Nicholas Chammas
>Priority: Blocker
>
> Should be pretty easy to do with Jenkins. The only thing I can think of that 
> would be tricky is to set up credentials so that jenkins can publish this 
> stuff somewhere on apache infra.
> Ideally we don't want to have to put a private key on every jenkins box 
> (since they are otherwise pretty stateless). One idea is to encrypt these 
> credentials with a passphrase and post them somewhere publicly visible. Then 
> the jenkins build can download the credentials provided we set a passphrase 
> in an environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5522) Accelerate the Histroty Server start

2015-02-01 Thread Mars Gu (JIRA)
Mars Gu created SPARK-5522:
--

 Summary: Accelerate the Histroty Server start
 Key: SPARK-5522
 URL: https://issues.apache.org/jira/browse/SPARK-5522
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Mars Gu


When starting the history server, all the log files will be fetched and parsed 
in order to get the applications' meta data e.g. App Name, Start Time, 
Duration, etc. In our production cluster, there exist 2600 log files (160G) in 
HDFS and it costs 3 hours to restart the history server, which is a little bit 
too long for us.

It would be better, if the history server does not fetch all the log files but 
only the meta data during start-up.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-02-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300905#comment-14300905
 ] 

Patrick Wendell commented on SPARK-5492:


[~sandyr] I share your confusion, Sandy. Nonetheless, catching 
ClassNotFoundException does seem reasonable. If anything it's just a better 
hedge against random Hadoop versions that might have intermediate sets of 
functionality. What do you think?

> Thread statistics can break with older Hadoop versions
> --
>
> Key: SPARK-5492
> URL: https://issues.apache.org/jira/browse/SPARK-5492
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Sandy Ryza
>Priority: Blocker
>
> {code}
>  java.lang.ClassNotFoundException: 
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118)
> at scala.Option.orElse(Option.scala:257)
> {code}
> I think the issue is we need to catch ClassNotFoundException here:
> https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144
> However, I'm really confused how this didn't fail our unit tests, since we 
> explicitly tried to test this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5508) [hive context] Unable to query array once saved as parquet

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5508:
---
Component/s: (was: spark sql)
 SQL

> [hive context] Unable to query array once saved as parquet
> --
>
> Key: SPARK-5508
> URL: https://issues.apache.org/jira/browse/SPARK-5508
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
> Environment: mesos, cdh
>Reporter: Ayoub Benali
>  Labels: hivecontext, parquet
>
> When the table is saved as parquet, we cannot query a field which is an array 
> of struct, like show bellow:  
> {noformat}
> scala> val data1="""{
>  | "timestamp": 1422435598,
>  | "data_array": [
>  | {
>  | "field1": 1,
>  | "field2": 2
>  | }
>  | ]
>  | }"""
> scala> val data2="""{
>  | "timestamp": 1422435598,
>  | "data_array": [
>  | {
>  | "field1": 3,
>  | "field2": 4
>  | }
>  | ]
> scala> val jsonRDD = sc.makeRDD(data1 :: data2 :: Nil)
> scala> val rdd = hiveContext.jsonRDD(jsonRDD)
> scala> rdd.printSchema
> root
>  |-- data_array: array (nullable = true)
>  ||-- element: struct (containsNull = false)
>  |||-- field1: integer (nullable = true)
>  |||-- field2: integer (nullable = true)
>  |-- timestamp: integer (nullable = true)
> scala> rdd.registerTempTable("tmp_table")
> scala> hiveContext.sql("select data.field1 from tmp_table LATERAL VIEW 
> explode(data_array) nestedStuff AS data").collect
> res3: Array[org.apache.spark.sql.Row] = Array([1], [3])
> scala> hiveContext.sql("SET hive.exec.dynamic.partition = true")
> scala> hiveContext.sql("SET hive.exec.dynamic.partition.mode = nonstrict")
> scala> hiveContext.sql("set parquet.compression=GZIP")
> scala> hiveContext.setConf("spark.sql.parquet.binaryAsString", "true")
> scala> hiveContext.sql("create external table if not exists 
> persisted_table(data_array ARRAY >, 
> timestamp INT) STORED AS PARQUET Location 'hdfs:///test_table'")
> scala> hiveContext.sql("insert into table persisted_table select * from 
> tmp_table").collect
> scala> hiveContext.sql("select data.field1 from persisted_table LATERAL VIEW 
> explode(data_array) nestedStuff AS data").collect
> parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in 
> file hdfs://*/test_table/part-1
>   at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213)
>   at 
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204)
>   at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:145)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:797)
>   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:797)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(

[jira] [Updated] (SPARK-5508) [hive context] Unable to query array once saved as parquet

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5508:
---
Component/s: (was: Spark Core)
 spark sql

> [hive context] Unable to query array once saved as parquet
> --
>
> Key: SPARK-5508
> URL: https://issues.apache.org/jira/browse/SPARK-5508
> Project: Spark
>  Issue Type: Bug
>  Components: spark sql
>Affects Versions: 1.2.1
> Environment: mesos, cdh
>Reporter: Ayoub Benali
>  Labels: hivecontext, parquet
>
> When the table is saved as parquet, we cannot query a field which is an array 
> of struct, like show bellow:  
> {noformat}
> scala> val data1="""{
>  | "timestamp": 1422435598,
>  | "data_array": [
>  | {
>  | "field1": 1,
>  | "field2": 2
>  | }
>  | ]
>  | }"""
> scala> val data2="""{
>  | "timestamp": 1422435598,
>  | "data_array": [
>  | {
>  | "field1": 3,
>  | "field2": 4
>  | }
>  | ]
> scala> val jsonRDD = sc.makeRDD(data1 :: data2 :: Nil)
> scala> val rdd = hiveContext.jsonRDD(jsonRDD)
> scala> rdd.printSchema
> root
>  |-- data_array: array (nullable = true)
>  ||-- element: struct (containsNull = false)
>  |||-- field1: integer (nullable = true)
>  |||-- field2: integer (nullable = true)
>  |-- timestamp: integer (nullable = true)
> scala> rdd.registerTempTable("tmp_table")
> scala> hiveContext.sql("select data.field1 from tmp_table LATERAL VIEW 
> explode(data_array) nestedStuff AS data").collect
> res3: Array[org.apache.spark.sql.Row] = Array([1], [3])
> scala> hiveContext.sql("SET hive.exec.dynamic.partition = true")
> scala> hiveContext.sql("SET hive.exec.dynamic.partition.mode = nonstrict")
> scala> hiveContext.sql("set parquet.compression=GZIP")
> scala> hiveContext.setConf("spark.sql.parquet.binaryAsString", "true")
> scala> hiveContext.sql("create external table if not exists 
> persisted_table(data_array ARRAY >, 
> timestamp INT) STORED AS PARQUET Location 'hdfs:///test_table'")
> scala> hiveContext.sql("insert into table persisted_table select * from 
> tmp_table").collect
> scala> hiveContext.sql("select data.field1 from persisted_table LATERAL VIEW 
> explode(data_array) nestedStuff AS data").collect
> parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in 
> file hdfs://*/test_table/part-1
>   at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213)
>   at 
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204)
>   at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:145)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:797)
>   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:797)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayLis

[jira] [Updated] (SPARK-5521) PCA wrapper for easy transform vectors

2015-02-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5521:
-
Target Version/s: 1.4.0

> PCA wrapper for easy transform vectors
> --
>
> Key: SPARK-5521
> URL: https://issues.apache.org/jira/browse/SPARK-5521
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Kirill A. Korinskiy
>Assignee: Kirill A. Korinskiy
>
> Implement a simple PCA wrapper for easy transform of vectors by PCA for 
> example LabeledPoint or another complicated structure.
> Now all PCA transformation may take only matrix and haven't got any way to 
> take project from vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5521) PCA wrapper for easy transform vectors

2015-02-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5521:
-
Assignee: Kirill A. Korinskiy

> PCA wrapper for easy transform vectors
> --
>
> Key: SPARK-5521
> URL: https://issues.apache.org/jira/browse/SPARK-5521
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Kirill A. Korinskiy
>Assignee: Kirill A. Korinskiy
>
> Implement a simple PCA wrapper for easy transform of vectors by PCA for 
> example LabeledPoint or another complicated structure.
> Now all PCA transformation may take only matrix and haven't got any way to 
> take project from vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5521) PCA wrapper for easy transform vectors

2015-02-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300902#comment-14300902
 ] 

Apache Spark commented on SPARK-5521:
-

User 'catap' has created a pull request for this issue:
https://github.com/apache/spark/pull/4304

> PCA wrapper for easy transform vectors
> --
>
> Key: SPARK-5521
> URL: https://issues.apache.org/jira/browse/SPARK-5521
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Kirill A. Korinskiy
>
> Implement a simple PCA wrapper for easy transform of vectors by PCA for 
> example LabeledPoint or another complicated structure.
> Now all PCA transformation may take only matrix and haven't got any way to 
> take project from vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5353) Log failures in ExceutorClassLoader

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5353.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Tobias Schlatter

> Log failures in ExceutorClassLoader
> ---
>
> Key: SPARK-5353
> URL: https://issues.apache.org/jira/browse/SPARK-5353
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Reporter: Tobias Schlatter
>Assignee: Tobias Schlatter
>Priority: Minor
> Fix For: 1.3.0
>
>
> When the ExecutorClassLoader tries to load classes compiled in the Spark 
> Shell and fails, it silently passes loading to the parent ClassLoader. It 
> should log these failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5515) Build fails with spark-ganglia-lgpl profile

2015-02-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5515.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Kousuke Saruta

> Build fails with spark-ganglia-lgpl profile
> ---
>
> Key: SPARK-5515
> URL: https://issues.apache.org/jira/browse/SPARK-5515
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Blocker
> Fix For: 1.3.0
>
>
> Build fails with spark-ganglia-lgpl profile at the moment. This is because 
> pom.xml for spark-ganglia-lgpl is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5515) Build fails with spark-ganglia-lgpl profile

2015-02-01 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300897#comment-14300897
 ] 

Andrew Or commented on SPARK-5515:
--

yes I just closed this

> Build fails with spark-ganglia-lgpl profile
> ---
>
> Key: SPARK-5515
> URL: https://issues.apache.org/jira/browse/SPARK-5515
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Blocker
> Fix For: 1.3.0
>
>
> Build fails with spark-ganglia-lgpl profile at the moment. This is because 
> pom.xml for spark-ganglia-lgpl is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5515) Build fails with spark-ganglia-lgpl profile

2015-02-01 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300894#comment-14300894
 ] 

Patrick Wendell commented on SPARK-5515:


[~andrewor14] is this fixed now?

> Build fails with spark-ganglia-lgpl profile
> ---
>
> Key: SPARK-5515
> URL: https://issues.apache.org/jira/browse/SPARK-5515
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Priority: Blocker
>
> Build fails with spark-ganglia-lgpl profile at the moment. This is because 
> pom.xml for spark-ganglia-lgpl is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5208) Add more documentation to Netty-based configs

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5208.

Resolution: Won't Fix

>  Add more documentation to Netty-based configs
> --
>
> Key: SPARK-5208
> URL: https://issues.apache.org/jira/browse/SPARK-5208
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>
> SPARK-4864 added some documentation about Netty-based configs but I think we 
> need more. I think following configs can be useful for performance tuning.
> * spark.shuffle.io.mode
> * spark.shuffle.io.backLog
> * spark.shuffle.io.receiveBuffer
> * spark.shuffle.io.sendBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5521) PCA wrapper for easy transform vectors

2015-02-01 Thread Kirill A. Korinskiy (JIRA)
Kirill A. Korinskiy created SPARK-5521:
--

 Summary: PCA wrapper for easy transform vectors
 Key: SPARK-5521
 URL: https://issues.apache.org/jira/browse/SPARK-5521
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Kirill A. Korinskiy


Implement a simple PCA wrapper for easy transform of vectors by PCA for example 
LabeledPoint or another complicated structure.

Now all PCA transformation may take only matrix and haven't got any way to take 
project from vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5021) GaussianMixtureEM should be faster for SparseVector input

2015-02-01 Thread Manoj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300889#comment-14300889
 ] 

Manoj Kumar commented on SPARK-5021:


Sorry for the delay, I just started going through the source. Just a random 
question, why is this model named GaussianMixtureEM? Shouldn't it be renamed 
just GaussianMixtureModel since EM is just an optimization algo.

> GaussianMixtureEM should be faster for SparseVector input
> -
>
> Key: SPARK-5021
> URL: https://issues.apache.org/jira/browse/SPARK-5021
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Manoj Kumar
>
> GaussianMixtureEM currently converts everything to dense vectors.  It would 
> be nice if it were faster for SparseVectors (running in time linear in the 
> number of non-zero values).
> However, this may not be too important since clustering should rarely be done 
> in high dimensions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3996) Shade Jetty in Spark deliverables

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3996.

Resolution: Fixed

I've merged a new patch so closing this for now.

> Shade Jetty in Spark deliverables
> -
>
> Key: SPARK-3996
> URL: https://issues.apache.org/jira/browse/SPARK-3996
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Mingyu Kim
>Assignee: Patrick Wendell
> Fix For: 1.3.0
>
>
> We'd like to use Spark in a Jetty 9 server, and it's causing a version 
> conflict. Given that Spark's dependency on Jetty is light, it'd be a good 
> idea to shade this dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5420) Cross-langauge load/store functions for creating and saving DataFrames

2015-02-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300873#comment-14300873
 ] 

Apache Spark commented on SPARK-5420:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/4294

> Cross-langauge load/store functions for creating and saving DataFrames
> --
>
> Key: SPARK-5420
> URL: https://issues.apache.org/jira/browse/SPARK-5420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Patrick Wendell
>Assignee: Yin Huai
>Priority: Blocker
>
> We should have standard API's for loading or saving a table from a data 
> store. Per comment discussion:
> {code}
> def loadData(datasource: String, parameters: Map[String, String]): DataFrame
> def loadData(datasource: String, parameters: java.util.Map[String, String]): 
> DataFrame
> def storeData(datasource: String, parameters: Map[String, String]): DataFrame
> def storeData(datasource: String, parameters: java.util.Map[String, String]): 
> DataFrame
> {code}
> Python should have this too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-01 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300868#comment-14300868
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi,

Sorry, i saw just this update of yours. Missed last two comments, will work on 
it.

Thanks,

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---
>
> Key: SPARK-4705
> URL: https://issues.apache.org/jira/browse/SPARK-4705
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If 
> even logging is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory 
> hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
>  already exists!
> at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
> at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
> at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
> at org.apache.spark.SparkContext.(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app 
> should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-01 Thread Twinkle Sachdeva (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300867#comment-14300867
 ] 

Twinkle Sachdeva commented on SPARK-4705:
-

Hi,

Sorry, i saw just this update of yours. Missed last two comments, will work on 
it.

Thanks,

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---
>
> Key: SPARK-4705
> URL: https://issues.apache.org/jira/browse/SPARK-4705
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If 
> even logging is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory 
> hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
>  already exists!
> at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
> at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
> at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
> at org.apache.spark.SparkContext.(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app 
> should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-01 Thread Twinkle Sachdeva (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Twinkle Sachdeva updated SPARK-4705:

Comment: was deleted

(was: Hi,

Sorry, i saw just this update of yours. Missed last two comments, will work on 
it.

Thanks,)

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---
>
> Key: SPARK-4705
> URL: https://issues.apache.org/jira/browse/SPARK-4705
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If 
> even logging is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory 
> hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
>  already exists!
> at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
> at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
> at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
> at org.apache.spark.SparkContext.(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app 
> should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5406) LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound

2015-02-01 Thread yuhao yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuhao yang closed SPARK-5406.
-

fix and merged. Thanks

> LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound
> -
>
> Key: SPARK-5406
> URL: https://issues.apache.org/jira/browse/SPARK-5406
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.2.0
> Environment: centos, others should be similar
>Reporter: yuhao yang
>Assignee: yuhao yang
>Priority: Minor
> Fix For: 1.3.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In RowMatrix.computeSVD, under LocalLAPACK mode, the code would invoke 
> brzSvd. Yet breeze svd for dense matrix has latent constraint. In it's 
> implementation
> ( 
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/svd.scala
>):
>   val workSize = ( 3
> * scala.math.min(m, n)
> * scala.math.min(m, n)
> + scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n)
>   * scala.math.min(m, n) + 4 * scala.math.min(m, n))
>   )
>   val work = new Array[Double](workSize)
> as a result, column num must satisfy 7 * n * n + 4 * n < Int.MaxValue
> thus, n < 17515.
> This jira is only the first step. If possbile, I hope spark can handle matrix 
> computation up to 80K * 80K.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2015-02-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-4001.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 2847
[https://github.com/apache/spark/pull/2847]

> Add Apriori algorithm to Spark MLlib
> 
>
> Key: SPARK-4001
> URL: https://issues.apache.org/jira/browse/SPARK-4001
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 1.3.0
>
> Attachments: Distributed frequent item mining algorithm based on 
> Spark.pptx
>
>
> Apriori is the classic algorithm for frequent item set mining in a 
> transactional data set.  It will be useful if Apriori algorithm is added to 
> MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5406) LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound

2015-02-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5406.
--
  Resolution: Fixed
   Fix Version/s: 1.3.0
Target Version/s: 1.3.0  (was: 1.2.1)

fixed by https://github.com/apache/spark/pull/4200

> LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound
> -
>
> Key: SPARK-5406
> URL: https://issues.apache.org/jira/browse/SPARK-5406
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.2.0
> Environment: centos, others should be similar
>Reporter: yuhao yang
>Assignee: yuhao yang
>Priority: Minor
> Fix For: 1.3.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In RowMatrix.computeSVD, under LocalLAPACK mode, the code would invoke 
> brzSvd. Yet breeze svd for dense matrix has latent constraint. In it's 
> implementation
> ( 
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/svd.scala
>):
>   val workSize = ( 3
> * scala.math.min(m, n)
> * scala.math.min(m, n)
> + scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n)
>   * scala.math.min(m, n) + 4 * scala.math.min(m, n))
>   )
>   val work = new Array[Double](workSize)
> as a result, column num must satisfy 7 * n * n + 4 * n < Int.MaxValue
> thus, n < 17515.
> This jira is only the first step. If possbile, I hope spark can handle matrix 
> computation up to 80K * 80K.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5406) LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound

2015-02-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5406:
-
Assignee: yuhao yang

> LocalLAPACK mode in RowMatrix.computeSVD should have much smaller upper bound
> -
>
> Key: SPARK-5406
> URL: https://issues.apache.org/jira/browse/SPARK-5406
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.2.0
> Environment: centos, others should be similar
>Reporter: yuhao yang
>Assignee: yuhao yang
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In RowMatrix.computeSVD, under LocalLAPACK mode, the code would invoke 
> brzSvd. Yet breeze svd for dense matrix has latent constraint. In it's 
> implementation
> ( 
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/svd.scala
>):
>   val workSize = ( 3
> * scala.math.min(m, n)
> * scala.math.min(m, n)
> + scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n)
>   * scala.math.min(m, n) + 4 * scala.math.min(m, n))
>   )
>   val work = new Array[Double](workSize)
> as a result, column num must satisfy 7 * n * n + 4 * n < Int.MaxValue
> thus, n < 17515.
> This jira is only the first step. If possbile, I hope spark can handle matrix 
> computation up to 80K * 80K.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5520) Make FP-Growth implementation take generic item types

2015-02-01 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-5520:


 Summary: Make FP-Growth implementation take generic item types
 Key: SPARK-5520
 URL: https://issues.apache.org/jira/browse/SPARK-5520
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Xiangrui Meng


There is not technical restriction on the item types in the FP-Growth 
implementation. We used String in the first PR for simplicity. Maybe we could 
make the type generic before 1.3 (and specialize it for Int/Long).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5519) Add user guide for FP-Growth

2015-02-01 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-5519:


 Summary: Add user guide for FP-Growth
 Key: SPARK-5519
 URL: https://issues.apache.org/jira/browse/SPARK-5519
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib
Reporter: Xiangrui Meng


We need to add a section for FP-Growth in the user guide after we merge the 
FP-Growth PR is merged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5518) Error messages for plans with invalid AttributeReferences

2015-02-01 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-5518:
---

 Summary: Error messages for plans with invalid AttributeReferences
 Key: SPARK-5518
 URL: https://issues.apache.org/jira/browse/SPARK-5518
 Project: Spark
  Issue Type: Sub-task
Reporter: Michael Armbrust
Priority: Blocker


It is now possible for users to put invalid attribute references into query 
plans.  We should check for this case at the end of analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-4981) Add a streaming singular value decomposition

2015-02-01 Thread Reza Zadeh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reza Zadeh updated SPARK-4981:
--
Comment: was deleted

(was: Another option: see slide 31 to solve the problem using IndexedRDDs, 
thanks to Ankur's nice slides and work on IndexedRDD:

https://issues.apache.org/jira/secure/attachment/12656374/2014-07-07-IndexedRDD-design-review.pdf)

> Add a streaming singular value decomposition
> 
>
> Key: SPARK-4981
> URL: https://issues.apache.org/jira/browse/SPARK-4981
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, Streaming
>Reporter: Jeremy Freeman
>
> This is for tracking WIP on a streaming singular value decomposition 
> implementation. This will likely be more complex than the existing streaming 
> algorithms (k-means, regression), but should be possible using the family of 
> sequential update rule outlined in this paper:
> "Fast low-rank modifications of the thin singular value decomposition"
> by Matthew Brand
> http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4981) Add a streaming singular value decomposition

2015-02-01 Thread Reza Zadeh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300849#comment-14300849
 ] 

Reza Zadeh commented on SPARK-4981:
---

Another option: see slide 31 to solve the problem using IndexedRDDs, thanks to 
Ankur's nice slides and work on IndexedRDD:

https://issues.apache.org/jira/secure/attachment/12656374/2014-07-07-IndexedRDD-design-review.pdf

> Add a streaming singular value decomposition
> 
>
> Key: SPARK-4981
> URL: https://issues.apache.org/jira/browse/SPARK-4981
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, Streaming
>Reporter: Jeremy Freeman
>
> This is for tracking WIP on a streaming singular value decomposition 
> implementation. This will likely be more complex than the existing streaming 
> algorithms (k-means, regression), but should be possible using the family of 
> sequential update rule outlined in this paper:
> "Fast low-rank modifications of the thin singular value decomposition"
> by Matthew Brand
> http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4981) Add a streaming singular value decomposition

2015-02-01 Thread Reza Zadeh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300848#comment-14300848
 ] 

Reza Zadeh commented on SPARK-4981:
---

Another option: see slide 31 to solve the problem using IndexedRDDs, thanks to 
Ankur's nice slides and work on IndexedRDD:

https://issues.apache.org/jira/secure/attachment/12656374/2014-07-07-IndexedRDD-design-review.pdf

> Add a streaming singular value decomposition
> 
>
> Key: SPARK-4981
> URL: https://issues.apache.org/jira/browse/SPARK-4981
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, Streaming
>Reporter: Jeremy Freeman
>
> This is for tracking WIP on a streaming singular value decomposition 
> implementation. This will likely be more complex than the existing streaming 
> algorithms (k-means, regression), but should be possible using the family of 
> sequential update rule outlined in this paper:
> "Fast low-rank modifications of the thin singular value decomposition"
> by Matthew Brand
> http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5476) SQLContext.createDataFrame shouldn't be an implicit function

2015-02-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust closed SPARK-5476.
---
Resolution: Won't Fix

We decided this was too hard to do while maintaining compatibility.

> SQLContext.createDataFrame shouldn't be an implicit function
> 
>
> Key: SPARK-5476
> URL: https://issues.apache.org/jira/browse/SPARK-5476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> It is sort of strange to ask users to import sqlContext._ or 
> sqlContext.createDataFrame.
> The proposal here is to ask users to define an implicit val for SQLContext, 
> and then dsl package object should include an implicit function that converts 
> an RDD[Product] to a DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5517) Add input types for Java UDFs

2015-02-01 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-5517:
---

 Summary: Add input types for Java UDFs
 Key: SPARK-5517
 URL: https://issues.apache.org/jira/browse/SPARK-5517
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.3.0
Reporter: Michael Armbrust
Assignee: Reynold Xin
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5182) Partitioning support for tables created by the data source API

2015-02-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-5182:

Target Version/s: 1.4.0  (was: 1.3.0)

> Partitioning support for tables created by the data source API
> --
>
> Key: SPARK-5182
> URL: https://issues.apache.org/jira/browse/SPARK-5182
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4867) UDF clean up

2015-02-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-4867:

Target Version/s: 1.4.0  (was: 1.3.0)

> UDF clean up
> 
>
> Key: SPARK-4867
> URL: https://issues.apache.org/jira/browse/SPARK-4867
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Priority: Blocker
>
> Right now our support and internal implementation of many functions has a few 
> issues.  Specifically:
>  - UDFS don't know their input types and thus don't do type coercion.
>  - We hard code a bunch of built in functions into the parser.  This is bad 
> because in SQL it creates new reserved words for things that aren't actually 
> keywords.  Also it means that for each function we need to add support to 
> both SQLContext and HiveContext separately.
> For this JIRA I propose we do the following:
>  - Change the interfaces for registerFunction and ScalaUdf to include types 
> for the input arguments as well as the output type.
>  - Add a rule to analysis that does type coercion for UDFs.
>  - Add a parse rule for functions to SQLParser.
>  - Rewrite all the UDFs that are currently hacked into the various parsers 
> using this new functionality.
> Depending on how big this refactoring becomes we could split parts 1&2 from 
> part 3 above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5465) Data source version of Parquet doesn't push down And filters properly

2015-02-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5465.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4255
[https://github.com/apache/spark/pull/4255]

> Data source version of Parquet doesn't push down And filters properly
> -
>
> Key: SPARK-5465
> URL: https://issues.apache.org/jira/browse/SPARK-5465
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Cheng Lian
>Priority: Blocker
> Fix For: 1.3.0
>
>
> The current implementation combines all predicates and then tries to convert 
> it to a single Parquet filter predicate. In this way, the Parquet filter 
> predicate can not be generated if any component of the original filters can 
> not be converted. (code lines 
> [here|https://github.com/apache/spark/blob/a731314c319a6f265060e05267844069027804fd/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala#L197-L201]).
> For example, {{a > 10 AND a < 20}} can be successfully converted, while {{a > 
> 10 AND a < b}} can't because Parquet doesn't accept filters like {{a < b}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5262) widen types for parameters of coalesce()

2015-02-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5262.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4057
[https://github.com/apache/spark/pull/4057]

> widen types for parameters of coalesce()
> 
>
> Key: SPARK-5262
> URL: https://issues.apache.org/jira/browse/SPARK-5262
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Adrian Wang
> Fix For: 1.3.0
>
>
> Currently Coalesce(null, 1, null) would throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3454) Expose JSON representation of data shown in WebUI

2015-02-01 Thread Ryan Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300834#comment-14300834
 ] 

Ryan Williams commented on SPARK-3454:
--

I think it looks great, [~imranr]!

Is the logical next step to flesh out exactly what the new POJOs will be?

> Expose JSON representation of data shown in WebUI
> -
>
> Key: SPARK-3454
> URL: https://issues.apache.org/jira/browse/SPARK-3454
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
> Attachments: sparkmonitoringjsondesign.pdf
>
>
> If WebUI support to JSON format extracting, it's helpful for user who want to 
> analyse stage / task / executor information.
> Fortunately, WebUI has renderJson method so we can implement the method in 
> each subclass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5196) Add comment field in Create Table Field DDL

2015-02-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5196.
-
Resolution: Fixed

Issue resolved by pull request 3999
[https://github.com/apache/spark/pull/3999]

> Add comment field in Create Table Field DDL
> ---
>
> Key: SPARK-5196
> URL: https://issues.apache.org/jira/browse/SPARK-5196
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: shengli
> Fix For: 1.3.0
>
>
> Support `comment` in Create Table Field DDL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-1825) Windows Spark fails to work with Linux YARN

2015-02-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-1825.

   Resolution: Fixed
Fix Version/s: 1.3.0

> Windows Spark fails to work with Linux YARN
> ---
>
> Key: SPARK-1825
> URL: https://issues.apache.org/jira/browse/SPARK-1825
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Taeyun Kim
>Assignee: Masayoshi TSUZUKI
> Fix For: 1.3.0
>
> Attachments: SPARK-1825.patch
>
>
> Windows Spark fails to work with Linux YARN.
> This is a cross-platform problem.
> This error occurs when 'yarn-client' mode is used.
> (yarn-cluster/yarn-standalone mode was not tested.)
> On YARN side, Hadoop 2.4.0 resolved the issue as follows:
> https://issues.apache.org/jira/browse/YARN-1824
> But Spark YARN module does not incorporate the new YARN API yet, so problem 
> persists for Spark.
> First, the following source files should be changed:
> - /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
> - 
> /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
> Change is as follows:
> - Replace .$() to .$$()
> - Replace File.pathSeparator for Environment.CLASSPATH.name to 
> ApplicationConstants.CLASS_PATH_SEPARATOR (import 
> org.apache.hadoop.yarn.api.ApplicationConstants is required for this)
> Unless the above are applied, launch_container.sh will contain invalid shell 
> script statements(since they will contain Windows-specific separators), and 
> job will fail.
> Also, the following symptom should also be fixed (I could not find the 
> relevant source code):
> - SPARK_HOME environment variable is copied straight to launch_container.sh. 
> It should be changed to the path format for the server OS, or, the better, a 
> separate environment variable or a configuration variable should be created.
> - '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after 
> the above change is applied. maybe I missed a few lines.
> I'm not sure whether this is all, since I'm new to both Spark and YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5176) Thrift server fails with confusing error message when deploy-mode is cluster

2015-02-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5176.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Tom Panning
Target Version/s: 1.3.0

> Thrift server fails with confusing error message when deploy-mode is cluster
> 
>
> Key: SPARK-5176
> URL: https://issues.apache.org/jira/browse/SPARK-5176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Tom Panning
>Assignee: Tom Panning
>  Labels: starter
> Fix For: 1.3.0
>
>
> With Spark 1.2.0, when I try to run
> {noformat}
> $SPARK_HOME/sbin/start-thriftserver.sh --deploy-mode cluster --master 
> spark://xd-spark.xdata.data-tactics-corp.com:7077
> {noformat}
> The log output is
> {noformat}
> Spark assembly has been built with Hive, including Datanucleus jars on 
> classpath
> Spark Command: /usr/java/latest/bin/java -cp 
> ::/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/sbin/../conf:/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/tpanning/Projects/spark/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar
>  -XX:MaxPermSize=128m -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit 
> --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 
> --deploy-mode cluster --master 
> spark://xd-spark.xdata.data-tactics-corp.com:7077 spark-internal
> 
> Jar url 'spark-internal' is not in valid format.
> Must be a jar file path in URL format (e.g. hdfs://host:port/XX.jar, 
> file:///XX.jar)
> Usage: DriverClient [options] launch
> [driver options]
> Usage: DriverClient kill  
> Options:
>-c CORES, --cores CORESNumber of cores to request (default: 1)
>-m MEMORY, --memory MEMORY Megabytes of memory to request (default: 
> 512)
>-s, --superviseWhether to restart the driver on failure
>-v, --verbose  Print more debugging output
>  
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> {noformat}
> I do not get this error if deploy-mode is set to client. The --deploy-mode 
> option is described by the --help output, so I expected it to work. I 
> checked, and this behavior seems to be present in Spark 1.1.0 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5515) Build fails with spark-ganglia-lgpl profile

2015-02-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300819#comment-14300819
 ] 

Apache Spark commented on SPARK-5515:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/4303

> Build fails with spark-ganglia-lgpl profile
> ---
>
> Key: SPARK-5515
> URL: https://issues.apache.org/jira/browse/SPARK-5515
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Priority: Blocker
>
> Build fails with spark-ganglia-lgpl profile at the moment. This is because 
> pom.xml for spark-ganglia-lgpl is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5515) Build fails with spark-ganglia-lgpl profile

2015-02-01 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300818#comment-14300818
 ] 

Andrew Or commented on SPARK-5515:
--

https://github.com/apache/spark/pull/4303

> Build fails with spark-ganglia-lgpl profile
> ---
>
> Key: SPARK-5515
> URL: https://issues.apache.org/jira/browse/SPARK-5515
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Priority: Blocker
>
> Build fails with spark-ganglia-lgpl profile at the moment. This is because 
> pom.xml for spark-ganglia-lgpl is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5516) ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-22] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java

2015-02-01 Thread wuyukai (JIRA)
wuyukai created SPARK-5516:
--

 Summary: ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-22] shutting down ActorSystem 
[sparkDriver] java.lang.OutOfMemoryError: Java heap space
 Key: SPARK-5516
 URL: https://issues.apache.org/jira/browse/SPARK-5516
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.2.0
 Environment: centos 6.5   

Reporter: wuyukai
 Fix For: 1.2.2


When we ran the model of Gradient Boosting Tree, it throwed this exception 
below. The data we used is only 45M. We ran these data on 4 computers that each 
have 4 cores and 16GB RAM. We set the parameter 
"gradientboostedtrees.maxiteration" 50.

15/02/01 01:39:48 INFO DAGScheduler: Job 965 failed: collectAsMap at 
DecisionTree.scala:653, took 1.616976 s
Exception in thread "main" org.apache.spark.SparkException: Job cancelled 
because SparkContext was shut down
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428)
at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundPostStop(DAGScheduler.scala:1375)
at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
at akka.actor.ActorCell.terminate(ActorCell.scala:369)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/02/01 01:39:48 ERROR ActorSystemImpl: Uncaught fatal error from thread 
[sparkDriver-akka.actor.default-dispatcher-22] shutting down ActorSystem 
[sparkDriver]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStre

[jira] [Closed] (SPARK-4859) Refactor LiveListenerBus and StreamingListenerBus

2015-02-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4859.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Shixiong Zhu

> Refactor LiveListenerBus and StreamingListenerBus
> -
>
> Key: SPARK-4859
> URL: https://issues.apache.org/jira/browse/SPARK-4859
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 1.3.0
>
>
> [#4006|https://github.com/apache/spark/pull/4006] refactors LiveListenerBus 
> and StreamingListenerBus and extracts the common codes to a parent class 
> ListenerBus.
> It also includes bug fixes in 
> [#3710|https://github.com/apache/spark/pull/3710]:
> 1. Fix the race condition of queueFullErrorMessageLogged in LiveListenerBus 
> and StreamingListenerBus to avoid outputing queue-full-error logs multiple 
> times.
> 2. Make sure the SHUTDOWN message will be delivered to listenerThread, so 
> that we can make sure listenerThread will always be able to exit.
> 3. Log the error from listener rather than crashing listenerThread in 
> StreamingListenerBus.
> During fixing the above bugs, we find it's better to make LiveListenerBus and 
> StreamingListenerBus have the same bahaviors. Then there will be many 
> duplicated codes in LiveListenerBus and StreamingListenerBus.
> Therefore, I extracted their common codes to ListenerBus as a parent class: 
> LiveListenerBus and StreamingListenerBus only need to extend ListenerBus and 
> implement onPostEvent (how to process an event) and onDropEvent (do something 
> when droppping an event).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-02-01 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300814#comment-14300814
 ] 

Sandy Ryza commented on SPARK-5492:
---

After seeing this I tried with 1.0.4 and didn't hit anything. I guess the ec2 
setup is different in some way - I'll post a patch tonight.

> Thread statistics can break with older Hadoop versions
> --
>
> Key: SPARK-5492
> URL: https://issues.apache.org/jira/browse/SPARK-5492
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Sandy Ryza
>Priority: Blocker
>
> {code}
>  java.lang.ClassNotFoundException: 
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118)
> at scala.Option.orElse(Option.scala:257)
> {code}
> I think the issue is we need to catch ClassNotFoundException here:
> https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144
> However, I'm really confused how this didn't fail our unit tests, since we 
> explicitly tried to test this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5155) Python API for MQTT streaming

2015-02-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300813#comment-14300813
 ] 

Apache Spark commented on SPARK-5155:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/4303

> Python API for MQTT streaming
> -
>
> Key: SPARK-5155
> URL: https://issues.apache.org/jira/browse/SPARK-5155
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, Streaming
>Reporter: Davies Liu
>
> Python API for MQTT Utils



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5515) Build fails with spark-ganglia-lgpl profile

2015-02-01 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created SPARK-5515:
-

 Summary: Build fails with spark-ganglia-lgpl profile
 Key: SPARK-5515
 URL: https://issues.apache.org/jira/browse/SPARK-5515
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.0
Reporter: Kousuke Saruta
Priority: Blocker


Build fails with spark-ganglia-lgpl profile at the moment. This is because 
pom.xml for spark-ganglia-lgpl is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-02-01 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300804#comment-14300804
 ] 

Michael Armbrust commented on SPARK-5492:
-

We hit this bug using the default hadoop version for the spark-ec2 scripts 
(1.0.4).

> Thread statistics can break with older Hadoop versions
> --
>
> Key: SPARK-5492
> URL: https://issues.apache.org/jira/browse/SPARK-5492
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Sandy Ryza
>Priority: Blocker
>
> {code}
>  java.lang.ClassNotFoundException: 
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118)
> at scala.Option.orElse(Option.scala:257)
> {code}
> I think the issue is we need to catch ClassNotFoundException here:
> https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144
> However, I'm really confused how this didn't fail our unit tests, since we 
> explicitly tried to test this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5514) collect should call executeCollect

2015-02-01 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-5514:
--

 Summary: collect should call executeCollect
 Key: SPARK-5514
 URL: https://issues.apache.org/jira/browse/SPARK-5514
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.3.0
Reporter: Reynold Xin
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5513) Add NMF option to the new ALS implementation

2015-02-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300787#comment-14300787
 ] 

Apache Spark commented on SPARK-5513:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/4302

> Add NMF option to the new ALS implementation
> 
>
> Key: SPARK-5513
> URL: https://issues.apache.org/jira/browse/SPARK-5513
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> Then we can swap "spark.mllib"'s implementation to use the new ALS impl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5513) Add NMF option to the new ALS implementation

2015-02-01 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-5513:


 Summary: Add NMF option to the new ALS implementation
 Key: SPARK-5513
 URL: https://issues.apache.org/jira/browse/SPARK-5513
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng


Then we can swap "spark.mllib"'s implementation to use the new ALS impl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5424) Make the new ALS implementation take generic ID types

2015-02-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5424.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4281
[https://github.com/apache/spark/pull/4281]

> Make the new ALS implementation take generic ID types
> -
>
> Key: SPARK-5424
> URL: https://issues.apache.org/jira/browse/SPARK-5424
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark ML
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
> Fix For: 1.3.0
>
>
> The new implementation uses local indices of users and items. So the input 
> user/item type could be generic, at least specialized for Int and Long. We 
> can expose the generic interface as a developer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled

2015-02-01 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300762#comment-14300762
 ] 

bc Wong commented on SPARK-4705:


[~twinkle], have you had a chance to worked on this?

> Driver retries in yarn-cluster mode always fail if event logging is enabled
> ---
>
> Key: SPARK-4705
> URL: https://issues.apache.org/jira/browse/SPARK-4705
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Marcelo Vanzin
>
> yarn-cluster mode will retry to run the driver in certain failure modes. If 
> even logging is enabled, this will most probably fail, because:
> {noformat}
> Exception in thread "Driver" java.io.IOException: Log directory 
> hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
>  already exists!
> at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
> at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
> at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
> at org.apache.spark.SparkContext.(SparkContext.scala:353)
> {noformat}
> The even log path should be "more unique". Or perhaps retries of the same app 
> should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1180) Allow to provide a custom persistence engine

2015-02-01 Thread Aaron Davidson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson resolved SPARK-1180.
---
   Resolution: Duplicate
Fix Version/s: 1.3.0
 Assignee: Prashant Sharma

> Allow to provide a custom persistence engine
> 
>
> Key: SPARK-1180
> URL: https://issues.apache.org/jira/browse/SPARK-1180
> Project: Spark
>  Issue Type: Improvement
>Reporter: Jacek Lewandowski
>Assignee: Prashant Sharma
>Priority: Minor
> Fix For: 1.3.0
>
>
> Currently Spark supports only predefined ZOOKEEPER and FILESYSTEM persistence 
> engines. It would be nice to give a possibility to provide custom persistence 
> engine by specifying a class name in {{spark.deploy.recoveryMode}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5207) StandardScalerModel mean and variance re-use

2015-02-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5207.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4140
[https://github.com/apache/spark/pull/4140]

> StandardScalerModel mean and variance re-use
> 
>
> Key: SPARK-5207
> URL: https://issues.apache.org/jira/browse/SPARK-5207
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Octavian Geagla
>Assignee: Octavian Geagla
> Fix For: 1.3.0
>
>
> From this discussion: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-use-scaling-means-and-variances-from-StandardScalerModel-td10073.html
> Changing constructor to public would be a simple change, but a discussion is 
> needed to determine what args necessary for this change.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop

2015-02-01 Thread DeepakVohra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300243#comment-14300243
 ] 

DeepakVohra commented on SPARK-2356:


Thanks Sean. 

HADOOP_CONF_DIR shouldn't be required to be set if Hadoop is not used. 

Hadoop doesn't even get installed on Windows.

> Exception: Could not locate executable null\bin\winutils.exe in the Hadoop 
> ---
>
> Key: SPARK-2356
> URL: https://issues.apache.org/jira/browse/SPARK-2356
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Kostiantyn Kudriavtsev
>Priority: Critical
>
> I'm trying to run some transformation on Spark, it works fine on cluster 
> (YARN, linux machines). However, when I'm trying to run it on local machine 
> (Windows 7) under unit test, I got errors (I don't use Hadoop, I'm read file 
> from local filesystem):
> {code}
> 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
> hadoop binary path
> java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
> Hadoop binaries.
>   at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
>   at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
>   at org.apache.hadoop.util.Shell.(Shell.java:326)
>   at org.apache.hadoop.util.StringUtils.(StringUtils.java:76)
>   at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
>   at org.apache.hadoop.security.Groups.(Groups.java:77)
>   at 
> org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
>   at 
> org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.(SparkHadoopUtil.scala:36)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala:109)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala)
>   at org.apache.spark.SparkContext.(SparkContext.scala:228)
>   at org.apache.spark.SparkContext.(SparkContext.scala:97)
> {code}
> It's happened because Hadoop config is initialized each time when spark 
> context is created regardless is hadoop required or not.
> I propose to add some special flag to indicate if hadoop config is required 
> (or start this configuration manually)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5512) Run the PIC algorithm with degree vector

2015-02-01 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-5512:
---
Priority: Minor  (was: Major)

> Run the PIC algorithm with degree vector
> 
>
> Key: SPARK-5512
> URL: https://issues.apache.org/jira/browse/SPARK-5512
> Project: Spark
>  Issue Type: Improvement
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> As suggested by the paper of Power Iteration Clustering, it is useful to set 
> the initial vector v0 as the degree vector d. This pr tries to add a running 
> method for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5512) Run the PIC algorithm with degree vector

2015-02-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300191#comment-14300191
 ] 

Apache Spark commented on SPARK-5512:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/4301

> Run the PIC algorithm with degree vector
> 
>
> Key: SPARK-5512
> URL: https://issues.apache.org/jira/browse/SPARK-5512
> Project: Spark
>  Issue Type: Improvement
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> As suggested by the paper of Power Iteration Clustering, it is useful to set 
> the initial vector v0 as the degree vector d. This pr tries to add a running 
> method for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5512) Run the PIC algorithm with degree vector

2015-02-01 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-5512:
--

 Summary: Run the PIC algorithm with degree vector
 Key: SPARK-5512
 URL: https://issues.apache.org/jira/browse/SPARK-5512
 Project: Spark
  Issue Type: Improvement
Reporter: Liang-Chi Hsieh


As suggested by the paper of Power Iteration Clustering, it is useful to set 
the initial vector v0 as the degree vector d. This pr tries to add a running 
method for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen closed SPARK-5510.


> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hash-x updated SPARK-5510:
--
Comment: was deleted

(was: mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I 
am a beginner at spark and scala.)

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300167#comment-14300167
 ] 

hash-x commented on SPARK-5510:
---

mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a 
beginner at spark and scala.

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300169#comment-14300169
 ] 

hash-x commented on SPARK-5510:
---

mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a 
beginner at spark and scala.

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300171#comment-14300171
 ] 

hash-x commented on SPARK-5510:
---

mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a 
beginner at spark and scala.

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300159#comment-14300159
 ] 

hash-x commented on SPARK-5510:
---

mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a 
beginner at spark and scala.

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300164#comment-14300164
 ] 

hash-x commented on SPARK-5510:
---

mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a 
beginner at spark and scala.

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300175#comment-14300175
 ] 

hash-x commented on SPARK-5510:
---

mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a 
beginner at spark and scala.

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5510) How can I fix the spark-submit script and then running the program on cluster ?

2015-02-01 Thread hash-x (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300157#comment-14300157
 ] 

hash-x commented on SPARK-5510:
---

mailing list ??? What is the site ,could you give me ? OK ,Thankyou!!! I am a 
beginner at spark and scala.

> How can I fix the spark-submit script and then running the program on cluster 
> ?
> ---
>
> Key: SPARK-5510
> URL: https://issues.apache.org/jira/browse/SPARK-5510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.2
>Reporter: hash-x
>  Labels: Help!!, spark-submit
>
> Reference: My Question is how can I fix the script and can submit the program 
> to a Master from my laptop? Not submit the program from a cluster. Submit 
> program from Node 2 is work for me.But the laptop is not!How can i do to fix 
> ??? help!!!
> I have looked the follow Email and I accept the recommend of One - run 
> spark-shell from a cluster node! But I want to solve the program with the 
> recommend of 2.But I am confused..
> Hi Ken,
> This is unfortunately a limitation of spark-shell and the way it works on the 
> standalone mode.
> spark-shell sets an environment variable, SPARK_HOME, which tells Spark where 
> to find its
> code installed on the cluster. This means that the path on your laptop must 
> be the same as
> on the cluster, which is not the case. I recommend one of two things:
> 1) Either run spark-shell from a cluster node, where it will have the right 
> path. (In general
> it’s also better for performance to have it close to the cluster)
> 2) Or, edit the spark-shell script and re-export SPARK_HOME right before it 
> runs the Java
> command (ugly but will probably work).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >