Re: EC2 Simple Cluster

2014-06-03 Thread Akhil Das
Hi Gianluca, I believe your cluster setup wasn't complete. Do check the ec2 script console for more details. Also micro instances will be having only 600mb memory. Thanks Best Regards On Tue, Jun 3, 2014 at 1:59 AM, Gianluca Privitera gianluca.privite...@studio.unibo.it wrote: Hi everyone,

Re: Using String Dataset for Logistic Regression

2014-06-03 Thread praveshjain1991
I am not sure. I have just been using some numerical datasets. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-String-Dataset-for-Logistic-Regression-tp5523p6784.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: pyspark problems on yarn (job not parallelized, and Py4JJavaError)

2014-06-03 Thread Andrew Or
I asked several people, no one seems to believe that we can do this: $ PYTHONPATH=/path/to/assembly/jar python import pyspark That is because people usually don't package python files into their jars. For pyspark, however, this will work as long as the jar can be opened and its contents can

Re: Using String Dataset for Logistic Regression

2014-06-03 Thread Xiangrui Meng
Yes. MLlib 1.0 supports sparse input data for linear methods. -Xiangrui On Mon, Jun 2, 2014 at 11:36 PM, praveshjain1991 praveshjain1...@gmail.com wrote: I am not sure. I have just been using some numerical datasets. -- View this message in context:

WebUI's Application count doesn't get updated

2014-06-03 Thread MrAsanjar .
- HI all, - Application running and completed count does not get updated, it is always zero. I have ran - SparkPi application at least 10 times. please help - - *Workers:* 3 - *Cores:* 24 Total, 0 Used - *Memory:* 43.7 GB Total, 0.0 B Used - *Applications:* 0 Running, 0

Need equallyWeightedPartitioner Algorithm

2014-06-03 Thread Joe L
I need to partition my data into the same weighted partitions, suppose I have 20GB data and I want 4 partitions where each partition has 5GB of the data. Thanks -- View this message in context:

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread Andrew Ash
Your applications are probably not connecting to your existing cluster and instead running in local mode. Are you passing the master URL to the SparkPi application? Andrew On Tue, Jun 3, 2014 at 12:30 AM, MrAsanjar . afsan...@gmail.com wrote: - HI all, - Application running and

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread MrAsanjar .
Thanks for your reply Andrew. I am running applications directly on the master node. My cluster also contain three worker nodes, all are visible on WebUI. Spark Master at spark://sanjar-local-machine-1:7077 - *URL:* spark://sanjar-local-machine-1:7077 - *Workers:* 3 - *Cores:* 24 Total,

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread Akhil Das
​As Andrew said, your application is running on Standalone mode. You need to pass MASTER=spark://sanjar-local-machine-1:7077 before running your sparkPi example. Thanks Best Regards On Tue, Jun 3, 2014 at 1:12 PM, MrAsanjar . afsan...@gmail.com wrote: Thanks for your reply Andrew. I am

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread MrAsanjar .
thanks guys, that fixed my problem. As you might have noticed, I am VERY new to spark. Building a spark cluster using LXC has been a challenge. On Tue, Jun 3, 2014 at 2:49 AM, Akhil Das ak...@sigmoidanalytics.com wrote: ​As Andrew said, your application is running on Standalone mode. You need

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-03 Thread Sean Owen
Ah, the output directory check was just not executed in the past. I thought it deleted the files. A third way indeed. FWIW I also think (B) is best. (A) and (C) both have their risks, but if they're non-default and everyone's willing to entertain a new arg to the API method, sure. (A) seems more

Reg: Add/Remove slave nodes spark-ec2

2014-06-03 Thread Sirisha Devineni
Hi All, I have created a spark cluster on EC2 using spark-ec2 script. Whenever more data is there to be processed I would like to add new slaves to existing cluster and would like to remove slave node when the data to be processed is low. It seems currently spark-ec2 doesn't have option to

Spark block manager registration extreme slow

2014-06-03 Thread Denes
Hi, My Spark installations (both 0.9.1 and 1.0.0) starts up extremely slow when starting a simple Spark Streaming job. I have to wait 6 (!) minutes at INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager stage and another 4 (!) minutes at INFO util.MetadataCleaner:

Kyro deserialisation error

2014-06-03 Thread Denes
I tried to use Kryo as a serialiser isn spark streaming, did everything according to the guide posted on the spark website, i.e. added the following lines: conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer); conf.set(spark.kryo.registrator, MyKryoRegistrator); I also added

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-03 Thread Jeremy Lee
Thanks for that, Matei! I'll look at that once I get a spare moment. :-) If you like, I'll keep documenting my newbie problems and frustrations... perhaps it might make things easier for others. Another issue I seem to have found (now that I can get small clusters up): some of the examples (the

Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-06-03 Thread Pierre Borckmans
You might want to look at another great plugin : “sbt-pack” https://github.com/xerial/sbt-pack. It collects all the dependencies JARs and creates launch scripts for *nix (including Mac OS) and windows. HTH Pierre On 02 Jun 2014, at 17:29, Andrei faithlessfri...@gmail.com wrote: Thanks!

Error related to serialisation in spark streaming

2014-06-03 Thread nilmish
I am using the following code segment : countPerWindow.foreachRDD(new FunctionJavaPairRDDlt;String, Long, Void() { @Override public Void call(JavaPairRDDString, Long rdd) throws Exception { ComparatorTuple2lt;String,Long comp = new

Reconnect to an application/RDD

2014-06-03 Thread Oleg Proudnikov
HI All, Is it possible to run a standalone app that would compute and persist/cache an RDD and then run other standalone apps that would gain access to that RDD? -- Thank you, Oleg

Re: Error related to serialisation in spark streaming

2014-06-03 Thread Sean Owen
Sorry if I'm dense but is OptimisingSort your class? it's saying you have included something from it in a function that is shipped off to remote workers but something in it is not java.io.Serializable. OptimisingSort$6$1 needs to be Serializable. On Tue, Jun 3, 2014 at 2:23 PM, nilmish

Prepare spark executor

2014-06-03 Thread yaoxin
Hi Is there any way to prepare spark executor? Like what we do in MapReduce, we implements a setup and a clearup method. For my case, I need this prepare method to init StaticParser base on the env(dev, production). Then, I can directly use this StaticParser on executor. like this object

Re: Reconnect to an application/RDD

2014-06-03 Thread Gerard Maas
I don't think that's supported by default as when the standalone context will close, the related RDDs will be GC'ed You should explore Spark-Job Server, which allows to cache RDDs by name and reuse them within a context. https://github.com/ooyala/spark-jobserver -kr, Gerard. On Tue, Jun 3,

Spark not working with mesos

2014-06-03 Thread praveshjain1991
I set up Spark-0.9.1 to run on mesos-0.13.0 using the steps mentioned here https://spark.apache.org/docs/0.9.1/running-on-mesos.html . The Mesos UI is showing two workers registered. I want to run these commands on Spark-shell scala val data = 1 to 1 data:

spark 1.0 not using properties file from SPARK_CONF_DIR

2014-06-03 Thread Eugen Cepoi
Is it on purpose that when setting SPARK_CONF_DIR spark submit still loads the properties file from SPARK_HOME/conf/spark-defauls.conf ? IMO it would be more natural to override what is defined in SPARK_HOME/conf by SPARK_CONF_DIR when defined (and SPARK_CONF_DIR being overriden by command line

Re: Spark not working with mesos

2014-06-03 Thread Akhil Das
1. Make sure your spark-*.tgz that you created by make_distribution.sh is accessible by all the slaves nodes. 2. Check the worker node logs. Thanks Best Regards On Tue, Jun 3, 2014 at 8:13 PM, praveshjain1991 praveshjain1...@gmail.com wrote: I set up Spark-0.9.1 to run on mesos-0.13.0

---cores option in spark-shell

2014-06-03 Thread Marek Wiewiorka
Hi All, there is information in 1.0.0 Spark's documentation that there is an option --cores that one can use to set the number of cores that spark-shell uses on the cluster: You can also pass an option --cores numCores to control the number of cores that spark-shell uses on the cluster. This

Re: ---cores option in spark-shell

2014-06-03 Thread Matt Kielo
I havent been able to set the cores with that option in Spark 1.0.0 either. To work around that, setting the environment variable: SPARK_JAVA_OPTS=-Dspark.cores.max=numCores seems to do the trick. Matt Kielo Data Scientist Oculus Info Inc. On Tue, Jun 3, 2014 at 11:15 AM, Marek Wiewiorka

Re: ---cores option in spark-shell

2014-06-03 Thread Marek Wiewiorka
That used to work with version 0.9.1 and earlier and does not seem to work with 1.0.0. M. 2014-06-03 17:53 GMT+02:00 Mikhail Strebkov streb...@gmail.com: Try -c numCores instead, works for me, e.g. bin/spark-shell -c 88 On Tue, Jun 3, 2014 at 8:15 AM, Marek Wiewiorka

Re: Re: how to construct a ClassTag object as a method parameter in Java

2014-06-03 Thread Michael Armbrust
Ah, this is a bug that was fixed in 1.0. I think you should be able to workaround it by using a fake class tag: scala.reflect.ClassTag$.MODULE$.AnyRef() On Mon, Jun 2, 2014 at 8:22 PM, bluejoe2008 bluejoe2...@gmail.com wrote: spark 0.9.1 textInput is a JavaRDD object i am programming in

Re: Using MLLib in Scala

2014-06-03 Thread Xiangrui Meng
Hi Suela, (Please subscribe our user mailing list and send your questions there in the future.) For your case, each file contains a column of numbers. So you can use `sc.textFile` to read them first, zip them together, and then create labeled points: val xx = sc.textFile(/path/to/ex2x.dat).map(x

Spark 1.0.0 fails if mesos.coarse set to true

2014-06-03 Thread Marek Wiewiorka
Hi All, I'm trying to run my code that used to work with mesos-0.14 and spark-0.9.0 with mesos-0.18.2 and spark-1.0.0. and I'm getting a weird error when I use coarse mode (see below). If I use the fine-grained mode everything is ok. Has anybody of you experienced a similar error? more stderr

Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread Sean Owen
Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected is the classic error meaning you compiled against Hadoop 1, but are running against Hadoop 2 I think you need to override the hadoop-client artifact that Spark depends on to be a Hadoop 2.x version. On Tue,

wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread toivoa
Hi Set up project under Eclipse using Maven: dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.10/artifactId version1.0.0/version /dependency Simple example fails: def main(args: Array[String]): Unit = {

Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread toivoa
Wow! What a quick reply! adding dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId version2.4.0/version /dependency solved the problem. But now I get 14/06/03 19:52:50 ERROR Shell: Failed to locate

mounting SSD devices of EC2 r3.8xlarge instances

2014-06-03 Thread Andras Barjak
Hi, I have noticed that upon launching a cluster consisting of r3.8xlarge high-memory instances the standard /mnt /mnt2 /mnt3 /mnt4 temporary directories get created and set up for temp usage, however they will point to the root 8Gb filesystem. The 2x320GB SSD-s are not mounted and also they are

Re: How to create RDDs from another RDD?

2014-06-03 Thread Gerard Maas
Hi Andrew, Thanks for your answer. The reason of the question: I've been trying to contribute to the community by helping answering Spark-related questions on Stack Overflow. (note on that: Given the growing volume on the user list lately, I think it will need to scale out to other venues, so

Re: spark 1.0 not using properties file from SPARK_CONF_DIR

2014-06-03 Thread Patrick Wendell
You can set an arbitrary properties file by adding --properties-file argument to spark-submit. It would be nice to have spark-submit also look in SPARK_CONF_DIR as well by default. If you opened a JIRA for that I'm sure someone would pick it up. On Tue, Jun 3, 2014 at 7:47 AM, Eugen Cepoi

Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread Sean Owen
I'd try the internet / SO first -- these are actually generic Hadoop-related issues. Here I think you don't have HADOOP_HOME or similar set. http://stackoverflow.com/questions/19620642/failed-to-locate-the-winutils-binary-in-the-hadoop-binary-path On Tue, Jun 3, 2014 at 5:54 PM, toivoa

Problems with connecting Spark to Hive

2014-06-03 Thread Lars Selsaas
Hi, I've installed Spark 1.0.0 on a HDP 2.1 I moved the hive-site.xml file into the conf directory for Spark in an attempt to connect Spark with my existing Hive. Below is the full log from me starting Spark till I get the error. It seems to be building the assembly with hive so that part

Re: Failed to remove RDD error

2014-06-03 Thread Tathagata Das
It was not intended to be experimental as this improves general performance. We tested the feature since 0.9, and didnt see any problems. We need to investigate the cause of this. Can you give us the logs showing this error so that we can analyze this. TD On Tue, Jun 3, 2014 at 10:08 AM,

Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread Matei Zaharia
Yeah unfortunately Hadoop 2 requires these binaries on Windows. Hadoop 1 runs just fine without them. Matei On Jun 3, 2014, at 10:33 AM, Sean Owen so...@cloudera.com wrote: I'd try the internet / SO first -- these are actually generic Hadoop-related issues. Here I think you don't have

Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-03 Thread Marek Wiewiorka
Hi All, I've been experiencing a very strange error after upgrade from Spark 0.9 to 1.0 - it seems that saveAsTestFile function is throwing java.lang.UnsupportedOperationException that I have never seen before. Any hints appreciated. scheduler.TaskSetManager: Loss was due to

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-03 Thread Gerard Maas
Have you tried re-compiling your job against the 1.0 release? On Tue, Jun 3, 2014 at 8:46 PM, Marek Wiewiorka marek.wiewio...@gmail.com wrote: Hi All, I've been experiencing a very strange error after upgrade from Spark 0.9 to 1.0 - it seems that saveAsTestFile function is throwing

Re: NoSuchElementException: key not found

2014-06-03 Thread Tathagata Das
I think I know what is going on! This probably a race condition in the DAGScheduler. I have added a JIRA for this. The fix is not trivial though. https://issues.apache.org/jira/browse/SPARK-2002 A not-so-good workaround for now would be not use coalesced RDD, which is avoids the race condition.

Re: Problems with connecting Spark to Hive

2014-06-03 Thread Yin Huai
Hello Lars, Can you check the value of hive.security.authenticator.manager in hive-site.xml? I guess the value is org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator. This class was introduced in hive 0.13, but Spark SQL is based on hive 0.12 right now. Can you change the value of

SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread k.tham
I'm trying to save an RDD as a parquet file through the saveAsParquestFile() api, With code that looks something like: val sc = ... val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext._ val someRDD: RDD[SomeCaseClass] = ... someRDD.saveAsParquetFile(someRDD.parquet)

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread Michael Armbrust
This thread seems to be about the same issue: https://www.mail-archive.com/user@spark.apache.org/msg04403.html On Tue, Jun 3, 2014 at 12:25 PM, k.tham kevins...@gmail.com wrote: I'm trying to save an RDD as a parquet file through the saveAsParquestFile() api, With code that looks something

Re: How to create RDDs from another RDD?

2014-06-03 Thread Andrew Ash
Hmm that sounds like it could be done in a custom OutputFormat, but I'm not familiar enough with custom OutputFormats to say that's the right thing to do. On Tue, Jun 3, 2014 at 10:23 AM, Gerard Maas gerard.m...@gmail.com wrote: Hi Andrew, Thanks for your answer. The reason of the

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread k.tham
Oh, I missed that thread. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-s-saveAsParquetFile-throws-java-lang-IncompatibleClassChangeError-tp6837p6839.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Error related to serialisation in spark streaming

2014-06-03 Thread Mayur Rustagi
So are you using Java 7 or 8. 7 doesnt clean closures properly. So you need to define a static class as a function then call that in your operations. Else it'll try to send the whole class along with the function. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Re: Problems with connecting Spark to Hive

2014-06-03 Thread Lars Selsaas
Thanks a lot, That worked great! Thanks, Lars On Tue, Jun 3, 2014 at 12:17 PM, Yin Huai huaiyin@gmail.com wrote: Hello Lars, Can you check the value of hive.security.authenticator.manager in hive-site.xml? I guess the value is

Re: Reg: Add/Remove slave nodes spark-ec2

2014-06-03 Thread Mayur Rustagi
You'll have to restart the cluster.. create copy of your existing slave.. add it to slave files in master restart the cluster Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, Jun 3, 2014 at 4:30 PM, Sirisha Devineni

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread Mayur Rustagi
Did you use docker or plain lxc specifically? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, Jun 3, 2014 at 1:40 PM, MrAsanjar . afsan...@gmail.com wrote: thanks guys, that fixed my problem. As you might have

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-03 Thread Marek Wiewiorka
yes, I have - I compiled both Spark and my soft from sources - actually the whole processing is executing fine - just saving results is failing. 2014-06-03 21:01 GMT+02:00 Gerard Maas gerard.m...@gmail.com: Have you tried re-compiling your job against the 1.0 release? On Tue, Jun 3, 2014

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread k.tham
I've read through that thread, and it seems for him, he needed to add a particular hadoop-client dependency. However, I don't think I should be required to do that as I'm not reading from HDFS. I'm just running a straight up minimal example, in local mode, and out of the box. Here's an example

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread Sean Owen
All of that support code uses Hadoop-related classes, like OutputFormat, to do the writing to Parquet format. There's a Hadoop code dependency in play here even if the bytes aren't going to HDFS. On Tue, Jun 3, 2014 at 10:10 PM, k.tham kevins...@gmail.com wrote: I've read through that thread,

Re: RDD with a Map

2014-06-03 Thread Doris Xin
Hey Amit, You might want to check out PairRDDFunctions http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions. For your use case in particular, you can load the file as a RDD[(String, String)] and then use the groupByKey() function in PairRDDFunctions to

Re: Having spark-ec2 join new slaves to existing cluster

2014-06-03 Thread Nicholas Chammas
On Tue, Jun 3, 2014 at 6:52 AM, sirisha_devineni sirisha_devin...@persistent.co.in wrote: Did you open a JIRA ticket for this feature to be implemented in spark-ec2? If so can you please point me to the ticket? Just created it: https://issues.apache.org/jira/browse/SPARK-2008 Nick

Re: Error related to serialisation in spark streaming

2014-06-03 Thread Andrew Ash
Hi Mayur, is that closure cleaning a JVM issue or a Spark issue? I'm used to thinking of closure cleaner as something Spark built. Do you have somewhere I can read more about this? On Tue, Jun 3, 2014 at 12:47 PM, Mayur Rustagi mayur.rust...@gmail.com wrote: So are you using Java 7 or 8. 7

Re: Window slide duration

2014-06-03 Thread Vadim Chekan
Ok, it's a bug in spark. I've submitted a patch: https://issues.apache.org/jira/browse/SPARK-2009 On Mon, Jun 2, 2014 at 8:39 PM, Vadim Chekan kot.bege...@gmail.com wrote: Thanks for looking into this Tathagata. Are you looking for traces of ReceiveInputDStream.clearMetadata call? Here is

Re: NoSuchElementException: key not found

2014-06-03 Thread Michael Chang
Hi Tathagata, Thanks for your help! By not using coalesced RDD, do you mean not repartitioning my Dstream? Thanks, Mike On Tue, Jun 3, 2014 at 12:03 PM, Tathagata Das tathagata.das1...@gmail.com wrote: I think I know what is going on! This probably a race condition in the DAGScheduler. I

Re: NoSuchElementException: key not found

2014-06-03 Thread Tathagata Das
I am not sure what DStream operations you are using, but some operation is internally creating CoalescedRDDs. That is causing the race condition. I might be able help if you can tell me what DStream operations you are using. TD On Tue, Jun 3, 2014 at 4:54 PM, Michael Chang m...@tellapart.com

Re: Window slide duration

2014-06-03 Thread Vadim Chekan
Лучше по частям собрать. http://www.newegg.com/Product/Product.aspx?Item=N82E16813157497 Пассивное охлаждение, 16Гб памяти можно поставить. А на то что ты прислал 4Гб максимум, это не годиться. Выбрать малый корпус и дело с концом. On Tue, Jun 3, 2014 at 4:35 PM, Vadim Chekan

Invalid Class Exception

2014-06-03 Thread Suman Somasundar
Hi all, I get the following exception when using Spark to run example k-means program. I am using Spark 1.0.0 and running the program locally. java.io.InvalidClassException: scala.Tuple2; invalid descriptor for field _1 at

Better line number hints for logging?

2014-06-03 Thread John Salvatier
I have created some extension methods for RDDs in RichRecordRDD and these are working exceptionally well for me. However, when looking at the logs, its impossible to tell what's going on because all the line number hints point to RichRecordRDD.scala rather than the code that uses it. For example:

Re: Better line number hints for logging?

2014-06-03 Thread Matei Zaharia
You can use RDD.setName to give it a name. There’s also a creationSite field that is private[spark] — we may want to add a public setter for that later. If the name isn’t enough and you’d like this, please open a JIRA issue for it. Matei On Jun 3, 2014, at 5:22 PM, John Salvatier

Re: how to construct a ClassTag object as a method parameter in Java

2014-06-03 Thread Gino Bustelo
A better way seems to be to use ClassTag$.apply(Class). I'm going by memory since I'm on my phone, but I just did that today. Gino B. On Jun 3, 2014, at 11:04 AM, Michael Armbrust mich...@databricks.com wrote: Ah, this is a bug that was fixed in 1.0. I think you should be able to

spark is dead and pid file exists

2014-06-03 Thread Sophia
When I run spark in cloudera of CDH5 with service spark-master start command,it turns out that Spark master is dead and pid file exists,What can I do to solve the problem? -- View this message in context:

Re: A single build.sbt file to start Spark REPL?

2014-06-03 Thread Tobias Pfeiffer
Hi, I guess it should be possible to dig through the scripts bin/spark-shell, bin/spark-submit etc. and convert them to a long sbt command that you can run. I just tried sbt run-main org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main but that fails with Failed

Re: Invalid Class Exception

2014-06-03 Thread Matei Zaharia
What Java version do you have, and how did you get Spark (did you build it yourself by any chance or download a pre-built one)? If you build Spark yourself you need to do it with Java 6 — it’s a known issue because of the way Java 6 and 7 package JAR files. But I haven’t seen it result in this

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-03 Thread Matei Zaharia
Ah, sorry to hear you had more problems. Some thoughts on them: Thanks for that, Matei! I'll look at that once I get a spare moment. :-) If you like, I'll keep documenting my newbie problems and frustrations... perhaps it might make things easier for others. Another issue I seem to have

Re: Upgradation to Spark 1.0.0

2014-06-03 Thread Matei Zaharia
You can copy your configuration from the old one. I’d suggest just downloading it to a different location on each node first for testing, then you can delete the old one if things work. On Jun 3, 2014, at 12:38 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi , I am currently using

Re: access hdfs file name in map()

2014-06-03 Thread Xu (Simon) Chen
I don't quite get it.. mapPartitionWithIndex takes a function that maps an integer index and an iterator to another iterator. How does that help with retrieving the hdfs file name? I am obviously missing some context.. Thanks. On May 30, 2014 1:28 AM, Aaron Davidson ilike...@gmail.com wrote:

How to stop a running SparkContext in the proper way?

2014-06-03 Thread MEETHU MATHEW
Hi, I want to know how I can stop a running SparkContext in a proper way so that next time when I start a new SparkContext, the web UI can be launched on the same port 4040.Now when i quit the job using ctrl+z the new sc are launched in new ports. I have the same problem with ipython

Re: spark is dead and pid file exists

2014-06-03 Thread Theodore Wong
Look in the directory /var/run/spark to see if a spark-master.pid file is left over from a crashed master, and remove it. - -- Theodore Wong lt;t...@tmwong.orggt; www.tmwong.org -- View this message in context:

KMeans.train() throws NotSerializableException

2014-06-03 Thread bluejoe2008
when i called KMeans.train(), an error happened: 14/06/04 13:02:29 INFO scheduler.DAGScheduler: Submitting Stage 3 (MappedRDD[12] at map at KMeans.scala:123), which has no missing parents 14/06/04 13:02:29 INFO scheduler.DAGScheduler: Failed to run takeSample at KMeans.scala:260 Exception in

Re: How to stop a running SparkContext in the proper way?

2014-06-03 Thread Xiangrui Meng
Did you try sc.stop()? On Tue, Jun 3, 2014 at 9:54 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I want to know how I can stop a running SparkContext in a proper way so that next time when I start a new SparkContext, the web UI can be launched on the same port 4040.Now when i quit the