Reading from multiple Input path into single Resilient Distributed dataset?

2013-12-16 Thread Archit Thakur
Hi, I want to read multiple paths into single RDD. I know I can do it this way: sc.sequenceFile(/data/new_rdd_/*,-,-,-) What if they belong to different directories or may be different machines? Is the only way by joining two RDD . That is reading different path into different RDD and then

Re: build spark with hadoop2.2.0

2013-12-16 Thread Prashant Sharma
It is temporarily disabled in master, there is a PR hanging that fixes it. You can either wait for the PR to get merged or use 0.8.1 release of spark. On Mon, Dec 16, 2013 at 5:30 PM, Jython googch...@gmail.com wrote: Hi, pal ! i cloned https://github.com/apache/incubator-spark repo and build

转发: OOM

2013-12-16 Thread leosand...@gmail.com
leosand...@gmail.com 发件人: leosand...@gmail.com 发送时间: 2013-12-16 20:01 收件人: user-subscribe 主题: OOM hello everyone, I have a problem when I run the wordcount example. I read data from hdfs , its almost 7G. I haven't seen the info from the web ui or sparkhome/work . This is the console info :

Re: build spark with hadoop2.2.0

2013-12-16 Thread Jython
I don't know where to download the 0.8.1 version, give a link please On Mon, Dec 16, 2013 at 8:03 PM, Prashant Sharma scrapco...@gmail.comwrote: It is temporarily disabled in master, there is a PR hanging that fixes it. You can either wait for the PR to get merged or use 0.8.1 release of

Re: build spark with hadoop2.2.0

2013-12-16 Thread Prashant Sharma
Hey, Sorry I forgot about that, 0.8.1 is still being released and has reached rc4, http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ but hopefully should be good to use. Remember this link is only temporarily available and might be removed once 0.8.1 is released. On Mon, Dec 16,

Re: build spark with hadoop2.2.0

2013-12-16 Thread Prashant Sharma
Also you can read the docs here http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/ and the same can also be checked out in https://github.com/apache/incubator-spark/tree/branch-0.8. HTH On Mon, Dec 16, 2013 at 5:51 PM, Prashant Sharma scrapco...@gmail.comwrote: Hey, Sorry I

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-16 Thread Nan Zhu
Hi, Azuryy Thank you for the reply So you compiled Spark with mvn? I’m watching the pom.xml, I think it is doing the same work as SparkBuild.Scala, I’m still confused by that, in Spark, some class utilized some classes like InputFormat, I assume that this should be included in

Re: some slaves don't actually start

2013-12-16 Thread Walrus theCat
I've combed through all of the logs (both STDERR and STDOUT) and this is all I've got. It just gives me a big long call to start a Spark worker, along with the classpath and the url. On Thu, Dec 12, 2013 at 10:30 PM, Hossein fal...@gmail.com wrote: Would you please provide some more

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-16 Thread Azuryy Yu
yes. I used maven. pom.xml specified hadoop-client. but you can change according to your hadoop version. our hadoop based on trunk. so changed more on pom.xml. On Dec 16, 2013 9:05 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, Azuryy Thank you for the reply So you compiled Spark with mvn?

Run Spark on Yarn Remotely

2013-12-16 Thread Karavany, Ido
Hi All, We've started with deploying spark on Hadoop 2 and Yarn. Our previous configuration (still not a production cluster) was Spark on Mesos. We're running a java application (which runs from tomcat server). The application builds a singleton java spark context when it is first lunch and

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-16 Thread Nan Zhu
Hi, Gary, The page says Spark uses hadoop-client.jar to interact with HDFS, but why it also downloads hadoop-core? Do I just need to change the dependency on hadoop-client to my local repo? Best, -- Nan Zhu School of Computer Science, McGill University On Monday, December 16, 2013

Best ways to use Spark with .NET code

2013-12-16 Thread Kenneth Tran
Hi, We have a large ML code base in .NET. Spark seems cool and we want to leverage it. What would be the best strategies to bridge the our .NET code and Spark? 1. Initiate a Spark .NET project 2. A lightweight bridge between .NET and Java While (1) sound too daunting, it's not clear to

Re: reading LZO compressed file in spark

2013-12-16 Thread Andrew Ash
Hi Rajeev, It looks like you're using the com.hadoop.mapred.DeprecatedLzoTextInputFormat input format above, while Stephen referred to com.hadoop.mapreduce. LzoTextInputFormat I think the way to use this in Spark would be to use the SparkContext.hadoopFile() or SparkContext.newAPIHadoopFile()

Re: reading LZO compressed file in spark

2013-12-16 Thread Rajeev Srivastava
Thanks for your suggestion. I will try this and update by late evening. regards Rajeev Rajeev Srivastava Silverline Design Inc 2118 Walsh ave, suite 204 Santa Clara, CA, 95050 cell : 408-409-0940 On Mon, Dec 16, 2013 at 11:24 AM, Andrew Ash and...@andrewash.com wrote: Hi Rajeev, It looks

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-16 Thread Gary Malouf
Check out the dependencies for the version of hadoop-client you are using - I think you will find that hadoop-core is present there. On Mon, Dec 16, 2013 at 1:28 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, Gary, The page says Spark uses hadoop-client.jar to interact with HDFS, but why

Re: Run Spark on Yarn Remotely

2013-12-16 Thread Tom Graves
The hadoop conf dir is what controls which YARN cluster it goes to so its a matter of putting in the correct configs for the cluster you want it to go to.  You have to execute the org.apache.spark.deploy.yarn.Client or your application will not run on yarn in standalone mode.   The client is

Re: Best ways to use Spark with .NET code

2013-12-16 Thread Kenneth Tran
Hi Matei, 1. If I understand pipe correctly, I don't think that it can solve the problem if the algorithm is iterative and requires a reduction step in each iteration. Consider this simple linear regression example // Example: Batch-gradient-descent logistic regression, ignoring

Re: Task not running in standalone cluster

2013-12-16 Thread Andrew Ash
Hi Jie, When you say firewall is closed does that mean ports are blocked between the worker nodes? I believe workers start up on a random port and send data directly between each other during shuffles. Your firewall may be blocking those connections. Can you try with the firewall temporarily

RE: Best ways to use Spark with .NET code

2013-12-16 Thread Silvio Fiorito
Have you looked at ikvm? http://www.ikvm.net/devguide/java2net.html From: Kenneth Tranmailto:o...@kentran.net Sent: ‎12/‎16/‎2013 7:43 PM To: usermailto:user@spark.incubator.apache.org Subject: Re: Best ways to use Spark with .NET code Hi Matei, 1. If I

Re: Best ways to use Spark with .NET code

2013-12-16 Thread Matei Zaharia
Yup, this is true, pipe will add overhead. Might still be worth a shot though if you’re okay with having mixed Scala + .NET code. Matei On Dec 16, 2013, at 4:42 PM, Kenneth Tran o...@kentran.net wrote: Hi Matei, 1. If I understand pipe correctly, I don't think that it can solve the

OOM, help

2013-12-16 Thread leosand...@gmail.com
hello everyone, I have a problem when I run the wordcount example. I read data from hdfs , its almost 7G. I haven't seen the info from the web ui or sparkhome/work . This is the console info : . 13/12/16 19:48:02 INFO LocalTaskSetManager: Size of task 52 is 1834 bytes 13/12/16 19:48:02 INFO

About spark.driver.host

2013-12-16 Thread Azuryy Yu
Hi, I am using spark-0,8,1, and what's the meaning of spark.driver.host? I ran SparkPi failed.(either yarn-standalone or yarn-client) It was 'Hostname or IP address for the driver to listen on.' in the document. but what host the Driver will listen on? the RM on the yarn? if yes, I configured

RE: About spark.driver.host

2013-12-16 Thread Liu, Raymond
It's what it said on the document. For yarn-standalone mode, it will be the host of where spark AM runs, while for yarn-client mode, it will be the local host you run the cmd. And what's cmd you run SparkPi ? I think you actually don't need to set sprak.driver.host manually for Yarn mode ,

Re: About spark.driver.host

2013-12-16 Thread Azuryy Yu
Thanks, Raymond! My command for Yarn mode: SPARK_JAR=spark-0.8.1/lib/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.2.1.jar ./spark-0.8.1/bin/spark-class org.apache.spark.deploy.yarn.Client --jar spark-0.8.1/spark-examples_2.9.3-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi

Re: About spark.driver.host

2013-12-16 Thread Azuryy Yu
Raymond: Add addtional: Yes, I build Spark-0.8.1 with -Pnew-yarn, and I followed run-on-yarn.cmd strictly. Spark web UI shows good for everything. On Tue, Dec 17, 2013 at 12:36 PM, Azuryy Yu azury...@gmail.com wrote: Thanks, Raymond! My command for Yarn mode:

RE: About spark.driver.host

2013-12-16 Thread Liu, Raymond
Hmm, I don't see what mode you are trying to use? You specify the MASTER in conf file? I think in the run-on-yarn doc, the example for yarn standalone mode mentioned that you also need to pass in -args=yarn-standalone for Client etc. And if using yarn-client mode, you don't need to invoke

Re: About spark.driver.host

2013-12-16 Thread Azuryy Yu
Hi raymond, I specified Master and Slaves in the conf. As for yarn-standalone and yarn-client, I have some confusion: If I am use yarn-standalone, does that mean, It's not run on yarn cluster, only pseudo- http://dict.cn/pseudo-distributed? On Tue, Dec 17, 2013 at 1:03 PM, Liu, Raymond

RE: About spark.driver.host

2013-12-16 Thread Liu, Raymond
No, the name is origin from the standard standalone mode and add a yarn prefix to distinguish it I think. But it do run on yarn cluster. About the way they run and difference of yarn-standalone mode and yarn-client mode, the doc also have the details, in short, yarn-standalone have

Re: About spark.driver.host

2013-12-16 Thread Azuryy Yu
Thanks Raymod, It's clear now. On Tue, Dec 17, 2013 at 1:32 PM, Liu, Raymond raymond@intel.com wrote: No, the name is origin from the standard standalone mode and add a yarn prefix to distinguish it I think. But it do run on yarn cluster. About the way they run and difference of