Hi,
I want to read multiple paths into single RDD.
I know I can do it this way:
sc.sequenceFile(/data/new_rdd_/*,-,-,-)
What if they belong to different directories or may be different machines?
Is the only way by joining two RDD .
That is reading different path into different RDD and then
It is temporarily disabled in master, there is a PR hanging that fixes it.
You can either wait for the PR to get merged or use 0.8.1 release of spark.
On Mon, Dec 16, 2013 at 5:30 PM, Jython googch...@gmail.com wrote:
Hi, pal !
i cloned https://github.com/apache/incubator-spark repo and build
leosand...@gmail.com
发件人: leosand...@gmail.com
发送时间: 2013-12-16 20:01
收件人: user-subscribe
主题: OOM
hello everyone,
I have a problem when I run the wordcount example. I read data from hdfs , its
almost 7G.
I haven't seen the info from the web ui or sparkhome/work . This is the console
info :
I don't know where to download the 0.8.1 version, give a link please
On Mon, Dec 16, 2013 at 8:03 PM, Prashant Sharma scrapco...@gmail.comwrote:
It is temporarily disabled in master, there is a PR hanging that fixes it.
You can either wait for the PR to get merged or use 0.8.1 release of
Hey,
Sorry I forgot about that, 0.8.1 is still being released and has reached
rc4, http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ but
hopefully should be good to use. Remember this link is only temporarily
available and might be removed once 0.8.1 is released.
On Mon, Dec 16,
Also you can read the docs here
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/ and the
same can also be checked out in
https://github.com/apache/incubator-spark/tree/branch-0.8.
HTH
On Mon, Dec 16, 2013 at 5:51 PM, Prashant Sharma scrapco...@gmail.comwrote:
Hey,
Sorry I
Hi, Azuryy
Thank you for the reply
So you compiled Spark with mvn?
I’m watching the pom.xml, I think it is doing the same work as
SparkBuild.Scala,
I’m still confused by that, in Spark, some class utilized some classes like
InputFormat, I assume that this should be included in
I've combed through all of the logs (both STDERR and STDOUT) and this is
all I've got. It just gives me a big long call to start a Spark worker,
along with the classpath and the url.
On Thu, Dec 12, 2013 at 10:30 PM, Hossein fal...@gmail.com wrote:
Would you please provide some more
yes. I used maven. pom.xml specified hadoop-client. but you can change
according to your hadoop version.
our hadoop based on trunk. so changed more on pom.xml.
On Dec 16, 2013 9:05 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
Hi, Azuryy
Thank you for the reply
So you compiled Spark with mvn?
Hi All,
We've started with deploying spark on Hadoop 2 and Yarn. Our previous
configuration (still not a production cluster) was Spark on Mesos.
We're running a java application (which runs from tomcat server). The
application builds a singleton java spark context when it is first lunch and
Hi, Gary,
The page says Spark uses hadoop-client.jar to interact with HDFS, but why it
also downloads hadoop-core?
Do I just need to change the dependency on hadoop-client to my local repo?
Best,
--
Nan Zhu
School of Computer Science,
McGill University
On Monday, December 16, 2013
Hi,
We have a large ML code base in .NET. Spark seems cool and we want to
leverage it. What would be the best strategies to bridge the our .NET code
and Spark?
1. Initiate a Spark .NET project
2. A lightweight bridge between .NET and Java
While (1) sound too daunting, it's not clear to
Hi Rajeev,
It looks like you're using the com.hadoop.mapred.DeprecatedLzoTextInputFormat
input format above, while Stephen referred to com.hadoop.mapreduce.
LzoTextInputFormat
I think the way to use this in Spark would be to use the
SparkContext.hadoopFile() or SparkContext.newAPIHadoopFile()
Thanks for your suggestion. I will try this and update by late evening.
regards
Rajeev
Rajeev Srivastava
Silverline Design Inc
2118 Walsh ave, suite 204
Santa Clara, CA, 95050
cell : 408-409-0940
On Mon, Dec 16, 2013 at 11:24 AM, Andrew Ash and...@andrewash.com wrote:
Hi Rajeev,
It looks
Check out the dependencies for the version of hadoop-client you are using -
I think you will find that hadoop-core is present there.
On Mon, Dec 16, 2013 at 1:28 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
Hi, Gary,
The page says Spark uses hadoop-client.jar to interact with HDFS, but why
The hadoop conf dir is what controls which YARN cluster it goes to so its a
matter of putting in the correct configs for the cluster you want it to go to.
You have to execute the org.apache.spark.deploy.yarn.Client or your application
will not run on yarn in standalone mode. The client is
Hi Matei,
1. If I understand pipe correctly, I don't think that it can solve the
problem if the algorithm is iterative and requires a reduction step in each
iteration. Consider this simple linear regression example
// Example: Batch-gradient-descent logistic regression,
ignoring
Hi Jie,
When you say firewall is closed does that mean ports are blocked between
the worker nodes? I believe workers start up on a random port and send
data directly between each other during shuffles. Your firewall may be
blocking those connections. Can you try with the firewall temporarily
Have you looked at ikvm?
http://www.ikvm.net/devguide/java2net.html
From: Kenneth Tranmailto:o...@kentran.net
Sent: 12/16/2013 7:43 PM
To: usermailto:user@spark.incubator.apache.org
Subject: Re: Best ways to use Spark with .NET code
Hi Matei,
1. If I
Yup, this is true, pipe will add overhead. Might still be worth a shot though
if you’re okay with having mixed Scala + .NET code.
Matei
On Dec 16, 2013, at 4:42 PM, Kenneth Tran o...@kentran.net wrote:
Hi Matei,
1. If I understand pipe correctly, I don't think that it can solve the
hello everyone,
I have a problem when I run the wordcount example. I read data from hdfs , its
almost 7G.
I haven't seen the info from the web ui or sparkhome/work . This is the console
info :
.
13/12/16 19:48:02 INFO LocalTaskSetManager: Size of task 52 is 1834 bytes
13/12/16 19:48:02 INFO
Hi,
I am using spark-0,8,1, and what's the meaning of spark.driver.host? I ran
SparkPi failed.(either yarn-standalone or yarn-client)
It was 'Hostname or IP address for the driver to listen on.' in the
document. but what host the Driver will listen on? the RM on the yarn? if
yes, I configured
It's what it said on the document. For yarn-standalone mode, it will be the
host of where spark AM runs, while for yarn-client mode, it will be the local
host you run the cmd.
And what's cmd you run SparkPi ? I think you actually don't need to set
sprak.driver.host manually for Yarn mode ,
Thanks, Raymond!
My command for Yarn mode:
SPARK_JAR=spark-0.8.1/lib/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.2.1.jar
./spark-0.8.1/bin/spark-class org.apache.spark.deploy.yarn.Client --jar
spark-0.8.1/spark-examples_2.9.3-0.8.1-incubating.jar --class
org.apache.spark.examples.SparkPi
Raymond:
Add addtional: Yes, I build Spark-0.8.1 with -Pnew-yarn, and I followed
run-on-yarn.cmd strictly.
Spark web UI shows good for everything.
On Tue, Dec 17, 2013 at 12:36 PM, Azuryy Yu azury...@gmail.com wrote:
Thanks, Raymond!
My command for Yarn mode:
Hmm, I don't see what mode you are trying to use? You specify the MASTER in
conf file?
I think in the run-on-yarn doc, the example for yarn standalone mode mentioned
that you also need to pass in -args=yarn-standalone for Client etc.
And if using yarn-client mode, you don't need to invoke
Hi raymond,
I specified Master and Slaves in the conf.
As for yarn-standalone and yarn-client, I have some confusion:
If I am use yarn-standalone, does that mean, It's not run on yarn cluster,
only pseudo- http://dict.cn/pseudo-distributed?
On Tue, Dec 17, 2013 at 1:03 PM, Liu, Raymond
No, the name is origin from the standard standalone mode and add a yarn prefix
to distinguish it I think. But it do run on yarn cluster.
About the way they run and difference of yarn-standalone mode and yarn-client
mode, the doc also have the details, in short, yarn-standalone have
Thanks Raymod, It's clear now.
On Tue, Dec 17, 2013 at 1:32 PM, Liu, Raymond raymond@intel.com wrote:
No, the name is origin from the standard standalone mode and add a yarn
prefix to distinguish it I think. But it do run on yarn cluster.
About the way they run and difference of
29 matches
Mail list logo