Hello,
I'm trying to run pyspark using the following setup:
- spark 1.6.1 standalone cluster on ec2
- virtualenv installed on master
- app is run using the following command:
export PYSPARK_DRIVER_PYTHON=/path_to_virtualenv/bin/python
export PYSPARK_PYTHON=/usr/bin/python
Hi,
I'm trying to run spark applications on a standalone cluster, running on
top of AWS. Since my slaves are spot instances, in some cases they are
being killed and lost due to bid prices. When apps are running during this
event, sometimes the spark application dies - and the driver process just
Hello spark-users,
I would like to use the spark standalone cluster for multi-tenants, to run
multiple apps at the same time. The issue is, when submitting an app to the
spark standalone cluster, you cannot pass --num-executors like on yarn,
but only --total-executor-cores. *This may cause
Hi all,
I'm running spark 1.2.0 on a 20-node Yarn emr cluster. I've noticed that
whenever I'm running a heavy computation job in parallel to other jobs
running, I'm getting these kind of exceptions:
* [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSetManager-
Lost task 820.0 in
there. I believe giving it with the --name property to spark-submit
should work.
-Sandy
On Thu, Dec 11, 2014 at 10:28 AM, Tomer Benyamini tomer@gmail.com
wrote:
On Thu, Dec 11, 2014 at 8:27 PM, Tomer Benyamini tomer@gmail.com
wrote:
Hi,
I'm trying to set a custom spark app name
Hi,
I'm trying to set a custom spark app name when running a java spark app in
yarn-cluster mode.
SparkConf sparkConf = new SparkConf();
sparkConf.setMaster(System.getProperty(spark.master));
sparkConf.setAppName(myCustomName);
sparkConf.set(spark.logConf, true);
JavaSparkContext sc =
On Thu, Dec 11, 2014 at 8:27 PM, Tomer Benyamini tomer@gmail.com
wrote:
Hi,
I'm trying to set a custom spark app name when running a java spark app in
yarn-cluster mode.
SparkConf sparkConf = new SparkConf();
sparkConf.setMaster(System.getProperty(spark.master
,
org.apache.hadoop.fs.s3native.NativeS3FileSystem)
On Wed, Nov 26, 2014 at 1:47 AM, Tomer Benyamini tomer@gmail.com
wrote:
Thanks Lalit; Setting the access + secret keys in the configuration
works even when calling sc.textFile. Is there a way to select which hadoop
s3 native filesystem
Hello,
I'm building a spark app required to read large amounts of log files from
s3. I'm doing so in the code by constructing the file list, and passing it
to the context as following:
val myRDD = sc.textFile(s3n://mybucket/file1, s3n://mybucket/file2, ... ,
s3n://mybucket/fileN)
When running
Thanks Lalit; Setting the access + secret keys in the configuration works
even when calling sc.textFile. Is there a way to select which hadoop s3
native filesystem implementation would be used at runtime using the hadoop
configuration?
Thanks,
Tomer
On Wed, Nov 26, 2014 at 11:08 AM, lalit1303
Hello,
I would like to parallelize my work on multiple RDDs I have. I wanted
to know if spark can support a foreach on an RDD of RDDs. Here's a
java example:
public static void main(String[] args) {
SparkConf sparkConf = new SparkConf().setAppName(testapp);
Hi,
I'm working on the problem of remotely submitting apps to the spark
master. I'm trying to use the spark-jobserver project
(https://github.com/ooyala/spark-jobserver) for that purpose.
For scala apps looks like things are working smoothly, but for java
apps, I have an issue with implementing
Hello,
I'm trying to read from s3 using a simple spark java app:
-
SparkConf sparkConf = new SparkConf().setAppName(TestApp);
sparkConf.setMaster(local);
JavaSparkContext sc = new JavaSparkContext(sparkConf);
sc.hadoopConfiguration().set(fs.s3.awsAccessKeyId, XX);
Hello,
I'm trying to read from s3 using a simple spark java app:
-
SparkConf sparkConf = new SparkConf().setAppName(TestApp);
sparkConf.setMaster(local);
JavaSparkContext sc = new JavaSparkContext(sparkConf);
sc.hadoopConfiguration().set(fs.s3.awsAccessKeyId, XX);
Hi,
I'm trying to write my JavaPairRDD using saveAsNewAPIHadoopFile with
MultipleTextOutputFormat,:
outRdd.saveAsNewAPIHadoopFile(/tmp, String.class, String.class,
MultipleTextOutputFormat.class);
but I'm getting this compilation error:
Bound mismatch: The generic method
, 2014 at 10:53 AM, Tomer Benyamini tomer@gmail.com
wrote:
Hi,
I'm trying to write my JavaPairRDD using saveAsNewAPIHadoopFile with
MultipleTextOutputFormat,:
outRdd.saveAsNewAPIHadoopFile(/tmp, String.class, String.class,
MultipleTextOutputFormat.class);
but I'm getting this compilation
Hi,
I would like to upgrade a standalone cluster to 1.1.0. What's the best
way to do it? Should I just replace the existing /root/spark folder
with the uncompressed folder from
http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-cdh4.tgz ? What
about hdfs and other installations?
I have spark
)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
Any idea?
Thanks!
Tomer
On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen rosenvi...@gmail.com wrote:
If I recall, you should be able to start Hadoop MapReduce using
~/ephemeral-hdfs/sbin/start-mapred.sh.
On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini
8, 2014 at 3:28 AM, Tomer Benyamini tomer@gmail.com wrote:
~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
when trying to run distcp
) running with
datanode process
--
Ye Xianjin
Sent with Sparrow
On Monday, September 8, 2014 at 11:13 PM, Tomer Benyamini wrote:
Still no luck, even when running stop-all.sh followed by start-all.sh.
On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote
Hi,
I would like to make sure I'm not exceeding the quota on the local
cluster's hdfs. I have a couple of questions:
1. How do I know the quota? Here's the output of hadoop fs -count -q
which essentially does not tell me a lot
root@ip-172-31-7-49 ~]$ hadoop fs -count -q /
2147483647
Thanks! I found the hdfs ui via this port - http://[master-ip]:50070/.
It shows 1 node hdfs though, although I have 4 slaves on my cluster.
Any idea why?
On Sun, Sep 7, 2014 at 4:29 PM, Ognen Duzlevski
ognen.duzlev...@gmail.com wrote:
On 9/7/2014 7:27 AM, Tomer Benyamini wrote:
2. What should
Hi,
I would like to copy log files from s3 to the cluster's
ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
running on the cluster - I'm getting the exception below.
Is there a way to activate it, or is there a spark alternative to distcp?
Thanks,
Tomer
mapreduce.Cluster
you have a mapreduce
cluster on your hdfs?
And from the error message, it seems that you didn't specify your jobtracker
address.
--
Ye Xianjin
Sent with Sparrow
On Sunday, September 7, 2014 at 9:42 PM, Tomer Benyamini wrote:
Hi,
I would like to copy log files from s3 to the cluster's
24 matches
Mail list logo