date:20140311

Block

2014-03-11 Thread David Thomas

What is the concept of Block and BlockManager in Spark? How is a Block related to a Partition of a RDD?

pyspark broadcast error

2014-03-11 Thread Brad Miller

Hi All, When I run the program shown below, I receive the error shown below. I am running the current version of branch-0.9 from github. Note that I do not receive the error when I replace 2 ** 29 with 2 ** X, where X 29. More interestingly, I do not receive the error when X = 30, and when X

building spark over proxy

2014-03-11 Thread hades dark

Can someone help me on how to build spark over proxy settings .. -- REGARDS ASHUTOSH JAIN IIT-BHU VARANASI

Re: building spark over proxy

2014-03-11 Thread Bharath Vissapragada

http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3ccaaqhkj48japuzqc476es67c+rrfime87uprambdoofhcl0k...@mail.gmail.com%3E On Tue, Mar 11, 2014 at 11:44 AM, hades dark hades.o...@gmail.com wrote: Can someone help me on how to build spark over proxy settings .. -- REGARDS

Reading sequencefile

2014-03-11 Thread Jaonary Rabarisoa

Hi all, I'm trying to read a sequenceFile that represent a set of jpeg image generated using this tool : http://stuartsierra.com/2008/04/24/a-million-little-files . According to the documentation : Each key is the name of a file (a Hadoop “Text”), the value is the binary contents of the file (a

Spark stand alone cluster mode

2014-03-11 Thread Gino Mathews

Hi, I am new to spark. I would like to run jobs in Spark stand alone cluster mode. No cluser managers other than spark is used. (https://spark.apache.org/docs/0.9.0/spark-standalone.html) I have tried wordcount from spark shell and stand alone scala app. The code reads input from HDFS and

Re: Spark stand alone cluster mode

2014-03-11 Thread Yana Kadiyska

does sbt show full-classpath show spark-core on the classpath? I am still pretty new to scala but it seems like you have val sparkCore = org.apache.spark %% spark-core% V.spark % provided -- I believe the provided part means it's in your classpath. Spark-shell script sets up

Pyspark Memory Woes

2014-03-11 Thread Aaron Olson

Dear Sparkians, We are working on a system to do relational modeling on top of Spark, all done in pyspark. While we've been learning a lot about Spark internals so far, we're currently running into memory issues and wondering how best to profile to fix them. Here are our symptoms: - We're

Spark usage patterns and questions

2014-03-11 Thread Sourav Chandra

Hi, I have some questions regarding usage patterns and debugging in spark/spark streaming. 1. What is some used design patterns of using broadcast variable? In my application i created some and also created a scheduled task which periodically refreshes the variables. I want to know how

Re: NO SUCH METHOD EXCEPTION

2014-03-11 Thread Matei Zaharia

Since it’s from Scala, it might mean you’re running with a different version of Scala than you compiled Spark with. Spark 0.8 and earlier use Scala 2.9, while Spark 0.9 uses Scala 2.10. Matei On Mar 11, 2014, at 8:19 AM, Jeyaraj, Arockia R (Arockia) arockia.r.jeya...@verizon.com wrote: Hi,

Re: Powered By Spark Page -- Companies Organizations

2014-03-11 Thread Matei Zaharia

Thanks, added you. On Mar 11, 2014, at 2:47 AM, Christoph Böhm listenbru...@gmx.net wrote: Dear Spark team, thanks for the great work and congrats on becoming an Apache top-level project! You could add us to your Powered-by-page, because we are using Spark (and Shark) to perform

Re: Pyspark Memory Woes

2014-03-11 Thread Sandy Ryza

Hi Aaron, When you say Java heap space is 1.5G per worker, 24 or 32 cores across 46 nodes. It seems like we should have more than enough to do this comfortably., how are you configuring this? -Sandy On Tue, Mar 11, 2014 at 10:11 AM, Aaron Olson aaron.ol...@shopify.comwrote: Dear Sparkians,

is spark.cleaner.ttl safe?

2014-03-11 Thread Michael Allman

Hello, I've been trying to run an iterative spark job that spills 1+ GB to disk per iteration on a system with limited disk space. I believe there's enough space if spark would clean up unused data from previous iterations, but as it stands the number of iterations I can run is limited by

RE: unsubscribe

2014-03-11 Thread Kapil Malik

Ohh ! I thought you're unsubscribing :) Kapil Malik | kma...@adobe.com | 33430 / 8800836581 -Original Message- From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: 12 March 2014 00:51 To: user@spark.apache.org Subject: Re: unsubscribe To unsubscribe from this list, please

Re: Pyspark Memory Woes

2014-03-11 Thread Sandy Ryza

Are you aware that you get an executor (and the 1.5GB) per machine, not per core? On Tue, Mar 11, 2014 at 12:52 PM, Aaron Olson aaron.ol...@shopify.comwrote: Hi Sandy, We're configuring that with the JAVA_OPTS environment variable in $SPARK_HOME/spark-worker-env.sh like this: # JAVA OPTS

Re: Out of memory on large RDDs

2014-03-11 Thread Grega Kespret

Your input data read as RDD may be causing OOM, so thats where you can use different memory configuration. We are not getting any OOM exceptions, just akka future timeouts in mapoutputtracker and unsuccessful get of shuffle outputs, therefore refetching them. What is the industry practice

possible bug in Spark's ALS implementation...

2014-03-11 Thread Michael Allman

Hi, I'm implementing a recommender based on the algorithm described in http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the basis for Spark's ALS implementation for data sets with implicit features. The data set I'm working with is proprietary and I cannot share it,

Re: possible bug in Spark's ALS implementation...

2014-03-11 Thread Xiangrui Meng

Hi Michael, I can help check the current implementation. Would you please go to https://spark-project.atlassian.net/browse/SPARK and create a ticket about this issue with component MLlib? Thanks! Best, Xiangrui On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman m...@allman.ms wrote: Hi, I'm

Re: How to create RDD from Java in-memory data?

2014-03-11 Thread wallacemann

Ah! Thank you. That'll work for now. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-from-Java-in-memory-data-tp2486p2570.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Applications for Spark on HDFS

2014-03-11 Thread Sandy Ryza

Hi Paul, What do you mean by distributing the jars manually? If you register jars that are local to the client with SparkContext.addJars, Spark should handle distributing them to the workers. Are you taking advantage of this? -Sandy On Tue, Mar 11, 2014 at 3:09 PM, Paul Schooss

Re: How to create RDD from Java in-memory data?

2014-03-11 Thread wallacemann

In a similar vein, it would be helpful to have an Iterable way to access the data inside an RDD. The collect method takes everything in the RDD and puts in a list, but this blows up memory. Since everything I want is already inside the RDD, it could be easy to iterate over the content without

Re: How to create RDD from Java in-memory data?

2014-03-11 Thread Mark Hamstra

https://github.com/apache/incubator-spark/pull/421 Works pretty good, but really needs to be enhanced to work with AsyncRDDActions. On Tue, Mar 11, 2014 at 4:50 PM, wallacemann wall...@bandpage.com wrote: In a similar vein, it would be helpful to have an Iterable way to access the data

Re: RDD.saveAs...

2014-03-11 Thread Matei Zaharia

I agree that we can’t keep adding these to the core API, partly because it will get unwieldy to maintain and partly just because each storage system will bring in lots of dependencies. We can simply have helper classes in different modules for each storage system. There’s some discussion on

Re: Block

2014-03-11 Thread dachuan

In my opinion, BlockManager manages many types of Block, RDD's partition, a.k.a. RDDBlock, is one type of them. Other types of Blocks are ShuffleBlock, IndirectBlock (if the task's return status is too large), etc. So, BlockManager is a layer that is independent of RDD concept. On Mar 11, 2014

Re: Are all transformations lazy?

2014-03-11 Thread Ewen Cheslack-Postava

You should probably be asking the opposite question: why do you think it *should* be applied immediately? Since the driver program hasn't requested any data back (distinct generates a new RDD, it doesn't return any data), there's no need to actually compute anything yet. As the documentation

Re: Are all transformations lazy?

2014-03-11 Thread David Thomas

I think you misunderstood my question - I should have stated it better. I'm not saying it should be applied immediately, but I'm trying to understand how Spark achieves this lazy computation transformations. May be this is due to my ignorance of how Scala works, but when I see the code, I see that

Block

pyspark broadcast error

building spark over proxy

Re: building spark over proxy

Reading sequencefile

Spark stand alone cluster mode

Re: Spark stand alone cluster mode

Pyspark Memory Woes

Spark usage patterns and questions

Re: NO SUCH METHOD EXCEPTION

Re: Powered By Spark Page -- Companies Organizations

Re: Pyspark Memory Woes

is spark.cleaner.ttl safe?

RE: unsubscribe

Re: Pyspark Memory Woes

Re: Out of memory on large RDDs

possible bug in Spark's ALS implementation...

Re: possible bug in Spark's ALS implementation...

Re: How to create RDD from Java in-memory data?

Re: Applications for Spark on HDFS

Re: How to create RDD from Java in-memory data?

Re: How to create RDD from Java in-memory data?

Re: RDD.saveAs...

Re: Block

Re: Are all transformations lazy?

Re: Are all transformations lazy?

26 matches

Site Navigation

Mail list logo

Footer information