I can see how that would be a valid use case. A lot of folks have code
written using Hadoop MR apis or other layers that use them. It will help
those Dev teams in migrating those apps to Spark if such a translation
Layer was available
On Feb 1, 2014 5:01 PM, Ankur Chauhan achau...@brightcove.com
I am using 2.9.3.
On Jan 27, 2014 11:50 PM, Khanderao kand khanderao.k...@gmail.com wrote:
Scala version changed in 0.9.0 to Scala 2.10 Are you using the same
version?
On Tue, Jan 28, 2014 at 11:30 AM, Ashish Rangole arang...@gmail.comwrote:
Hi,
I am seeing the following error message
Hi,
I am seeing the following error message when I began testing my Streaming
application locally. Could it be due to a mismatch with
old spark jars somewhere or is this something else?
Thanks,
Ashish
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
I wonder if it will help to have a generic Monad container that wraps
either RDD or DStream and provides
map, flatmap, foreach and filter methods.
case class DataMonad[A](data: A) {
def map[B]( f : A = B ) : DataMonad[B] = {
DataMonad( f( data ) )
}
def flatMap[B]( f : A =
You can compress a csv or tab delimited file as well :)
You can specify the codec of your choice, say snappy, when writing out.
That's what we do. You can also write out data as sequence files. RCFile
should also be possible given the flexibility of Spark API but we haven't
tried that.
On Dec 7,
that parameter:
http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.api.java.JavaRDD
--
*From:* Ashish Rangole [arang...@gmail.com]
*Sent:* Monday, December 09, 2013 7:41 PM
*To:* user@spark.incubator.apache.org
*Subject:* Re
/index.html#org.apache.spark.api.java.JavaPairRDD
--
*From:* Ashish Rangole [arang...@gmail.com]
*Sent:* Monday, December 09, 2013 7:41 PM
*To:* user@spark.incubator.apache.org
*Subject:* Re: JavaRDD, Specify number of tasks
AFAIK yes. IIRC, there is a 2nd parameter
That data size is sufficiently small for the cluster configuration that you
mention.
Are you doing the sort in local mode or on master only? Is the default
parallelism
system property being set prior to creating SparkContext?
On Mon, Dec 9, 2013 at 10:45 PM, Matt Cheah mch...@palantir.com wrote:
I am not sure if 32 partitions is a hard limit that you have.
Unless you have a strong reason to use only 32 partitions, please try
providing the second optional
argument (numPartitions) to reduceByKey and sortByKey methods which will
paralellize these Reduce operations.
A number 3x the number of
I am sure you have already checked this, any chance the classpath has
v 0.7.x jars in it?
On Nov 29, 2013 4:40 PM, Walrus theCat walrusthe...@gmail.com wrote:
The full context isn't much -- this is the first thing I do in my main
method (assign a value to sc), and it throws this error.
On
Hi Walrus theCat,
We have been successfully using Spark 0.8 on EC2 ever since it was released
and we do this
several times a day.
We use spark-ec2.py with the new version option (--spark-version=0.8.0), to
spin-up the Spark 0.8 cluster on ec2.
The key is to use the new spark-ec2.py and not the
You likely have a different jar version between client and server.
See the URL below for a similar problem to give you some idea:
http://hbase.apache.org/book.html#client_dependencies
On Fri, Nov 22, 2013 at 8:58 AM, Sriram Ramachandrasekaran
sri.ram...@gmail.com wrote:
It's a client and
in SPARK_JAVA_OPTS to log the lengths of GC pauses.
Matei
On Oct 3, 2013, at 1:10 PM, Ashish Rangole arang...@gmail.com wrote:
Hi,
Trying to figure out what does it mean when the application (driver
program) logs end with the the lines like the ones below. This is with the
application
Hi,
Trying to figure out what does it mean when the application (driver
program) logs end with the the lines like the ones below. This is with the
application running on Spark 0.8.0 on EC2.
Any help will be greatly appreciated.
Thanks!
13/10/03 16:17:33 INFO cluster.ClusterTaskSetManager:
14 matches
Mail list logo