Hi
Not sure, if this will help you.
1. Create one application that will put files to your S3 bucket from public
data source (You can use public wiki-data)
2. Create another application (SparkStreaming one) which will listen on
that bucket ^^ and perform some operation (Caching, GroupBy etc) as
Something like this???
import java.util.List;
import org.apache.commons.configuration.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import
Hi,
When i send the following statements in spark-shell:
val file =
sc.textFile(hdfs://nameservice1/user/study/spark/data/soc-LiveJournal1.txt)
val count = file.flatMap(line = line.split( )).map(word = (word,
1)).reduceByKey(_+_)
println(count.count())
and, it throw a exception:
Can you share some example code of what you are doing?
BTW Gmail puts down your mail as spam, saying it cannot verify it came from
yahoo.com. Might want to check your mail client settings. (It could be a
Gmail or Yahoo bug too of course.)
On Fri, Jun 27, 2014 at 4:29 PM, harsh2005_7
The code base is huge but sharing the snapshot of it which I think might give
you some idea . Here is my class Player which is supposed to be my vertex
attribute :
*class Player(var RvalRdd: RDD[((Int, Int), Double)], Slope_m: Double)
extends Serializable {
//Some code here
}*
As you can see
Hi:
Is there a comprehensive properties list (with permissible/default values) for
spark ?
Thanks
Mans
Hi all,
I am trying to create a custom RDD class for result set of queries
supported in InMobi Grill (http://inmobi.github.io/grill/)
Each result set has a schema (similar to Hive's TableSchema) and a path in
HDFS containing the result set data.
An easy way of doing this would be to create a
Hi all,
I reinstalled spark,reboot the system,but still I am not able to start the
workers.Its throwing the following exception:
Exception in thread main org.jboss.netty.channel.ChannelException: Failed to
bind to: master/192.168.125.174:0
I doubt the problem is with 192.168.125.174:0.
Hi,
Try setting driver-java-options with spark-submit or set
spark.executor.extraJavaOptions in spark-default.conf
Thanks Regards,
Meethu M
On Monday, 30 June 2014 1:28 PM, hansen han...@neusoft.com wrote:
Hi,
When i send the following statements in spark-shell:
val file =
Hello,
I’m trying to use KMeans with MLLib but am getting a TaskNotSerializable
error. I’m using Spark 0.9.1 and invoking the KMeans.run method with k = 2
and numPartitions = 200. Has anyone seen this error before and know what
could be the reason for this?
Thanks,
Daniel
I'm trying to perform operations on a large RDD, that ends up being about 1.3
GB in memory when loaded in. It's being cached in memory during the first
operation, but when another task begins that uses the RDD, I'm getting this
error that says the RDD was lost:
14/06/30 09:48:17 INFO
I'm hoping someone can clear up some confusion for me.
When I view the Spark 1.0 docs online (http://spark.apache.org/docs/1.0.0/)
they are different than the docs which are packaged with the Spark 1.0.0
download (spark-1.0.0.tgz).
In particular, in the online docs, there's a single merged Spark
Could you post the code snippet and the error stack trace? -Xiangrui
On Mon, Jun 30, 2014 at 7:03 AM, Daniel Micol dmi...@gmail.com wrote:
Hello,
I’m trying to use KMeans with MLLib but am getting a TaskNotSerializable
error. I’m using Spark 0.9.1 and invoking the KMeans.run method with k = 2
I am new to spark streaming and wondering if spark streaming tracks
counters (e.g., how many rows in each consumer, how many rows routed to an
individual reduce task, etc.) in any form so I can get an idea of how data
is skewed? I checked spark job page but don't seem to find any.
--
Chen Song
Tobias,
Your suggestion is very helpful. I will definitely investigate it.
Just curious. Suppose the batch size is t seconds. In practice, does Spark
always require the program to finish processing the data of t seconds
within t seconds' processing time? Can Spark begin to consume the new batch
Hi community, this one should be an easy one:
I have left spark.task.maxFailures to it's default (which should be
4). I see a job that shows the following statistics for Tasks:
Succeeded/Total
7109/819 (1 failed)
So there were 819 tasks to start with. I have 2 executors in that
cluster. From
Hi,
Spark 1.0 has been installed as Standalone - But it can't read any compressed
(CMX/Snappy) and Sequence file residing on HDFS. The key notable message is:
Unable to load native-hadoop library.. Other related messages are -
Caused by: java.lang.IllegalStateException: Cannot load
Hi everyone,I was able to solve this issue. For now I changed the library code
and added the following to the class com.wcohen.ss.BasicStringWrapper:
public class BasicStringWrapper implements Serializable
However, I am still curious to know ho to get around the issue when you don't
have
Hi,
I modified the example code for logistic regression to compute the error in
classification. Please see below. However the code is failing when it makes a
call to:
labelsAndPreds.filter(lambda (v, p): v != p).count()
with the error message (something related to numpy or dot product):
Hi All,
I am resending this message because I suspect the original may have been
blocked from the mailing list due to attachments. Note that the mail does
appear on the apache archives
Bill,
let's say the processing time is t' and the window size t. Spark does not
*require* t' t. In fact, for *temporary* peaks in your streaming data, I
think the way Spark handles it is very nice, in particular since 1) it does
not mix up the order in which items arrived in the stream, so items
Hi Daniel,
I also faced the same issue when using Naive Bayes classifier in MLLib. I
was able to solve it by making all fields in the calling object either
transient of serializable.
Spark will print which class's object it was not able to serialize, in the
error message. that can give you a
Done :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/History-Server-renered-page-not-suitable-for-load-balancing-tp7447p8550.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi Andrew,
I'm actually using spark-submit, and I tried using
spark.executor.extraJavaOpts to configure tachyon client to connect to
Tachyon HA master, however the configuration settings were not picked up.
On the other hand when I set the same tachyon configuration parameters
through
Hi Prabeesh,
I've recently moved to mesos 0.18.2 and spark 1.0, so far no problems in
fine grained mode, even for grapx or mllib workflows. If u have specific
code snippets I can try it out.
Best regards
Lukasz
--
View this message in context:
You were using an old version of numpy, 1.4? I think this is fixed in
the latest master. Try to replace vec.dot(target) by numpy.dot(vec,
target), or use the latest master. -Xiangrui
On Mon, Jun 30, 2014 at 2:04 PM, Sam Jacobs sam.jac...@us.abb.com wrote:
Hi,
I modified the example code for
26 matches
Mail list logo