Are you by any change using only memory in the storage level of the input
streams?
TD
On Mon, Jun 30, 2014 at 5:53 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Bill,
let's say the processing time is t' and the window size t. Spark does not
*require* t' t. In fact, for *temporary* peaks in
Hi,
I did netstat -na | grep 192.168.125.174 and its showing 192.168.125.174:7077
LISTEN(after starting master)
I tried to execute the following script from the slaves manually but it ends up
with the same exception and log.This script is internally executing the java
command.
If you want to stick with Java serialization and need to serialize a
non-Serializable object, your best choices are probably to either subclass
it with a Serializable one or wrap it in a class of your own which
implements its own writeObject/readObject methods (see here:
Hi,all:
I'm working to compile spark by executing './make-distribution.sh --hadoop
0.20.205.0 --tgz ',
after the completion of the compilation I found that the default version
number is 1.1.0-SNAPSHOT i.e. spark-1.1.0-SNAPSHOT-bin-0.20.205.tgz,
who know how to assign version number myself
Hi,
I am having issue in running scala example code. I have tested and able
to run successfully python example code, but when I run the scala code I
get this error
java.lang.ClassCastException: cannot assign instance of
org.apache.spark.examples.SparkPi$$anonfun$1 to field
Hi Spark,
I am running LBFGS on our user data. The data size with Kryo serialisation is
about 210G. The weight size is around 1,300,000. I am quite confused that the
performance is very close whether the data is cached or not.
The program is simple:
points = sc.hadoopFIle(int,
You can specify a custom name with the --name option. It will still contain
1.1.0-SNAPSHOT, but at least you can specify your company name.
If you want to replace SNAPSHOT with your company name, you will have to
edit make-distribution.sh and replace the following line:
VERSION=$(mvn
Sorry, there's a typo in my previous post, the line should read:
VERSION=$(mvn help:evaluate -Dexpression=project.version 2/dev/null | grep
-v INFO | tail -n 1 | sed -e 's/SNAPSHOT/$COMPANYNAME/g')
On Tue, Jul 1, 2014 at 10:35 AM, Guillaume Ballet gbal...@gmail.com wrote:
You can specify a
Hi All
Can we install RSpark on windows setup of R and use it to access the remote
Spark cluster ?
Thanks
Stuti Awasthi
::DISCLAIMER::
The
Hi,
The window size in a spark streaming is time based which means we have
different number of elements in each window. For example if you have two
streams (might be more) which are related to each other and you want to compare
them in a specific time interval. I am not clear how it will work.
Hi,
I am trying to run a project which takes data as a DStream and dumps the
data in the Shark table after various operations. I am getting the
following error :
Exception in thread main org.apache.spark.SparkException: Job aborted:
Task 0.0:0 failed 1 times (most recent failure: Exception
Hi,
The window size in a spark streaming is time based which means we have
different number of elements in each window. For example if you have two
streams (might be more) which are related to each other and you want to compare
them in a specific time interval. I am not clear how it will
Hi All,
We are using shark table to dump the data, we are getting the following
error :
Exception in thread main org.apache.spark.SparkException: Job aborted:
Task 1.0:0 failed 1 times (most recent failure: Exception failure:
java.io.FileNotFoundException: http://IP/broadcast_1)
We dont know
Hi ,
I am using Spark Standalone mode with one master and 2 slaves.I am not able to
start the workers and connect it to the master using
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077
The log says
Exception in thread main
Is this command working??
java -cp ::/usr/local/spark-1.0.0/conf:/usr/local/spark-1.0.0/
assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.2.1.jar
-XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m
org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077
Thanks
Yes.
Thanks Regards,
Meethu M
On Tuesday, 1 July 2014 6:14 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
Is this command working??
java -cp
::/usr/local/spark-1.0.0/conf:/usr/local/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.2.1.jar
-XX:MaxPermSize=128m
Thanks Xiangrui, your suggestion fixed the problem. I will see if I can upgrade
the numpy/python for a permanent fix. My current versions of python and numpy
are 2.6 and 4.1.9 respectively.
Thanks,
Sam
-Original Message-
From: Xiangrui Meng [mailto:men...@gmail.com]
Sent: Tuesday,
We changed the loglevel to DEBUG by replacing every INFO with DEBUG in
/root/ephemeral-hdfs/conf/log4j.properties and propagating it to the
cluster. There is some DEBUG output visible in both master and worker but
nothing really interesting regarding stages or scheduling. Since we
expected a
Can anyone explain to me what is difference between worker and slave? I hav
e one master and two slaves which are connected to each other, by using jps
command I can see master in master node and worker in slave nodes but I dont
have any worker in my master by using this command
One thing we ran into was that there was another log4j.properties earlier
in the classpath. For us, it was in our MapR/Hadoop conf.
If that is the case, something like the following could help you track it
down. The only thing to watch out for is that you might have to walk up the
classloader
Are you looking at the driver log? (e.g. Shark?). I see a ton of
information in the INFO category on what query is being started, what
stage is starting and which executor stuff is sent to. So I'm not sure
if you're saying you see all that and you need more, or that you're
not seeing this type of
Dear Spark Users:
Spark 1.0 has been installed as Standalone - But it can't read any compressed
(CMX/Snappy) and Sequence file residing on HDFS (it can read uncompressed files
from HDFS). The key notable message is: Unable to load native-hadoop
library.. Other related messages are -
Are you saying that both streams come in at the same rate and you have
the same batch interval but the batch size ends up different? i.e. two
datapoints both arriving at X seconds after streaming starts end up in
two different batches? How do you define real time values for both
streams? I am
Hi Bin,
VD and ED are ClassTags, you could treat them as placeholder, or template T in
C (not 100% clear).
You do not need convert graph[String, Double] to Graph[VD,ED].
Check ClassTag’s definition in Scala could help.
Best,
On Jul 1, 2014, at 4:49 AM, Bin WU bw...@connect.ust.hk wrote:
Hi
You can use either bin/run-example or bin/spark-summit to run example
code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark
classpath. There are examples in the official doc:
http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here
-Xiangrui
On Tue, Jul 1, 2014 at
Try to reduce number of partitions to match the number of cores. We
will add treeAggregate to reduce the communication cost.
PR: https://github.com/apache/spark/pull/1110
-Xiangrui
On Tue, Jul 1, 2014 at 12:55 AM, Charles Li littlee1...@gmail.com wrote:
Hi Spark,
I am running LBFGS on our
I attended yesterday on ustream.tv, but can't find the links to today's
streams anywhere. help!
--
Aditya Varun Chadha | http://www.adichad.com | +91 81308 02929 (M)
*General Session / Keynotes :
http://www.ustream.tv/channel/spark-summit-2014
http://www.ustream.tv/channel/spark-summit-2014Track A
: http://www.ustream.tv/channel/track-a1
http://www.ustream.tv/channel/track-a1Track
B: http://www.ustream.tv/channel/track-b1
Hi Tobias,
Your explanation makes a lot of sense. Actually, I tried to use partial
data on the same program yesterday. It has been up for around 24 hours and
is still running correctly. Thanks!
Bill
On Mon, Jun 30, 2014 at 5:53 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Bill,
let's say
Hi Tathagata,
Yes. The input stream is from Kafka and my program reads the data, keeps
all the data in memory, process the data, and generate the output.
Bill
On Mon, Jun 30, 2014 at 11:45 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:
Are you by any change using only memory in the
In my use case, if I need to stop spark streaming for a while, data would
accumulate a lot on kafka topic-partitions. After I restart spark streaming
job, the worker's heap will go out of memory on the fetch of the 1st batch.
I am wondering if
* Is there a way to throttle reading from kafka in
This all seems pretty hackish and a lot of trouble to get around
limitations in mllib.
The big limitation is that right now, the optimization algorithms work on
one large dataset at a time. We need a second of set of methods to work on
a large number of medium sized datasets.
I've started to code
Maybe reducing the batch duration would help :\
2014-07-01 17:57 GMT+01:00 Chen Song chen.song...@gmail.com:
In my use case, if I need to stop spark streaming for a while, data would
accumulate a lot on kafka topic-partitions. After I restart spark streaming
job, the worker's heap will go
Michael -
Does Spark SQL support rlike and like yet? I am running into that same
error with a basic select * from table where field like '%foo%' using the
hql() funciton.
Thanks
On Wed, May 28, 2014 at 2:22 PM, Michael Armbrust mich...@databricks.com
wrote:
On Tue, May 27, 2014 at 6:08 PM,
Are these sessions recorded ?
On Tue, Jul 1, 2014 at 9:47 AM, Alexis Roos alexis.r...@gmail.com wrote:
*General Session / Keynotes :
http://www.ustream.tv/channel/spark-summit-2014
http://www.ustream.tv/channel/spark-summit-2014 Track A
: http://www.ustream.tv/channel/track-a1
Where are you running the spark-class version? Hopefully also on the
workers.
If you're trying to centrally start/stop all workers, you can add a
slaves file to the spark conf/ directory which is just a list of your
hosts, one per line. Then you can just use ./sbin/start-slaves.sh to
start the
Hi Yana,
Yes, that is what I am saying. I need both streams to be at same pace. I do
have timestamps for each datapoint. There is a way suggested by Tathagata das
in an earlier post where you have have a bigger window than required and you
fetch your required data from that window based on
its kind of handy to be able to convert stuff to breeze... is there some
other way i am supposed to access that functionality?
Yieldbot is pleased to announce the release of Flambo, our Clojure DSL for
Apache Spark.
Flambo allows one to write spark applications in pure Clojure as an
alternative to Scala, Java and Python currently available in Spark.
We have already written a substantial amount of internal code in
We were not ready to expose it as a public API in v1.0. Both breeze
and MLlib are in rapid development. It would be possible to expose it
as a developer API in v1.1. For now, it should be easy to define a
toBreeze method in your own project. -Xiangrui
On Tue, Jul 1, 2014 at 12:17 PM, Koert
Seems it is a bug. I have opened
https://issues.apache.org/jira/browse/SPARK-2339 to track it.
Thank you for reporting it.
Yin
On Tue, Jul 1, 2014 at 12:06 PM, Subacini B subac...@gmail.com wrote:
Hi All,
Running this join query
sql(SELECT * FROM A_TABLE A JOIN B_TABLE B WHERE
Hi,
I'm trying to get rid of an error (NoSuchMethodError) while using Amazon's
s3 client on Spark. I'm using the Spark Submit script to run my code.
Reading about my options and other threads, it seemed the most logical way
would be make sure my jar is loaded first. Spark submit on debug shows
Hi all,
I have an issue with multiple slf4j bindings. My program was running
correctly. I just added the new dependency kryo. And when I submitted a
job, the job was killed because of the following error messages:
*SLF4J: Class path contains multiple SLF4J bindings.*
The log said there were
I am running Spark 1.0 on a 4-node standalone spark cluster (1 master + 3
worker). Our app is fetching data from Cassandra and doing a basic filter, map,
and countByKey on that data. I have run into a strange problem. Even if the
number of rows in Cassandra is just 1M, the Spark job goes seems
also, multiple calls to mapPartitions() will be pipelined by the spark
execution engine into a single stage, so the overhead is minimal.
On Fri, Jun 13, 2014 at 9:28 PM, zhen z...@latrobe.edu.au wrote:
Thank you for your suggestion. We will try it out and see how it performs.
We
think the
A lot of things can get funny when you run distributed as opposed to
local -- e.g. some jar not making it over. Do you see anything of
interest in the log on the executor machines -- I'm guessing
192.168.222.152/192.168.222.164. From here
yes, spark attempts to achieve data locality (PROCESS_LOCAL or NODE_LOCAL)
where possible just like MapReduce. it's a best practice to co-locate your
Spark Workers on the same nodes as your HDFS Name Nodes for just this
reason.
this is achieved through the RDD.preferredLocations() interface
They are recorded... For example, 2013: http://spark-summit.org/2013
I'm assuming the 2014 videos will be up in 1-2 weeks.
Marco
On Tue, Jul 1, 2014 at 3:18 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:
Are these sessions recorded ?
On Tue, Jul 1, 2014 at 9:47 AM, Alexis Roos
Hi,
On Wed, Jul 2, 2014 at 1:57 AM, Chen Song chen.song...@gmail.com wrote:
* Is there a way to control how far Kafka Dstream can read on
topic-partition (via offset for example). By setting this to a small
number, it will force DStream to read less data initially.
Please see the post at
I’d suggest asking the IBM Hadoop folks, but my guess is that the library
cannot be found in /opt/IHC/lib/native/Linux-amd64-64/. Or maybe if this
exception is happening in your driver program, the driver program’s
java.library.path doesn’t include this. (SPARK_LIBRARY_PATH from spark-env.sh
Awesome.
Just want to catch up on some sessions from other tracks.
Learned a ton over the last two days.
Thanks
Soumya
On Jul 1, 2014, at 8:50 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Yup, we’re going to try to get the videos up as soon as possible.
Matei
On Jul 1,
In your spark-env.sh, do you happen to set SPARK_PUBLIC_DNS or something of
that kin? This error suggests the worker is trying to bind a server on the
master's IP, which clearly doesn't make sense
On Mon, Jun 30, 2014 at 11:59 PM, MEETHU MATHEW meethu2...@yahoo.co.in
wrote:
Hi,
I did
Hi Zhen,
The Scala iterator trait supports cloning via the duplicate method
(http://www.scala-lang.org/api/current/index.html#scala.collection.Iterator@duplicate:(Iterator[A],Iterator[A])).
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Jun 13,
It could be cause you are out of memory on the worker nodes blocks are
not getting registered..
A older issue with 0.6.0 was with dead nodes causing loss of task then
resubmission of data in an infinite loop... It was fixed in 0.7.0 though.
Are you seeing a crash log in this log.. or in the
54 matches
Mail list logo