Yes we do track the issues in Github.
You can report it in the Calliope EA
repohttp://github.com/tuplejump/calliope(just send me your GIthub ID
or signup from calliope homepage, [Get
Early access
linkhttps://docs.google.com/forms/d/1jFTqKnp_13vTjXwy3Zex58X1JKRsFJLLWNhyZ9mQUDg/viewform]
and we
Seems like it is not able to find a particular class -
org.apache.spark.metrics.sink.MetricsServlet .
How are you running your program? Is this an intermittent error? Does it go
away if you do a clean compilation of your project and run again?
TD
On Tue, Feb 4, 2014 at 9:22 AM, soojin
Hi Sourav,
For number of records received per second, you could use something like
this to calculate number of records in each batch, and divide it by your
batch size.
yourKafkaStream.foreachRDD(rdd = {
val count = rdd.count
println(Current rate = + (count / batchSize) + records / second)
Hi,
Fir some of the stages I am getting scheduler delay is ~300-400 ms where
for some other taks it is ~100 ms.
I am curious to know what factors should i look into debugging scheduler
delay problems?
How can i fix this?
Thanks,
--
Sourav Chandra
Senior Software Engineer
· · · · · · · · · ·
To start the discussion on Calliope core (Spark+Cassandra part of it)
becoming a contrib module in Spark (spark-cassandra) I have opened a issue
on Spark Jira. https://spark-project.atlassian.net/browse/SPARK-1054
Please let us know what you all feel of it... great/good idea/bad
idea/doesn't make
Hi Tathagata,
How can i find the batch size?
Thanks,
Sourav
On Wed, Feb 5, 2014 at 2:02 PM, Tathagata Das
tathagata.das1...@gmail.comwrote:
Hi Sourav,
For number of records received per second, you could use something like
this to calculate number of records in each batch, and divide it
Responses inline.
On Mon, Feb 3, 2014 at 11:03 AM, Liam Stewart liam.stew...@gmail.comwrote:
I'm looking at adding spark / shark to our analytics pipeline and would
also like to use spark streaming for some incremental computations, but I
have some questions about the suitability of spark
Hi,
I'm trying to execute a stream application using local[4], however I just
see one executor in the web UI, shouldn't be more? one executor per worker
thread?
I'm trying to open connections in all the worker nodes to a mysql database
and keep them open until the end of the stream.
Do you guys
Hi,
Spark 0.8.1 had built in support for windows and sbt directory contains
sbt.bat as well as sbt-launch-0.11.3-2.jar.
Spark 0.9.0 does not have these files. - How should we run it on windows?
--
Eran | CTO
Hi, I have the problem running examples in IntelliJ IDEA 13.02.
Any idea ?
Thanks
/usr/lib64/jvm/java-1.7.0-openjdk-1.7.0/bin/java -Didea.launcher.port=7534
-Didea.launcher.bin.path=/home/zgalic/development/idea-IC-133.696/bin
-Dfile.encoding=UTF-8 -classpath
Not sure it is the same but recall having the same problem yesterday when I
run it using mvn. it appears that the mvn package did not complete well
since I have to changes something in the pom.xml so when I run the mvn
exec:java
-Dexec.mainClass=SimpleApp it failed with the same error.
HTH
Eran
I'm using it through Git Bash and GitHub's Git Shell.
On Wed, Feb 5, 2014 at 3:33 PM, goi cto goi@gmail.com wrote:
Hi,
Spark 0.8.1 had built in support for windows and sbt directory contains
sbt.bat as well as sbt-launch-0.11.3-2.jar.
Spark 0.9.0 does not have these files. - How
Did you try it with Cygwin?
On Wed, Feb 5, 2014 at 4:08 PM, goi cto goi@gmail.com wrote:
So what is the equivalent of building it in Git
sbt\sbt assembly
?
On Wed, Feb 5, 2014 at 5:06 PM, Stevo Slavić ssla...@gmail.com wrote:
I'm using it through Git Bash and GitHub's Git Shell.
So what is the equivalent of building it in Git
sbt\sbt assembly
?
On Wed, Feb 5, 2014 at 5:06 PM, Stevo Slavić ssla...@gmail.com wrote:
I'm using it through Git Bash and GitHub's Git Shell.
On Wed, Feb 5, 2014 at 3:33 PM, goi cto goi@gmail.com wrote:
Hi,
Spark 0.8.1 had built in
Cool, got it to work just fine with Git Bash.
Thanks!
On Wed, Feb 5, 2014 at 5:10 PM, Haris Osmanagic
haris.osmana...@gmail.comwrote:
Did you try it with Cygwin?
On Wed, Feb 5, 2014 at 4:08 PM, goi cto goi@gmail.com wrote:
So what is the equivalent of building it in Git
sbt\sbt
I'm running a Spark cluster. (Spark-0.9.0_SNAPSHOT).
I connect to the Spark cluster from the spark-shell. I can see the Spark
web UI on n001:8080 and it shows that the master is running on
spark://n001:7077
However, when I try to connect to it using a standalone Scala program but
I'm getting
You could also follow SBT or Maven's instructions for installing under
Windows (http://www.scala-sbt.org/release/docs/Getting-Started/Setup.html)
and use that to build from cmd.exe or powershell. We no longer provide a
sbt.bat file because we had to stop bundling binary artifacts (such as the
SBT
Hi Matei,
Firstly thank you a lot for answer.You are right I'm missing on local the
hadoop-client dependency.
But in my cluster I deployed the last version of spark-0.9.0 and now on same
code I get the next error to sbt package:
[warn] ::
[warn]
Try depending on spark-core_2.10 rather than 2.10.3 -- the third digit was
dropped in the maven artifact and I hit this just yesterday as well.
Sent from my mobile phone
On Feb 5, 2014 10:41 AM, Dana Tontea d...@cylex.ro wrote:
Hi Matei,
Firstly thank you a lot for answer.You are right I'm
What do you mean by the last version of spark-0.9.0? To be precise,
there isn't anything known as spark-0.9.0. What was released recently is
spark-0.9.0-incubating, and there is and only ever will be one version of
that. If you're talking about a 0.9.0-incubating-SNAPSHOT built locally,
then
Has anyone tried this? I'd like to read a bunch of Avro GenericRecords
from a Parquet file. I'm having a bit of trouble with respect to
dependencies. My latest attempt looks like this:
export
I'm assuming you checked all the jars in SPARK_CLASSPATH to confirm that
parquet/org/codehaus/jackson/JsonGenerationException.class exists in one of
them?
On Wed, Feb 5, 2014 at 12:02 PM, Uri Laserson laser...@cloudera.com wrote:
Has anyone tried this? I'd like to read a bunch of Avro
When you look in the webui (port 8080) for the master does it list at least
one connected worker?
On Wed, Feb 5, 2014 at 7:19 AM, Soumya Simanta soumya.sima...@gmail.comwrote:
I'm running a Spark cluster. (Spark-0.9.0_SNAPSHOT).
I connect to the Spark cluster from the spark-shell. I can see
I have sucessfully run spark-shell with Spark 0.9 and CDH4, but when I tried
the same with CDH5 I keep getting jvm crash.
I am running HDFS cluster with CDH5 (I tried bot version of client
2.2.0-mr1-CDH5.0.0-beta-1 and 2.2.0-CDH5.0.0)
In attachment error log from jvm hs_err_pid28948.log
Hi,
I was hoping to have some discussion about how sparse matrices are
represented in MLLib. I noticed a few commits to add a basic MatrixEntry
object:
https://github.com/apache/incubator-spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/MatrixEntry.scala
I know that
I'm evaluating whether Spark would be a good fit in my current streaming data
processing pipeline, and I'm just a bit confused about the differentiation
between spark and spark streaming.
Spark seems to have a mature Python API that I plan on trying out, but Spark
Streaming appears to NOT have
Yes, of course. That class is a jackson class, and I'm not sure why it's
being referred to as *parquet.*org.codehaus.jackson.JsonGenerationException.
org.codehaus.jackson.JsonGenerationException is on the classpath. But not
when it's prefixed by parquet.
On Wed, Feb 5, 2014 at 12:06 PM,
Hi Xiangrui,
We are also adding support for sparse format in mllib...if you have a pull
request or jira link could you please point to it ? Jblas does not
implememt sparse formats the last time I looked at it but colt had sparse
formats which could be reused...
Thanks.
Deb
On Jan 31, 2014 11:15
Hi,
I created a JIRA for discussion and track the progress:
https://spark-project.atlassian.net/browse/MLLIB-18
Let us move our discussion there.
Best,
Xiangrui
On Wed, Feb 5, 2014 at 3:35 PM, Debasish Das debasish.da...@gmail.com wrote:
Hi Xiangrui,
We are also adding support for sparse
Hi Imran,
Yes, for better performance in computation, we should use CSC or CSR
format. I believe that we are moving towards that direction. But let
us first discuss the format for sparse input. Do you mind moving our
discussion to the JIRA I created?
After creating a lot of Spark connections, work/app-* folders in Worker
nodes keep getting created without any clean-up being done. This
particularly becomes a problem when the Spark driver programs ship jars or
files. Is there any way to garbage collect these without manually deleting
them?
I'm observing this as well on 0.9.0, with several 10s of GB accumulating in
that directory but never being cleaned up. I think this has gotten more
pronounced in 0.9.0 as well with large reducers spilling to disk.
On Wed, Feb 5, 2014 at 3:46 PM, Mingyu Kim m...@palantir.com wrote:
After
Yep, I did not include that jar in the class path. Now I've got some
real errors to try to work through. Thanks!
On Wed, Feb 5, 2014 at 3:52 PM, Jey Kottalam j...@cs.berkeley.edu wrote:
Hi Uri,
Could you try adding the parquet-jackson JAR to your classpath? There
may possibly be other
Hey,
In spark-shell, I'm doing:
val s3 = // connection to s3 using aws-java-sdk
val mapping: Map[String, String] = {
// use s3 to load file, create a plain map
}
val rdd = sc.loadSomeData().map {
// use mapping local var, but *not* s3
}
rdd.count()
This blows up with Task not serializable
val s3 = // connection to s3 using aws-java-sdk
Turns out:
@transient val s3 = ...
Works. I suppose I should have thought of this (...although I really
think this did work before :-), but I stumbled across:
@transient val sc =
org.apache.spark.repl.Main.interp.createSparkContext();
In
I am cross-posting on the parquet mailing list. Short recap: I am trying
to read Parquet data from the spark interactive shell.
I have added all the necessary parquet jars to SPARK_CLASSPATH:
export
My spark is 0.9.0-SNAPSHOT, built from wherever master was at the time
(like a week or two ago).
If you're referring to the cloneRecords parameter, it appears to default to
true, but even when I add it explicitly, I get the same error.
On Wed, Feb 5, 2014 at 7:17 PM, Frank Austin Nothaft
Uri,
Er, yes, it is the cloneRecords, and when I said true, I meant false… Apologies
for the misdirection there.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Feb 5, 2014, at 7:44 PM, Uri Laserson laser...@cloudera.com wrote:
My spark is
That cloneRecords parameter is gone, so either use the released 0.9.0 or
the current master.
On Thu, Feb 6, 2014 at 9:17 AM, Frank Austin Nothaft
fnoth...@berkeley.eduwrote:
Uri,
Er, yes, it is the cloneRecords, and when I said true, I meant false...
Apologies for the misdirection there.
Hi,
In older posts on Google Groups, there was mention of checking the logs on
“preferred/non-preferred” for data locality.
But I can’t seem to find this on 0.9.0 anymore? Has this been changed to
“PROCESS_LOCAL” , like this:
14/02/06 13:51:45 INFO TaskSetManager: Starting task 9.0:50 as TID
If you have multiple executors running on a single node then you might have
data that's on the same server but in different JVMs. Just on the same
server is NODE_LOCAL, but being in the same JVM is PROCESS_LOCAL.
Yes it was changed to be more specific than just preferred/non-preferred.
The new
// version 0.9.0
Hi Spark users,
My understanding of the MEMORY_AND_DISK_SER persistence level was that if
an RDD could fit into memory then it would be left there (same as
MEMORY_ONLY), and only if it was too big for memory would it spill to disk.
Here's how the docs describe it:
42 matches
Mail list logo