Hi,
Trying to use spark streaming, but I am struggling with word count :(
I want consolidate output of the word count (not on a per window basis), so
I am using updateStateByKey(), but for some reason this is not working.
The function it self is not being invoked(do not see the sysout output on
Hi,
I'm having trouble using both zipWithIndex and repartition. When I use them
both, the following action will get stuck and won't return.
I'm using spark 1.1.0.
Those 2 lines work as expected:
scala sc.parallelize(1 to 10).repartition(10).count()
res0: Long = 10
scala sc.parallelize(1 to
Hi Sadhan,
Could you please provide the stack trace of the
|ArrayIndexOutOfBoundsException| (if any)? The reason why the first
query succeeds is that Spark SQL doesn’t bother reading all data from
the table to give |COUNT(*)|. In the second case, however, the whole
table is asked to be
Hi Niko,
Have you tried it running keeping the wordCounts.print() ?? Possibly the
import to the package *org.apache.spark.streaming._* is not there so during
sbt package it is unable to locate the saveAsTextFile API.
Go to
Hi Ben,I haven't tried it with Python, but the instructions are the same as for
Scala compiled (jar) apps. What it's saying is that it's not possible to
offload the entire work to the master (ala hadoop) in a fire and forget (or
rather submit-and-forget) manner when running on stand alone.
Ashic,
Thanks for your email.
Two things:
1. I think a whole lot of data scientists and other people would love
it if they could just fire off jobs from their laptops. It is, in my
opinion, a common desired use case.
2. Did anyone actually get the Ooyala job server to work? I asked that
Hi Ognen,Currently,
Note that cluster mode is currently not supported for standalone clusters,
Mesos clusters, or python applications.
So it seems like Yarn + scala is the only option for fire and forget. It
shouldn't be too hard to create a proxy submitter, but yes, that does involve
another
I was trying to zip the rdd with another rdd. I store my matrix in HDFS and
load it as Ab_rdd = sc.textFile('data/Ab.txt', 100)
If I do
idx = sc.parallelize(range(m),100) #m is the number of records in Ab.txt
print matrix_Ab.matrix.zip(idx).first()
I got the following error:
If I store my
UPDATE
I have removed and added things systematically to the job and have figured
that the inclusion of the construction of the SparkContext object is what is
causing it to fail.
The last run contained the code below.
I keep losing executors apparently and I'm not sure why. Some of the
Hi,
I wonder if the pagerank implementation is correct. More specifically, I
look at the following function from PageRank.scala
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
, which is given to Pregel:
def vertexProgram(id:
Hi,
I have already successfully compile and run spark examples. My problem is
that if I make some modifications (e.g., on SparkPi.scala or LogQuery.scala)
I have to use mvn -DskipTests package to rebuild the whole spark project
and wait a relatively long time.
I also tried mvn scala:cc as
I haven't tried scala:cc, but you can ask maven to just build a
particular sub-project. For example:
mvn -pl :spark-examples_2.10 compile
On Sat, Nov 15, 2014 at 5:31 PM, Yiming (John) Zhang sdi...@gmail.com wrote:
Hi,
I have already successfully compile and run spark examples. My problem
If Spark is not installed on the client side, you won't be able to
deserialize the model. Instead of serializing the model object, you
may serialize the model weights array and implement predict on the
client side. -Xiangrui
On Fri, Nov 14, 2014 at 2:54 PM, xiaoyan yu xiaoyan...@gmail.com wrote:
This is a bug. Could you make a JIRA? -Xiangrui
On Sat, Nov 15, 2014 at 3:27 AM, lev kat...@gmail.com wrote:
Hi,
I'm having trouble using both zipWithIndex and repartition. When I use them
both, the following action will get stuck and won't return.
I'm using spark 1.1.0.
Those 2 lines
I think I understand where the bug is now. I created a JIRA
(https://issues.apache.org/jira/browse/SPARK-4433) and will make a PR
soon. -Xiangrui
On Sat, Nov 15, 2014 at 7:39 PM, Xiangrui Meng men...@gmail.com wrote:
This is a bug. Could you make a JIRA? -Xiangrui
On Sat, Nov 15, 2014 at 3:27
PR: https://github.com/apache/spark/pull/3291 . For now, here is a workaround:
val a = sc.parallelize(1 to 10).zipWithIndex()
a.partitions // call .partitions explicitly
a.repartition(10).count()
Thanks for reporting the bug! -Xiangrui
On Sat, Nov 15, 2014 at 8:38 PM, Xiangrui Meng
Hi Cheng,
Thanks for your response.Here is the stack trace from yarn logs:
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-exception-on-cached-parquet-table-tp18978p19020.html
Sent from the Apache Spark User List mailing list archive at
Hi all,
We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true, we got
exceptions as below, has anyone else saw these before?
java.lang.ExceptionInInitializerError
at
org.apache.spark.sql.execution.SparkPlan.newProjection(SparkPlan.scala:92)
at
18 matches
Mail list logo