On my local (windows) dev environment, I have been trying to get spark
streaming running to test my real time(ish) jobs. I have set the checkpoint
directory as /tmp/spark and have installed latest cygwin. I keep getting
the following error:
org.apache.hadoop.util.Shell$ExitCodeException: chmod:
Hi everyone
It turns out that I had chef installed and it's chmod has higher
preference than cygwin's chmod in the PATH. I fixed the environment
variable and now its working fine.
On 1 September 2014 11:48, Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
On my local (windows) dev
Hi,
An RDD replicated by an application is owned by only that application. No
other applications can share it. Then, what is motive behind providing the
rdd replication feature. What all oparations can be performed on the
replicated RDD.
Thank you!!!
-karthik
Hi, I'm developing an application with Spark.
My java application trying to creates spark context like
Creating spark context
public SparkContext createSparkContext(){
String execUri = System.getenv(SPARK_EXECUTOR_URI);
String[] jars = SparkILoop.getAddedJars();
Hi,all:
Can value in spark-defaults.conf support system variables?
Such as mess = ${user.home}/${user.name}.
Best Regards
Zhanfeng Huo
Hi,
Has anyone else also experienced
https://issues.apache.org/jira/browse/SPARK-2604?
It is an edge case scenario of mis configuration, where the executor memory
asked is same as the maximum allowed memory by yarn. In such situation,
application stays in hang state, and the reason is not logged
Hi,
Currently the number of shuffle partitions is config driven parameter
(SHUFFLE_PARTITIONS) . This means that anyone who is running a spark-sql query
should first of
all analyze that what value of SHUFFLE_PARTITIONS would give the best
performance for the query.
Shouldn't there be a logic
Hi all
I am struggling to implement a use case wherein I need to trigger an action
in case no data has been received for X amount of time. I haven't been able
to figure out an easy way to do this. No state/foreach methods get called
when no data has arrived. I thought of generating a 'tick'
Hello Igor,
Although Decimal is supported, Hive 0.12 does not support user definable
precision and scale (it was introduced in Hive 0.13).
Thanks,
Yin
On Sat, Aug 30, 2014 at 1:50 AM, Zitser, Igor igor.zit...@citi.com wrote:
Hi All,
New to spark and using Spark 1.0.2 and hive 0.12.
If
you could join, it'll give you the intersection and a list of the labels
where the value was found.
a.join(b).collect
Array[(String, (String, String))] = Array((4,(a,b)), (3,(a,b)))
best,
matt
On 08/31/2014 09:23 PM, Liu, Raymond wrote:
You could use cogroup to combine RDDs in one RDD for
and in python,
map = {'a': 1, 'b': 2, 'c': 3}
rdd = sc.parallelize(map.items())
rdd.collect()
[('a', 1), ('c', 3), ('b', 2)]
best,
matt
On 08/28/2014 07:01 PM, Sean Owen wrote:
val map = Map(foo - 1, bar - 2, baz - 3)
val rdd = sc.parallelize(map.toSeq)
rdd is a an RDD[(String,Int)] and
Hi,
I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from
source).
spark: 1.0.2
shark: 0.9.2
hadoop: 2.4.1
java: java version “1.7.0_67”
protobuf: 2.5.0
I have tried the smoke test in shark but got
“java.util.NoSuchElementException” error, can you please advise
I don't believe that Shark works with Spark 1.0. Have you considered
trying Spark SQL?
On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi,
I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling
from source).
spark: 1.0.2
We tried to connect the old Simba Shark ODBC driver to the Thrift JDBC Server
with Spark 1.1 RC2 and it works fine.
Best
Paolo
Paolo Platter
Agile Lab CTO
Da: Michael Armbrust mich...@databricks.com
Inviato: lunedì 1 settembre 2014 19:43
A:
i guess it is not a question of spark but a question on your dataset you need
to Setup
think about what you wonna model and how you can shape the data in such a
way spark can use it
akima is a technique i know
a_{t+1} = C1 * a_{t} + C2* a_{t-1} + ... + C6 * a_{t-5}
spark can finde the
Hi,
I have noticed that the GroupByTest example in
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala
has been changed to be run using spark-submit.
Previously, I set local as the first command line parameter, and this enable
me to
No, not currently.
2014-09-01 2:53 GMT-07:00 Zhanfeng Huo huozhanf...@gmail.com:
Hi,all:
Can value in spark-defaults.conf support system variables?
Such as mess = ${user.home}/${user.name}.
Best Regards
--
Zhanfeng Huo
http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip
http://www.adamcrume.com/blog/archive/2014/02/19/fixing-sparks-rdd-zip
Please check this url .
I got same problem in v1.0.1
In some cases, RDD losts several elements after zip so that a total count of
ZippedRDD is less than
18 matches
Mail list logo