get the same value of the broadcast variable
(e.g. if the variable is shipped to a new node later).
Best Regards,
Shixiong Zhu
2014-11-12 15:20 GMT+08:00 qiaou qiaou8...@gmail.com:
this work!
but can you explain why should use like this?
--
qiaou
已使用 Sparrow http://www.sparrowmailapp.com
it? Is there
a SparkContext field in the outer class?
Best Regards,
Shixiong Zhu
2014-10-28 0:28 GMT+08:00 octavian.ganea octavian.ga...@inf.ethz.ch:
I am also using spark 1.1.0 and I ran it on a cluster of nodes (it works
if I
run it in local mode! )
If I put the accumulator inside the for loop, everything
Now it doesn't support such query. I can easily reproduce it. Created a
JIRA here: https://issues.apache.org/jira/browse/SPARK-4296
Best Regards,
Shixiong Zhu
2014-11-07 16:44 GMT+08:00 Tridib Samanta tridib.sama...@live.com:
I am trying to group by on a calculated field. Is it supported
Will this work even with Kryo Serialization ?
Now spark.closure.serializer must be
org.apache.spark.serializer.JavaSerializer. Therefore the serialization
closure functions won’t be involved with Kryo. Kryo is only used to
serialize the data.
Best Regards,
Shixiong Zhu
2014-11-07 12:27 GMT+08
is not persisted, Spark needs to
load the data again. You can call RDD.cache to persist the RDD in the
memory.
Best Regards,
Shixiong Zhu
2014-11-06 11:35 GMT+08:00 nsareen nsar...@gmail.com:
I noticed a behaviour where it was observed that, if i'm using
val temp = sc.parallelize ( 1 to 10
Two limitations we found here:
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-in-quot-cogroup-quot-td17349.html
Best Regards,
Shixiong Zhu
2014-11-06 2:04 GMT+08:00 Yangcheng Huang yangcheng.hu...@huawei.com:
Hi
One question about the power of spark.shuffle.spill –
(I
.
Best Regards,
Shixiong Zhu
2014-11-06 7:56 GMT+08:00 ankits ankitso...@gmail.com:
In my spark job, I have a loop something like this:
bla.forEachRdd(rdd = {
//init some vars
rdd.forEachPartition(partiton = {
//init some vars
partition.foreach(kv = {
...
I am seeing
I mean updating the spark conf not only in the driver, but also in the
Spark Workers.
Because the driver configurations cannot be read by the Executors, they
still use the default spark.io.compression.codec to deserialize the tasks.
Best Regards,
Shixiong Zhu
2014-10-28 16:39 GMT+08:00 buring
Or def getAs[T](i: Int): T
Best Regards,
Shixiong Zhu
2014-10-29 13:16 GMT+08:00 Zhan Zhang zzh...@hortonworks.com:
Can you use row(i).asInstanceOf[]
Thanks.
Zhan Zhang
On Oct 28, 2014, at 5:03 PM, Mohammed Guller moham...@glassbeam.com
wrote:
Hi –
The Spark SQL Row class has
and these values cannot fit
into memory. Spilling data to disk helps nothing because cogroup needs to
read all values for a key into memory.
Any suggestion to solve these OOM cases? Thank you,.
Best Regards,
Shixiong Zhu
to check if anyone has similar problem and
better solution.
Best Regards,
Shixiong Zhu
2014-10-28 0:13 GMT+08:00 Holden Karau hol...@pigscanfly.ca:
On Monday, October 27, 2014, Shixiong Zhu zsxw...@gmail.com wrote:
We encountered some special OOM cases of cogroup when the data in one
Are you using spark standalone mode? If so, you need to
set spark.io.compression.codec for all workers.
Best Regards,
Shixiong Zhu
2014-10-28 10:37 GMT+08:00 buring qyqb...@gmail.com:
Here is error log,I abstract as follows:
INFO [binaryTest---main]: before first
WARN
Best Regards,
Shixiong Zhu
2014-08-14 22:11 GMT+08:00 Christopher Nguyen c...@adatao.com:
Hi Hoai-Thu, the issue of private default constructor is unlikely the
cause here, since Lance was already able to load/deserialize the model
object.
And on that side topic, I wish all serdes libraries
I think in the following case
class Foo { def foo() = Array(1.0) }
val t = new Foo
val m = t.foo
val r1 = sc.parallelize(List(1, 2, 3))
val r2 = r1.map(_ + m(0))
r2.toArray
Spark should not serialize t. But looks it will.
Best Regards,
Shixiong Zhu
2014-08-14 23:22 GMT+08:00 lancezhange
You can use JavaPairRDD.saveAsHadoopFile/saveAsNewAPIHadoopFile.
Best Regards,
Shixiong Zhu
2014-06-20 14:22 GMT+08:00 abhiguruvayya sharath.abhis...@gmail.com:
Any inputs on this will be helpful.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How
solution is using rdd.partitionBy(new HashPartitioner(1)) to
make sure there is only one partition. But that's not efficient for big
input.
Best Regards,
Shixiong Zhu
2014-04-02 11:10 GMT+08:00 Thierry Herrmann thierry.herrm...@gmail.com:
I'm new to Spark, but isn't this a pure scala question
to create a RDD from a collection.
Best Regards,
Shixiong Zhu
2014-03-19 20:52 GMT+08:00 Yana Kadiyska yana.kadiy...@gmail.com:
Not sure what you mean by not getting information how to join. If
you mean that you can't see the result I believe you need to collect
the result of the join
().take(5)
Best Regards,
Shixiong Zhu
2014-03-09 13:30 GMT+08:00 Kane kane.ist...@gmail.com:
when i try to open sequence file:
val t2 = sc.sequenceFile(/user/hdfs/e1Mseq, classOf[String],
classOf[String])
t2.groupByKey().take(5)
I get:
org.apache.spark.SparkException: Job aborted: Task 25.0:0
Regards,
Shixiong Zhu
2014-03-04 4:23 GMT+08:00 Oleksandr Olgashko alexandrolg...@gmail.com:
Hello. How should i better check two Vector's for equality?
val a = new Vector(Array(1))
val b = new Vector(Array(1))
println(a == b)
// false
101 - 119 of 119 matches
Mail list logo