Hi,
I'm trying to rename an orc table (either in hive or spark has no
difference). After that, all the content in the table will be invisible in
spark while it is still available in hive. The problem could alway be
recreated by very simple steps:
spark shell
I have 20 executors, 6 cores each. Total 5 stages. It fails on 5th one. All
of them have 6474 tasks. 5th task is a count operations and it also
performs aggregateByKey as a part of it lazy evaluation.
I am setting:
spark.driver.memory=10G, spark.yarn.am.memory=2G and
spark.driver.maxResultSize=9G
Driver maintains the complete metadata of application ( scheduling of
executor and maintaining the messaging to control the execution )
This code seems to be failing in that code path only. With that said there
is Jvm overhead based on num of executors , stages and tasks in your app.
Do you know
How big is your file and can you also share the code snippet
On Saturday, May 7, 2016, Johnny W. wrote:
> hi spark-user,
>
> I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a
> dataframe from a parquet data source with a single parquet file, it yields
> a
hi spark-user,
I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a
dataframe from a parquet data source with a single parquet file, it yields
a stage with lots of small tasks. It seems the number of tasks depends on
how many executors I have instead of how many parquet
Hi Simon,
Thanks. I did actually have "SPARK_WORKER_CORES=8" in spark-env.sh - its
commented as 'to set the number of cores to use on this machine'.
Not sure how this would interplay with SPARK_EXECUTOR_INSTANCES and
SPARK_EXECUTOR_CORES, but I removed it and still see no scaleup with
increasing
Right but this logs from spark driver and spark driver seems to use Akka.
ERROR [sparkDriver-akka.actor.default-dispatcher-17]
akka.actor.ActorSystemImpl: Uncaught fatal error from thread
[sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down
ActorSystem [sparkDriver]
I saw
bq. at akka.serialization.JavaSerializer.toBinary(Serializer.scala:129)
It was Akka which uses JavaSerializer
Cheers
On Sat, May 7, 2016 at 11:13 AM, Nirav Patel wrote:
> Hi,
>
> I thought I was using kryo serializer for shuffle. I could verify it from
> spark UI -
Hi,
I thought I was using kryo serializer for shuffle. I could verify it from
spark UI - Environment tab that
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator com.myapp.spark.jobs.conf.SparkSerializerRegistrator
But when I see following error in Driver logs it
Hi Ted
Following is my use case.
I have a prediction algorithm where i need to update some records to
predict the target.
For eg.
I have an eq. Y= mX +c
I need to change value of Xi of some records and calculate sum(Yi) if the
value of prediction is not close to target value then repeat the
Hi,
What is the easiest way of finding max(price) in code below
object CEP_AVG {
def main(args: Array[String]) {
// Create a local StreamingContext with two working thread and batch
interval of 10 seconds.
val sparkConf = new SparkConf().
setAppName("CEP_AVG").
Thanks Cody. It turns out that there was an even simpler explanation (the
flaw you pointed out was accurate too). I had mutable.Map instances being
passed where KafkaUtils wants immutable ones.
On Fri, May 6, 2016 at 8:32 AM, Cody Koeninger wrote:
> Look carefully at the
Check how much free memory you have on your hosr
/usr/bin/free
as a heuristic values start with these in
export SPARK_EXECUTOR_CORES=4 ##, Number of cores for the workers (Default:
1).
export SPARK_EXECUTOR_MEMORY=8G ## , Memory per Worker (e.g. 1000M, 2G)
(Default: 1G)
export
Hello,
Is there a way to instruct treeReduce() to reduce RDD partitions on the
same node locally?
In my case, I'm using treeReduce() to reduce map results in parallel. My
reduce function is just arithmetically adding map results (i.e. no notion
of aggregation by key). As far as I understand, a
Hi,
I'm running spark 1.6.1 on a single machine, initially a small one (8 cores,
16GB ram) using "--master local[*]" to spark-submit and I'm trying to see
scaling with increasing cores, unsuccessfully.
Initially I'm setting SPARK_EXECUTOR_INSTANCES=1, and increasing cores for
each executor.
Hi Divya,
I haven't actually used the package yet, but maybe you should check out the
gitter-room where the creator is quite active. You can find it on
https://gitter.im/FRosner/drunken-data-quality .
There you should be able to get the information you need.
Best,
Rick
On 6 May 2016 12:34,
Hi,
Thank you for all those answers.
The below is code I am trying out
val records = sparkSession.read.format("csv").stream("/tmp/input")
val re = records.write.format("parquet").trigger(ProcessingTime(100.seconds)).
option("checkpointLocation", "/tmp/checkpoint")
17 matches
Mail list logo