Hi Tathagata/Cody,
I am facing a challenge in Production with DAG behaviour during
checkpointing in spark streaming -
Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~
100 GB of data
Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to parallelise
processing -
Hi All,
I have enabled replication for my RDDs.
I see on the Storage tab of the Spark UI, which mentions the replication
level 2x or 1x.
But the names given are mappedRDD, shuffledRDD, I am not able to debug
which of my RDD is 2n replicated, and which one is 1x.
Please help.
Regards,
Gaurab
; On Fri, Oct 23, 2015 at 8:36 AM, gaurav sharma <sharmagaura...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I created 2 workers on same machine each with 4 cores and 6GB ram
>>
>> I submitted first job, and it allocated 2 cores on each of the worker
>> proce
Hi,
I created 2 workers on same machine each with 4 cores and 6GB ram
I submitted first job, and it allocated 2 cores on each of the worker
processes, and utilized full 4 GB ram for each executor process
When i submit my second job it always say in WAITING state.
Cheers!!
On Tue, Oct 20,
...@databricks.com> wrote:
>
>> Could you periodically (say every 10 mins) run System.gc() on the driver.
>> The cleaning up shuffles is tied to the garbage collection.
>>
>>
>> On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma <sharmagaura...@gmail.com>
>> w
Hi All,
I have a 24x7 running Streaming Process, which runs on 2 hour windowed data
The issue i am facing is my worker machines are running OUT OF DISK space
I checked that the SHUFFLE FILES are not getting cleaned up.
Ideally the 2 messages read from kafka must differ on some parameter
atleast, or else they are logically same
As a solution to your problem, if the message content is same, u cud create
a new field UUID, which might play the role of partition key while
inserting the 2 messages in Cassandra
Msg1
I have run into similar excpetions
ERROR DirectKafkaInputDStream: ArrayBuffer(java.net.SocketTimeoutException,
org.apache.spark.SparkException: Couldn't find leader offsets for
Set([AdServe,1]))
and the issue has happened on Kafka Side, where my broker offsets go out of
sync, or do not return
Hi guys,
I too am facing similar challenge with directstream.
I have 3 Kafka Partitions.
and running spark on 18 cores, with parallelism level set to 48.
I am running simple map-reduce job on incoming stream.
Though the reduce stage takes milliseconds-seconds for around 15 million
packets,
permission to write to /opt/ on that machine as its one machine always
throwing up.
Thanks
Best Regards
On Sat, Jul 11, 2015 at 11:18 PM, gaurav sharma sharmagaura...@gmail.com
wrote:
Hi All,
I am facing this issue in my production environment.
My worker dies by throwing this exception
Hi All,
I am facing this issue in my production environment.
My worker dies by throwing this exception.
But i see the space is available on all the partitions on my disk
I did NOT see any abrupt increase in DIsk IO, which might have choked the
executor to write on to the stderr file.
But still
When you submit a job, spark breaks down it into stages, as per DAG. the
stages run transformations or actions on the rdd's. Each rdd constitutes of
N partitions. The tasks creates by spark to execute the stage are equal to
the number of partitions. Every task is executed on the cored utilized
. It will get
shipped once to all nodes in the cluster and can be referenced by them.
HTH.
-Todd
On Thu, Jun 11, 2015 at 8:23 AM, gaurav sharma sharmagaura...@gmail.com
wrote:
Hi,
I am using Kafka Spark cluster for real time aggregation analytics use
case in production.
Cluster details
Hi,
I am using Kafka Spark cluster for real time aggregation analytics use case
in production.
*Cluster details*
*6 nodes*, each node running 1 Spark and kafka processes each.
Node1 - 1 Master , 1 Worker, 1 Driver,
1 Kafka process
Node 2,3,4,5,6 - 1 Worker prcocess each
Hi,
I am using Kafka Spark cluster for real time aggregation analytics use case
in production.
Cluster details
6 nodes, each node running 1 Spark and kafka processes each.
Node1 - 1 Master , 1 Worker, 1 Driver,
1 Kafka process
Node 2,3,4,5,6 - 1 Worker prcocess each
15 matches
Mail list logo