Please help
From: amit assudani <aassud...@impetus.com>
Date: Thursday, June 16, 2016 at 6:11 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Update Batch DF with Streaming
Hi All,
Can I update batch data frames loaded in memory with Streaming data
Hi All,
Can I update batch data frames loaded in memory with Streaming data,
For eg,
I have employee DF is registered as temporary table, it has EmployeeID, Name,
Address, etc. fields, and assuming it is very big and takes time to load in
memory,
I've two types of employee events (both
: Monday, June 29, 2015 at 5:24 PM
To: amit assudani aassud...@impetus.commailto:aassud...@impetus.com
Cc: Cody Koeninger c...@koeninger.orgmailto:c...@koeninger.org,
user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: How to recover
, June 27, 2015 at 5:14 AM
To: amit assudani aassud...@impetus.commailto:aassud...@impetus.com
Cc: Cody Koeninger c...@koeninger.orgmailto:c...@koeninger.org,
user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: How to recover in case
Also, how do you suggest catching exceptions while using with connector API
like, saveAsNewAPIHadoopFiles ?
From: amit assudani aassud...@impetus.commailto:aassud...@impetus.com
Date: Monday, June 29, 2015 at 9:55 AM
To: Tathagata Das t...@databricks.commailto:t...@databricks.com
Cc: Cody
Hi All,
While using Checkpoints ( using HDFS ), if connectivity to hadoop cluster is
lost for a while and gets restored in some time, what happens to the running
streaming job.
Is it always assumed that connection to checkpoint FS ( this case HDFS ) would
ALWAYS be HA and would never fail for
in hand.
Regards,
Amit
From: Cody Koeninger c...@koeninger.orgmailto:c...@koeninger.org
Date: Friday, June 26, 2015 at 11:32 AM
To: amit assudani aassud...@impetus.commailto:aassud...@impetus.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Problem: how do we recover from user errors (connectivity issues / storage
service down / etc.)?
Environment: Spark streaming using Kafka Direct Streams
Code Snippet:
HashSetString topicsSet = new HashSetString(Arrays.asList(kafkaTopic1));
HashMapString, String kafkaParams = new HashMapString,
:16 AM
To: amit assudani aassud...@impetus.commailto:aassud...@impetus.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org, Tathagata Das
t...@databricks.commailto:t...@databricks.com
Subject: Re: How to recover in case user errors in streaming
Also, what I understand is, max failures doesn’t stop the entire stream, it
fails the job created for the specific batch, but the subsequent batches still
proceed, isn’t it right ? And question still remains, how to keep track of
those failed batches ?
From: amit assudani aassud
Also, I get TaskContext.get() null when used in foreach function below ( I get
it when I use it in map, but the whole point here is to handle something that
is breaking in action ). Please help. :(
From: amit assudani aassud...@impetus.commailto:aassud...@impetus.com
Date: Friday, June 26, 2015
Hi Friends,
I am trying to solve a use case in spark streaming, I need help on getting to
right approach on lookup / update the master data.
Use case ( simplified )
I've a dataset of entity with three attributes and identifier/row key in a
persistent store.
Each attribute along with row key
,
Amit
From: Tathagata Das t...@databricks.commailto:t...@databricks.com
Date: Thursday, April 9, 2015 at 3:13 PM
To: amit assudani aassud...@impetus.commailto:aassud...@impetus.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re
13 matches
Mail list logo