Why don't you apply proper change data capture?
This will be more complex though.
Am Mo., 25. Mai 2020 um 07:38 Uhr schrieb Manjunath Shetty H <
manjunathshe...@live.com>:
> Hi Mike,
>
> Thanks for the response.
>
> Even with that flag set data miss can happen right ?. As the fetch is
> based on
Hi Mike,
Thanks for the response.
Even with that flag set data miss can happen right ?. As the fetch is based on
the last watermark (maximum timestamp of the row that last batch job fetched ),
Take a scenario like this with table
a : 1
b : 2
c : 3
d : 4
f : 6
g : 7
h : 8
e : 5
*
Does anything different happened when you set the isolationLevel to do
Dirty Reads i.e. "READ_UNCOMMITTED"
On Sun, May 24, 2020 at 7:50 PM Manjunath Shetty H
wrote:
> Hi,
>
> We are writing a ETL pipeline using Spark, that fetch the data from SQL
> server in batch mode (every 15mins). Problem
Hi,
We are writing a ETL pipeline using Spark, that fetch the data from SQL server
in batch mode (every 15mins). Problem we are facing when we try to
parallelising single table reads into multiple tasks without missing any data.
We have tried this,
* Use `ROW_NUMBER` window function in
On Sat, 16 May 2020, 22:34 Punna Yenumala, wrote:
>
Hi Avadhut Narayan JoshiThe use case is achievable using Spark. Connection to
SQL Server possible as Mich mentioned below as longs as there a JDBC driver
that can connect to SQL ServerFor a production workloads important points to
consider, >> what is the QoS requirements for your case? at least
How a Spark job reads datasources depends on the underlying source system,the
job configuration about number of executors and cores per executor.
https://spark.apache.org/docs/latest/rdd-programming-guide.html#external-datasets
About Shuffle operations.
I am writing something that partitions a data set and then trains a machine
learning model on the data in each partition
The resulting model is very big and right now i am storing it in an rdd as
a pair of :
partition_id and very_big_model_that_is_hundreds_of_megabytes_big
but it is becoming
Hi,
while reading streaming data from kafka we use following API.
df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "host1:port1,host2:port2") \
.option("subscribe", "topic1") \
.option("startingOffsets", "earliest") \
.load()
My Question is how to
10 matches
Mail list logo