RE: Spark SQL Transaction

Strange, Nick Wed, 20 Apr 2016 13:53:58 -0700

Nologging means no redo log is generated (or minimal redo). However undo is 
still generated and the transaction will still be rolled back in the event of 
an issue.

Nick

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Wednesday, April 20, 2016 4:08 PM
To: Andrés Ivaldi
Cc: user @spark
Subject: Re: Spark SQL Transaction

Well Oracle will allow that if the underlying table is in NOLOOGING mode :)

mtale...@mydb12.mich.LOCAL<mailto:mtale...@mydb12.mich.LOCAL>> create table 
testme(col1 int);
Table created.
mtale...@mydb12.mich.LOCAL<mailto:mtale...@mydb12.mich.LOCAL>> alter table 
testme NOLOGGING;
Table altered.
mtale...@mydb12.mich.LOCAL<mailto:mtale...@mydb12.mich.LOCAL>> insert into 
testme values(1);
1 row created.

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

On 20 April 2016 at 20:45, Andrés Ivaldi 
<iaiva...@gmail.com<mailto:iaiva...@gmail.com>> wrote:
I think the same, and I don't think reducing batches size improves speed but 
will avoid loosing all data when rollback.

Thanks for the help..

On Wed, Apr 20, 2016 at 4:03 PM, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:
yep. I think it is not possible to make SQL Server do a non logged transaction. 
Other alternative is doing inserts in small batches if possible. Or write to a 
CSV type file and use Bulk copy to load the file into MSSQL with frequent 
commits like every 50K rows?

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

On 20 April 2016 at 19:42, Andrés Ivaldi 
<iaiva...@gmail.com<mailto:iaiva...@gmail.com>> wrote:
Yes, I know that behavior , but there is not explicit Begin Transaction in my 
code, so, maybe Spark or the same driver is adding the begin transaction, or 
implicit transaction is configured. If spark is'n adding a Begin transaction on 
each insertion, then probably is database or Driver configuration...

On Wed, Apr 20, 2016 at 3:33 PM, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:

You will see what is happening in SQL Server. First create a test table called  
testme

1> use tempdb
2> go
1> create table testme(col1 int)
2> go
-- Now explicitly begin a transaction and insert 1 row and select from table
1> begin tran
2> insert into testme values(1)
3> select * from testme
4> go
(1 row affected)
 col1
 -----------
           1
-- That value col1=1 is there
--
(1 row affected)
-- Now rollback that transaction meaning in your case by killing your Spark 
process!
--
1> rollback tran
2> select * from testme
3> go
 col1
 -----------
(0 rows affected)

-- You can see that record has gone as it rolled back!

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

On 20 April 2016 at 18:42, Andrés Ivaldi 
<iaiva...@gmail.com<mailto:iaiva...@gmail.com>> wrote:
Sorry I'cant answer before, I want to know if spark is the responsible to add 
the Begin Tran, The point is to speed up insertion over losing data,  Disabling 
Transaction will speed up the insertion and we dont care about consistency... 
I'll disable te implicit_transaction and see what happens.

thanks

On Wed, Apr 20, 2016 at 12:09 PM, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:
Assuming that you are using JDBC for putting data into any ACID compliant 
database (MSSQL, Sybase, Oracle etc), you are implicitly or explicitly  adding 
BEGIN TRAN to INSERT statement in a distributed transaction. MSSQL does not 
know or care where data is coming from. If your connection completes OK a 
COMMIT TRAN will be sent and that will tell MSQL to commit transaction. If yoy 
kill Spark transaction before MSSQL receive COMMIT TRAN, the transaction will 
be rolled back.

The only option is that if you don't care about full data getting to MSSQL,to 
break your insert into chunks at source and send data to MSSQL in small 
batches. In that way you will not lose all data in MSSQL because of rollback.

HTH

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

On 20 April 2016 at 07:33, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:
Are you using JDBC to push data to MSSQL?

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

On 19 April 2016 at 23:41, Andrés Ivaldi 
<iaiva...@gmail.com<mailto:iaiva...@gmail.com>> wrote:
I mean local transaction, We've ran a Job that writes into SQLServer then we 
killed spark JVM just for testing purpose and we realized that SQLServer did a 
rollback.

Regards

On Tue, Apr 19, 2016 at 5:27 PM, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:
Hi,

What do you mean by *without transaction*? do you mean forcing SQL Server to 
accept a non logged operation?

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

On 19 April 2016 at 21:18, Andrés Ivaldi 
<iaiva...@gmail.com<mailto:iaiva...@gmail.com>> wrote:
Hello, is possible to execute a SQL write without Transaction? we dont need 
transactions to save our data and this adds an overhead to the SQLServer.

Regards.
--
Ing. Ivaldi Andres

--
Ing. Ivaldi Andres

--
Ing. Ivaldi Andres

--
Ing. Ivaldi Andres

--
Ing. Ivaldi Andres

RE: Spark SQL Transaction

Reply via email to