Re: Spark SQL Transaction

Mich Talebzadeh Sat, 23 Apr 2016 06:32:49 -0700

In your JDBC connection you can do

conn.commit();


or conn.rollback()

Why don't insert your data into #table in MSSQL and from there do one
insert/select into the main table. That is from ETL. In that case your main
table will be protected. Either it will have full data or no data.

Also have you specified the max packet size in JDBC to load into MSSQL
table. That will improve the speed.

Try experimenting by creating a CSV type file and use bulk load with
autocommit say every 10,000 rows into MSSQL table. That will tell you if
there is any issue. Ask the DBA to provide you with max packet size etc.
There is another limitation which would be the size of transaction log in
MSSQL database getting full.

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 23 April 2016 at 13:57, Andrés Ivaldi <iaiva...@gmail.com> wrote:

> Hello, so I executed Profiler and found that implicit isolation was turn
> on by JDBC driver, this is the default behavior of MSSQL JDBC driver, but
> it's possible change it with setAutoCommit method. There is no property for
> that so I've to do it in the code, do you now where can I access to the
> instance of JDBC class used by Spark on DataFrames?
>
> Regards.
>
> On Thu, Apr 21, 2016 at 10:59 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> This statement
>>
>> ."..each database statement is atomic and is itself a transaction.. your
>> statements should be atomic and there will be no ‘redo’ or ‘commit’ or
>> ‘rollback’."
>>
>> MSSQL compiles with ACIDITY which requires that each transaction be "all
>> or nothing": if one part of the transaction fails, then the entire
>> transaction fails, and the database state is left unchanged.
>>
>> Assuming that it is one transaction (through much doubt if JDBC does that
>> as it will take for ever), then either that transaction commits (in MSSQL
>> redo + undo are combined in syslogs table of the database) meaning
>> there will be undo + redo log generated  for that row only in syslogs. So
>> under normal operation every RDBMS including MSSQL, Oracle, Sybase and
>> others will comply with generating (redo and undo) and one cannot avoid it.
>> If there is a batch transaction as I suspect in this case, it is either all
>> or nothing. The thread owner indicated that rollback is happening so it is
>> consistent with all rows rolled back.
>>
>> I don't think Spark, Sqoop, Hive can influence the transaction behaviour
>> of an RDBMS for DML. DQ (data queries) do not generate transactions.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 21 April 2016 at 13:58, Michael Segel <msegel_had...@hotmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Sometimes terms get muddled over time.
>>>
>>> If you’re not using transactions, then each database statement is atomic
>>> and is itself a transaction.
>>> So unless you have some explicit ‘Begin Work’ at the start…. your
>>> statements should be atomic and there will be no ‘redo’ or ‘commit’ or
>>> ‘rollback’.
>>>
>>> I don’t see anything in Spark’s documentation about transactions, so the
>>> statements should be atomic.  (I’m not a guru here so I could be missing
>>> something in Spark)
>>>
>>> If you’re seeing the connection drop unexpectedly and then a rollback,
>>> could this be a setting or configuration of the database?
>>>
>>>
>>> > On Apr 19, 2016, at 1:18 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>>> >
>>> > Hello, is possible to execute a SQL write without Transaction? we dont
>>> need transactions to save our data and this adds an overhead to the
>>> SQLServer.
>>> >
>>> > Regards.
>>> >
>>> > --
>>> > Ing. Ivaldi Andres
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Ing. Ivaldi Andres
>

Re: Spark SQL Transaction

Reply via email to