Re: Spark SaveMode

2019-07-21 Thread Mich Talebzadeh
I dug some of my old stuff using Spark as ETL. Regarding the question "Any reason why Spark's SaveMode doesn't have mode that ignore any Primary Key/Unique constraint violations?" There is no way Spark can determine if PK constraint is violated until it receives such message from Oracle through

Re: Spark SaveMode

2019-07-20 Thread Mich Talebzadeh
JDBC read from Oracle table requires Oracle jdbc driver ojdbc6.jar or higher. ojdbc6.jar works for 11 and 12c added as --jars /ojdbc6.jar Example with parallel read (4 connections) to Oracle with ID being your PK in Oracle table var _ORACLEserver= "jdbc:oracle:thin:@rhes564:1521:mydb12" var

Re: Spark SaveMode

2019-07-20 Thread Mich Talebzadeh
This behaviour is governed by the underlying RDBMS for bulk insert, where it either commits or roll backs. You can insert new rows into an staging table in Oracle (which is common in ETL) and then insert/select into Oracle table in shell routine. The other way is to use JDBC in Spark to read

Re: Spark SaveMode

2019-07-19 Thread Jörn Franke
This is not an issue of Spark, but the underlying database. The primary key constraint has a purpose and ignoring it would defeat that purpose. Then to handle your use case, you would need to make multiple decisions that may imply you don’t want to simply insert if not exist. Maybe you want to

Spark SaveMode

2019-07-19 Thread Richard
Any reason why Spark's SaveMode doesn't have mode that ignore any Primary Key/Unique constraint violations? Let's say I'm using spark to migrate some data from Cassandra to Oracle, I want the insert operation to be "ignore if exist primary keys" instead of failing the whole batch. Thanks,