Hello spark users,

Spark-postgres is designed for reliable and performant ETL in big-data
workload and offer read/write/scd capability . The version 3 introduces
a  datasource API and simplifies the usage. It outperforms sqoop by
factor 8 and the apache spark core jdbc by infinity.

Features:
- use of pg COPY statements
- parallel reads/writes
- use of hdfs to store intermediary csv
- reindex after bulk-loading
- SCD1 computations done on the spark side
- use unlogged tables when needed
- handle arrays and multiline string columns
- useful jdbc functions (ddl, updates...)

The official repository:
https://framagit.org/parisni/spark-etl/tree/master/spark-postgres

And its mirror on microsoft github:
https://github.com/EDS-APHP/spark-etl/tree/master/spark-postgres

-- 
nicolas

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to