subject:"Optimized way to use spark as db to hdfs etl"

Re: Optimized way to use spark as db to hdfs etl

2016-11-06 Thread Sabarish Sasidharan

Pls be aware that Accumulators involve communication back with the driver and may not be efficient. I think OP wants some way to extract the stats from the sql plan if it is being stored in some internal data structure Regards Sab On 5 Nov 2016 9:42 p.m., "Deepak Sharma"

Re: Optimized way to use spark as db to hdfs etl

2016-11-05 Thread Deepak Sharma

Hi Rohit You can use accumulators and increase it on every record processing. At last you can get the value of accumulator on driver , which will give you the count. HTH Deepak On Nov 5, 2016 20:09, "Rohit Verma" wrote: > I am using spark to read from database and

Optimized way to use spark as db to hdfs etl

2016-11-05 Thread Rohit Verma

I am using spark to read from database and write in hdfs as parquet file. Here is code snippet. private long etlFunction(SparkSession spark){ spark.sqlContext().setConf("spark.sql.parquet.compression.codec", “SNAPPY"); Properties properties = new Properties();