Hi, I am running spark batch processing job using spark-submit command. And below is my code snippet. Basically converting JsonRDD to parquet and storing it in HDFS location.
The problem I am facing is if multiple jobs are are triggered parallely, even though job executes properly (as i can see in spark webUI), there is no parquet file created in hdfs path. If 5 jobs are executed parallely than only 3 parquet files are getting created. Is this the data loss scenario ? Or am I missing something here. Please help me in this Here tableName is unique with timestamp appended to it. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val jsonRdd = sqlContext.jsonRDD(results) val parquetTable = sqlContext.parquetFile(parquetFilePath) parquetTable.registerTempTable(tableName) jsonRdd.insertInto(tableName) Regards, Vasu C