I am not sure, if this the easiest way to solve your problem. But you can
connect to the HIVE metastore(through derby) and find the HDFS path from
there.
On Wed, Feb 18, 2015 at 9:31 AM, Vasu C vasuc.bigd...@gmail.com wrote:
Hi,
I am running spark batch processing job using spark-submit command. And
below is my code snippet. Basically converting JsonRDD to parquet and
storing it in HDFS location.
The problem I am facing is if multiple jobs are are triggered parallely,
even though job executes properly (as i can see in spark webUI), there is
no parquet file created in hdfs path. If 5 jobs are executed parallely than
only 3 parquet files are getting created.
Is this the data loss scenario ? Or am I missing something here. Please
help me in this
Here tableName is unique with timestamp appended to it.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val jsonRdd = sqlContext.jsonRDD(results)
val parquetTable = sqlContext.parquetFile(parquetFilePath)
parquetTable.registerTempTable(tableName)
jsonRdd.insertInto(tableName)
Regards,
Vasu C
--
[image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com
*Arush Kharbanda* || Technical Teamlead
ar...@sigmoidanalytics.com || www.sigmoidanalytics.com