Re: Spark SQL partitioned tables - check for partition

2016-02-25 Thread Kevin Mellott
If you want to see which partitions exist on disk (without manually checking), you could write code against the Hadoop FileSystem library to check. Is that what you are asking? https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/fs/package-summary.html On Thu, Feb 25, 2016 at 10:54 AM,

Re: Spark SQL partitioned tables - check for partition

2016-02-25 Thread Deenar Toraskar
Kevin I meant the partitions on disk/hdfs not the inmemory RDD/Dataframe partitions. If I am right mapPartitions or forEachPartitions would identify and operate on the in memory partitions. Deenar On 25 February 2016 at 15:28, Kevin Mellott wrote: > Once you have

Re: Spark SQL partitioned tables - check for partition

2016-02-25 Thread Kevin Mellott
Once you have loaded information into a DataFrame, you can use the *mapPartitionsi or forEachPartition *operations to both identify the partitions and operate against them. http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame On Thu, Feb 25, 2016 at 9:24 AM,

Spark SQL partitioned tables - check for partition

2016-02-25 Thread Deenar Toraskar
Hi How does one check for the presence of a partition in a Spark SQL partitioned table (save using dataframe.write.partitionedBy("partCol") not hive compatible tables), other than physically checking the directory on HDFS or doing a count(*) with the partition cols in the where clause ?