Just wondering if any one could help me out on this. Thank you!
*Regards,Dhrubajyoti Hati.* On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati <dhruba.w...@gmail.com> wrote: > Hi, > > Is there any way to discard files starting with dot(.) or ending with .tmp > in the hive partition while reading from Hive table using spark.read.table > method. > > I tried using PathFilters but they didn't work. I am using spark-submit > and passing my python file(pyspark) containing the source code. > > spark.sparkContext._jsc.hadoopConfiguration().set("mapreduce.input.pathFilter.class", > "com.abc.hadoop.utility.TmpFileFilter") > > class TmpFileFilter extends PathFilter { > override def accept(path : Path): Boolean = !path.getName.endsWith(".tmp") > } > > Still in the detailed logs I can see .tmp files are getting considered in > the detailed logs: > 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus > maprfs:///a/hour=05/host=abc/FlumeData.1587559137715 > 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus > maprfs:///a/hour=05/host=abc/FlumeData.1587556815621 > 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus > maprfs:///a/hour=05/host=abc/.FlumeData.1587560277337.tmp > > > Is there any way to discard the tmp(.tmp) or the hidden files(filename > starting with dot or underscore) in hive partitions while reading from > spark? > > > > > *Regards,Dhrubajyoti Hati.* >