Can you provide a code sample please? On Fri, Sep 8, 2017 at 5:44 PM, Matthew Anthony <statm...@gmail.com> wrote:
> Hi all - > > > since upgrading to 2.2.0, we've noticed a significant increase in > read.parquet(...) ops. The parquet files are being read from S3. Upon entry > at the interactive terminal (pyspark in this case), the terminal will sit > "idle" for several minutes (as many as 10) before returning: > > > "17/09/08 15:34:37 WARN SharedInMemoryCache: Evicting cached table > partition metadata from memory due to size constraints > (spark.sql.hive.filesourcePartitionFileCacheSize = 2000000000 bytes). > This may impact query planning performance." > > > In the spark UI, there are no jobs being run during this idle period. > Subsequently, a short 1-task job lasting approximately 10 seconds runs, and > then another idle time of roughly 2-3 minutes follows thereafter before > returning to the terminal/CLI. > > > Can someone explain what is happening here in the background? Is there a > misconfiguration we should be looking for? We are using Hive metastore on > the EMR cluster. > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >