Hi Jeff, sadly that does not resolve the issue. I am sure that the memory mapping to physical files locations can be saved and recovered in SPARK.
Regards, Gourav Sengupta On Wed, Dec 16, 2015 at 12:13 PM, Jeff Zhang <zjf...@gmail.com> wrote: > oh, you are using S3. As I remember, S3 has performance issue when > processing large amount of files. > > > > On Wed, Dec 16, 2015 at 7:58 PM, Gourav Sengupta < > gourav.sengu...@gmail.com> wrote: > >> The HIVE table has very large number of partitions around 365 * 5 * 10 >> and when I say hivemetastore to start running queries on it (the one with >> .count() or .show()) then it takes around 2 hours before the job starts in >> SPARK. >> >> On the pyspark screen I can see that it is parsing the S3 locations for >> these 2 hours. >> >> Regards, >> Gourav >> >> On Wed, Dec 16, 2015 at 3:38 AM, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> >>> Currently it takes around 1.5 hours for me just to cache in the >>> partition information and after that I can see that the job gets queued in >>> the SPARK UI. >>> I guess you mean the stage of getting the split info. I suspect it might >>> be your cluster issue (or metadata store), unusually it won't take such >>> long time for splitting. >>> >>> On Wed, Dec 16, 2015 at 8:06 AM, Gourav Sengupta < >>> gourav.sengu...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I have a HIVE table with few thousand partitions (based on date and >>>> time). It takes a long time to run if for the first time and then >>>> subsequently it is fast. >>>> >>>> Is there a way to store the cache of partition lookups so that every >>>> time I start a new SPARK instance (cannot keep my personal server running >>>> continuously), I can immediately restore back the temptable in hiveContext >>>> without asking it go again and cache the partition lookups? >>>> >>>> Currently it takes around 1.5 hours for me just to cache in the >>>> partition information and after that I can see that the job gets queued in >>>> the SPARK UI. >>>> >>>> Regards, >>>> Gourav >>>> >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> > > > -- > Best Regards > > Jeff Zhang >