Around 14000 partitions need to be loaded every hour. Yes, I tested this and its taking a lot of time to load. A partition would look something like the following which is further partitioned by userId with all the userRecords for that date inside it.
5 2016-05-20 16:03 /user/user/userRecords/dtPartitioner=2012-09-12 On Sun, May 22, 2016 at 12:30 PM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > by partition do you mean 14000 files loaded in each batch session (say > daily)?. > > Have you actually tested this? > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 22 May 2016 at 20:24, swetha kasireddy <swethakasire...@gmail.com> > wrote: > >> The data is not very big. Say 1MB-10 MB at the max per partition. What is >> the best way to insert this 14k partitions with decent performance? >> >> On Sun, May 22, 2016 at 12:18 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> the acid question is how many rows are you going to insert in a batch >>> session? btw if this is purely an sql operation then you can do all that in >>> hive running on spark engine. It will be very fast as well. >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 22 May 2016 at 20:14, Jörn Franke <jornfra...@gmail.com> wrote: >>> >>>> 14000 partitions seem to be way too many to be performant (except for >>>> large data sets). How much data does one partition contain? >>>> >>>> > On 22 May 2016, at 09:34, SRK <swethakasire...@gmail.com> wrote: >>>> > >>>> > Hi, >>>> > >>>> > In my Spark SQL query to insert data, I have around 14,000 partitions >>>> of >>>> > data which seems to be causing memory issues. How can I insert the >>>> data for >>>> > 100 partitions at a time to avoid any memory issues? >>>> > >>>> > >>>> > >>>> > -- >>>> > View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html >>>> > Sent from the Apache Spark User List mailing list archive at >>>> Nabble.com. >>>> > >>>> > --------------------------------------------------------------------- >>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> > For additional commands, e-mail: user-h...@spark.apache.org >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >