Re: How to insert data for 100 partitions at a time using Spark SQL

swetha kasireddy Sun, 22 May 2016 12:25:58 -0700

The data is not very big. Say 1MB-10 MB at the max per partition. What is
the best way to insert this 14k partitions with decent performance?


On Sun, May 22, 2016 at 12:18 PM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> the acid question is how many rows are you going to insert in a batch
> session? btw if this is purely an sql operation then you can do all that in
> hive running on spark engine. It will be very fast as well.
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 22 May 2016 at 20:14, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> 14000 partitions seem to be way too many to be performant (except for
>> large data sets). How much data does one partition contain?
>>
>> > On 22 May 2016, at 09:34, SRK <swethakasire...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > In my Spark SQL query to insert data, I have around 14,000 partitions of
>> > data which seems to be causing memory issues. How can I insert the data
>> for
>> > 100 partitions at a time to avoid any memory issues?
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: How to insert data for 100 partitions at a time using Spark SQL

Reply via email to