So, if I put 1000 records at a time and if the next 1000 records have some records that has same partition as the previous records then the data will be overwritten. How can I prevent overwriting valid data in this case? Could you post the example that you are talking about?
What I am doing is in the final insert into the ORC table, I insert/overwrite the data. So, I need to have a way to insert all the data related to one partition at a time so that it is not overwritten when I insert the next set of records. On Sun, May 22, 2016 at 11:51 AM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > ok is the staging table used as staging only. > > you can create a staging *directory^ where you put your data there (you > can put 100s of files there) and do an insert/select that will take data > from 100 files into your main ORC table. > > I have an example of 100's of CSV files insert/select from a staging > external table into an ORC table. > > My point is you are more likely interested in doing analysis on ORC table > (read internal) rather than using staging table. > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 22 May 2016 at 19:43, swetha kasireddy <swethakasire...@gmail.com> > wrote: > >> But, how do I take 100 partitions at a time from staging table? >> >> On Sun, May 22, 2016 at 11:26 AM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> ok so you still keep data as ORC in Hive for further analysis >>> >>> what I have in mind is to have an external table as staging table and do >>> insert into an orc internal table which is bucketed and partitioned. >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 22 May 2016 at 19:11, swetha kasireddy <swethakasire...@gmail.com> >>> wrote: >>> >>>> I am looking at ORC. I insert the data using the following query. >>>> >>>> sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS records (id >>>> STRING, >>>> record STRING) PARTITIONED BY (datePartition STRING, idPartition STRING) >>>> stored as ORC LOCATION '/user/users' ") >>>> sqlContext.sql(" orc.compress= SNAPPY") >>>> sqlContext.sql( >>>> """ from recordsTemp ps insert overwrite table users >>>> partition(datePartition , idPartition ) select ps.id, ps.record , >>>> ps.datePartition, ps.idPartition """.stripMargin) >>>> >>>> On Sun, May 22, 2016 at 12:37 AM, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> where is your base table and what format is it Parquet, ORC etc) >>>>> >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 22 May 2016 at 08:34, SRK <swethakasire...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> In my Spark SQL query to insert data, I have around 14,000 partitions >>>>>> of >>>>>> data which seems to be causing memory issues. How can I insert the >>>>>> data for >>>>>> 100 partitions at a time to avoid any memory issues? >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >> >