Re: How to delete Specific date data using hive QL?
Adding my two cents If you are having an unpartitioned data/table and would like to partition it on some specific columns in source table, Use dynamic partition insert. That would get the source data in separate partitions on a partitioned target table. http://kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Hamza Asad Date: Tue, 4 Jun 2013 12:52:49 To: Reply-To: user@hive.apache.org Subject: Re: How to delete Specific date data using hive QL? Thank u s much nitin for your help.. :) On Tue, Jun 4, 2013 at 12:18 PM, Nitin Pawar wrote: > 1- Does partitioning improve performance? > --Only if you make use of partitions in your queries (mostly in where > clause to limit data to your query for a specific value of partitioned > column) > > 2- Do i have to create partition table new or i can create partition on > existing table by renaming that date column and add partition column > event_date (the actual column name) ? > you can not create partitions on already existing data unless the data is > in partitioned directories on hdfs. > I would recommend create a new table with partitioned columns. > load data from old table into partitioned table > dump old table > > 3- can i import data directly into partition table using sqoop command? > you can import data directly into a partition. > > for exported data, you don't have to worry. it remains as it is > > > On Tue, Jun 4, 2013 at 12:41 PM, Hamza Asad wrote: > >> No i don't want to change my queries. I want that my queries work on same >> table and partition does not change its schema. >> and from schema i means schema on mysql (exported data). >> >> Few more things >> 1- Does partitioning improve performance? >> 2- Do i have to create partition table new or i can create partition on >> existing table by renaming that date column and add partition column >> event_date (the actual column name) ? >> 3- can i import data directly into partition table using sqoop command? >> >> >> >> >> On Tue, Jun 4, 2013 at 11:40 AM, Nitin Pawar wrote: >> >>> partitioning of data in hive is more for the reasons on how you layout >>> data in a well defined manner so that when you access your data , you >>> request only for specific data by specifying the partition columns in where >>> clause. >>> >>> to answer your question, >>> do you have to change your queries? out of the box the queries should >>> work as it is unless and until you are changing the table schema by >>> removing/adding new columns. >>> does the format change when you export data? if your select statement is >>> not changing it will not change >>> will table schema change? do you mean schema on hive or mysql ? >>> >>> >>> On Tue, Jun 4, 2013 at 11:37 AM, Hamza Asad wrote: >>> >>>> thats far more better :) .. >>>> Please tell me few more things. Do i have to change my query if i >>>> create table with partition on date? rest of the columns would be same as >>>> it is? Also if i export that partitioned table to mysql, does schema of >>>> that table would same as it was before partition? >>>> >>>> >>>> On Tue, Jun 4, 2013 at 12:09 AM, Stephen Sprague wrote: >>>> >>>>> there is no delete semantic. >>>>> >>>>> you either partition on the data you want to drop and use drop >>>>> partition (or drop table for the whole shebang) or you can do as Nitin >>>>> suggests by selecting the inverse of the data you want to delete and store >>>>> it back into the table itself. Not ideal but maybe it could work for your >>>>> situation. >>>>> >>>>> Now here's another idea. This was just _recently_ discussed on this >>>>> group as coincidence would have it. if you were to have scanned just a >>>>> little of the groups messages you would have seen that and could then have >>>>> added to the discussion! :) >>>>> >>>>> >>>>> On Mon, Jun 3, 2013 at 2:19 AM, Hamza Asad wrote: >>>>> >>>>>> Thanx for your response nitin. Anybody else have any better solution? >>>>>> >>>>>> >>>>>> On Mon, Jun 3, 2013 at 1:27 PM, Nitin Pawar >>>>>> wrote: >>>>>> >>>>>>> hive does not give you a record level deletion as of
Re: How to delete Specific date data using hive QL?
Thank u s much nitin for your help.. :) On Tue, Jun 4, 2013 at 12:18 PM, Nitin Pawar wrote: > 1- Does partitioning improve performance? > --Only if you make use of partitions in your queries (mostly in where > clause to limit data to your query for a specific value of partitioned > column) > > 2- Do i have to create partition table new or i can create partition on > existing table by renaming that date column and add partition column > event_date (the actual column name) ? > you can not create partitions on already existing data unless the data is > in partitioned directories on hdfs. > I would recommend create a new table with partitioned columns. > load data from old table into partitioned table > dump old table > > 3- can i import data directly into partition table using sqoop command? > you can import data directly into a partition. > > for exported data, you don't have to worry. it remains as it is > > > On Tue, Jun 4, 2013 at 12:41 PM, Hamza Asad wrote: > >> No i don't want to change my queries. I want that my queries work on same >> table and partition does not change its schema. >> and from schema i means schema on mysql (exported data). >> >> Few more things >> 1- Does partitioning improve performance? >> 2- Do i have to create partition table new or i can create partition on >> existing table by renaming that date column and add partition column >> event_date (the actual column name) ? >> 3- can i import data directly into partition table using sqoop command? >> >> >> >> >> On Tue, Jun 4, 2013 at 11:40 AM, Nitin Pawar wrote: >> >>> partitioning of data in hive is more for the reasons on how you layout >>> data in a well defined manner so that when you access your data , you >>> request only for specific data by specifying the partition columns in where >>> clause. >>> >>> to answer your question, >>> do you have to change your queries? out of the box the queries should >>> work as it is unless and until you are changing the table schema by >>> removing/adding new columns. >>> does the format change when you export data? if your select statement is >>> not changing it will not change >>> will table schema change? do you mean schema on hive or mysql ? >>> >>> >>> On Tue, Jun 4, 2013 at 11:37 AM, Hamza Asad wrote: >>> thats far more better :) .. Please tell me few more things. Do i have to change my query if i create table with partition on date? rest of the columns would be same as it is? Also if i export that partitioned table to mysql, does schema of that table would same as it was before partition? On Tue, Jun 4, 2013 at 12:09 AM, Stephen Sprague wrote: > there is no delete semantic. > > you either partition on the data you want to drop and use drop > partition (or drop table for the whole shebang) or you can do as Nitin > suggests by selecting the inverse of the data you want to delete and store > it back into the table itself. Not ideal but maybe it could work for your > situation. > > Now here's another idea. This was just _recently_ discussed on this > group as coincidence would have it. if you were to have scanned just a > little of the groups messages you would have seen that and could then have > added to the discussion! :) > > > On Mon, Jun 3, 2013 at 2:19 AM, Hamza Asad wrote: > >> Thanx for your response nitin. Anybody else have any better solution? >> >> >> On Mon, Jun 3, 2013 at 1:27 PM, Nitin Pawar >> wrote: >> >>> hive does not give you a record level deletion as of now. >>> >>> so unless you have partitioned, other option is you overwrite the >>> table with data which you want >>> please wait for others to suggest you more options. this one is just >>> mine and can be costly too >>> >>> >>> On Mon, Jun 3, 2013 at 12:36 PM, Hamza Asad >>> wrote: >>> no, its not partitioned by date. On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar < nitinpawar...@gmail.com> wrote: > how is the data laid out? > is it partitioned data by the date? > > > On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad < > hamza.asa...@gmail.com> wrote: > >> Dear all, >> How can i remove data of specific dates from HDFS >> using hive query language? >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > -- *Muhammad Hamza Asad* >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> *Muhammad Hamza Asad* >> > > -- *Muhammad Hamza Asad* >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > -- *Muhammad Hamza Asad*
Re: How to delete Specific date data using hive QL?
1- Does partitioning improve performance? --Only if you make use of partitions in your queries (mostly in where clause to limit data to your query for a specific value of partitioned column) 2- Do i have to create partition table new or i can create partition on existing table by renaming that date column and add partition column event_date (the actual column name) ? you can not create partitions on already existing data unless the data is in partitioned directories on hdfs. I would recommend create a new table with partitioned columns. load data from old table into partitioned table dump old table 3- can i import data directly into partition table using sqoop command? you can import data directly into a partition. for exported data, you don't have to worry. it remains as it is On Tue, Jun 4, 2013 at 12:41 PM, Hamza Asad wrote: > No i don't want to change my queries. I want that my queries work on same > table and partition does not change its schema. > and from schema i means schema on mysql (exported data). > > Few more things > 1- Does partitioning improve performance? > 2- Do i have to create partition table new or i can create partition on > existing table by renaming that date column and add partition column > event_date (the actual column name) ? > 3- can i import data directly into partition table using sqoop command? > > > > > On Tue, Jun 4, 2013 at 11:40 AM, Nitin Pawar wrote: > >> partitioning of data in hive is more for the reasons on how you layout >> data in a well defined manner so that when you access your data , you >> request only for specific data by specifying the partition columns in where >> clause. >> >> to answer your question, >> do you have to change your queries? out of the box the queries should >> work as it is unless and until you are changing the table schema by >> removing/adding new columns. >> does the format change when you export data? if your select statement is >> not changing it will not change >> will table schema change? do you mean schema on hive or mysql ? >> >> >> On Tue, Jun 4, 2013 at 11:37 AM, Hamza Asad wrote: >> >>> thats far more better :) .. >>> Please tell me few more things. Do i have to change my query if i create >>> table with partition on date? rest of the columns would be same as it is? >>> Also if i export that partitioned table to mysql, does schema of that table >>> would same as it was before partition? >>> >>> >>> On Tue, Jun 4, 2013 at 12:09 AM, Stephen Sprague wrote: >>> there is no delete semantic. you either partition on the data you want to drop and use drop partition (or drop table for the whole shebang) or you can do as Nitin suggests by selecting the inverse of the data you want to delete and store it back into the table itself. Not ideal but maybe it could work for your situation. Now here's another idea. This was just _recently_ discussed on this group as coincidence would have it. if you were to have scanned just a little of the groups messages you would have seen that and could then have added to the discussion! :) On Mon, Jun 3, 2013 at 2:19 AM, Hamza Asad wrote: > Thanx for your response nitin. Anybody else have any better solution? > > > On Mon, Jun 3, 2013 at 1:27 PM, Nitin Pawar > wrote: > >> hive does not give you a record level deletion as of now. >> >> so unless you have partitioned, other option is you overwrite the >> table with data which you want >> please wait for others to suggest you more options. this one is just >> mine and can be costly too >> >> >> On Mon, Jun 3, 2013 at 12:36 PM, Hamza Asad >> wrote: >> >>> no, its not partitioned by date. >>> >>> >>> On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar < >>> nitinpawar...@gmail.com> wrote: >>> how is the data laid out? is it partitioned data by the date? On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad >>> > wrote: > Dear all, > How can i remove data of specific dates from HDFS > using hive query language? > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar >>> >>> >>> >>> -- >>> *Muhammad Hamza Asad* >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > *Muhammad Hamza Asad* > >>> >>> >>> -- >>> *Muhammad Hamza Asad* >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar
Re: How to delete Specific date data using hive QL?
No i don't want to change my queries. I want that my queries work on same table and partition does not change its schema. and from schema i means schema on mysql (exported data). Few more things 1- Does partitioning improve performance? 2- Do i have to create partition table new or i can create partition on existing table by renaming that date column and add partition column event_date (the actual column name) ? 3- can i import data directly into partition table using sqoop command? On Tue, Jun 4, 2013 at 11:40 AM, Nitin Pawar wrote: > partitioning of data in hive is more for the reasons on how you layout > data in a well defined manner so that when you access your data , you > request only for specific data by specifying the partition columns in where > clause. > > to answer your question, > do you have to change your queries? out of the box the queries should work > as it is unless and until you are changing the table schema by > removing/adding new columns. > does the format change when you export data? if your select statement is > not changing it will not change > will table schema change? do you mean schema on hive or mysql ? > > > On Tue, Jun 4, 2013 at 11:37 AM, Hamza Asad wrote: > >> thats far more better :) .. >> Please tell me few more things. Do i have to change my query if i create >> table with partition on date? rest of the columns would be same as it is? >> Also if i export that partitioned table to mysql, does schema of that table >> would same as it was before partition? >> >> >> On Tue, Jun 4, 2013 at 12:09 AM, Stephen Sprague wrote: >> >>> there is no delete semantic. >>> >>> you either partition on the data you want to drop and use drop partition >>> (or drop table for the whole shebang) or you can do as Nitin suggests by >>> selecting the inverse of the data you want to delete and store it back into >>> the table itself. Not ideal but maybe it could work for your situation. >>> >>> Now here's another idea. This was just _recently_ discussed on this >>> group as coincidence would have it. if you were to have scanned just a >>> little of the groups messages you would have seen that and could then have >>> added to the discussion! :) >>> >>> >>> On Mon, Jun 3, 2013 at 2:19 AM, Hamza Asad wrote: >>> Thanx for your response nitin. Anybody else have any better solution? On Mon, Jun 3, 2013 at 1:27 PM, Nitin Pawar wrote: > hive does not give you a record level deletion as of now. > > so unless you have partitioned, other option is you overwrite the > table with data which you want > please wait for others to suggest you more options. this one is just > mine and can be costly too > > > On Mon, Jun 3, 2013 at 12:36 PM, Hamza Asad wrote: > >> no, its not partitioned by date. >> >> >> On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar > > wrote: >> >>> how is the data laid out? >>> is it partitioned data by the date? >>> >>> >>> On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad >>> wrote: >>> Dear all, How can i remove data of specific dates from HDFS using hive query language? -- *Muhammad Hamza Asad* >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > -- *Muhammad Hamza Asad* >>> >>> >> >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > -- *Muhammad Hamza Asad*
Re: How to delete Specific date data using hive QL?
partitioning of data in hive is more for the reasons on how you layout data in a well defined manner so that when you access your data , you request only for specific data by specifying the partition columns in where clause. to answer your question, do you have to change your queries? out of the box the queries should work as it is unless and until you are changing the table schema by removing/adding new columns. does the format change when you export data? if your select statement is not changing it will not change will table schema change? do you mean schema on hive or mysql ? On Tue, Jun 4, 2013 at 11:37 AM, Hamza Asad wrote: > thats far more better :) .. > Please tell me few more things. Do i have to change my query if i create > table with partition on date? rest of the columns would be same as it is? > Also if i export that partitioned table to mysql, does schema of that table > would same as it was before partition? > > > On Tue, Jun 4, 2013 at 12:09 AM, Stephen Sprague wrote: > >> there is no delete semantic. >> >> you either partition on the data you want to drop and use drop partition >> (or drop table for the whole shebang) or you can do as Nitin suggests by >> selecting the inverse of the data you want to delete and store it back into >> the table itself. Not ideal but maybe it could work for your situation. >> >> Now here's another idea. This was just _recently_ discussed on this >> group as coincidence would have it. if you were to have scanned just a >> little of the groups messages you would have seen that and could then have >> added to the discussion! :) >> >> >> On Mon, Jun 3, 2013 at 2:19 AM, Hamza Asad wrote: >> >>> Thanx for your response nitin. Anybody else have any better solution? >>> >>> >>> On Mon, Jun 3, 2013 at 1:27 PM, Nitin Pawar wrote: >>> hive does not give you a record level deletion as of now. so unless you have partitioned, other option is you overwrite the table with data which you want please wait for others to suggest you more options. this one is just mine and can be costly too On Mon, Jun 3, 2013 at 12:36 PM, Hamza Asad wrote: > no, its not partitioned by date. > > > On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar > wrote: > >> how is the data laid out? >> is it partitioned data by the date? >> >> >> On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad >> wrote: >> >>> Dear all, >>> How can i remove data of specific dates from HDFS using >>> hive query language? >>> >>> -- >>> *Muhammad Hamza Asad* >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar >>> >>> >>> >>> -- >>> *Muhammad Hamza Asad* >>> >> >> > > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar
Re: How to delete Specific date data using hive QL?
thats far more better :) .. Please tell me few more things. Do i have to change my query if i create table with partition on date? rest of the columns would be same as it is? Also if i export that partitioned table to mysql, does schema of that table would same as it was before partition? On Tue, Jun 4, 2013 at 12:09 AM, Stephen Sprague wrote: > there is no delete semantic. > > you either partition on the data you want to drop and use drop partition > (or drop table for the whole shebang) or you can do as Nitin suggests by > selecting the inverse of the data you want to delete and store it back into > the table itself. Not ideal but maybe it could work for your situation. > > Now here's another idea. This was just _recently_ discussed on this group > as coincidence would have it. if you were to have scanned just a little of > the groups messages you would have seen that and could then have added to > the discussion! :) > > > On Mon, Jun 3, 2013 at 2:19 AM, Hamza Asad wrote: > >> Thanx for your response nitin. Anybody else have any better solution? >> >> >> On Mon, Jun 3, 2013 at 1:27 PM, Nitin Pawar wrote: >> >>> hive does not give you a record level deletion as of now. >>> >>> so unless you have partitioned, other option is you overwrite the table >>> with data which you want >>> please wait for others to suggest you more options. this one is just >>> mine and can be costly too >>> >>> >>> On Mon, Jun 3, 2013 at 12:36 PM, Hamza Asad wrote: >>> no, its not partitioned by date. On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar wrote: > how is the data laid out? > is it partitioned data by the date? > > > On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad wrote: > >> Dear all, >> How can i remove data of specific dates from HDFS using >> hive query language? >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > -- *Muhammad Hamza Asad* >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> *Muhammad Hamza Asad* >> > > -- *Muhammad Hamza Asad*
Re: How to delete Specific date data using hive QL?
there is no delete semantic. you either partition on the data you want to drop and use drop partition (or drop table for the whole shebang) or you can do as Nitin suggests by selecting the inverse of the data you want to delete and store it back into the table itself. Not ideal but maybe it could work for your situation. Now here's another idea. This was just _recently_ discussed on this group as coincidence would have it. if you were to have scanned just a little of the groups messages you would have seen that and could then have added to the discussion! :) On Mon, Jun 3, 2013 at 2:19 AM, Hamza Asad wrote: > Thanx for your response nitin. Anybody else have any better solution? > > > On Mon, Jun 3, 2013 at 1:27 PM, Nitin Pawar wrote: > >> hive does not give you a record level deletion as of now. >> >> so unless you have partitioned, other option is you overwrite the table >> with data which you want >> please wait for others to suggest you more options. this one is just mine >> and can be costly too >> >> >> On Mon, Jun 3, 2013 at 12:36 PM, Hamza Asad wrote: >> >>> no, its not partitioned by date. >>> >>> >>> On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar wrote: >>> how is the data laid out? is it partitioned data by the date? On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad wrote: > Dear all, > How can i remove data of specific dates from HDFS using > hive query language? > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar >>> >>> >>> >>> -- >>> *Muhammad Hamza Asad* >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > *Muhammad Hamza Asad* >
Re: How to delete Specific date data using hive QL?
Thanx for your response nitin. Anybody else have any better solution? On Mon, Jun 3, 2013 at 1:27 PM, Nitin Pawar wrote: > hive does not give you a record level deletion as of now. > > so unless you have partitioned, other option is you overwrite the table > with data which you want > please wait for others to suggest you more options. this one is just mine > and can be costly too > > > On Mon, Jun 3, 2013 at 12:36 PM, Hamza Asad wrote: > >> no, its not partitioned by date. >> >> >> On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar wrote: >> >>> how is the data laid out? >>> is it partitioned data by the date? >>> >>> >>> On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad wrote: >>> Dear all, How can i remove data of specific dates from HDFS using hive query language? -- *Muhammad Hamza Asad* >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > -- *Muhammad Hamza Asad*
Re: How to delete Specific date data using hive QL?
hive does not give you a record level deletion as of now. so unless you have partitioned, other option is you overwrite the table with data which you want please wait for others to suggest you more options. this one is just mine and can be costly too On Mon, Jun 3, 2013 at 12:36 PM, Hamza Asad wrote: > no, its not partitioned by date. > > > On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar wrote: > >> how is the data laid out? >> is it partitioned data by the date? >> >> >> On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad wrote: >> >>> Dear all, >>> How can i remove data of specific dates from HDFS using hive >>> query language? >>> >>> -- >>> *Muhammad Hamza Asad* >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar
Re: How to delete Specific date data using hive QL?
no, its not partitioned by date. On Mon, Jun 3, 2013 at 11:19 AM, Nitin Pawar wrote: > how is the data laid out? > is it partitioned data by the date? > > > On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad wrote: > >> Dear all, >> How can i remove data of specific dates from HDFS using hive >> query language? >> >> -- >> *Muhammad Hamza Asad* >> > > > > -- > Nitin Pawar > -- *Muhammad Hamza Asad*
Re: How to delete Specific date data using hive QL?
how is the data laid out? is it partitioned data by the date? On Mon, Jun 3, 2013 at 11:20 AM, Hamza Asad wrote: > Dear all, > How can i remove data of specific dates from HDFS using hive > query language? > > -- > *Muhammad Hamza Asad* > -- Nitin Pawar
How to delete Specific date data using hive QL?
Dear all, How can i remove data of specific dates from HDFS using hive query language? -- *Muhammad Hamza Asad*