Re: How Can I store the Hive query result in one file ?
will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.comwrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers -- Nitin Pawar
Re: How Can I store the Hive query result in one file ?
The question is what is the volume of your output. There is one file per output task (map or reduce) because that way each can write it independently and in parallel. That's how mapreduce work. And except by forcing the number of tasks to 1, there is no certain way to have one output file. But indeed if the volume is low enough, you could also capture the standard output into a local file like Nitin described. Bertrand On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote: will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com wrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers -- Nitin Pawar -- Bertrand Dechoux
Re: How Can I store the Hive query result in one file ?
I have found that for output larger than a few GB, redirecting stdout results in an incomplete file. For very large output, I do CREATE TABLE MYTABLE AS SELECT ... and then copy the resulting HDFS files directly out of /user/hive/warehouse. From: Bertrand Dechoux decho...@gmail.com To: user@hive.apache.org Sent: Thursday, July 4, 2013 7:09 AM Subject: Re: How Can I store the Hive query result in one file ? The question is what is the volume of your output. There is one file per output task (map or reduce) because that way each can write it independently and in parallel. That's how mapreduce work. And except by forcing the number of tasks to 1, there is no certain way to have one output file. But indeed if the volume is low enough, you could also capture the standard output into a local file like Nitin described. Bertrand On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.com wrote: will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com wrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers -- Nitin Pawar -- Bertrand Dechoux
Re: How Can I store the Hive query result in one file ?
Thanks for your responses, effctively the answer of Bertrand make this possible: the set of hive properities below froce thet job to write the hive result in one file whithout specifing the name (_0) : set hive.exec.reducers.max = 1; set mapred.reduce.tasks = 1; for Nitin, I want to store the results of SELECT not the stdout (log) of execution of the query, is this applicable for the results of SELECT? 2013/7/4 Michael Malak michaelma...@yahoo.com I have found that for output larger than a few GB, redirecting stdout results in an incomplete file. For very large output, I do CREATE TABLE MYTABLE AS SELECT ... and then copy the resulting HDFS files directly out of /user/hive/warehouse. *From:* Bertrand Dechoux decho...@gmail.com *To:* user@hive.apache.org *Sent:* Thursday, July 4, 2013 7:09 AM *Subject:* Re: How Can I store the Hive query result in one file ? The question is what is the volume of your output. There is one file per output task (map or reduce) because that way each can write it independently and in parallel. That's how mapreduce work. And except by forcing the number of tasks to 1, there is no certain way to have one output file. But indeed if the volume is low enough, you could also capture the standard output into a local file like Nitin described. Bertrand On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote: will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com wrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers -- Nitin Pawar -- Bertrand Dechoux
Re: How Can I store the Hive query result in one file ?
the one i said does not work on hdfs files. Its just one way to write the stdlog to a file. I am not sure if hive allows you named files for output and the above settings will make your query run really slow if you have large dataset. if you are really specific on having a filename then for now I am not aware if hive supports it. I did a quick search but did not find anything useful. If you need a quick way to get to your solution then pig supports the store function and its written to a named file. i will search in depth and see if there is anything in configurations of hive On Thu, Jul 4, 2013 at 8:50 PM, Matouk IFTISSEN matouk.iftis...@ysance.comwrote: Thanks for your responses, effctively the answer of Bertrand make this possible: the set of hive properities below froce thet job to write the hive result in one file whithout specifing the name (_0) : set hive.exec.reducers.max = 1; set mapred.reduce.tasks = 1; for Nitin, I want to store the results of SELECT not the stdout (log) of execution of the query, is this applicable for the results of SELECT? 2013/7/4 Michael Malak michaelma...@yahoo.com I have found that for output larger than a few GB, redirecting stdout results in an incomplete file. For very large output, I do CREATE TABLE MYTABLE AS SELECT ... and then copy the resulting HDFS files directly out of /user/hive/warehouse. *From:* Bertrand Dechoux decho...@gmail.com *To:* user@hive.apache.org *Sent:* Thursday, July 4, 2013 7:09 AM *Subject:* Re: How Can I store the Hive query result in one file ? The question is what is the volume of your output. There is one file per output task (map or reduce) because that way each can write it independently and in parallel. That's how mapreduce work. And except by forcing the number of tasks to 1, there is no certain way to have one output file. But indeed if the volume is low enough, you could also capture the standard output into a local file like Nitin described. Bertrand On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote: will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com wrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers -- Nitin Pawar -- Bertrand Dechoux -- Nitin Pawar
Re: How Can I store the Hive query result in one file ?
Normally if use set mapred.reduce.tasks=1 you get one output file. You can also look at *hive*.*merge*.*mapfiles*, mapred.reduce.tasks, hive.merge.reducefiles also you can use a separate tool https://github.com/edwardcapriolo/filecrush On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote: will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com wrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers -- Nitin Pawar
Re: How Can I store the Hive query result in one file ?
hive set hive.io.output.fileformat=CSVTextFile; hive insert overwrite local directory '/usr/home/hadoop/da1/' select * from customers *** customers is a Hive table From: Edward Capriolo edlinuxg...@gmail.com To: user@hive.apache.org user@hive.apache.org Sent: Friday, July 5, 2013 12:10 AM Subject: Re: How Can I store the Hive query result in one file ? Normally if use set mapred.reduce.tasks=1 you get one output file. You can also look at hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can use a separate tool https://github.com/edwardcapriolo/filecrush On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote: will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com wrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers -- Nitin Pawar
Re: How Can I store the Hive query result in one file ?
Adding to that - Multiple files can be concatenated from the directory like Example: cat 0-0 00-1 0-2 final From: Raj Hadoop hadoop...@yahoo.com To: user@hive.apache.org user@hive.apache.org; matouk.iftis...@ysance.com matouk.iftis...@ysance.com Sent: Friday, July 5, 2013 12:17 AM Subject: Re: How Can I store the Hive query result in one file ? hive set hive.io.output.fileformat=CSVTextFile; hive insert overwrite local directory '/usr/home/hadoop/da1/' select * from customers *** customers is a Hive table From: Edward Capriolo edlinuxg...@gmail.com To: user@hive.apache.org user@hive.apache.org Sent: Friday, July 5, 2013 12:10 AM Subject: Re: How Can I store the Hive query result in one file ? Normally if use set mapred.reduce.tasks=1 you get one output file. You can also look at hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can use a separate tool https://github.com/edwardcapriolo/filecrush On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote: will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com wrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers -- Nitin Pawar