Re: Hive and Lzo Compression
Thanks for your replies and the link. I could get it working, but wondered why the CREATE TABLE statement worked without the STORED AS Clause as well...that's what puzzles me a bit... But I will use the STORED AS Clause to be on the safe side. Von: Lefty Leverenz leftylever...@gmail.com An: user@hive.apache.org CC: w00t w00t w00...@yahoo.de Gesendet: 19:06 Samstag, 10.August 2013 Betreff: Re: Hive and Lzo Compression I'm not seeing any documentation link in Sanjay's message, so here it is again (in the Hive wiki's language manual): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO. On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Please refer this documentation here Let me know if u need more clarifications so that we can make this document better and complete Thanks sanjay From: w00t w00t w00...@yahoo.de Reply-To: user@hive.apache.org user@hive.apache.org, w00t w00t w00...@yahoo.de Date: Thursday, August 8, 2013 2:02 AM To: user@hive.apache.org user@hive.apache.org Subject: Hive and Lzo Compression Hello, I am started to run Hive with Lzo compression on Hortonworks 1.2 I have managed to install/configure Lzo and hive -e set io.compression.codecs shows me the Lzo Codecs: io.compression.codecs= org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec, org.apache.hadoop.io.compress.BZip2Codec However, I have some questions where I would be happy if you could help me. (1) CREATE TABLE statement I read in different postings, that in the CREATE TABLE statement, I have to use the following STORAGE clause: CREATE EXTERNAL TABLE txt_table_lzo ( txt_line STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/myuser/data/in/lzo_compressed'; It works withouth any problems now to execute SELECT statements on this table with Lzo data. However I also created a table on the same data without this STORAGE clause: CREATE EXTERNAL TABLE txt_table_lzo_tst ( txt_line STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LOCATION '/user/myuser/data/in/lzo_compressed'; The interesting thing is, it works as well, when I execute a SELECT statement and this table. Can you help, why the second CREATE TABLE statement works as well? What should I use in DDLs? Is it best practice to use the STORED AS clause with a deprecatedLzoTextInputFormat? Or should I remove it? (2) Output and Intermediate Compression Settings I want to use output compression . In Programming Hive from Capriolo, Wampler, Rutherglen the following commands are recommended: SET hive.exec.compress.output=true; SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; However, in some other places in forums, I found the following recommended settings: SET hive.exec.compress.output=true SET mapreduce.output.fileoutputformat.compress=true SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec Am I right, that the first settings are for Hadoop versions prior 0.23? Or is there any other reason why the settings are different? I am using Hadoop 1.1.2 with Hive 0.10.0. Which settings would you recommend to use? -- I also want to compress intermediate results. Again, in Programming Hive the following settings are recommended: SET hive.exec.compress.intermediate=true; SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; Is this the right setting? Or should I again use the settings (which look more valid for Hadoop 0.23 and greater)?: SET hive.exec.compress.intermediate=true; SET mapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; Thanks CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. -- Lefty
Re: Nice hive notes and cheat sheets from Minwoo Kim
Thanks for mentioning my blog. I just uploaded a presentation about hive hook on my blog. http://julingks.tistory.com/entry/Apache-Hive-Hooks I hope this will be useful when wanting to use hive hook. Thanks, Minwoo Kim 2013/8/2 Sanjay Subramanian sanjay.subraman...@wizecommerce.com http://julingks.tistory.com/category/Hive Thanks sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
LZO output compression
Hello, I am running Hortonworks 1.2 using Hadoop 1.1.2.21 and Hive 0.10.0.21. I set up LZO compression and can read LZO compressed data without problems. My next try was to test output compression. Therefore, I created the following small script: -- SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress=true; SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec; DROP TABLE IF EXISTS simple_lzo; CREATE TABLE simple_lzo ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' AS SELECT count(*) FROM txt_table_lzo; The output gets compressed but with default-codec deflate - not with LZO. Do you know what the problem could be here and how I could debug it? There are no error messages or so. Additionally, I also tried the commands for Hadoop 0.20: mapred.output.compress=true; mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec That didn't work as well. In Pig or Java MR, I have no problems to gerneate LZO compressed output. Thanks
Re: LZO output compression
Oh, I could get it working using these settings: SET hive.exec.compress.output=true; SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; But I have one question, where maybe on of you can help me with an explaination: As I am running Hadoop 1.1.* why do I need the old command for Hadoop 0.20?: SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; I supposed the commands for the newer Hadoop versions are: SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress=true; SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec; Von: w00t w00t w00...@yahoo.de An: user@hive.apache.org user@hive.apache.org Gesendet: 11:26 Dienstag, 13.August 2013 Betreff: LZO output compression Hello, I am running Hortonworks 1.2 using Hadoop 1.1.2.21 and Hive 0.10.0.21. I set up LZO compression and can read LZO compressed data without problems. My next try was to test output compression. Therefore, I created the following small script: -- SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress=true; SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec; DROP TABLE IF EXISTS simple_lzo; CREATE TABLE simple_lzo ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' AS SELECT count(*) FROM txt_table_lzo; The output gets compressed but with default-codec deflate - not with LZO. Do you know what the problem could be here and how I could debug it? There are no error messages or so. Additionally, I also tried the commands for Hadoop 0.20: mapred.output.compress=true; mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec That didn't work as well. In Pig or Java MR, I have no problems to gerneate LZO compressed output. Thanks
Re: Numbers display in Hive CLI
well... a good 'ol search (let's not use the word google) of hive udf we find this: https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-StringFunctionsand there's a reference to a function called format_number(). or did you really want the *hive CLI* to format the number? if that's the case then no there is no option for that in the hive client. On Mon, Aug 12, 2013 at 11:30 PM, pandees waran pande...@gmail.com wrote: HI, I see the SUM(double_column) displays the result in scientific notation in the hive cli. Is there any way to customize the number display in hive CLI? -- Thanks, Pandeeswaran Hi, I am seeing the double values are displayed as scientifi not
Passing mapreduce configuration parameters to hive udf
Hi there, I'm trying to pass some external properties to a UDF. In the MapReduce world I'm used to extending Configured in my classes, but in my UDF class when initializing a new Configuration object or HiveConf object it doesn't inherit any of those properties. I see it in the Job Configuration XML when the job runs but my UDF can't pick it up when it creates a new instance. Are there any other suggested ways of doing this? I could probably just add some conf file to distributed cache and load the properties on UDF initialization, but I figured I could get at the configuration through other means. Thanks in advance, Jon
Re: Numbers display in Hive CLI
Yeah. I would think i'd be a useful feature to have in the client - but probably not the Hive CLI client. The Hive client seems pretty bare bones and my guess it'll probably stay that way. The Beeline client, however, looks to be where these kinds of bells and whistles probably could/should be added. Check that app out and see if you agree. (search hive beeline). On Tue, Aug 13, 2013 at 9:47 AM, pandees waran pande...@gmail.com wrote: Thanks Stephen! I shall check this . My requirement is controlling the formatting in session level using some properties set. Looks like, there's no such as of now . Would this be a good feature in hive cli? If many people think so, then I can file a feature request. — Sent from Mailbox https://www.dropbox.com/mailbox for iPad On Tue, Aug 13, 2013 at 8:11 PM, Stephen Sprague sprag...@gmail.comwrote: well... a good 'ol search (let's not use the word google) of hive udf we find this: https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-StringFunctionsand there's a reference to a function called format_number(). or did you really want the *hive CLI* to format the number? if that's the case then no there is no option for that in the hive client. On Mon, Aug 12, 2013 at 11:30 PM, pandees waran pande...@gmail.comwrote: HI, I see the SUM(double_column) displays the result in scientific notation in the hive cli. Is there any way to customize the number display in hive CLI? -- Thanks, Pandeeswaran Hi, I am seeing the double values are displayed as scientifi not
Re: Passing mapreduce configuration parameters to hive udf
Hi Jon, Please refer to the following document: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification Hope this helps. Thanks -Abdelrahman On Tue, Aug 13, 2013 at 9:13 AM, Jon Bender jonathan.ben...@gmail.comwrote: Hi there, I'm trying to pass some external properties to a UDF. In the MapReduce world I'm used to extending Configured in my classes, but in my UDF class when initializing a new Configuration object or HiveConf object it doesn't inherit any of those properties. I see it in the Job Configuration XML when the job runs but my UDF can't pick it up when it creates a new instance. Are there any other suggested ways of doing this? I could probably just add some conf file to distributed cache and load the properties on UDF initialization, but I figured I could get at the configuration through other means. Thanks in advance, Jon -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Numbers display in Hive CLI
Sure , let me explore hive beeline client — Sent from Mailbox for iPad On Tue, Aug 13, 2013 at 11:24 PM, Stephen Sprague sprag...@gmail.com wrote: Yeah. I would think i'd be a useful feature to have in the client - but probably not the Hive CLI client. The Hive client seems pretty bare bones and my guess it'll probably stay that way. The Beeline client, however, looks to be where these kinds of bells and whistles probably could/should be added. Check that app out and see if you agree. (search hive beeline). On Tue, Aug 13, 2013 at 9:47 AM, pandees waran pande...@gmail.com wrote: Thanks Stephen! I shall check this . My requirement is controlling the formatting in session level using some properties set. Looks like, there's no such as of now . Would this be a good feature in hive cli? If many people think so, then I can file a feature request. — Sent from Mailbox https://www.dropbox.com/mailbox for iPad On Tue, Aug 13, 2013 at 8:11 PM, Stephen Sprague sprag...@gmail.comwrote: well... a good 'ol search (let's not use the word google) of hive udf we find this: https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-StringFunctionsand there's a reference to a function called format_number(). or did you really want the *hive CLI* to format the number? if that's the case then no there is no option for that in the hive client. On Mon, Aug 12, 2013 at 11:30 PM, pandees waran pande...@gmail.comwrote: HI, I see the SUM(double_column) displays the result in scientific notation in the hive cli. Is there any way to customize the number display in hive CLI? -- Thanks, Pandeeswaran Hi, I am seeing the double values are displayed as scientifi not
Getting the raw hive column types from HiveResultSetMetaData
Hi all, I'd like to inspect ResultSet returned by Hive JDBC and to be able to reconstruct the complex types (I.e. Map, struct and array). However, ResultSet and ResultSetMetaData only returns the column type as string for these complex types, making it impossible to distinguish an array [1,2,3] from an actual string value [1,2,3]. Questions 1. I saw from the source code that HiveResultSetMetaData has the raw column types, but it just not exposing the information. (http://svn.apache.org/viewvc/hive/branches/HIVE-4115/jdbc/src/java/org/apac he/hive/jdbc/HiveResultSetMetaData.java?view=markup) Can you add a method that returns the raw column type (e.g. mapstring, string)? (Btw, I'm new to this mailing list, and please let me know if this should be sent to the dev mailing list.) 2. Is there any other way to achieve what I want to do? Of course, you can inspect your query, but I was hoping ResultSet can be self-sufficient. Thanks, Mingyu smime.p7s Description: S/MIME cryptographic signature
Re: Hive and Lzo Compression
Hi I think the CREATE TABLE without the STORED AS clause will not give any errors while creating the table. However when you query that table and since that table contains .lzo files , you would get errors. With external tables , u r separating the table creation(definition) from the data. So only at the time of query of that table, hive might report errors. LZO compression rocks ! I am so glad I used it in our projects here. Regards sanjay From: w00t w00t w00...@yahoo.demailto:w00...@yahoo.de Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org, w00t w00t w00...@yahoo.demailto:w00...@yahoo.de Date: Tuesday, August 13, 2013 12:13 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Hive and Lzo Compression Thanks for your replies and the link. I could get it working, but wondered why the CREATE TABLE statement worked without the STORED AS Clause as well...that's what puzzles me a bit... But I will use the STORED AS Clause to be on the safe side. Von: Lefty Leverenz leftylever...@gmail.commailto:leftylever...@gmail.com An: user@hive.apache.orgmailto:user@hive.apache.org CC: w00t w00t w00...@yahoo.demailto:w00...@yahoo.de Gesendet: 19:06 Samstag, 10.August 2013 Betreff: Re: Hive and Lzo Compression I'm not seeing any documentation link in Sanjay's message, so here it is again (in the Hive wiki's language manual): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO. On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com wrote: Please refer this documentation here Let me know if u need more clarifications so that we can make this document better and complete Thanks sanjay From: w00t w00t w00...@yahoo.demailto:w00...@yahoo.de Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org, w00t w00t w00...@yahoo.demailto:w00...@yahoo.de Date: Thursday, August 8, 2013 2:02 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Hive and Lzo Compression Hello, I am started to run Hive with Lzo compression on Hortonworks 1.2 I have managed to install/configure Lzo and hive -e set io.compression.codecs shows me the Lzo Codecs: io.compression.codecs= org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec, org.apache.hadoop.io.compress.BZip2Codec However, I have some questions where I would be happy if you could help me. (1) CREATE TABLE statement I read in different postings, that in the CREATE TABLE statement, I have to use the following STORAGE clause: CREATE EXTERNAL TABLE txt_table_lzo ( txt_line STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/myuser/data/in/lzo_compressed'; It works withouth any problems now to execute SELECT statements on this table with Lzo data. However I also created a table on the same data without this STORAGE clause: CREATE EXTERNAL TABLE txt_table_lzo_tst ( txt_line STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '' LOCATION '/user/myuser/data/in/lzo_compressed'; The interesting thing is, it works as well, when I execute a SELECT statement and this table. Can you help, why the second CREATE TABLE statement works as well? What should I use in DDLs? Is it best practice to use the STORED AS clause with a deprecatedLzoTextInputFormat? Or should I remove it? (2) Output and Intermediate Compression Settings I want to use output compression . In Programming Hive from Capriolo, Wampler, Rutherglen the following commands are recommended: SET hive.exec.compress.output=true; SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; However, in some other places in forums, I found the following recommended settings: SET hive.exec.compress.output=true SET mapreduce.output.fileoutputformat.compress=true SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec Am I right, that the first settings are for Hadoop versions prior 0.23? Or is there any other reason why the settings are different? I am using Hadoop 1.1.2 with Hive 0.10.0. Which settings would you recommend to use? -- I also want to compress intermediate results. Again, in Programming Hive the following settings are recommended: SET hive.exec.compress.intermediate=true; SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; Is this the right setting? Or should I again use the settings (which look more valid for
Re: LZO output compression
Check this class where these are defined http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1/src/mapred/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java From: w00t w00t w00...@yahoo.demailto:w00...@yahoo.de Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org, w00t w00t w00...@yahoo.demailto:w00...@yahoo.de Date: Tuesday, August 13, 2013 2:39 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org, w00t w00t w00...@yahoo.demailto:w00...@yahoo.de Subject: Re: LZO output compression Oh, I could get it working using these settings: SET hive.exec.compress.output=true; SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; But I have one question, where maybe on of you can help me with an explaination: As I am running Hadoop 1.1.* why do I need the old command for Hadoop 0.20?: SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; I supposed the commands for the newer Hadoop versions are: SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress=true; SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec; Von: w00t w00t w00...@yahoo.demailto:w00...@yahoo.de An: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Gesendet: 11:26 Dienstag, 13.August 2013 Betreff: LZO output compression Hello, I am running Hortonworks 1.2 using Hadoop 1.1.2.21 and Hive 0.10.0.21. I set up LZO compression and can read LZO compressed data without problems. My next try was to test output compression. Therefore, I created the following small script: -- SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress=true; SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec; DROP TABLE IF EXISTS simple_lzo; CREATE TABLE simple_lzo ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' AS SELECT count(*) FROM txt_table_lzo; The output gets compressed but with default-codec deflate - not with LZO. Do you know what the problem could be here and how I could debug it? There are no error messages or so. Additionally, I also tried the commands for Hadoop 0.20: mapred.output.compress=true; mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec That didn't work as well. In Pig or Java MR, I have no problems to gerneate LZO compressed output. Thanks CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Hiveserver2 Beeline command clarification
Hi guys I just hooked up hivservrer2 to ldap. In beeline I realized you can login like the following (don't need to define org.apache.hive.jdbc.HiveDriver) beeline !connect jdbc:hive2://dev-thdp5:1 sanjay.subraman...@wizecommerce.com scan complete in 2ms Connecting to jdbc:hive2://dev-thdp5:1 Enter password for jdbc:hive2://dev-thdp5:1: Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.3.0) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://dev-thdp5:1 show tables; +--+ | tab_name | +--+ | keyword_impressions_log | +--+ 1 row selected (1.574 seconds) 0: jdbc:hive2://dev-thdp5:1 If this is also a correct way to use beeline, then I actually prefer this since the password is not visible sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.