Re: How to write Block of queries in Hive?
Hello Aniket, Thanks for the explanation. I have one more question that in SQL we write the multiple queries in which one query get executed and give the result to another query as a input. So, can we write something like that in Hive? I have also tried customs scripts in Hive but I am not getting that How to use it in block of queries. (Multiple queries) Thanks and Regards, Bhavesh Shah On Thu, Jan 5, 2012 at 11:43 AM, Aniket Mokashi aniket...@gmail.com wrote: Hi Bhavesh, [moving discussion to hive user list] I would suggest you to send your discussion to hive user list in order to reach a broader audience. As per my understanding, in the query- map_script and reduce_script are custom scripts that run as a streaming jobs. You are asking hive to run map_script as mapper job on 3 columns to generate 3 new values- c1, c2, c3. After this, hive will sort your records on c1 and c2 and distribute them to reducers based on c3 values. 'reduce_scripts' will consume these 3 records and generate 2 records to store in pv_users_reduced. Hope it helps. Thanks, Aniket On Wed, Jan 4, 2012 at 8:55 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Hello, I am new to hive. I want to write block of queries in Hive so that one query give result to another one like in SQL. I have also visited one link given below: http://karmasphere.com/ksc/hive-user-defined-functions.html In above link I am looking for functions but I get below one and I dont understand following things: USING 'map_script'USING 'reduce_script' in following block: FROM ( FROM pv_users MAP ( pv_users.userid, pv_users.date ) USING 'map_script' AS c1, c2, c3 DISTRIBUTE BY c2 SORT BY c2, c1) map_output INSERT OVERWRITE TABLE pv_users_reduced REDUCE ( map_output.c1, map_output.c2, map_output.c3 ) USING 'reduce_script' AS date, count; Pls can anyone tell what is the use of scripts and how to write block of queries in hive? -- Regards, Bhavesh Shah -- ...:::Aniket:::... Quetzalco@tl
[jira] [Updated] (HIVE-2682) Clean-up logs
[ https://issues.apache.org/jira/browse/HIVE-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2682: -- Attachment: HIVE-2682.D1035.3.patch rajat updated the revision HIVE-2682 [jira] Clean-up logs. Reviewers: JIRA, jsichi, jonchang, heyongqiang, njain Incorporated suggestions from Dymtro (dms). REVISION DETAIL https://reviews.facebook.net/D1035 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java Clean-up logs - Key: HIVE-2682 URL: https://issues.apache.org/jira/browse/HIVE-2682 Project: Hive Issue Type: Wish Components: Logging Reporter: Rajat Goel Priority: Trivial Attachments: HIVE-2682.D1035.1.patch, HIVE-2682.D1035.2.patch, HIVE-2682.D1035.3.patch Original Estimate: 24h Remaining Estimate: 24h Just wanted to cleanup some logs being printed at wrong loglevel - 1. org.apache.hadoop.hive.ql.exec.CommonJoinOperator prints table 0 has 1000 rows for join key [...] as WARNING. Is it really that? 2. org.apache.hadoop.hive.ql.exec.GroupByOperator prints Hash Table completed flushed and Begin Hash Table flush at close: size = 21 as WARNING. It shouldn't be. 3. org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher prints Warning. Invalid statistic. which looks fishy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2693) Add DECIMAL data type
Add DECIMAL data type - Key: HIVE-2693 URL: https://issues.apache.org/jira/browse/HIVE-2693 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Carl Steinbach Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice template for how to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2694) Add FORMAT UDF
[ https://issues.apache.org/jira/browse/HIVE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180747#comment-13180747 ] Carl Steinbach commented on HIVE-2694: -- Ref: http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_format Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. mysql SELECT FORMAT(12332.123456, 4); - '12,332.1235' mysql SELECT FORMAT(12332.1,4); - '12,332.1000' mysql SELECT FORMAT(12332.2,0); - '12,332' Add FORMAT UDF -- Key: HIVE-2694 URL: https://issues.apache.org/jira/browse/HIVE-2694 Project: Hive Issue Type: New Feature Components: UDF Reporter: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2694) Add FORMAT UDF
Add FORMAT UDF -- Key: HIVE-2694 URL: https://issues.apache.org/jira/browse/HIVE-2694 Project: Hive Issue Type: New Feature Components: UDF Reporter: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2695) Add PRINTF() Udf
Add PRINTF() Udf Key: HIVE-2695 URL: https://issues.apache.org/jira/browse/HIVE-2695 Project: Hive Issue Type: New Feature Components: UDF Reporter: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2695) Add PRINTF() Udf
[ https://issues.apache.org/jira/browse/HIVE-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180759#comment-13180759 ] Carl Steinbach commented on HIVE-2695: -- Add a PRINTF(String format, Obj... args) Udf that can format strings according to printf-style format strings. Ref: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Formatter.html Add PRINTF() Udf Key: HIVE-2695 URL: https://issues.apache.org/jira/browse/HIVE-2695 Project: Hive Issue Type: New Feature Components: UDF Reporter: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1185 - Still Failing
Changes for Build #1147 [namit] HIVE-2617 Insert overwrite table db.tname fails if partition already exists (Chinna Rao Lalam via namit) Changes for Build #1148 [heyongqiang] HIVE-2651 [jira] The variable hive.exec.mode.local.auto.tasks.max should be changed (Namit Jain via Yongqiang He) Summary: HIVE-2651 It should be called hive.exec.mode.local.auto.input.files.max instead. The number of input files are checked currently. Test Plan: EMPTY Reviewers: JIRA, heyongqiang Reviewed By: heyongqiang CC: heyongqiang Differential Revision: 861 [cws] HIVE-727. Hive Server getSchema() returns wrong schema for 'Explain' queries (Prasad Mujumdar via cws) [namit] HIVE-2611 Make index table output of create index command if index is table based (Kevin Wilfong via namit) Changes for Build #1150 [jvs] HIVE-2657 [jira] builtins JAR is not being published to Maven repo hive-cli POM does not depend on it either (Carl Steinbach via John Sichi) Summary: Make hive-cli and hive-ql depend on hive-builtins Test Plan: EMPTY Reviewers: JIRA, jsichi Reviewed By: jsichi CC: jsichi Differential Revision: 897 [namit] HIVE-2654 hive.querylog.location requires parent directory to be exist or else folder creation fails (Chinna Rao Lalam via namit) Changes for Build #1151 [hashutosh] HIVE-1892 : show functions also returns internal operators (Priyadarshini via Ashutosh Chauhan) Changes for Build #1152 Changes for Build #1153 [namit] HIVE-2660 Need better exception handling in RCFile tolerate corruptions mode (Ramkumar Vadali via namit) Changes for Build #1154 [cws] HIVE-2631. Make Hive work with Hadoop 1.0.0 (Ashutosh Chauhan via cws) Changes for Build #1155 [cws] HIVE-BUILD. Update RELEASE_NOTES.txt with 0.8.0 release information (cws) Changes for Build #1156 Changes for Build #1157 Changes for Build #1158 [namit] HIVE-2602 add support for insert partition overwrite(...) if not exists (Chinna Rao Lalam via namit) Changes for Build #1159 Changes for Build #1160 [cws] HIVE-2005. Implement BETWEEN operator (Navis via cws) Changes for Build #1161 [jvs] HIVE-2433. add DOAP file for Hive Changes for Build #1162 Changes for Build #1163 Changes for Build #1164 [heyongqiang] HIVE-2666 [jira] StackOverflowError when using custom UDF in map join (Kevin Wilfong via Yongqiang He) Summary: Resource files are now added to the class path as soon as they are added via the CLI. This fixes the stack overflow error mentioned in the JIRA by ensuring a consistent class loader between serializers and deserializers for the same query. Note that now serdes which contain a static block to register themselves are now registered twice, once when adding the file to the class loader, and once when an instance of the class is created. Previously, registering a serde twice resulted in an exception, to avoid this, I have downgraded it to a warning. When a custom UDF is used as part of a join which is converted to a map join, the XMLEncoder enters an infinite loop when serializing the map reduce task for the second time, as part of sending it to be executed. This results in a stack overflow error. Test Plan: I ran the unit tests to verify nothing was broken. I ran several queries which used custom UDFs and involved a join which was converted to a map join. I verified these completed successfully consistently Reviewers: JIRA, heyongqiang Reviewed By: heyongqiang CC: heyongqiang, kevinwilfong Differential Revision: 957 [namit] HIVE-2642 fix Hive-2566 and make union optimization more aggressive (Yongqiang He via namit) Changes for Build #1166 Changes for Build #1167 Changes for Build #1168 [heyongqiang] HIVE-2600: Enable/Add type-specific compression for rcfile (Krishna Kumar via He Yongqiang) Changes for Build #1169 Changes for Build #1170 [cws] HIVE-1877. Add java_method() as a synonym for the reflect() UDF (Zhenxiao Luo via cws) Changes for Build #1171 Changes for Build #1172 Changes for Build #1173 Changes for Build #1174 Changes for Build #1175 [hashutosh] HIVE-2681 : SUCESS is misspelled (jonchang via Ashutosh Chauhan) Changes for Build #1176 [hashutosh] HIVE-2616 : Passing user identity from metastore client to server in non-secure mode (Ashutosh Chauhan) Changes for Build #1177 Changes for Build #1178 Changes for Build #1179 Changes for Build #1180 Changes for Build #1181 Changes for Build #1182 [heyongqiang] HIVE-2621:Allow multiple group bys with the same input data and spray keys to be run on the same reducer. (Kevin via He Yongqiang) Changes for Build #1184 [namit] HIVE-2690 a bug in 'alter table concatenate' that causes filenames getting double url encoded (He Yongqiang via namit) Changes for Build #1185 27 tests failed. REGRESSION: org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk Error Message: Unexpected exception Stack Trace: junit.framework.AssertionFailedError: Unexpected exception at
[jira] [Created] (HIVE-2696) Conf variable to turn off setting the create time for a new partition
Conf variable to turn off setting the create time for a new partition - Key: HIVE-2696 URL: https://issues.apache.org/jira/browse/HIVE-2696 Project: Hive Issue Type: New Feature Reporter: Kevin Wilfong Assignee: Kevin Wilfong Priority: Minor There are some cases where the user does not want the create time for a partition to change on INSERT OVERWRITE to that partition. To accommodate this, we can add a new conf variable which will prevent the create time from being set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2696) Conf variable to turn off setting the create time for a new partition
[ https://issues.apache.org/jira/browse/HIVE-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong resolved HIVE-2696. - Resolution: Not A Problem Conf variable to turn off setting the create time for a new partition - Key: HIVE-2696 URL: https://issues.apache.org/jira/browse/HIVE-2696 Project: Hive Issue Type: New Feature Reporter: Kevin Wilfong Assignee: Kevin Wilfong Priority: Minor There are some cases where the user does not want the create time for a partition to change on INSERT OVERWRITE to that partition. To accommodate this, we can add a new conf variable which will prevent the create time from being set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2697) Ant compile-test target should be triggered from subprojects, not from top-level targets
Ant compile-test target should be triggered from subprojects, not from top-level targets Key: HIVE-2697 URL: https://issues.apache.org/jira/browse/HIVE-2697 Project: Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 0.8.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.8.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
errors while running hive queries
Hello, I am trying to run hive queries but I am getting errors as: hive FROM ( FROM t1 MAP t1.patient_mrn, t1.encounter_date USING 'retrieve' AS mp1, mp2 CLUSTER BY mp1) map_output INSERT OVERWRITE TABLE t3 REDUCE map_output.mp1, map_output.mp2 USING 'q1.txt' AS reducef1, reducef2; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201112281627_0097, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201112281627_0097 Kill Command = /home/hadoop/hadoop-0.20.2-cdh3u2//bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201112281627_0097 2011-12-31 03:10:46,391 Stage-1 map = 0%, reduce = 0% 2011-12-31 03:11:29,794 Stage-1 map = 100%, reduce = 100% Ended Job = job_201112281627_0097 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask hive -- Regards, Bhavesh Shah
[jira] [Commented] (HIVE-2629) Make a single Hive binary work with both 0.20.x and 0.23.0
[ https://issues.apache.org/jira/browse/HIVE-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181185#comment-13181185 ] Phabricator commented on HIVE-2629: --- amareshwarisr has commented on the revision HIVE-2629 [jira] Make a single Hive binary work with both 0.20.x and 0.23.0. The directory commonSecure should be changed to common/secure. Why don't we put those files in the directory common itself? Why create a new directory? Putting them in common would make the code cleaner INLINE COMMENTS build-common.xml:118-123 Why are these changes required? If not required, can you remove them? build.properties:13-17 Are we going to add new version here for all the upcoming versions as well? I don't think we should do it this way. shims/build.xml:57 Can we change commonSecure to common.secure in all the places? REVISION DETAIL https://reviews.facebook.net/D711 Make a single Hive binary work with both 0.20.x and 0.23.0 -- Key: HIVE-2629 URL: https://issues.apache.org/jira/browse/HIVE-2629 Project: Hive Issue Type: Bug Components: Shims Reporter: Carl Steinbach Assignee: Thomas Weise Fix For: 0.9.0 Attachments: HIVE-2629.D711.1.patch, HIVE-2629.D711.2.patch, HIVE-2629.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira