Re: UDF reflect
It's probably not as pretty as the new built-in version, but this allows scripted UDFs in any javax.script language: https://github.com/livingsocial/HiveSwarm/blob/master/src/main/java/com/livingsocial/hive/udf/ScriptedUDF.java On Thu, Apr 3, 2014 at 4:35 PM, Andy Srine andy.sr...@gmail.com wrote: Thanks Edward. But inline groovy is available on Hive 13 right? I am using an older version. Best, Andy On Thu, Apr 3, 2014 at 11:37 AM, Edward Capriolo edlinuxg...@gmail.com wrote: You can write UDF's in groovy now. That pretty much means. You can just write a quick method inline now. Makese udf reflect much less useful. On Thu, Apr 3, 2014 at 2:22 PM, Andy Srine andy.sr...@gmail.com wrote: Thanks Szehon and Peyman, I want to call hashCode() on the UUID object. This object is stored in the table as a string, but I can convert it to UUID. Thats not the problem. Basically the question is, how do we call this reflect UDF on methods that takes no arguments? How do I do the following using the reflect UDF? (fromString(uid_str)).hashCode() Thanks, Andy On Thu, Apr 3, 2014 at 6:18 AM, Peyman Mohajerian mohaj...@gmail.com wrote: Maybe your intention is the following: reflect(java.util.UUID, randomUUID) On Thu, Apr 3, 2014 at 2:33 AM, Szehon Ho sze...@cloudera.com wrote: Hi, according to the description of the reflect UDF, you are trying to call java.util.UUID.hashcode(uidString), which doesnt seem to be an existing method on either java 6/7. http://docs.oracle.com/javase/7/docs/api/java/util/UUID.html#hashCode() Thanks Szehon On Wed, Apr 2, 2014 at 2:13 PM, Andy Srine andy.sr...@gmail.com wrote: Hi guys, I am trying to use the reflect UDF for an UUID method and am getting an exception. I believe this function should be available in java 1.6.0_31 the system is running. select reflect(java.util.UUID, hashCode, uid_str) my_uid, ... My suspicion is, this is because the hive column I am calling this on is a string and not an UUID. So I nested the reflects as shown below to go from a string to an UUID first and then to hashCode it. reflect(java.util.UUID, hashCode, reflect(java.util.UUID, fromString, uid_str)) my_uid, In either case, I always get the exception below though the row of data it prints has no null for the uid_str column. Any ideas? at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: UDFReflect getMethod at org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:164) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.getNewKey(KeyWrapperFactory.java:113) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:794) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548) ... 9 more Caused by: java.lang.NoSuchMethodException: java.util.UUID.hashCode(null) at java.lang.Class.getMethod(Class.java:1605) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFReflect.evaluate(GenericUDFReflect.java:160) Thanks, Andy -- Thanks, Andy -- Thanks, Andy
Re: Help with simple UDF error..
Try changing ArrayList to just List as the argument to the evaluate function. On Thu, Feb 13, 2014 at 12:15 PM, Manish maa...@gmail.com wrote: Hello: While running the hive query : hive select Actor, count (*) from metrics where array_contains(type_tuple, Actor) and checkAttributes (attribute_tuple, p1, Actor) ; I get this error : FAILED: SemanticException [Error 10014]: Line 1:103 Wrong arguments 'Actor': No matching method for class yk.HiveCheckAllAttributes with (arraystring, string, string). Possible choices: _FUNC_(string, string, string) the metrics hive table schema looks like : hive describe metrics; OK col_name data_type comment entity_uuid string type_tuple arraystring attribute_tuple arraystring the class yk.HiveCheckAllAttributes extends UDF and has evaluate method defined as : --snip-- public boolean evaluate(ArrayListString payload, String pType, String type) { ... } I have tried Text type and still go the same error.. --snip-- public boolean evaluate(Text payload, String pType, String type) { ... } How can I pass Hive ArrayStrings as an argument to a simple UDF? /Manish
Re: Can data be passed to the final mode init call in a UDAF?
The code I'm having problems with is https://github.com/livingsocial/HiveSwarm/blob/WIP_scripting/src/main/java/com/livingsocial/hive/udf/ScriptedUDAF.java. There is a FIXME comment at the problem area where the return type for the init method can not be determined. On Mon, Feb 10, 2014 at 4:26 PM, John Meagher john.meag...@gmail.com wrote: I'm working on a UDAF that takes in a constant string that defines what the final output of the UDAF will be. In the mode=PARTIAL1 call to the init function all the parameters are available and the constant can be read so the output ObjectInspector can be built. I haven't found a way to pass this same constant data through so it is available to the mode=FINAL call to init. Is there a way to pass constant parameters through the UDAF so it is available in the mode=FINAL init call? I have tried having the PARTIAL1 init call return a StructObjectInspector containing constant object inspectors with the data, but when that is read during the FINAL call the incoming ObjectInspectors are no longer constants. I've looked for ways to access the JobConf object for the job to try to pass this parameter along, but I can't find a way to access that from Hive. I can pass the data along with the data coming from terminatePartial, but that data is not available until merge is called and init in FINAL mode may be called without another invocation of init. Note: this is with Hive 0.10. While I'd like to get this working under 0.10 I'd like to know if this is a feature available in newer releases. Thanks, John
Re: FUNCTION HIVE to DAYS OF WEEK
The pom.xml file in the top level specifies the version of Hadoop and Hive it is compiling against. There are 2 ways to get things working: 1) Modify the HiveSwarm pom.xml file to reference the version of Hadoop and Hive you are using and recompile 2) Copy the day of the week UDF over into your own project Either approach should work. On Tue, Feb 11, 2014 at 5:47 AM, Eduardo Parra Valdes | BEEVA eduardo.pa...@beeva.com wrote: Hi, Jhon Thank very much for your response. this project is very interesting. I test your jar , but no work. I think that is because i have other version of hadoop. I have installed Hadoop 2.2.0 whit YARN. the Error is Diagnostic Messages for this Task: Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:134) ... 22 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1422) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.isDeterministic(FunctionRegistry.java:1385) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.isDeterministic(ExprNodeGenericFuncEvaluator.java:132) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.iterate(ExprNodeEvaluatorFactory.java:83) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.toCachedEval(ExprNodeEvaluatorFactory.java:73) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:57) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409) at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:82) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:425) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:113) ... 22 more I'll try doing other test. whit other version of hadoop. if you see , why don't your jar , Quizás quisiste decir: te agradecería que me lo dijeras. I appreciate that you told me. Yhanks a lot! 2014-02-10 22:43 GMT+01:00 John Meagher john.meag...@gmail.com
Can data be passed to the final mode init call in a UDAF?
I'm working on a UDAF that takes in a constant string that defines what the final output of the UDAF will be. In the mode=PARTIAL1 call to the init function all the parameters are available and the constant can be read so the output ObjectInspector can be built. I haven't found a way to pass this same constant data through so it is available to the mode=FINAL call to init. Is there a way to pass constant parameters through the UDAF so it is available in the mode=FINAL init call? I have tried having the PARTIAL1 init call return a StructObjectInspector containing constant object inspectors with the data, but when that is read during the FINAL call the incoming ObjectInspectors are no longer constants. I've looked for ways to access the JobConf object for the job to try to pass this parameter along, but I can't find a way to access that from Hive. I can pass the data along with the data coming from terminatePartial, but that data is not available until merge is called and init in FINAL mode may be called without another invocation of init. Note: this is with Hive 0.10. While I'd like to get this working under 0.10 I'd like to know if this is a feature available in newer releases. Thanks, John
Re: FUNCTION HIVE to DAYS OF WEEK
Here's one implementation of it: https://github.com/livingsocial/HiveSwarm#dayofweekdate. The code for it is pretty straight forward: https://github.com/livingsocial/HiveSwarm/blob/master/src/main/java/com/livingsocial/hive/udf/DayOfWeek.java On Mon, Feb 10, 2014 at 4:38 PM, Stephen Sprague sprag...@gmail.com wrote: oddly enough i don't see one here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions however, you're not the only one finding something like this useful. cf. https://issues.apache.org/jira/browse/HIVE-6046 in the meantime it appears as though you'll have to write your own UDF or perhaps setup a streaming job via transform() function. Cheers, Stephen. PS. The transform() function is _really_ easy to use so don't be scared off. cf. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform (the last example is the simplest.) On Mon, Feb 10, 2014 at 8:33 AM, Eduardo Parra Valdes | BEEVA eduardo.pa...@beeva.com wrote: Hello! all I wish to find a function that returns hive day of the week (Monday, Tuesday .. etc) to enter a parameter (timestamp). Anyone have an idea of how to do it? -- [image: BEEVA] *Eduardo Parra Valdés* eduardo.pa...@beeva.com BEE OUR CLIENT WWW.BEEVA.COM http://www.beeva.com/ Clara del Rey 26, 4ª planta. [28002] Madrid. https://www.facebook.com/somosBEEVAhttp://www.linkedin.com/company/beevahttps://twitter.com/beeva_es Antes de imprimir este mensaje, por favor compruebe que es necesario hacerlo. Before you print this message please consider if it really necesary. Aviso Legal: Este mensaje, su contenido y cualquier fichero transmitido con él, está dirigido únicamente a su destinatario y es confidencial. Por ello, se informa a quien lo reciba por error o tenga conocimiento del mismo sin ser su destinatario, que la información contenida en él es reservada y su uso no autorizado, por lo que en tal caso le rogamos nos lo comunique por la misma vía, así como que se abstenga de reproducir el mensaje mediante cualquier medio o remitirlo o entregarlo a otra persona, procediendo a su borrado de manera inmediata.Disclaimer: This message, its content and any file attached thereto is for the intended recipient only and is confidential. If you have received this e-mail in error or had access to it, you should note that the information in it is private and any use thereof is unauthorised. In such an event please notify us by e-mail. Any reproduction of this e-mail by whatsoever means and any transmission or dissemination thereof to other persons is prohibited. It should be deleted immediately from your system.
Re: Formatting hive queries
I use vim and https://github.com/vim-scripts/SQLUtilities to do it. It's not hive specific. Any SQL formatting tool will work. On Tue, Jan 21, 2014 at 11:23 PM, pandees waran pande...@gmail.com wrote: Hi, I would like to come up with a code which automatically formats your hql files. Because, formatting is one of the tedious task and i would like to come up with an utility for that. Please let me know, whether any specific utilities exist already for formatting hive queries. -- Thanks, Pandeeswaran
Re: Problème with min function in HiveQL
Aggregate functions need to go in a HAVING clause instead of the WHERE clause. WHERE clauses are applied prior to aggregation, HAVING is applied post aggregation. select ... from ... where some row level filter group by ... having some aggregate level filter On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com wrote: Looks like the issue is the use of min() within the WHERE clause - the place where the exception is being thrown has the following comment: // UDAF in filter condition, group-by caluse, param of funtion, etc. On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Hi everybody, I am coding some HiveQL script to do some calculations. I have a problem with the min() function. My hive script is below : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE CAST(a.dt_jour AS TIMESTAMP) = CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT) AND CAST(a.dt_jour AS TIMESTAMP) CAST(min(b.dt_jour)+1 AS TIMESTAMP) AND a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; And, when i try to launch this, i get this error : FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported place for UDAF 'min' Obviously, there is a problem with min() function. How can i pass through this error? Thanks for your help CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Java Courses for Scripters/Big Data Geeks
The Data Science course on Coursera has a pretty good overview of map reduce, Hive, and Pig without going into the Java side of things. https://www.coursera.org/course/datasci. It's not in depth, but it is enough to get started. On Wed, Jul 17, 2013 at 3:52 PM, John Omernik j...@omernik.com wrote: Hey all - I was wondering if there were any shortcut Java courses out there. As in, I am not looking for a holistic learn everything about Java course, but more of a So you are a big data/hive geek and you get Python/Perl pretty well, but when you try to understand Java your head explodes and it feels like you are missing something entry level and basic thus you need these basic things and you'll be fine course. Any thoughts?
Re: Hive - max rows limit (int limit = 2^31). need Help (looks liek a bug)
What is the data type of the p1 column? I've used hive with partitions containing far above 2 billion rows without having any problems like this. On Wed, May 29, 2013 at 2:41 PM, Gabi Kazav gabi.ka...@pursway.com wrote: Hi, We are working on hive DB with our Hadoop cluster. We now facing an issue about joining a big partition with more than 2^31 rows. When the partition has more than 2147483648 rows (even 2147483649) the output of the join is a single row. When the partition has less than 2147483648 rows (event 2147483647) the output is correct. Our test case: create a table with 2147483649 rows in a partition with the value : 1 , join this table to another table with a single row,single column with the value 1 on the partition_key. later delete 2 rows and run the same join. 1st : only a single row is created 2nd : 2147483647 rows the query we run for test the case is: create table output_rows_over as select a.s1 from max_sint_rows a join small_table b on (a.p1=b.p1); on more than 2^31 rows we got the following on reducer log: 2013-05-27 21:51:14,186 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 On less than 2^31 rows we got the following reducer log: 2013-05-27 23:43:14,681 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:2147483647 Anyone faced this issue? Does hive has workaround for that? I have huge partitions I need to work on and I cannot use hive for that.. Thanks, Gabi Kazav Infrastructure Team Leader, Pursway.com
Re: How to do a calculation among 2 rows?
There's a new release candidate version of Hive that includes windowing functions that address this. RC1 is available at http://people.apache.org/~hashutosh/hive-0.11.0-rc1/. Also see https://issues.apache.org/jira/browse/HIVE-896 Without 0.11 the approach from Paul will work, but will be slow. On Wed, May 15, 2013 at 7:53 AM, Paul COURTOIS p...@pole-conseils.com wrote: Hi Bhathiya, what about cartesian product ? the idea would be something like that : select a, b, c from t1 join t1 as t2 on t2.a = t1.a where t2.b t1.b it's not very elegant, but should work ;-) Paul 2013/5/15 Bhathiya Jayasekara tobhathi...@gmail.com Hi all, Can someone please give an idea for this? I'm totally stuck. http://stackoverflow.com/questions/16558438/in-hive-how-to-do-a-calculation-among-2-rows Thanks. --Bhathiya
Re: Hive Group By Limitations
Not quite sure but I think each group by will give another M/R job. It will be done in a single M/R job no matter how many fields are in the GROUP BY clause. On Mon, May 6, 2013 at 2:07 PM, Peter Chu pete@outlook.com wrote: In Hive, I cannot perform a SELECT GROUP BY on fields not in the GROUP BY clause. Example: SELECT st.a, st.b, st.c, st.d, FROM some_table st GROUP BY st.a; -- This does not work. To make it work, I would need to add the other fields in the group by clause. Not quite sure but I think each group by will give another M/R job. Wondering if there is any other way / better way to do group by. Peter
Re: Using Reflect: A thread for ideas
Another option for this functionality would be to use the Java scripting API. The basic structure of the call would be... select script( scriptLanguage, scriptToRun, args... ) I haven't seen that in Hive, but something similar is available for Pig. Documentation for that is available on http://pig.apache.org/docs/r0.9.2/udf.html#js-udfs. There's also a variation in Jira https://issues.apache.org/jira/browse/PIG-1777. On Wed, Feb 13, 2013 at 11:38 PM, John Omernik j...@omernik.com wrote: I stumbled across the little documented reflect function today. I've always known about it, but java scares me if it's not in a cup so I didn't dig. Well today I dug, and found an awesome use case for reflect (for me) and wanted to share. I also thought it would be nice to validate some thoughts I had on reflect, and how we could possibly share ideas on reflect so that folks could get more use out of this great feature of hive. Here's my example: A simple URL decode function: select url, reflect('java.net.URLDecoder', 'decode', url, 'utf-8') as decoded_url from logs Basically I am using the decode function of the java.net.URLDecoder class. Pretty awesome, works great, no files to distribute either. Even works through JDBC! Ok that being said, I realized now that the function I am trying to call has to return data in a simple data type. For example, I struggle to come up with a simple reflect() for making an Hex MD5 out of a string because the built in function return an object, which have methods that can return what I am looking for. Which is great, but then I have to compile java code, distribute a jar, and then run the code. I am looking for simple like the URLDecoding function. I love this reflect feature, but I think it's probably underutilized due to the perceived usability issues for beginners. So that leads me to my next thought. What if we brain storm here handy functions in Java that are not included in the standard hive language, that make the transition to hive well using the reflect function and the show an example of it's use? I went first with my URLDecode, and obviously will be looking for more, but have you seen some examples that we neat and worked well for you? Can you share? Perhaps if we get enough examples we could roll some of these into a wiki page on the hive wiki that folks can use to get over the perceived complexity of using java reflect? Thanks to those who have worked hard to implement features like this, it is truly awesome.
Re: Help in hive query
The WHERE part in the approvals can be moved up to be an IF in the SELECT... SELECT client_id,receive_dd,receive_hh, receive_hh+1, COUNT(1) AS transaction_count, SUM( IF ( response=00, 1, 0) ) AS approval_count, SUM( IF ( response=00, 1, 0) ) / COUNT(1) * 100 AS percent FROM sale_test group by fdc_client_id,receive_dd,receive_hh,receive_hh+1; On Tue, Oct 30, 2012 at 1:51 AM, dyuti a hadoop.hiv...@gmail.com wrote: Hi All, I want to perform (No.of .approvals in an hour/No.of transactions in that hour)*100. //COUNT(1) AS cnt gives total transactions in an hour SELECT client_id,receive_dd,receive_hh,receive_hh+1,COUNT(1) AS cnt FROM sale_test group by fdc_client_id,receive_dd,receive_hh,receive_hh+1; GETREGREOS 23 16 17 5969 GETREGREOS 23 21 22 2602 GETREGREOS 24 3 4 114 //Approved transactions where response=00 SELECT client_id,receive_dd,receive_hh,receive_hh+1,COUNT(1) AS cnt FROM sale_test where response=00 group by client_id,receive_dd,receive_hh,receive_hh+1; GETREGREOS 23 16 17 5775 GETREGREOS 23 21 22 2515 GETREGREOS 24 3 4 103 I want to perform 100 * (5775/5969) , 100 * (2515/2602) , 100 * (103/114) like the same for all other clients for each hour i.e., (No.of .approvals in an hour/No.of transactions in that hour)*100. Please help me out as how to achieve this in hive. Thanks Regards, dti
Re: Hive directory permissions
Creating the /user/hive/warehouse folder is a one-time setup step that can be done as the hdfs user. With g+w permissions any user can then create and read the tables. On Thu, Aug 16, 2012 at 9:57 AM, Connell, Chuck chuck.conn...@nuance.comwrote: I have no doubt that works, but surely a Hive user should not need sudo privileges! I am also looking for best practices, since we have run into the same. ** ** ** ** *From:* Himanish Kushary [mailto:himan...@gmail.com] *Sent:* Thursday, August 16, 2012 9:51 AM *To:* user@hive.apache.org *Subject:* Re: Hive directory permissions ** ** We usually start the shell thru sudo,otherwise we get a Permission denied while creating Hive tables. ** ** But this is a good point, any suggestions/best practices from the user community ? ** ** Thanks On Thu, Aug 16, 2012 at 9:37 AM, Connell, Chuck chuck.conn...@nuance.com wrote: I have run into similar problems. Thanks for the suggestions. One concern… Isn’t hdfs a highly privileged user within the Hadoop cluster? So do we really want it to be standard practice for all Hive users to su to hdfs?** ** Chuck Connell Nuance RD Data Team Burlington, MA *From:* Himanish Kushary [mailto:himan...@gmail.com] *Sent:* Thursday, August 16, 2012 9:30 AM *To:* user@hive.apache.org *Subject:* Re: Hive directory permissions Hi Sean, From the Hive language manual - Moreover, we strongly advise users to create the HDFS directories /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before tables are created in Hive My warehouse directory has the following permissions: *Name* *Type* *Size* *Replication* *Block Size* *Modification Time* *Permission* *Owner* *Group* *warehousehttp://h-node02:50075/browseDirectory.jsp?dir=%2Fuser%2Fhive%2FwarehousenamenodeInfoPort=50070nnaddr=hdfs01:8020 * *dir* *2012-08-15 15:51* *rwxrwxr-x* *hdfs* *supergroup* Also I make sure that I run the hive CLI as 'hdfs' user. Hope this helps Thanks Himanish On Thu, Aug 16, 2012 at 1:50 AM, Sean Bigdatafun sean.bigdata...@gmail.com wrote: I ran the following command trying to setup my hive directory, but got the following error. root@cdh-client:/# hadoop fs -mkdir /user/hive/warehouse mkdir: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode=/user:hdfs:hadoop:drwxr-xr-x Can someone tell me what is the correct ownership for hive directories? My /user directory is with the following ownership: userhttp://cdh-slave2-16g.hwhadoop.com:50075/browseDirectory.jsp?dir=%2FusernamenodeInfoPort=50070 dir 2012-08-14 16:51 rwxr-xr-x hdfs hadoop Should I create a sub-directory under user with the following permission?* *** hhttp://cdh-slave2-16g.hwhadoop.com:50075/browseDirectory.jsp?dir=%2FusernamenodeInfoPort=50070 ive dir 2012-08-14 16:51 rwxr-xr-x hive hadoop Thanks, Sean -- Thanks Regards Himanish ** ** -- Thanks Regards Himanish
Re: Converting rows into dynamic colums in Hive
R is another programming language geared to statistical analysis. It has some features that make it easy to write this sort of data manipulation. http://www.r-project.org/ Here's a sample R function that does something similar to what you need (note: I'm not a real R programmer, so this might be ugly, but it works): abCountsToMatrix - function(csvFilename, outputmatrixfile){ data - read.csv(csvFilename, as.is=T, check.names=F, header=F) cat(Loaded, nrow(data), rows.\n) idrows - sort(unique(c(data[ , 1]))) idcols - sort(unique(c(data[ , 2]))) cat(Found, length(idrows), X , length(idcols), .) output - array(0, c(length(idrows), length(idcols))) rownames(output) - idrows colnames(output) - idcols for(i in 1:nrow(data)){ if (i %% 100 == 0) { cat(On , i, of , nrow(data), \n) } rowidx - which(idrows == data[i, 1]) colidx - which(idcols == data[i, 2]) count - data[i,3] output[rowidx, colidx] - count } write.csv(output, file=outputmatrixfile, quote=F) } On Wed, Aug 8, 2012 at 8:58 PM, richin.j...@nokia.com wrote: John, What is R? -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Wednesday, August 08, 2012 4:34 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure. On Wed, Aug 8, 2012 at 3:56 PM, richin.j...@nokia.com wrote: Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, from table ) group by alpha something on those lines... Ashish On Tue, Aug 7, 2012 at 1:57 PM, richin.j...@nokia.com wrote: Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question - Can it be done in Hive? If not, any suggestions. Thanks, Richin
Re: Converting rows into dynamic colums in Hive
I don't think having dynamic columns is possible in Hive. I've always output from Hive a structure like your query output and used R to convert it into a dynamic column structure. On Wed, Aug 8, 2012 at 3:56 PM, richin.j...@nokia.com wrote: Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr, from table ) group by alpha something on those lines... Ashish On Tue, Aug 7, 2012 at 1:57 PM, richin.j...@nokia.com wrote: Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123 def 2.1 456 xyz 0.1 456 abc 0.6 456 pqr 1.9 456 def 3.2 456 lmn1.1 456 sdf 1.2 I want the output for the data visualization purpose to look like (basically taking rows from one table and making it column in another table) Alphaxyz abc pqr def lmn sdf 123 1.0 0.5 1.3 2.1 456 0.1 0.6 1.9 3.2 1.1 1.2 Question – Can it be done in Hive? If not, any suggestions. Thanks, Richin