Re: How to write Block of queries in Hive?

2012-01-05 Thread Bhavesh Shah
Hello Aniket,
Thanks for the explanation.

I have one more question that in SQL we write the multiple queries in which
one query get executed and give the result to another query as a input.
So, can we write something like that in Hive?
I have also tried customs scripts in Hive but I am not getting that How to
use it in block of queries. (Multiple queries)




Thanks and Regards,
Bhavesh Shah

On Thu, Jan 5, 2012 at 11:43 AM, Aniket Mokashi aniket...@gmail.com wrote:

 Hi Bhavesh,

 [moving discussion to hive user list]

 I would suggest you to send your discussion to hive user list in order to
 reach a broader audience.

 As per my understanding, in the query- map_script and reduce_script are
 custom scripts that run as a streaming jobs. You are asking hive to run
 map_script as mapper job on 3 columns to generate 3 new values- c1, c2, c3.
 After this, hive will sort your records on c1 and c2 and distribute them to
 reducers based on c3 values. 'reduce_scripts' will consume these 3 records
 and generate 2 records to store in pv_users_reduced.

 Hope it helps.

 Thanks,
 Aniket

 On Wed, Jan 4, 2012 at 8:55 PM, Bhavesh Shah bhavesh25s...@gmail.com
 wrote:

  Hello,
  I am new to hive. I want to write block of queries in Hive so that one
  query give result to another one like in SQL.
 
  I have also visited one link given below:
  http://karmasphere.com/ksc/hive-user-defined-functions.html
 
  In above link I am looking for functions but I get below one and I dont
  understand following things:
 
  USING 'map_script'USING 'reduce_script'
 
  in following block:
 
 
  FROM (
   FROM pv_users
   MAP ( pv_users.userid, pv_users.date )
   USING 'map_script'
   AS c1, c2, c3
   DISTRIBUTE BY c2
   SORT BY c2, c1) map_output
   INSERT OVERWRITE TABLE pv_users_reduced
   REDUCE ( map_output.c1, map_output.c2, map_output.c3 )
   USING 'reduce_script'
   AS date, count;
 
 
  Pls can anyone tell what is the use of scripts and how to write block
  of queries in hive?
 
 
 
 
  --
  Regards,
  Bhavesh Shah
 



 --
 ...:::Aniket:::... Quetzalco@tl



[jira] [Updated] (HIVE-2682) Clean-up logs

2012-01-05 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2682:
--

Attachment: HIVE-2682.D1035.3.patch

rajat updated the revision HIVE-2682 [jira] Clean-up logs.
Reviewers: JIRA, jsichi, jonchang, heyongqiang, njain

  Incorporated suggestions from Dymtro (dms).

REVISION DETAIL
  https://reviews.facebook.net/D1035

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java


 Clean-up logs
 -

 Key: HIVE-2682
 URL: https://issues.apache.org/jira/browse/HIVE-2682
 Project: Hive
  Issue Type: Wish
  Components: Logging
Reporter: Rajat Goel
Priority: Trivial
 Attachments: HIVE-2682.D1035.1.patch, HIVE-2682.D1035.2.patch, 
 HIVE-2682.D1035.3.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 Just wanted to cleanup some logs being printed at wrong loglevel -
 1. org.apache.hadoop.hive.ql.exec.CommonJoinOperator prints table 0 has 1000 
 rows for join key [...] as WARNING. Is it really that? 
 2. org.apache.hadoop.hive.ql.exec.GroupByOperator prints Hash Table 
 completed flushed and Begin Hash Table flush at close: size = 21 as 
 WARNING. It shouldn't be.
 3. org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher prints Warning. 
 Invalid statistic. which looks fishy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2693) Add DECIMAL data type

2012-01-05 Thread Carl Steinbach (Created) (JIRA)
Add DECIMAL data type
-

 Key: HIVE-2693
 URL: https://issues.apache.org/jira/browse/HIVE-2693
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Carl Steinbach


Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice 
template for how to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2694) Add FORMAT UDF

2012-01-05 Thread Carl Steinbach (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180747#comment-13180747
 ] 

Carl Steinbach commented on HIVE-2694:
--

Ref: 
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_format

Formats the number X to a format like '#,###,###.##', rounded to D decimal 
places, and returns the result as a string. If D is 0, the result has no 
decimal point or fractional part.

mysql SELECT FORMAT(12332.123456, 4);
- '12,332.1235'
mysql SELECT FORMAT(12332.1,4);
- '12,332.1000'
mysql SELECT FORMAT(12332.2,0);
- '12,332'

 Add FORMAT UDF
 --

 Key: HIVE-2694
 URL: https://issues.apache.org/jira/browse/HIVE-2694
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Carl Steinbach



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2694) Add FORMAT UDF

2012-01-05 Thread Carl Steinbach (Created) (JIRA)
Add FORMAT UDF
--

 Key: HIVE-2694
 URL: https://issues.apache.org/jira/browse/HIVE-2694
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Carl Steinbach




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2695) Add PRINTF() Udf

2012-01-05 Thread Carl Steinbach (Created) (JIRA)
Add PRINTF() Udf


 Key: HIVE-2695
 URL: https://issues.apache.org/jira/browse/HIVE-2695
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Carl Steinbach




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2695) Add PRINTF() Udf

2012-01-05 Thread Carl Steinbach (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180759#comment-13180759
 ] 

Carl Steinbach commented on HIVE-2695:
--

Add a PRINTF(String format, Obj... args) Udf that can format strings according 
to printf-style format strings.

Ref: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Formatter.html


 Add PRINTF() Udf
 

 Key: HIVE-2695
 URL: https://issues.apache.org/jira/browse/HIVE-2695
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Carl Steinbach



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hive-trunk-h0.21 - Build # 1185 - Still Failing

2012-01-05 Thread Apache Jenkins Server
Changes for Build #1147
[namit] HIVE-2617 Insert overwrite table db.tname fails if partition already 
exists
(Chinna Rao Lalam via namit)


Changes for Build #1148
[heyongqiang] HIVE-2651 [jira] The variable hive.exec.mode.local.auto.tasks.max 
should be
changed
(Namit Jain via Yongqiang He)

Summary:
HIVE-2651

It should be called hive.exec.mode.local.auto.input.files.max instead.
The number of input files are checked currently.

Test Plan: EMPTY

Reviewers: JIRA, heyongqiang

Reviewed By: heyongqiang

CC: heyongqiang

Differential Revision: 861

[cws] HIVE-727. Hive Server getSchema() returns wrong schema for 'Explain' 
queries (Prasad Mujumdar via cws)

[namit] HIVE-2611 Make index table output of create index command if
index is table based (Kevin Wilfong via namit)


Changes for Build #1150
[jvs] HIVE-2657 [jira] builtins JAR is not being published to Maven repo  
hive-cli
POM does not depend on it either
(Carl Steinbach via John Sichi)

Summary: Make hive-cli and hive-ql depend on hive-builtins

Test Plan: EMPTY

Reviewers: JIRA, jsichi

Reviewed By: jsichi

CC: jsichi

Differential Revision: 897

[namit] HIVE-2654 hive.querylog.location requires parent directory to be 
exist or
  else folder creation fails (Chinna Rao Lalam via namit)


Changes for Build #1151
[hashutosh] HIVE-1892 : show functions also returns internal operators 
(Priyadarshini via Ashutosh Chauhan)


Changes for Build #1152

Changes for Build #1153
[namit] HIVE-2660 Need better exception handling in RCFile tolerate corruptions
mode (Ramkumar Vadali via namit)


Changes for Build #1154
[cws] HIVE-2631. Make Hive work with Hadoop 1.0.0 (Ashutosh Chauhan via cws)


Changes for Build #1155
[cws] HIVE-BUILD. Update RELEASE_NOTES.txt with 0.8.0 release information (cws)


Changes for Build #1156

Changes for Build #1157

Changes for Build #1158
[namit] HIVE-2602 add support for insert partition overwrite(...) if not
  exists (Chinna Rao Lalam via namit)


Changes for Build #1159

Changes for Build #1160
[cws] HIVE-2005. Implement BETWEEN operator (Navis via cws)


Changes for Build #1161
[jvs] HIVE-2433. add DOAP file for Hive


Changes for Build #1162

Changes for Build #1163

Changes for Build #1164
[heyongqiang] HIVE-2666 [jira] StackOverflowError when using custom UDF in map 
join
(Kevin Wilfong via Yongqiang He)

Summary:
Resource files are now added to the class path as soon as they are added via the
CLI.  This fixes the stack overflow error mentioned in the JIRA by ensuring a
consistent class loader between serializers and deserializers for the same
query.

Note that now serdes which contain a static block to register themselves are now
registered twice, once when adding the file to the class loader, and once when
an instance of the class is created.  Previously, registering a serde twice
resulted in an exception, to avoid this, I have downgraded it to a warning.

When a custom UDF is used as part of a join which is converted to a map join,
the XMLEncoder enters an infinite loop when serializing the map reduce task for
the second time, as part of sending it to be executed.  This results in a stack
overflow error.

Test Plan:
I ran the unit tests to verify nothing was broken.

I ran several queries which used custom UDFs and involved a join which was
converted to a map join.  I verified these completed successfully consistently

Reviewers: JIRA, heyongqiang

Reviewed By: heyongqiang

CC: heyongqiang, kevinwilfong

Differential Revision: 957

[namit] HIVE-2642 fix Hive-2566 and make union optimization more aggressive
(Yongqiang He via namit)


Changes for Build #1166

Changes for Build #1167

Changes for Build #1168
[heyongqiang] HIVE-2600: Enable/Add type-specific compression for rcfile 
(Krishna Kumar via He Yongqiang)


Changes for Build #1169

Changes for Build #1170
[cws] HIVE-1877. Add java_method() as a synonym for the reflect() UDF (Zhenxiao 
Luo via cws)


Changes for Build #1171

Changes for Build #1172

Changes for Build #1173

Changes for Build #1174

Changes for Build #1175
[hashutosh] HIVE-2681 : SUCESS is misspelled (jonchang via Ashutosh Chauhan)


Changes for Build #1176
[hashutosh] HIVE-2616 : Passing user identity from metastore client to server 
in non-secure mode (Ashutosh Chauhan)


Changes for Build #1177

Changes for Build #1178

Changes for Build #1179

Changes for Build #1180

Changes for Build #1181

Changes for Build #1182
[heyongqiang] HIVE-2621:Allow multiple group bys with the same input data and 
spray keys to be run on the same reducer. (Kevin via He Yongqiang)


Changes for Build #1184
[namit] HIVE-2690 a bug in 'alter table concatenate' that causes filenames 
getting
double url encoded (He Yongqiang via namit)


Changes for Build #1185



27 tests failed.
REGRESSION:  
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk

Error Message:
Unexpected exception

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
at 

[jira] [Created] (HIVE-2696) Conf variable to turn off setting the create time for a new partition

2012-01-05 Thread Kevin Wilfong (Created) (JIRA)
Conf variable to turn off setting the create time for a new partition
-

 Key: HIVE-2696
 URL: https://issues.apache.org/jira/browse/HIVE-2696
 Project: Hive
  Issue Type: New Feature
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
Priority: Minor


There are some cases where the user does not want the create time for a 
partition to change on INSERT OVERWRITE to that partition.  To accommodate 
this, we can add a new conf variable which will prevent the create time from 
being set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2696) Conf variable to turn off setting the create time for a new partition

2012-01-05 Thread Kevin Wilfong (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong resolved HIVE-2696.
-

Resolution: Not A Problem

 Conf variable to turn off setting the create time for a new partition
 -

 Key: HIVE-2696
 URL: https://issues.apache.org/jira/browse/HIVE-2696
 Project: Hive
  Issue Type: New Feature
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
Priority: Minor

 There are some cases where the user does not want the create time for a 
 partition to change on INSERT OVERWRITE to that partition.  To accommodate 
 this, we can add a new conf variable which will prevent the create time from 
 being set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2697) Ant compile-test target should be triggered from subprojects, not from top-level targets

2012-01-05 Thread Carl Steinbach (Created) (JIRA)
Ant compile-test target should be triggered from subprojects, not from 
top-level targets


 Key: HIVE-2697
 URL: https://issues.apache.org/jira/browse/HIVE-2697
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.8.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.8.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




errors while running hive queries

2012-01-05 Thread Bhavesh Shah
Hello,
I am trying to run hive queries but I am getting errors as:

hive FROM (
 FROM t1
 MAP t1.patient_mrn, t1.encounter_date
 USING 'retrieve'
 AS mp1, mp2
 CLUSTER BY mp1) map_output
   INSERT OVERWRITE TABLE t3
 REDUCE map_output.mp1, map_output.mp2
 USING 'q1.txt'
 AS reducef1, reducef2;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Starting Job = job_201112281627_0097, Tracking URL =
http://localhost:50030/jobdetails.jsp?jobid=job_201112281627_0097
Kill Command = /home/hadoop/hadoop-0.20.2-cdh3u2//bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201112281627_0097
2011-12-31 03:10:46,391 Stage-1 map = 0%,  reduce = 0%
2011-12-31 03:11:29,794 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201112281627_0097 with errors
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask
hive



-- 
Regards,
Bhavesh Shah


[jira] [Commented] (HIVE-2629) Make a single Hive binary work with both 0.20.x and 0.23.0

2012-01-05 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181185#comment-13181185
 ] 

Phabricator commented on HIVE-2629:
---

amareshwarisr has commented on the revision HIVE-2629 [jira] Make a single 
Hive binary work with both 0.20.x and 0.23.0.

  The directory commonSecure should be changed to common/secure. Why don't we 
put those files in the directory common itself? Why create a new directory? 
Putting them in common would make the code cleaner


INLINE COMMENTS
  build-common.xml:118-123 Why are these changes required? If not required, can 
you remove them?
  build.properties:13-17 Are we going to add new version here for all the 
upcoming versions as well?  I don't think we should do it this way.
  shims/build.xml:57 Can we change commonSecure to common.secure in all the 
places?

REVISION DETAIL
  https://reviews.facebook.net/D711


 Make a single Hive binary work with both 0.20.x and 0.23.0
 --

 Key: HIVE-2629
 URL: https://issues.apache.org/jira/browse/HIVE-2629
 Project: Hive
  Issue Type: Bug
  Components: Shims
Reporter: Carl Steinbach
Assignee: Thomas Weise
 Fix For: 0.9.0

 Attachments: HIVE-2629.D711.1.patch, HIVE-2629.D711.2.patch, 
 HIVE-2629.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira