[jira] Updated: (HIVE-30) Hive web interface
[ https://issues.apache.org/jira/browse/HIVE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-30: Attachment: hive-30-7.patch This patch adds: * the WAR location to be specified in hive-site.conf (changed to HiveConf) * also the class path refers to hadoop.root rather then a hardcoded version ie 0.19.0 Hive web interface -- Key: HIVE-30 URL: https://issues.apache.org/jira/browse/HIVE-30 Project: Hadoop Hive Issue Type: Bug Components: Web UI Reporter: Jeff Hammerbacher Assignee: Edward Capriolo Priority: Minor Fix For: 0.2.0 Attachments: HIVE-30-5.patch, HIVE-30-6.patch, hive-30-7.patch, HIVE-30-A.patch, HIVE-30.patch, HIVE-30.patch Hive needs a web interface. The initial checkin should have: * simple schema browsing * query submission * query history (similar to MySQL's SHOW PROCESSLIST) A suggested feature: the ability to have a query notify the user when it's completed. Edward Capriolo has expressed some interest in driving this process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-30) Hive web interface
[ https://issues.apache.org/jira/browse/HIVE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668624#action_12668624 ] Ashish Thusoo commented on HIVE-30: --- Hi Edward, It seems like the latest patch has the output for svn stat instead of svn diff... Thanks, Ashish Hive web interface -- Key: HIVE-30 URL: https://issues.apache.org/jira/browse/HIVE-30 Project: Hadoop Hive Issue Type: Bug Components: Web UI Reporter: Jeff Hammerbacher Assignee: Edward Capriolo Priority: Minor Fix For: 0.2.0 Attachments: HIVE-30-5.patch, HIVE-30-6.patch, hive-30-7.patch, HIVE-30-A.patch, HIVE-30.patch, HIVE-30.patch Hive needs a web interface. The initial checkin should have: * simple schema browsing * query submission * query history (similar to MySQL's SHOW PROCESSLIST) A suggested feature: the ability to have a query notify the user when it's completed. Edward Capriolo has expressed some interest in driving this process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-253: --- Affects Version/s: 0.2.0 Fix Version/s: 0.2.0 Marking this for 0.2.0 version. rand() gets precomputated in compilation phase -- Key: HIVE-253 URL: https://issues.apache.org/jira/browse/HIVE-253 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.2.0 Reporter: Zheng Shao Assignee: Ashish Thusoo Priority: Blocker Fix For: 0.2.0 SELECT * FROM t WHERE rand() 0.01; Hive will say: No need to submit job, because the condition evaluates to false. The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase. One way to do that is to add an annotation in the UDFRand and check that in the compiling phase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-253) rand() gets precomputated in compilation phase
[ https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo reassigned HIVE-253: -- Assignee: Ashish Thusoo rand() gets precomputated in compilation phase -- Key: HIVE-253 URL: https://issues.apache.org/jira/browse/HIVE-253 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.2.0 Reporter: Zheng Shao Assignee: Ashish Thusoo Priority: Blocker Fix For: 0.2.0 SELECT * FROM t WHERE rand() 0.01; Hive will say: No need to submit job, because the condition evaluates to false. The rand() function is special in the sense that every time it evaluates to a different value. We should disallow computing the value in the compiling phase. One way to do that is to add an annotation in the UDFRand and check that in the compiling phase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-79) Print number of raws inserted to table(s) when the query is finished.
[ https://issues.apache.org/jira/browse/HIVE-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Antony reassigned HIVE-79: - Assignee: Suresh Antony Print number of raws inserted to table(s) when the query is finished. -- Key: HIVE-79 URL: https://issues.apache.org/jira/browse/HIVE-79 Project: Hadoop Hive Issue Type: New Feature Components: Logging Reporter: Suresh Antony Assignee: Suresh Antony Priority: Minor Fix For: 0.2.0 It is good to print the number of rows inserted into each table at end of query. insert overwrite table tab1 select a.* from tab2 a where a.col1 = 10; This query can print something like: tab1 rows=100 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-261) union all query hangs
union all query hangs - Key: HIVE-261 URL: https://issues.apache.org/jira/browse/HIVE-261 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Hao Liu we have this query: SELECT a.u, b.id FROM ( SELECT a1.u, a1.id as id FROM t_1 a1 WHERE a1.date = '2009-01-01' UNION ALL SELECT a2.u, a2.id as id FROM t_2 a2 WHERE a2.date = '2009-01-01' UNION ALL ... SELECT aN.u, aN.id as id FROM t_N an WHERE aN.date = '2009-01-01' ) a JOIN t b ON a.id = b.id WHERE b.date='2009-01-01' GROUP BY a.u, b.id When we union more than 20 tables, the query will hang. It looks like something wrong in the compiler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-260) hive cli should not output the line by default
[ https://issues.apache.org/jira/browse/HIVE-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668681#action_12668681 ] Ashish Thusoo commented on HIVE-260: Does this show up along with the message that outputs the number of reducers etc... If so would this just not go away with running the cli in silent mode? hive cli should not output the line by default -- Key: HIVE-260 URL: https://issues.apache.org/jira/browse/HIVE-260 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.2.0 Reporter: Zheng Shao Priority: Blocker This is at the beginning of hive cli output: Hive history file=/tmp/zshao/hive_job_log_zshao_200901291532_-1964746650.txt We should remove it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-30) Hive web interface
[ https://issues.apache.org/jira/browse/HIVE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-30: Attachment: hive-30-9.patch Newest patch. (not a svn stat DOH!) Hive web interface -- Key: HIVE-30 URL: https://issues.apache.org/jira/browse/HIVE-30 Project: Hadoop Hive Issue Type: Bug Components: Web UI Reporter: Jeff Hammerbacher Assignee: Edward Capriolo Priority: Minor Fix For: 0.2.0 Attachments: HIVE-30-5.patch, HIVE-30-6.patch, hive-30-7.patch, hive-30-9.patch, HIVE-30-A.patch, HIVE-30.patch, HIVE-30.patch Hive needs a web interface. The initial checkin should have: * simple schema browsing * query submission * query history (similar to MySQL's SHOW PROCESSLIST) A suggested feature: the ability to have a query notify the user when it's completed. Edward Capriolo has expressed some interest in driving this process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668699#action_12668699 ] Edward Capriolo commented on HIVE-259: -- 95% percentile is very often used in Internet Service Provider billing that might be useful. The percentile calculation is a sort and then picking an element. The syntax could be like: * PERCENTILE(column, .99) * PERCENTILE(column, .50) In this manner you could do any percentile. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-82) Augment build.xml with a target to build the forrest docs and javadocs
[ https://issues.apache.org/jira/browse/HIVE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668700#action_12668700 ] Edward Capriolo commented on HIVE-82: - We can also use the Hive-Web-Interface to display the javadoc. If we create a folder $HIVE_HOME/doc the hive web server can load it as a static context. Augment build.xml with a target to build the forrest docs and javadocs -- Key: HIVE-82 URL: https://issues.apache.org/jira/browse/HIVE-82 Project: Hadoop Hive Issue Type: New Feature Components: Build Infrastructure Reporter: Jeff Hammerbacher See hadoop's build.xml, especially the targets docs and javadoc-dev -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-260) hive cli should not output the line by default
[ https://issues.apache.org/jira/browse/HIVE-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668706#action_12668706 ] Zheng Shao commented on HIVE-260: - No. The number of reducers is not there. hive cli should not output the line by default -- Key: HIVE-260 URL: https://issues.apache.org/jira/browse/HIVE-260 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.2.0 Reporter: Zheng Shao Priority: Blocker This is at the beginning of hive cli output: Hive history file=/tmp/zshao/hive_job_log_zshao_200901291532_-1964746650.txt We should remove it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-262) outer join gets some duplicate rows in some scenarios
outer join gets some duplicate rows in some scenarios - Key: HIVE-262 URL: https://issues.apache.org/jira/browse/HIVE-262 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key 20); returns duplicate rows for outer join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-106) Join operation fails for some queries
[ https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668711#action_12668711 ] Namit Jain commented on HIVE-106: - Josh, can you provide the data files for the tables activities and users which was failing Join operation fails for some queries - Key: HIVE-106 URL: https://issues.apache.org/jira/browse/HIVE-106 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Josh Ferguson Assignee: Namit Jain Priority: Critical The Tables Are CREATE TABLE activities (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; Detailed Table Information: Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null), FieldSchema(name:actee_id,type:string,comment:null), FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), FieldSchema(name:application,type:string,comment:null), FieldSchema(name:dataset,type:string,comment:null), FieldSchema(name:hour,type:int,comment:null)],parameters:{}) CREATE TABLE users (id STRING, properties MAPSTRING, STRING) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; Detailed Table Information: Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null), FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), FieldSchema(name:application,type:string,comment:null), FieldSchema(name:dataset,type:string,comment:null), FieldSchema(name:hour,type:int,comment:null)],parameters:{}) A working query is SELECT activities.* FROM activities WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance'; A non working query is SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance'; The Exception Is java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index expression on string at org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262) at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140) at
[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios
[ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-262: Attachment: patch.262.1.txt outer join gets some duplicate rows in some scenarios - Key: HIVE-262 URL: https://issues.apache.org/jira/browse/HIVE-262 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.2.0 Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.2.0 Attachments: patch.262.1.txt SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key 20); returns duplicate rows for outer join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios
[ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-262: Fix Version/s: 0.2.0 Affects Version/s: 0.2.0 Status: Patch Available (was: Open) outer join gets some duplicate rows in some scenarios - Key: HIVE-262 URL: https://issues.apache.org/jira/browse/HIVE-262 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.2.0 Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.2.0 Attachments: patch.262.1.txt SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key 20); returns duplicate rows for outer join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios
[ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-262: Status: Open (was: Patch Available) outer join gets some duplicate rows in some scenarios - Key: HIVE-262 URL: https://issues.apache.org/jira/browse/HIVE-262 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.2.0 Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.2.0 Attachments: patch.262.1.txt, patch262.2.txt SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key 20); returns duplicate rows for outer join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios
[ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-262: Status: Patch Available (was: Open) outer join gets some duplicate rows in some scenarios - Key: HIVE-262 URL: https://issues.apache.org/jira/browse/HIVE-262 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.2.0 Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.2.0 Attachments: patch.262.1.txt, patch262.2.txt SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key 20); returns duplicate rows for outer join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios
[ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-262: Attachment: patch262.2.txt forgot to update parse result files outer join gets some duplicate rows in some scenarios - Key: HIVE-262 URL: https://issues.apache.org/jira/browse/HIVE-262 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.2.0 Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.2.0 Attachments: patch.262.1.txt, patch262.2.txt SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key 20); returns duplicate rows for outer join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by
[ https://issues.apache.org/jira/browse/HIVE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reassigned HIVE-223: --- Assignee: Namit Jain when using map-side aggregates - perform single map-reduce group-by --- Key: HIVE-223 URL: https://issues.apache.org/jira/browse/HIVE-223 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Namit Jain today even when we do map side aggregates - we do multiple map-reduce jobs. however - the reason for doing multiple map-reduce group-bys (for single group-bys) was the fear of skews. When we are doing map side aggregates - skews should not exist for the most part. There can be two reason for skews: - large number of entries for a single grouping set - map side aggregates should take care of this - badness in hash function that sends too much stuff to one reducer - we should be able to take care of this by having good hash functions (and prime number reducer counts) So i think we should be able to do a single stage map-reduce when doing map-side aggregates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-260) hive cli should not output the line by default
[ https://issues.apache.org/jira/browse/HIVE-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao resolved HIVE-260. - Resolution: Invalid -S option does remove that line. hive cli should not output the line by default -- Key: HIVE-260 URL: https://issues.apache.org/jira/browse/HIVE-260 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.2.0 Reporter: Zheng Shao Priority: Blocker This is at the beginning of hive cli output: Hive history file=/tmp/zshao/hive_job_log_zshao_200901291532_-1964746650.txt We should remove it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.