[jira] Created: (HIVE-856) allow concat to take more than 2 arguments
allow concat to take more than 2 arguments -- Key: HIVE-856 URL: https://issues.apache.org/jira/browse/HIVE-856 Project: Hadoop Hive Issue Type: Improvement Reporter: S. Alex Smith Priority: Minor mysql's concat allows concat('a', 'b', 'c'), but hive's currently will accept only two arguments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-797) mappers should report life in ways other than emitting data
mappers should report life in ways other than emitting data --- Key: HIVE-797 URL: https://issues.apache.org/jira/browse/HIVE-797 Project: Hadoop Hive Issue Type: Bug Reporter: S. Alex Smith Mappers which are performing a great deal of aggregation can be killed by time out even if they are running successfully. For example, in the following query the group by operator stops the mapper from returning any rows of data until the map is entirely finished. If the data processing takes longer than the time-out limit, the job will fail. The mapper should instead offer the tracker some indication that it is busy working. Alternatively, the tracker could ping the mapper with an appropriate question / warning before it sends a kill signal. FROM ( FROM my_table SELECT TRANSFORM(my_data) USING 'my_boolean_function' AS boolean_output) a SELECT boolean_output, COUNT(1) GROUP BY boolean_output -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-607) Create statistical UDFs.
Create statistical UDFs. Key: HIVE-607 URL: https://issues.apache.org/jira/browse/HIVE-607 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: S. Alex Smith Priority: Minor Create UDFs replicating: STD() Return the population standard deviation STDDEV_POP()(v5.0.3)Return the population standard deviation STDDEV_SAMP()(v5.0.3) Return the sample standard deviation STDDEV()Return the population standard deviation SUM() Return the sum VAR_POP()(v5.0.3) Return the population standard variance VAR_SAMP()(v5.0.3) Return the sample variance VARIANCE()(v4.1)Return the population standard variance as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-517) Silent flag does not work on local jobs.
Silent flag does not work on local jobs. Key: HIVE-517 URL: https://issues.apache.org/jira/browse/HIVE-517 Project: Hadoop Hive Issue Type: Bug Components: Clients Reporter: S. Alex Smith The commands hive2 -S -e from tmp_foo select count(1) my_stdout.txt and hive2 -S -hiveconf mapred.job.tracker=local -hiveconf mapred.local.dir=/tmp/foo -e from tmp_foo select count(1) my_stdout.txt give different results. The former looks like: 56 and the latter looks like: plan = /tmp/plan61908.xml Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Job running in-process (local Hadoop) map = 100%, reduce =0% map = 100%, reduce =100% Ended Job = job_local_1 56 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-484) Add flag to redirect system messages to stderr.
Add flag to redirect system messages to stderr. --- Key: HIVE-484 URL: https://issues.apache.org/jira/browse/HIVE-484 Project: Hadoop Hive Issue Type: Improvement Components: Logging Reporter: S. Alex Smith (It is possible that this is a Hadoop issue, rather than Hive) Hive currently puts job-status information, such as the percentage of the job completed, to stdout. This information can be removed with the silent flag -S, but it cannot be reasonably redirected. This means that users who want to capture the output of jobs are stuck with either getting status information in their capture, or foregoing status entirely. Both these options are bad. It would be nice if there were a way to redirect everything which -S silences to stderr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-475) Lines exceeding mapred.linerecordreader.maxlength should cause exceptions
Lines exceeding mapred.linerecordreader.maxlength should cause exceptions - Key: HIVE-475 URL: https://issues.apache.org/jira/browse/HIVE-475 Project: Hadoop Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: S. Alex Smith Currently, rows of data that exceed mapred.linerecordreader.maxlength vanish silently. Instead, an option should be added to indicate what to do under this circumstance (vanish the entire line, truncate after max length, or fail the job), but the default behavior should be job failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-459) Allow limited 'OR' in join clauses
Allow limited 'OR' in join clauses -- Key: HIVE-459 URL: https://issues.apache.org/jira/browse/HIVE-459 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: S. Alex Smith Priority: Minor A (somewhat) common use case for full outer joins is FROM a FULL OUTER JOIN b ON (b.foo = a.foo) FULL OUTER JOIN c ON (c.foo = a.foo OR c.foo = b.foo) It would be nice if this (or equivalent functionality) could be supported by hive. Note that this is a specific use of OR that would allow rows from all three tables to be clustered on the same key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-353) Comments can't have semi-colons
Comments can't have semi-colons --- Key: HIVE-353 URL: https://issues.apache.org/jira/browse/HIVE-353 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith Priority: Minor hive CREATE TABLE tmp_foo(foo DOUBLE COMMENT ';'); FAILED: Parse Error: line 2:7 mismatched input 'TABLE' expecting TEMPORARY in create function statement hive CREATE TABLE tmp_foo(foo DOUBLE); OK -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-351) Dividing by zero raises an exception.
Dividing by zero raises an exception. - Key: HIVE-351 URL: https://issues.apache.org/jira/browse/HIVE-351 Project: Hadoop Hive Issue Type: Bug Reporter: S. Alex Smith Can I just have a NULL instead? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-297) Parses doesn't catch certain type errors.
Parses doesn't catch certain type errors. - Key: HIVE-297 URL: https://issues.apache.org/jira/browse/HIVE-297 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith If table_a and table_c have schemas: userid bigint and table_b has a schema: userid string Then the following with make it through the parser, but will fail when running: FROM ( FROM table_a SELECT userid UNION ALL FROM table_b SELECT userid) unioned INSERT OVERWRITE TABLE table_c SELECT *; Specifically, the map step with throw: java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String I have interpreted this as a bug in the parser, but it could also be viewed as a bug about not auto-casting. Note that this can be worked around by using explicit CAST statements. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-297) Parses doesn't catch certain type errors.
[ https://issues.apache.org/jira/browse/HIVE-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] S. Alex Smith updated HIVE-297: --- Description: The following query: FROM ( FROM (FROM my_table SELECT CAST(userid AS BIGINT) AS userid a SELECT userid UNION ALL FROM (FROM my_table SELECT CAST(userid AS STRING) AS userid) b SELECT userid ) unioned SELECT DISTINCT userid; Is accepted by the parse, but throws the following at run-time: java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String (Note that this seems less silly if the inner queries are different tables with userid stored as a bigint and a string, respectively) I have interpreted this as a bug in the parser, but it could also be viewed as a bug about not auto-casting. This can be worked around by using explicit CAST statements. was: If table_a and table_c have schemas: userid bigint and table_b has a schema: userid string Then the following with make it through the parser, but will fail when running: FROM ( FROM table_a SELECT userid UNION ALL FROM table_b SELECT userid) unioned INSERT OVERWRITE TABLE table_c SELECT *; Specifically, the map step with throw: java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String I have interpreted this as a bug in the parser, but it could also be viewed as a bug about not auto-casting. Note that this can be worked around by using explicit CAST statements. Correcting example to one which actually exhibits the problem. Parses doesn't catch certain type errors. - Key: HIVE-297 URL: https://issues.apache.org/jira/browse/HIVE-297 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith The following query: FROM ( FROM (FROM my_table SELECT CAST(userid AS BIGINT) AS userid a SELECT userid UNION ALL FROM (FROM my_table SELECT CAST(userid AS STRING) AS userid) b SELECT userid ) unioned SELECT DISTINCT userid; Is accepted by the parse, but throws the following at run-time: java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String (Note that this seems less silly if the inner queries are different tables with userid stored as a bigint and a string, respectively) I have interpreted this as a bug in the parser, but it could also be viewed as a bug about not auto-casting. This can be worked around by using explicit CAST statements. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-289) superfluous parenthesis break queries
superfluous parenthesis break queries - Key: HIVE-289 URL: https://issues.apache.org/jira/browse/HIVE-289 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith Queries likefrom (my_table) select *; break. This has a deleterious effect on automated query construction. In particular, the following works well: FROM %s do something complicated % my_table FROM %s do something complicated % (FROM my_other_table SELECT my_row WHERE my_table.ds='2009-02-13') but if extra parenthesis were correctly ignored, the following would also work: FROM (%s) do something complicated % my_table FROM (%s) do something complicated % FROM my_other_table SELECT my_row WHERE my_table.ds='2009-02-13' which would allow a much more convenient and generic nesting of queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-155) parenthesis should not mess up a group by
[ https://issues.apache.org/jira/browse/HIVE-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] S. Alex Smith resolved HIVE-155. Resolution: Duplicate This is subsumed by a general parenthesis issue I just submitted. parenthesis should not mess up a group by - Key: HIVE-155 URL: https://issues.apache.org/jira/browse/HIVE-155 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: S. Alex Smith Priority: Minor Using parenthesis breaks group by statements, in the sense that: ... GROUP BY (mytable.a, mytable.b) results in a parse error: FAILED: Parse Error: line 1:136 mismatched input ',' expecting ) This should be made to work equivalently to ... GROUP BY mytable.a, mytable.b -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-275) hive -e 'query' writes non-output to stdout
hive -e 'query' writes non-output to stdout --- Key: HIVE-275 URL: https://issues.apache.org/jira/browse/HIVE-275 Project: Hadoop Hive Issue Type: Bug Components: Clients Reporter: S. Alex Smith A command like: hive -e 'select * from my_table' produces output like: Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt OK 1 [4, 4] 1 [0, 1] 0 [0, 0] 0 [1, 0] Time taken: 2.413 seconds where all seven lines go to stdout, instead of lines 1, 2, and 7 going to stderr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-275) split log output from hive -e 'query' to something other than stdout
[ https://issues.apache.org/jira/browse/HIVE-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] S. Alex Smith updated HIVE-275: --- Priority: Minor (was: Major) Description: A command like: hive -e 'select * from my_table' produces output like: Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt OK 1 [4, 4] 1 [0, 1] 0 [0, 0] 0 [1, 0] Time taken: 2.413 seconds all of which goes to stdout. The non-data messages can be removed using '-s', but it would be nice to have a way to instead redirect them to (for example) stderr. was: A command like: hive -e 'select * from my_table' produces output like: Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt OK 1 [4, 4] 1 [0, 1] 0 [0, 0] 0 [1, 0] Time taken: 2.413 seconds where all seven lines go to stdout, instead of lines 1, 2, and 7 going to stderr. Issue Type: Improvement (was: Bug) Summary: split log output from hive -e 'query' to something other than stdout (was: hive -e 'query' writes non-output to stdout) split log output from hive -e 'query' to something other than stdout Key: HIVE-275 URL: https://issues.apache.org/jira/browse/HIVE-275 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: S. Alex Smith Priority: Minor A command like: hive -e 'select * from my_table' produces output like: Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt OK 1 [4, 4] 1 [0, 1] 0 [0, 0] 0 [1, 0] Time taken: 2.413 seconds all of which goes to stdout. The non-data messages can be removed using '-s', but it would be nice to have a way to instead redirect them to (for example) stderr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-275) split log output from hive -e 'query' to something other than stdout
[ https://issues.apache.org/jira/browse/HIVE-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12670976#action_12670976 ] S. Alex Smith commented on HIVE-275: I see my update was out of sync with your comments. Yeah, -S seems to be what I want, although it would be nice to be able to get the log messages without having them pollute the output (as now requested). Certainly not vital. split log output from hive -e 'query' to something other than stdout Key: HIVE-275 URL: https://issues.apache.org/jira/browse/HIVE-275 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: S. Alex Smith Priority: Minor A command like: hive -e 'select * from my_table' produces output like: Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt OK 1 [4, 4] 1 [0, 1] 0 [0, 0] 0 [1, 0] Time taken: 2.413 seconds all of which goes to stdout. The non-data messages can be removed using '-s', but it would be nice to have a way to instead redirect them to (for example) stderr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-251) Failures in Transform don't stop the job
Failures in Transform don't stop the job Key: HIVE-251 URL: https://issues.apache.org/jira/browse/HIVE-251 Project: Hadoop Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: S. Alex Smith If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened. The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output). This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-221) select * fails on subqueries without alias
[ https://issues.apache.org/jira/browse/HIVE-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] S. Alex Smith updated HIVE-221: --- Summary: select * fails on subqueries without alias (was: select * fails on tables without alias) select * fails on subqueries without alias -- Key: HIVE-221 URL: https://issues.apache.org/jira/browse/HIVE-221 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith Priority: Minor The following query fails: SELECT * FROM (SELECT * FROM tmp_asmith); Additionally, although the error message is frequently ... expected Identifies, in certain more complicated expressions this is not the case. For example, in SELECT * FROM (SELECT * FROM tmp_asmith UNION ALL SELECT * FROM (SELECT * FROM tmp_asmith)) t; the error message is FAILED: Parse Error: line 1:41 mismatched input 'UNION' expecting ) (but works fine if (SELECT * FROM tmp_asmith) is given an alias) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-221) select * fails on tables without alias
select * fails on tables without alias -- Key: HIVE-221 URL: https://issues.apache.org/jira/browse/HIVE-221 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith Priority: Minor The following query fails: SELECT * FROM (SELECT * FROM tmp_asmith); Additionally, although the error message is frequently ... expected Identifies, in certain more complicated expressions this is not the case. For example, in SELECT * FROM (SELECT * FROM tmp_asmith UNION ALL SELECT * FROM (SELECT * FROM tmp_asmith)) t; the error message is FAILED: Parse Error: line 1:41 mismatched input 'UNION' expecting ) (but works fine if (SELECT * FROM tmp_asmith) is given an alias) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-198) Parse errors report incorrectly.
Parse errors report incorrectly. Key: HIVE-198 URL: https://issues.apache.org/jira/browse/HIVE-198 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: S. Alex Smith The following two queries fail: CREATE TABLE output_table(userid, bigint); CREATE TABLE output_table(userid bigint, age int, sex string, location string); each giving the error message FAILED: Parse Error: line 1:16 mismatched input 'TABLE' expecting KW_TEMPORARY Although one might not catch it from the error message, the problem with the first is that there is a comma between userid and bigint, and the problem with the second is that location is a reserved keyword. Reported errors should more accurately describe the nature of the error, such as no type given for column 'userid' or 'location' is not a valid column name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-156) Allow != in place of
Allow != in place of --- Key: HIVE-156 URL: https://issues.apache.org/jira/browse/HIVE-156 Project: Hadoop Hive Issue Type: Improvement Reporter: S. Alex Smith Priority: Trivial I'm used to using != for inequality. It would be nice if Hive supported this as an alternative to . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.