[jira] Created: (HIVE-856) allow concat to take more than 2 arguments

2009-09-24 Thread S. Alex Smith (JIRA)
allow concat to take more than 2 arguments
--

 Key: HIVE-856
 URL: https://issues.apache.org/jira/browse/HIVE-856
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: S. Alex Smith
Priority: Minor


mysql's concat allows concat('a', 'b', 'c'), but hive's currently will accept 
only two arguments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-797) mappers should report life in ways other than emitting data

2009-08-25 Thread S. Alex Smith (JIRA)
mappers should report life in ways other than emitting data
---

 Key: HIVE-797
 URL: https://issues.apache.org/jira/browse/HIVE-797
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: S. Alex Smith


Mappers which are performing a great deal of aggregation can be killed by time 
out even if they are running successfully.  For example, in the following query 
the group by operator stops the mapper from returning any rows of data until 
the map is entirely finished.  If the data processing takes longer than the 
time-out limit, the job will fail.  The mapper should instead offer the tracker 
some indication that it is busy working.  Alternatively, the tracker could ping 
the mapper with an appropriate question / warning before it sends a kill signal.

FROM (
  FROM my_table
  SELECT TRANSFORM(my_data)
  USING 'my_boolean_function'
  AS boolean_output) a
SELECT boolean_output, COUNT(1)
GROUP BY boolean_output

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-607) Create statistical UDFs.

2009-07-02 Thread S. Alex Smith (JIRA)
Create statistical UDFs.


 Key: HIVE-607
 URL: https://issues.apache.org/jira/browse/HIVE-607
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: S. Alex Smith
Priority: Minor


Create UDFs replicating:
STD()   Return the population standard deviation
STDDEV_POP()(v5.0.3)Return the population standard deviation
STDDEV_SAMP()(v5.0.3)   Return the sample standard deviation
STDDEV()Return the population standard deviation
SUM()   Return the sum
VAR_POP()(v5.0.3)   Return the population standard variance
VAR_SAMP()(v5.0.3)  Return the sample variance
VARIANCE()(v4.1)Return the population standard variance

as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-517) Silent flag does not work on local jobs.

2009-05-26 Thread S. Alex Smith (JIRA)
Silent flag does not work on local jobs.


 Key: HIVE-517
 URL: https://issues.apache.org/jira/browse/HIVE-517
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: S. Alex Smith


The commands
  hive2 -S -e from tmp_foo select count(1)  my_stdout.txt
and
  hive2 -S -hiveconf mapred.job.tracker=local -hiveconf 
mapred.local.dir=/tmp/foo -e from tmp_foo select count(1)  my_stdout.txt
give different results.

The former looks like:
56

and the latter looks like:
plan = /tmp/plan61908.xml
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Job running in-process (local Hadoop)
 map = 100%,  reduce =0%
 map = 100%,  reduce =100%
Ended Job = job_local_1
56


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-484) Add flag to redirect system messages to stderr.

2009-05-13 Thread S. Alex Smith (JIRA)
Add flag to redirect system messages to stderr.
---

 Key: HIVE-484
 URL: https://issues.apache.org/jira/browse/HIVE-484
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Logging
Reporter: S. Alex Smith


(It is possible that this is a Hadoop issue, rather than Hive)

Hive currently puts job-status information, such as the percentage of the job 
completed, to stdout.  This information can be removed with the silent flag 
-S, but it cannot be reasonably redirected.  This means that users who want 
to capture the output of jobs are stuck with either getting status information 
in their capture, or foregoing status entirely.

Both these options are bad.

It would be nice if there were a way to redirect everything which -S silences 
to stderr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-475) Lines exceeding mapred.linerecordreader.maxlength should cause exceptions

2009-05-06 Thread S. Alex Smith (JIRA)
Lines exceeding mapred.linerecordreader.maxlength should cause exceptions
-

 Key: HIVE-475
 URL: https://issues.apache.org/jira/browse/HIVE-475
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: S. Alex Smith


Currently, rows of data that exceed mapred.linerecordreader.maxlength vanish 
silently.  Instead, an option should be added to indicate what to do under this 
circumstance (vanish the entire line, truncate after max length, or fail the 
job), but the default behavior should be job failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-459) Allow limited 'OR' in join clauses

2009-04-29 Thread S. Alex Smith (JIRA)
Allow limited 'OR' in join clauses
--

 Key: HIVE-459
 URL: https://issues.apache.org/jira/browse/HIVE-459
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: S. Alex Smith
Priority: Minor


A (somewhat) common use case for full outer joins is

FROM a
FULL OUTER JOIN b ON (b.foo = a.foo)
FULL OUTER JOIN c ON (c.foo = a.foo OR c.foo = b.foo)

It would be nice if this (or equivalent functionality) could be supported by 
hive.  Note that this is a specific use of OR that would allow rows from all 
three tables to be clustered on the same key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-353) Comments can't have semi-colons

2009-03-17 Thread S. Alex Smith (JIRA)
Comments can't have semi-colons
---

 Key: HIVE-353
 URL: https://issues.apache.org/jira/browse/HIVE-353
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith
Priority: Minor



hive CREATE TABLE tmp_foo(foo DOUBLE COMMENT ';');
FAILED: Parse Error: line 2:7 mismatched input 'TABLE' expecting TEMPORARY in 
create function statement

hive CREATE TABLE tmp_foo(foo DOUBLE);
OK


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-351) Dividing by zero raises an exception.

2009-03-16 Thread S. Alex Smith (JIRA)
Dividing by zero raises an exception.
-

 Key: HIVE-351
 URL: https://issues.apache.org/jira/browse/HIVE-351
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: S. Alex Smith


Can I just have a NULL instead?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-297) Parses doesn't catch certain type errors.

2009-02-22 Thread S. Alex Smith (JIRA)
Parses doesn't catch certain type errors.
-

 Key: HIVE-297
 URL: https://issues.apache.org/jira/browse/HIVE-297
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith


If table_a and table_c have schemas:
 userid bigint

and table_b has a schema:
  userid string

Then the following with make it through the parser, but will fail when running:

FROM (
FROM table_a
SELECT userid
UNION ALL
FROM table_b
SELECT userid) unioned
INSERT OVERWRITE TABLE table_c
SELECT *;

Specifically, the map step with throw:
java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException: 
java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String

I have interpreted this as a bug in the parser, but it could also be viewed as 
a bug about not auto-casting.

Note that this can be worked around by using explicit CAST statements.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-297) Parses doesn't catch certain type errors.

2009-02-22 Thread S. Alex Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

S. Alex Smith updated HIVE-297:
---

Description: 
The following query:

FROM (
FROM (FROM my_table
 SELECT CAST(userid AS BIGINT) AS userid a
   SELECT userid
   UNION ALL
   FROM (FROM my_table
 SELECT CAST(userid AS STRING) AS userid) b
   SELECT userid
   ) unioned
   SELECT DISTINCT userid;

Is accepted by the parse, but throws the following at run-time:
java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException: 
java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String

(Note that this seems less silly if the inner queries are different tables with 
userid stored as a bigint and a string, respectively)

I have interpreted this as a bug in the parser, but it could also be viewed as 
a bug about not auto-casting.

This can be worked around by using explicit CAST statements.

  was:
If table_a and table_c have schemas:
 userid bigint

and table_b has a schema:
  userid string

Then the following with make it through the parser, but will fail when running:

FROM (
FROM table_a
SELECT userid
UNION ALL
FROM table_b
SELECT userid) unioned
INSERT OVERWRITE TABLE table_c
SELECT *;

Specifically, the map step with throw:
java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException: 
java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String

I have interpreted this as a bug in the parser, but it could also be viewed as 
a bug about not auto-casting.

Note that this can be worked around by using explicit CAST statements.


Correcting example to one which actually exhibits the problem.

 Parses doesn't catch certain type errors.
 -

 Key: HIVE-297
 URL: https://issues.apache.org/jira/browse/HIVE-297
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith

 The following query:
 FROM (
 FROM (FROM my_table
  SELECT CAST(userid AS BIGINT) AS userid a
SELECT userid
UNION ALL
FROM (FROM my_table
  SELECT CAST(userid AS STRING) AS userid) b
SELECT userid
) unioned
SELECT DISTINCT userid;
 Is accepted by the parse, but throws the following at run-time:
 java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException: 
 java.lang.ClassCastException: java.lang.Long cannot be cast to 
 java.lang.String
 (Note that this seems less silly if the inner queries are different tables 
 with userid stored as a bigint and a string, respectively)
 I have interpreted this as a bug in the parser, but it could also be viewed 
 as a bug about not auto-casting.
 This can be worked around by using explicit CAST statements.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-289) superfluous parenthesis break queries

2009-02-13 Thread S. Alex Smith (JIRA)
superfluous parenthesis break queries
-

 Key: HIVE-289
 URL: https://issues.apache.org/jira/browse/HIVE-289
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith


Queries likefrom (my_table) select *; break.

This has a deleterious effect on automated query construction.  In particular, 
the following works well:
FROM %s do something complicated % my_table
FROM %s do something complicated % (FROM my_other_table SELECT my_row 
WHERE my_table.ds='2009-02-13')

but if extra parenthesis were correctly ignored, the following would also work:

FROM (%s) do something complicated % my_table
FROM (%s) do something complicated % FROM my_other_table SELECT my_row 
WHERE my_table.ds='2009-02-13'

which would allow a much more convenient and generic nesting of queries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-155) parenthesis should not mess up a group by

2009-02-13 Thread S. Alex Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

S. Alex Smith resolved HIVE-155.


Resolution: Duplicate

This is subsumed by a general parenthesis issue I just submitted.

 parenthesis should not mess up a group by
 -

 Key: HIVE-155
 URL: https://issues.apache.org/jira/browse/HIVE-155
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: S. Alex Smith
Priority: Minor

 Using parenthesis breaks group by statements, in the sense that:
 ... GROUP BY (mytable.a, mytable.b)
 results in a parse error:
 FAILED: Parse Error: line 1:136 mismatched input ',' expecting )
 This should be made to work equivalently to
 ... GROUP BY mytable.a, mytable.b

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-275) hive -e 'query' writes non-output to stdout

2009-02-05 Thread S. Alex Smith (JIRA)
hive -e 'query' writes non-output to stdout
---

 Key: HIVE-275
 URL: https://issues.apache.org/jira/browse/HIVE-275
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: S. Alex Smith


A command like: hive -e 'select * from my_table' produces output like:
Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt
OK
1   [4, 4]
1   [0, 1]
0   [0, 0]
0   [1, 0]
Time taken: 2.413 seconds

where all seven lines go to stdout, instead of lines 1, 2, and 7 going to 
stderr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-275) split log output from hive -e 'query' to something other than stdout

2009-02-05 Thread S. Alex Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

S. Alex Smith updated HIVE-275:
---

   Priority: Minor  (was: Major)
Description: 
A command like: hive -e 'select * from my_table' produces output like:
Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt
OK
1   [4, 4]
1   [0, 1]
0   [0, 0]
0   [1, 0]
Time taken: 2.413 seconds

all of which goes to stdout.  The non-data messages can be removed using '-s', 
but it would be nice to have a way to instead redirect them to (for example) 
stderr.

  was:
A command like: hive -e 'select * from my_table' produces output like:
Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt
OK
1   [4, 4]
1   [0, 1]
0   [0, 0]
0   [1, 0]
Time taken: 2.413 seconds

where all seven lines go to stdout, instead of lines 1, 2, and 7 going to 
stderr.

 Issue Type: Improvement  (was: Bug)
Summary: split log output from hive -e 'query' to something other 
than stdout  (was: hive -e 'query' writes non-output to stdout)

 split log output from hive -e 'query' to something other than stdout
 

 Key: HIVE-275
 URL: https://issues.apache.org/jira/browse/HIVE-275
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: S. Alex Smith
Priority: Minor

 A command like: hive -e 'select * from my_table' produces output like:
 Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt
 OK
 1   [4, 4]
 1   [0, 1]
 0   [0, 0]
 0   [1, 0]
 Time taken: 2.413 seconds
 all of which goes to stdout.  The non-data messages can be removed using 
 '-s', but it would be nice to have a way to instead redirect them to (for 
 example) stderr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-275) split log output from hive -e 'query' to something other than stdout

2009-02-05 Thread S. Alex Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12670976#action_12670976
 ] 

S. Alex Smith commented on HIVE-275:


I see my update was out of sync with your comments.

Yeah, -S seems to be what I want, although it would be nice to be able to get 
the log messages without having them pollute the output (as now requested).  
Certainly not vital.

 split log output from hive -e 'query' to something other than stdout
 

 Key: HIVE-275
 URL: https://issues.apache.org/jira/browse/HIVE-275
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Reporter: S. Alex Smith
Priority: Minor

 A command like: hive -e 'select * from my_table' produces output like:
 Hive history file=/tmp/asmith/hive_job_log_asmith_200902051423_1455103524.txt
 OK
 1   [4, 4]
 1   [0, 1]
 0   [0, 0]
 0   [1, 0]
 Time taken: 2.413 seconds
 all of which goes to stdout.  The non-data messages can be removed using 
 '-s', but it would be nice to have a way to instead redirect them to (for 
 example) stderr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-251) Failures in Transform don't stop the job

2009-01-26 Thread S. Alex Smith (JIRA)
Failures in Transform don't stop the job


 Key: HIVE-251
 URL: https://issues.apache.org/jira/browse/HIVE-251
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: S. Alex Smith


If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a 
non-zero exit status, Hive proceeds as if nothing bad happened.  The main way 
that the user knows something bad has happened is if the user checks the logs 
(probably because he got no output).  This is doubly bad if the program only 
fails part of the time (say, on certain inputs) since the job will still 
produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-221) select * fails on subqueries without alias

2009-01-09 Thread S. Alex Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

S. Alex Smith updated HIVE-221:
---

Summary: select * fails on subqueries without alias  (was: select * fails 
on tables without alias)

 select * fails on subqueries without alias
 --

 Key: HIVE-221
 URL: https://issues.apache.org/jira/browse/HIVE-221
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith
Priority: Minor

 The following query fails:
 SELECT * FROM (SELECT * FROM tmp_asmith);  
 Additionally, although the error message is frequently ... expected 
 Identifies, in certain more complicated expressions this is not the case.  
 For example, in
 SELECT * FROM (SELECT * FROM tmp_asmith UNION ALL SELECT * FROM (SELECT * 
 FROM tmp_asmith)) t;  
 the error message is FAILED: Parse Error: line 1:41 mismatched input 'UNION' 
 expecting )
 (but works fine if (SELECT * FROM tmp_asmith) is given an alias)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-221) select * fails on tables without alias

2009-01-09 Thread S. Alex Smith (JIRA)
select * fails on tables without alias
--

 Key: HIVE-221
 URL: https://issues.apache.org/jira/browse/HIVE-221
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith
Priority: Minor


The following query fails:
SELECT * FROM (SELECT * FROM tmp_asmith);  

Additionally, although the error message is frequently ... expected 
Identifies, in certain more complicated expressions this is not the case.  For 
example, in

SELECT * FROM (SELECT * FROM tmp_asmith UNION ALL SELECT * FROM (SELECT * FROM 
tmp_asmith)) t;  

the error message is FAILED: Parse Error: line 1:41 mismatched input 'UNION' 
expecting )
(but works fine if (SELECT * FROM tmp_asmith) is given an alias)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-198) Parse errors report incorrectly.

2008-12-25 Thread S. Alex Smith (JIRA)
Parse errors report incorrectly.


 Key: HIVE-198
 URL: https://issues.apache.org/jira/browse/HIVE-198
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith


The following two queries fail:
CREATE TABLE output_table(userid, bigint);
CREATE TABLE output_table(userid bigint, age int, sex string, location string);
each giving the error message FAILED: Parse Error: line 1:16 mismatched input 
'TABLE' expecting KW_TEMPORARY

Although one might not catch it from the error message, the problem with the 
first is that there is a comma between userid and bigint, and the problem 
with the second is that location is a reserved keyword.  Reported errors 
should more accurately describe the nature of the error, such as no type given 
for column 'userid' or 'location' is not a valid column name.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-156) Allow != in place of

2008-12-10 Thread S. Alex Smith (JIRA)
Allow != in place of 
---

 Key: HIVE-156
 URL: https://issues.apache.org/jira/browse/HIVE-156
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: S. Alex Smith
Priority: Trivial


I'm used to using != for inequality.  It would be nice if Hive supported this 
as an alternative to .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.