date:20121005

[jira] [Created] (HIVE-3543) A hive-builtins snapshot is required on the classpath of generated eclipse files

2012-10-05 Thread Harsh J (JIRA)

Harsh J created HIVE-3543:
-

 Summary: A hive-builtins snapshot is required on the classpath of 
generated eclipse files
 Key: HIVE-3543
 URL: https://issues.apache.org/jira/browse/HIVE-3543
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Harsh J


We shouldn't rely on presence of a jar of our own project to pre-exist when 
generating the eclipse files, like so:

{code}

{code}

Does the src on classpath for builtins not suffice instead? This one's presence 
makes one have to run {{ant jar eclipse-files}} instead of the simple {{ant 
compile eclipse-files}}. If not source paths, lets at least consider adding a 
classes/ directory instead of expecting a jar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3536) Output of sort merge join is no longer bucketed

2012-10-05 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3536:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Kevin

> Output of sort merge join is no longer bucketed
> ---
>
> Key: HIVE-3536
> URL: https://issues.apache.org/jira/browse/HIVE-3536
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3536.1.patch.txt, HIVE-3536.2.patch.txt, 
> HIVE-3536.3.patch.txt, HIVE-3536.D5907.2.patch, HIVE-3536.D5907.3.patch
>
>
> I don't know if this was a feature or a happy coincidence, but before 
> HIVE-3230, the output of a sort merge join on two partitions would be 
> bucketed, even if hive.enforce.bucketing was set to false.  This could 
> potentially save a reduce phase when inserting into a bucketed table.
> This would be good to have back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2228) Can't use DB qualified column names in WHERE or GROUP BY clauses

2012-10-05 Thread Zhenxiao Luo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470919#comment-13470919
 ] 

Zhenxiao Luo commented on HIVE-2228:


@Namit: Thanks for the comment.
You are correct, none of the other clauses working: SORT BY, ORDER BY, CLUSTER 
BY, and DISTRIBUTE BY.

I filed HIVE-3542 and link with this ticket.

> Can't use DB qualified column names in WHERE or GROUP BY clauses
> 
>
> Key: HIVE-2228
> URL: https://issues.apache.org/jira/browse/HIVE-2228
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, Query Processor, SQL
>Affects Versions: 0.7.0
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: HIVE-2228.1.patch.txt
>
>
> Hive doesn't allow you to use DB qualified column names in the WHERE or GROUP 
> BY clauses. The workaround is to define a table alias:
> {noformat}
> hive> CREATE DATABASE db1;
> OK
> hive> CREATE TABLE db1.t(a INT, b INT);
> OK
> hive> SELECT * FROM db1.t WHERE db1.t.a > 100;
> FAILED: Error in semantic analysis: Line 1:26 Invalid table alias or column 
> reference 'db1'
> hive> SELECT * FROM db1.t t WHERE t.a > 100;
> OK
> hive> SELECT * FROM db1.t GROUP BY db1.t.a;
> FAILED: Error in semantic analysis: Line 1:29 Invalid table alias or column 
> reference 'db1'
> hive> SELECT * FROM db1.t t GROUP BY t.a;
> OK
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3542) Can not use DB Qualified Name in Order By, Sort By, Distribute By, and Cluster By

2012-10-05 Thread Zhenxiao Luo (JIRA)

Zhenxiao Luo created HIVE-3542:
--

 Summary: Can not use DB Qualified Name in Order By, Sort By, 
Distribute By, and Cluster By
 Key: HIVE-3542
 URL: https://issues.apache.org/jira/browse/HIVE-3542
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


CREATE DATABASE db1;

CREATE TABLE db1.t(a INT, b INT);

SELECT * FROM db1.t ORDER BY db1.t.a;
FAILED: SemanticException [Error 10004]: Line 3:29 Invalid table alias or 
column reference 'db1': (possible column names are: a, b)

SELECT * FROM db1.t SORT BY db1.t.a;
FAILED: SemanticException [Error 10004]: Line 3:28 Invalid table alias or 
column reference 'db1': (possible column names are: a, b)

SELECT * FROM db1.t CLUSTER BY db1.t.a;
FAILED: SemanticException [Error 10004]: Line 3:31 Invalid table alias or 
column reference 'db1': (possible column names are: a, b)

SELECT * FROM db1.t DISTRIBUTE BY db1.t.a;
FAILED: SemanticException [Error 10004]: Line 3:34 Invalid table alias or 
column reference 'db1': (possible column names are: a, b)

alias is working OK:
SELECT * FROM db1.t t ORDER BY t.a;
OK
SELECT * FROM db1.t t SORT BY t.a;
OK
SELECT * FROM db1.t t CLUSTER BY t.a;
OK
SELECT * FROM db1.t t DISTRIBUTE BY t.a;
OK





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name

2012-10-05 Thread Zhenxiao Luo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-1977:
---

Attachment: HIVE-1977.4.patch.txt

> DESCRIBE TABLE syntax doesn't support specifying a database qualified table 
> name
> 
>
> Key: HIVE-1977
> URL: https://issues.apache.org/jira/browse/HIVE-1977
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, Query Processor, SQL
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt, 
> HIVE-1977.3.patch.txt, HIVE-1977.4.patch.txt
>
>
> The syntax for DESCRIBE is broken. It should be:
> {code}
> DESCRIBE [EXTENDED] [database DOT]table [column]
> {code}
> but is actually
> {code}
> DESCRIBE [EXTENDED] table[DOT col_name]
> {code}
> Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name

2012-10-05 Thread Zhenxiao Luo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-1977:
---

Status: Patch Available  (was: Open)

> DESCRIBE TABLE syntax doesn't support specifying a database qualified table 
> name
> 
>
> Key: HIVE-1977
> URL: https://issues.apache.org/jira/browse/HIVE-1977
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, Query Processor, SQL
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt, 
> HIVE-1977.3.patch.txt, HIVE-1977.4.patch.txt
>
>
> The syntax for DESCRIBE is broken. It should be:
> {code}
> DESCRIBE [EXTENDED] [database DOT]table [column]
> {code}
> but is actually
> {code}
> DESCRIBE [EXTENDED] table[DOT col_name]
> {code}
> Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name

2012-10-05 Thread Zhenxiao Luo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470911#comment-13470911
 ] 

Zhenxiao Luo commented on HIVE-1977:


Comments addressed and review request submitted at:
https://reviews.facebook.net/D5763

> DESCRIBE TABLE syntax doesn't support specifying a database qualified table 
> name
> 
>
> Key: HIVE-1977
> URL: https://issues.apache.org/jira/browse/HIVE-1977
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, Query Processor, SQL
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt, 
> HIVE-1977.3.patch.txt, HIVE-1977.4.patch.txt
>
>
> The syntax for DESCRIBE is broken. It should be:
> {code}
> DESCRIBE [EXTENDED] [database DOT]table [column]
> {code}
> but is actually
> {code}
> DESCRIBE [EXTENDED] table[DOT col_name]
> {code}
> Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3541) Allow keeping the bucket order while streaming bucketed table

2012-10-05 Thread Kevin Wilfong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470851#comment-13470851
 ] 

Kevin Wilfong commented on HIVE-3541:
-

It would be good if the bucketing was maintained even in the face of selects, 
filters, and other operators through which the values of the columns the table 
is bucketed on pass through unmodified.

> Allow keeping the bucket order while streaming bucketed table
> -
>
> Key: HIVE-3541
> URL: https://issues.apache.org/jira/browse/HIVE-3541
> Project: Hive
>  Issue Type: Improvement
>Reporter: Igor Kabiljo
>Priority: Minor
>
> If we have a bucketed table, for example table_a with columns col_key and 
> col_value (bucketed on col_key), and we need to create new derived bucketed 
> table (by for example SELECT col_key, col_value*2 FROM table a), it would be 
> fastest if it can be done in single streaming map-only job. 
> With specifying:
> SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> we can make sure that each input bucket will be read by exactly one mapper, 
> and that they will output exactly one file. With:
> SET hive.merge.mapfiles = false;
> SET hive.merge.mapredfiles = false;
> SET hive.enforce.bucketing = false;
> We can make sure those files are inserted as is into the output table. 
> But with that - bucket order is not kept, so end table is not bucketed 
> correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3541) Allow keeping the bucket order while streaming bucketed table

2012-10-05 Thread Igor Kabiljo (JIRA)

Igor Kabiljo created HIVE-3541:
--

 Summary: Allow keeping the bucket order while streaming bucketed 
table
 Key: HIVE-3541
 URL: https://issues.apache.org/jira/browse/HIVE-3541
 Project: Hive
  Issue Type: Improvement
Reporter: Igor Kabiljo
Priority: Minor


If we have a bucketed table, for example table_a with columns col_key and 
col_value (bucketed on col_key), and we need to create new derived bucketed 
table (by for example SELECT col_key, col_value*2 FROM table a), it would be 
fastest if it can be done in single streaming map-only job. 

With specifying:
SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
we can make sure that each input bucket will be read by exactly one mapper, and 
that they will output exactly one file. With:
SET hive.merge.mapfiles = false;
SET hive.merge.mapredfiles = false;
SET hive.enforce.bucketing = false;
We can make sure those files are inserted as is into the output table. 
But with that - bucket order is not kept, so end table is not bucketed 
correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1362) column level statistics

2012-10-05 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1362:
-

Status: Open  (was: Patch Available)

@Shreepadma: More comments on RB. Thanks.

> column level statistics
> ---
>
> Key: HIVE-1362
> URL: https://issues.apache.org/jira/browse/HIVE-1362
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Ning Zhang
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
> HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
> HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
> HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: HIVE-1362: Support for column statistics in Hive

2012-10-05 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6878/#review12203
---



ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java


It looks like most of the classes that extend BaseSemanticAnalyzer are 
overriding init() with a NoOp method. If that's the case maybe it would make 
more sense to provide the NoOp method here instead of making it abstract.



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java


Please change this back. Static imports decrease noise, and it's easy to 
figure out where a token is defined using the navigation/lookup features 
provided by modern IDEs.



ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java


Please change this back.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java


s/setups/sets up/



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java


StatsSemanticAnalyzer should catch these exceptions and convert them to the 
propert SemanticException before rethrowing them.



ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java


Might be a good idea to add a comment explaining that the table stats are 
implemented elsewhere.



ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java


s/Lvl/Level/



ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java


Formatting



ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java


Formatting



ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java


Formatting



ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java


Please use a StringBuilder instead of doing lots of String concatenation.



ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java


Formatting



ql/src/java/org/apache/hadoop/hive/ql/parse/StatsSemanticAnalyzer.java


StringBuilder.


- Carl Steinbach


On Oct. 4, 2012, 5:45 a.m., Shreepadma Venugopalan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6878/
> ---
> 
> (Updated Oct. 4, 2012, 5:45 a.m.)
> 
> 
> Review request for hive and Carl Steinbach.
> 
> 
> Description
> ---
> 
> This patch implements version 1 of the column statistics project in Hive. It 
> adds support for computing and persisting statistical summary of column 
> values in Hive Tables and Partitions. In order to support column statistics 
> in Hive, this patch does the following,
> 
> * Adds a new compute stats UDAF to compute scalar statistics for all 
> primitive Hive data types. In version 1 of the project, we support the 
> following scalar statistics on primitive types - estimate of number of 
> distinct values, number of null values, number of trues/falses for boolean 
> typed columsn, max and avg length for string and binary typed columns, max 
> and min value for long and double typed columns. Note that version 1 of the 
> column stats project includes support for column statistics both at the table 
> and partition level.
> 
> * Adds Metastore schema tables to persist the newly added statistics both at 
> table and partition level.
> * Adds Metastore Thrift API to persist, retrieve and delete column statistics 
> at both table and partition level. 
> Please refer to the following wiki link for the details of the schema and the 
> Thrift API changes - 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
> 
> * Extends the analyze table compute statistics statement to trigger 
> statistics computation and persistence for one or more columns. Please note 
> that statistics for multiple columns is computed through a single scan of the 
> table data. Please refer to the following wiki link for the syntax changes - 
> https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
> 
> One thing missing from the patch at this point is the metastore upgrade 
> scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to 
> finalize the metastore schema changes before I go ahead and ad

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2012-10-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470766#comment-13470766
 ] 

Ashutosh Chauhan commented on HIVE-2935:


@Carl: One of the concern which folks have with hive server (HS1) is how well 
it can handle concurrency. Can you explain how that concern is taken care of in 
HS2? In the course of development and testing of HS2 did you discover any 
concurrency bugs? It will be good to spell those concurrency bugs out and fixes 
that you made to solve it. Same goes for memory leaks, socket leaks, file 
descriptor leaks etc., the typical resource leaks which occurs in long running 
server processes. Was there any work and/or testing in those areas ?  Are those 
tests reproducible? 

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1362) column level statistics

2012-10-05 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470698#comment-13470698
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

@Namit: In the first version of the project, which this patch implements, the 
only way to trigger stats gathering is through an explicit ANALYZE command.

> column level statistics
> ---
>
> Key: HIVE-1362
> URL: https://issues.apache.org/jira/browse/HIVE-1362
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Ning Zhang
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
> HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
> HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
> HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1362) column level statistics

2012-10-05 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470686#comment-13470686
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

I assume when you say row level statistics you are referring to table 
statistics. Today, table statistics is stored as part of the table_params. 
table_params table gets mapped to the TTable object in memory and it looks like 
the existing APIs sufficed. We want to have a dedicated Thrift API for column 
stats for the following reasons,

1. Column statistics is a property of the column and not the table and hence 
doesn't belong with the table_params. Furthermore, we have seen customers with 
tables that are 100s-1000s of columns wide. Storing this information as a 
table_param is going to bloat, and it will also make the output of DESCRIBE 
EXTENDED unreadable.

2. We want column statistics to be a first class metadata. In order to do so, 
we have to provide dedicated Thrift APIs to query and update it. We want the 
Thrift API to be self-documenting, i.e. if someone tells you that metastore 
supports column stats, you should be able to look at the Thrift IDL and figure 
out which method you need to use to store/retrieve column stats. Right now a 
lot of the API doesn't satisfy that goal since many methods are overloaded, and 
other features are implemented by adding new key/value properties to different 
catalog objects that aren't easy to document via the thrift API

3. Additionally storing column statistics as a key/value pair in the 
table_params table is not space efficient. We need to repeat the keys for each 
one of the columns in the table for which statistics is gathered. Furthermore, 
by storing column stats in the table_params table we would de-normalize the 
schema completely and incur a performance penalty performing self-joins, though 
not necessarily in the metasote db, to retrieve the statistics associated with 
a column. 

> column level statistics
> ---
>
> Key: HIVE-1362
> URL: https://issues.apache.org/jira/browse/HIVE-1362
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Ning Zhang
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
> HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
> HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
> HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2012-10-05 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470649#comment-13470649
 ] 

Carl Steinbach commented on HIVE-2935:
--

bq. To follow on Edward's comment, I don't understand why beeline is in the 
patch. Is it integral to HiveServer2?

People need a way to interact with HiveServer2. We could have spent time 
modifying the existing CLI to work with HS2, but we decided against this 
approach because a) the HiveCLI has a lot of bugs, and b) we risked introducing 
new bugs in the process of modifying the CLI to work with both HS1 and HS2. We 
included BeeLine in this patch because most of the test coverage we have 
provided for HiveServer2 depends on the new TestBeeLineDriver, which in turn 
depends on BeeLine. 

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE

2012-10-05 Thread Jakob Homan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470640#comment-13470640
 ] 

Jakob Homan commented on HIVE-3525:
---

Reviewing...

> Avro Maps with Nullable Values fail with NPE
> 
>
> Key: HIVE-3525
> URL: https://issues.apache.org/jira/browse/HIVE-3525
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Sean Busbey
> Attachments: HIVE-3525.1.patch.txt, HIVE-3525.2.patch.txt
>
>
> When working against current trunk@1393794, using a backing Avro schema that 
> has a Map field with nullable values causes a NPE on deserialization when the 
> map contains a null value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2012-10-05 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470641#comment-13470641
 ] 

Carl Steinbach commented on HIVE-2935:
--

bq. What do you think about fork beeline as a separate project inside hive. 
Suchs as hive-beeline. Because a majority of this patch looks to be beeline 
with some subtle tweeks.

In the current version of the patch the BeeLine code is included in the 
hive-cli package. I think this makes sense since BeeLine is a CLI. On the other 
hand, if we added a new package for beeline we would be able to avoid adding 
dependencies on the other Hive JARs that the current CLI mandates we include. 
Providing this separation will probably be beneficial in the long term so I'll 
start making the change and will submit this in another ticket.

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3538) Avro SerDe can't handle Nullable Enums

2012-10-05 Thread Jakob Homan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470639#comment-13470639
 ] 

Jakob Homan commented on HIVE-3538:
---

Good catch.  Let me take a look.

> Avro SerDe can't handle Nullable Enums
> --
>
> Key: HIVE-3538
> URL: https://issues.apache.org/jira/browse/HIVE-3538
> Project: Hive
>  Issue Type: Bug
>Reporter: Sean Busbey
> Attachments: HIVE-3538.tests.txt
>
>
> If a field has a schema that unions NULL with an enum, Avro fails to resolve 
> the union because Avro SerDe doesn't restore "enumness".
> Since the enum datum is a String, avro internals check the union for a string 
> schema, which is not present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2012-10-05 Thread Jakob Homan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470635#comment-13470635
 ] 

Jakob Homan commented on HIVE-3528:
---

This looks good.  Let me test the patch.

> Avro SerDe doesn't handle serializing Nullable types that require access to a 
> Schema
> 
>
> Key: HIVE-3528
> URL: https://issues.apache.org/jira/browse/HIVE-3528
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Sean Busbey
>  Labels: avro
> Attachments: HIVE-3528.1.patch.txt
>
>
> Deserialization properly handles hiding Nullable Avro types, including 
> complex types like record, map, array, etc. However, when Serialization 
> attempts to write out these types it erroneously makes use of the UNION 
> schema that contains NULL and the other type.
> This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
> Bytes.
> Here's a [review board of unit tests that express the 
> problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
> case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2012-10-05 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470623#comment-13470623
 ] 

Carl Steinbach commented on HIVE-2935:
--

@Namit: The only change we made to the Driver class was to wrap a monitor lock 
around the compile() call in order to serialize access to the compilation 
phase. I can split this out into a separate patch if you think that would 
helpful.

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1362) column level statistics

2012-10-05 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470592#comment-13470592
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

@Namit: I don't understand hive.stats.reliable very clearly. Can you please 
explain how hive.stats.reliable works? What are the semantics of 
hive.stats.reliable? Why do we need hive.stats.reliable? Thanks.

> column level statistics
> ---
>
> Key: HIVE-1362
> URL: https://issues.apache.org/jira/browse/HIVE-1362
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Ning Zhang
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
> HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
> HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
> HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable

2012-10-05 Thread Raghotham Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy updated HIVE-3522:
---

Attachment: hive-3522.3.patch

> Make separator for Entity name configurable
> ---
>
> Key: HIVE-3522
> URL: https://issues.apache.org/jira/browse/HIVE-3522
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Raghotham Murthy
>Assignee: Raghotham Murthy
>Priority: Trivial
> Attachments: hive-3522.1.patch, hive-3522.2.patch, hive-3522.3.patch
>
>
> Right now its hard-coded to '@'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable

2012-10-05 Thread Raghotham Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy updated HIVE-3522:
---

Attachment: (was: hive-3522.3.patch)

> Make separator for Entity name configurable
> ---
>
> Key: HIVE-3522
> URL: https://issues.apache.org/jira/browse/HIVE-3522
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Raghotham Murthy
>Assignee: Raghotham Murthy
>Priority: Trivial
> Attachments: hive-3522.1.patch, hive-3522.2.patch, hive-3522.3.patch
>
>
> Right now its hard-coded to '@'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable

2012-10-05 Thread Raghotham Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy updated HIVE-3522:
---

Attachment: hive-3522.3.patch

> Make separator for Entity name configurable
> ---
>
> Key: HIVE-3522
> URL: https://issues.apache.org/jira/browse/HIVE-3522
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Raghotham Murthy
>Assignee: Raghotham Murthy
>Priority: Trivial
> Attachments: hive-3522.1.patch, hive-3522.2.patch, hive-3522.3.patch
>
>
> Right now its hard-coded to '@'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3536) Output of sort merge join is no longer bucketed

2012-10-05 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3536:
--

Attachment: HIVE-3536.D5907.3.patch

kevinwilfong updated the revision "HIVE-3536 [jira] Output of sort merge join 
is no longer bucketed".
Reviewers: JIRA, njain

  Fixed a bug causing non-prefixed files to be generated prefixed with null.

REVISION DETAIL
  https://reviews.facebook.net/D5907

AFFECTED FILES
  ql/src/test/results/clientpositive/smb_mapjoin_11.q.out
  ql/src/test/queries/clientpositive/smb_mapjoin_11.q
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java

To: JIRA, njain, kevinwilfong


> Output of sort merge join is no longer bucketed
> ---
>
> Key: HIVE-3536
> URL: https://issues.apache.org/jira/browse/HIVE-3536
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3536.1.patch.txt, HIVE-3536.2.patch.txt, 
> HIVE-3536.3.patch.txt, HIVE-3536.D5907.2.patch, HIVE-3536.D5907.3.patch
>
>
> I don't know if this was a feature or a happy coincidence, but before 
> HIVE-3230, the output of a sort merge join on two partitions would be 
> bucketed, even if hive.enforce.bucketing was set to false.  This could 
> potentially save a reduce phase when inserting into a bucketed table.
> This would be good to have back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3536) Output of sort merge join is no longer bucketed

2012-10-05 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3536:


Attachment: HIVE-3536.3.patch.txt

> Output of sort merge join is no longer bucketed
> ---
>
> Key: HIVE-3536
> URL: https://issues.apache.org/jira/browse/HIVE-3536
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3536.1.patch.txt, HIVE-3536.2.patch.txt, 
> HIVE-3536.3.patch.txt, HIVE-3536.D5907.2.patch, HIVE-3536.D5907.3.patch
>
>
> I don't know if this was a feature or a happy coincidence, but before 
> HIVE-3230, the output of a sort merge join on two partitions would be 
> bucketed, even if hive.enforce.bucketing was set to false.  This could 
> potentially save a reduce phase when inserting into a bucketed table.
> This would be good to have back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3536) Output of sort merge join is no longer bucketed

2012-10-05 Thread Kevin Wilfong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470495#comment-13470495
 ] 

Kevin Wilfong commented on HIVE-3536:
-

Fixed a bug causing non-prefixed files to be generated prefixed with null.

> Output of sort merge join is no longer bucketed
> ---
>
> Key: HIVE-3536
> URL: https://issues.apache.org/jira/browse/HIVE-3536
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3536.1.patch.txt, HIVE-3536.2.patch.txt, 
> HIVE-3536.3.patch.txt, HIVE-3536.D5907.2.patch, HIVE-3536.D5907.3.patch
>
>
> I don't know if this was a feature or a happy coincidence, but before 
> HIVE-3230, the output of a sort merge join on two partitions would be 
> bucketed, even if hive.enforce.bucketing was set to false.  This could 
> potentially save a reduce phase when inserting into a bucketed table.
> This would be good to have back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3540) Non-local Hive query with custom InputFormat via CombineFileInputFormat fails with zipped data

2012-10-05 Thread Chris McConnell (JIRA)

Chris McConnell created HIVE-3540:
-

 Summary: Non-local Hive query with custom InputFormat via 
CombineFileInputFormat fails with zipped data
 Key: HIVE-3540
 URL: https://issues.apache.org/jira/browse/HIVE-3540
 Project: Hive
  Issue Type: Bug
  Components: CLI, Query Processor
Affects Versions: 0.8.1
Reporter: Chris McConnell
Priority: Minor


When accessing a table with zipped data using a custom InputFormat which 
extends CombineFileInputFormat, any non-local Hive execution will pick up the 
default org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.

Issue spawns from the fact that (by default) InputFormat cannot handle 
concatenated zip files. 

To reproduce:

Create data that is text based, zip it
Create table with a custom input that extends CombineFileInputFormat
Execute query that is not local (select * from  limit 1;)



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #159

2012-10-05 Thread Apache Jenkins Server

See 


--
[...truncated 10381 lines...]
 [echo] Project: odbc
 [copy] Warning: 

 does not exist.

ivy-resolve-test:
 [echo] Project: odbc

ivy-retrieve-test:
 [echo] Project: odbc

compile-test:
 [echo] Project: odbc

create-dirs:
 [echo] Project: serde
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: serde

ivy-init-settings:
 [echo] Project: serde

ivy-resolve:
 [echo] Project: serde
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: serde

dynamic-serde:

compile:
 [echo] Project: serde

ivy-resolve-test:
 [echo] Project: serde

ivy-retrieve-test:
 [echo] Project: serde

compile-test:
 [echo] Project: serde
[javac] Compiling 26 source files to 

[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

create-dirs:
 [echo] Project: service
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: service

ivy-init-settings:
 [echo] Project: service

ivy-resolve:
 [echo] Project: service
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: service

compile:
 [echo] Project: service

ivy-resolve-test:
 [echo] Project: service

ivy-retrieve-test:
 [echo] Project: service

compile-test:
 [echo] Project: service
[javac] Compiling 2 source files to 


test:
 [echo] Project: hive

test-shims:
 [echo] Project: hive

test-conditions:
 [echo] Project: shims

gen-test:
 [echo] Project: shims

create-dirs:
 [echo] Project: shims
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 0.20.2 
(

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 


ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build_shims:
 [echo] Project: shims
 [echo] Compiling

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #159

2012-10-05 Thread Apache Jenkins Server

See 

--
[...truncated 9143 lines...]

init:
 [echo] Project: cli

create-dirs:
 [echo] Project: jdbc
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [copy] Warning: 

 does not exist.

init:
 [echo] Project: jdbc

create-dirs:
 [echo] Project: hwi
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [copy] Warning: 

 does not exist.

init:
 [echo] Project: hwi

create-dirs:
 [echo] Project: hbase-handler
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [copy] Warning: 

 does not exist.

init:
 [echo] Project: hbase-handler

create-dirs:
 [echo] Project: pdk
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [copy] Warning: 

 does not exist.

init:
 [echo] Project: pdk

create-dirs:
 [echo] Project: builtins
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [copy] Warning: 

 does not exist.

init:
 [echo] Project: builtins

jar:
 [echo] Project: hive

create-dirs:
 [echo] Project: shims
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 0.20.2 
(

ivy-init-settings:

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2012-10-05 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470379#comment-13470379
 ] 

Alan Gates commented on HIVE-2935:
--

To follow on Edward's comment, I don't understand why beeline is in the patch.  
Is it integral to HiveServer2?

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2012-10-05 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470317#comment-13470317
 ] 

Edward Capriolo commented on HIVE-2935:
---

What do you think about fork beeline as a separate project inside hive. Suchs 
as hive-beeline. Because a majority of this patch looks to be beeline with some 
subtle tweeks. 

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3538) Avro SerDe can't handle Nullable Enums

2012-10-05 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470316#comment-13470316
 ] 

Sean Busbey commented on HIVE-3538:
---

I think at least part of this is a bug in Avro. If you follow the flow of 
canDeserializeNullableEnums (in the attached patch of tests), inside of the 
first call to verifyNullableType GenericData properly validates the record. 
Despite this, the call to GenericDatumWriter.write inside of 
AvroGenericRecordWritable explodes because it's unable to resolve a String 
against a [null, ENUM]. Especially when the same sequence of validate followed 
by write works in the non-union case for canDeserializeEnums, that looks like a 
problem in Avro.

I'll follow up there, depending on the fix this issue might be obviated.

> Avro SerDe can't handle Nullable Enums
> --
>
> Key: HIVE-3538
> URL: https://issues.apache.org/jira/browse/HIVE-3538
> Project: Hive
>  Issue Type: Bug
>Reporter: Sean Busbey
> Attachments: HIVE-3538.tests.txt
>
>
> If a field has a schema that unions NULL with an enum, Avro fails to resolve 
> the union because Avro SerDe doesn't restore "enumness".
> Since the enum datum is a String, avro internals check the union for a string 
> schema, which is not present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3538) Avro SerDe can't handle Nullable Enums

2012-10-05 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HIVE-3538:
--

Attachment: HIVE-3538.tests.txt

Attaching tests to show what I'm talking about. Note that this is not a 
solution patch, just some unit tests.

> Avro SerDe can't handle Nullable Enums
> --
>
> Key: HIVE-3538
> URL: https://issues.apache.org/jira/browse/HIVE-3538
> Project: Hive
>  Issue Type: Bug
>Reporter: Sean Busbey
> Attachments: HIVE-3538.tests.txt
>
>
> If a field has a schema that unions NULL with an enum, Avro fails to resolve 
> the union because Avro SerDe doesn't restore "enumness".
> Since the enum datum is a String, avro internals check the union for a string 
> schema, which is not present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-h0.21 - Build # 1723 - Still Failing

2012-10-05 Thread Apache Jenkins Server

Changes for Build #1708

Changes for Build #1709
[namit] HIVE-3515 metadata_export_drop.q causes failure of other tests
(Ivan Gorbachev via namit)


Changes for Build #1710

Changes for Build #1711
[heyongqiang] HIVE-2206:add a new optimizer for query correlation discovery and 
optimization (Yin Huai via He Yongqiang)

[namit] HIVE-1367 cluster by multiple columns does not work if parenthesis is 
present
(Zhenxiao Luo via namit)


Changes for Build #1712
[cws] add instrumentation to capture if there is skew in reducers (Arun Dobriya 
via cws)

[namit] HIVE-3493 aggName of SemanticAnalyzer.getGenericUDAFEvaluator is 
generated in two
different ways (Yin Huai via namit)

[heyongqiang] revert r1392105 due to bylaw requirement mentioned by Carl 
Steinbach


Changes for Build #1713

Changes for Build #1714
[kevinwilfong] HIVE-3484. RetryingRawStore logic needs to be significantly 
reworked to support retries within transactions (Jean Xu via kevinwilfong)


Changes for Build #1715
[namit] HIVE-3495 For UDAFs, when generating a plan without 
map-side-aggregation, constant 
agg parameters will be replaced by ExprNodeColumnDesc (Yin Huai via namit)


Changes for Build #1716

Changes for Build #1717
[kevinwilfong] HIVE-3458. Parallel test script doesnt run all tests. (Ivan 
Gorbachev via kevinwilfong)

[hashutosh] HIVE-3481: : Hiveserver is not closing the existing 
driver handle before executing the next command. It results in to file handle 
leaks. (Kanna Karanam via Ashutosh Chauhan)


Changes for Build #1718

Changes for Build #1719
[kevinwilfong] HIVE-3498. hivetest.py fails with --revision option. (Ivan 
Gorbachev via kevinwilfong)


Changes for Build #1720

Changes for Build #1721

Changes for Build #1722

Changes for Build #1723
[namit] HIVE-3514 Refactor Partition Pruner so that logic can be reused.
(Gang Tim Liu via namit)




1 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1

Error Message:
Unexpected exception See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.
at junit.framework.Assert.fail(Assert.java:47)
at 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1(TestNegativeCliDriver.java:11512)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1723)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1723/ to 
view the results.

[jira] [Commented] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470298#comment-13470298
 ] 

Hudson commented on HIVE-3514:
--

Integrated in Hive-trunk-h0.21 #1723 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1723/])
HIVE-3514 Refactor Partition Pruner so that logic can be reused.
(Gang Tim Liu via namit) (Revision 1394358)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1394358
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerExpressionOperatorFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerOperatorFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/PrunerUtils.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/ExprProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/OpProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java


> Refactor Partition Pruner so that logic can be reused.
> --
>
> Key: HIVE-3514
> URL: https://issues.apache.org/jira/browse/HIVE-3514
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>Priority: Minor
> Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
> HIVE-3514.patch.4, HIVE-3514.patch.5
>
>
> Partition Pruner has logic reusable like
> 1. walk through operator tree
> 2. walk through operation tree
> 3. create pruning predicate
> The first candidate is list bucketing pruner.
> Some consideration:
> 1. refactor for general use case not just list bucketing
> 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3539) add an option in hive where the location for managed tables/partitions cannot be specified

2012-10-05 Thread Namit Jain (JIRA)

Namit Jain created HIVE-3539:


 Summary: add an option in hive where the location for managed 
tables/partitions cannot be specified
 Key: HIVE-3539
 URL: https://issues.apache.org/jira/browse/HIVE-3539
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain


We run into scenarios where multiple tables/partitions point to the same data.
This is OK for external tables, but may create problems for managed tables.

An option should be provided to disable this behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2012-10-05 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470248#comment-13470248
 ] 

Namit Jain commented on HIVE-2935:
--

It would be useful if you can split the hive driver changes (if any) in a 
different patch.
I mean, the parts of the patch that possibly affect the stability of current 
hive (not using hive server).


> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1362) column level statistics

2012-10-05 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470240#comment-13470240
 ] 

Namit Jain commented on HIVE-1362:
--

Would be easier to comment here:

>> It does not interact with hive.stats.reliable.

Can you fix that ? I mean, if you want to do it in a follow-up that's fine. 
But, please throw an error in that case if hive.stats.reliable
is set to true for now.

>>> Why does it make sense to add a thrift API for updating statistics ? There 
>>> doesn't exist a interface for updating
row level statistics. How is the user supposed to compute these other than 
analyze, which anyway updates the stats.

> column level statistics
> ---
>
> Key: HIVE-1362
> URL: https://issues.apache.org/jira/browse/HIVE-1362
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Ning Zhang
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
> HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
> HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
> HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #157

2012-10-05 Thread Lin

Unsubscribe.

Thanks.

[jira] [Commented] (HIVE-3533) ZooKeeperHiveLockManager does not respect the option to keep locks alive even after the current session has closed

2012-10-05 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470234#comment-13470234
 ] 

Namit Jain commented on HIVE-3533:
--

+1

> ZooKeeperHiveLockManager does not respect the option to keep locks alive even 
> after the current session has closed
> --
>
> Key: HIVE-3533
> URL: https://issues.apache.org/jira/browse/HIVE-3533
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 0.9.0
>Reporter: Matt Martin
>Priority: Minor
> Attachments: HIVE-3533.1.patch.txt
>
>
> The HiveLockManager interface defines the following method:
> public List lock(List objs,
>   boolean keepAlive) throws LockException;
> ZooKeeperHiveLockManager implements HiveLockManager, but the current 
> implementation of the "lock" method never actually references the "keepAlive" 
> parameter.  As a result, all of the locks acquired by the "lock" method are 
> ephemeral.  In other words, Zookeeper-based locks only exist as long as the 
> underlying Zookeeper session exists.  As soon as the Zookeeper session ends, 
> any Zookeeper-based locks are automatically released.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2228) Can't use DB qualified column names in WHERE or GROUP BY clauses

2012-10-05 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2228:
-

Status: Open  (was: Patch Available)

minor comments on phabricator

> Can't use DB qualified column names in WHERE or GROUP BY clauses
> 
>
> Key: HIVE-2228
> URL: https://issues.apache.org/jira/browse/HIVE-2228
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, Query Processor, SQL
>Affects Versions: 0.7.0
>Reporter: Carl Steinbach
>Assignee: Zhenxiao Luo
> Attachments: HIVE-2228.1.patch.txt
>
>
> Hive doesn't allow you to use DB qualified column names in the WHERE or GROUP 
> BY clauses. The workaround is to define a table alias:
> {noformat}
> hive> CREATE DATABASE db1;
> OK
> hive> CREATE TABLE db1.t(a INT, b INT);
> OK
> hive> SELECT * FROM db1.t WHERE db1.t.a > 100;
> FAILED: Error in semantic analysis: Line 1:26 Invalid table alias or column 
> reference 'db1'
> hive> SELECT * FROM db1.t t WHERE t.a > 100;
> OK
> hive> SELECT * FROM db1.t GROUP BY db1.t.a;
> FAILED: Error in semantic analysis: Line 1:29 Invalid table alias or column 
> reference 'db1'
> hive> SELECT * FROM db1.t t GROUP BY t.a;
> OK
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3537) release locks at the end of move tasks

2012-10-05 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3537:
-

Attachment: hive.3537.1.patch

> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3537.1.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3537) release locks at the end of move tasks

2012-10-05 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3537:
-

Status: Patch Available  (was: Open)

> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3537.1.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3537) release locks at the end of move tasks

2012-10-05 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470227#comment-13470227
 ] 

Namit Jain commented on HIVE-3537:
--

https://reviews.facebook.net/D5931

> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3537.1.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2012-10-05 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HIVE-3528:
--

Attachment: HIVE-3528.1.patch.txt

Patch with proposed fix and unit tests. Also on [review board 
#7431|https://reviews.apache.org/r/7431/]

> Avro SerDe doesn't handle serializing Nullable types that require access to a 
> Schema
> 
>
> Key: HIVE-3528
> URL: https://issues.apache.org/jira/browse/HIVE-3528
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Sean Busbey
>  Labels: avro
> Attachments: HIVE-3528.1.patch.txt
>
>
> Deserialization properly handles hiding Nullable Avro types, including 
> complex types like record, map, array, etc. However, when Serialization 
> attempts to write out these types it erroneously makes use of the UNION 
> schema that contains NULL and the other type.
> This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
> Bytes.
> Here's a [review board of unit tests that express the 
> problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
> case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2012-10-05 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HIVE-3528:
--

Status: Patch Available  (was: Open)

> Avro SerDe doesn't handle serializing Nullable types that require access to a 
> Schema
> 
>
> Key: HIVE-3528
> URL: https://issues.apache.org/jira/browse/HIVE-3528
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Sean Busbey
>  Labels: avro
> Attachments: HIVE-3528.1.patch.txt
>
>
> Deserialization properly handles hiding Nullable Avro types, including 
> complex types like record, map, array, etc. However, when Serialization 
> attempts to write out these types it erroneously makes use of the UNION 
> schema that contains NULL and the other type.
> This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
> Bytes.
> Here's a [review board of unit tests that express the 
> problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
> case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: HIVE-3528

2012-10-05 Thread Sean Busbey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7431/
---

(Updated Oct. 5, 2012, 8:39 a.m.)


Review request for hive.


Changes
---

Spun off nullable enums as HIVE-3538, because the issue appears to be String vs 
Arvo EnumSymbol and the changes involved were way more involved then what's 
here.


Description
---

Changes AvroSerDe to properly give the non-null schema to serialization 
routines when using Nullable complex types


Diffs (updated)
-

  /trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java 
1394121 
  
/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java
 1394121 

Diff: https://reviews.apache.org/r/7431/diff/


Testing
---

Adds tests that check each of the Avro types that Serialization needs to use a 
user-provided schema, both as top level fields and as nested members of a 
complex type.


Thanks,

Sean Busbey

[jira] [Created] (HIVE-3538) Avro SerDe can't handle Nullable Enums

2012-10-05 Thread Sean Busbey (JIRA)

Sean Busbey created HIVE-3538:
-

 Summary: Avro SerDe can't handle Nullable Enums
 Key: HIVE-3538
 URL: https://issues.apache.org/jira/browse/HIVE-3538
 Project: Hive
  Issue Type: Bug
Reporter: Sean Busbey


If a field has a schema that unions NULL with an enum, Avro fails to resolve 
the union because Avro SerDe doesn't restore "enumness".

Since the enum datum is a String, avro internals check the union for a string 
schema, which is not present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

49 matches

Mail list logo