[jira] [Commented] (HIVE-1446) Move Hive Documentation from the wiki to version control

2013-07-12 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706758#comment-13706758
 ] 

Lefty Leverenz commented on HIVE-1446:
--

I'd have called this Resolved:  Won't Fix but whatever you call it, thanks for 
putting this sad old JIRA to rest.

For those who don't already know:  the Hive xdocs are no longer available 
(since release 0.10.0).  Everything is in the Hive wiki now -- see 
[HIVE-3896|https://issues.apache.org/jira/browse/HIVE-3896].



> Move Hive Documentation from the wiki to version control
> 
>
> Key: HIVE-1446
> URL: https://issues.apache.org/jira/browse/HIVE-1446
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: hive-1446.diff, hive-1446-part-1.diff, hive-logo-wide.png
>
>
> Move the Hive Language Manual (and possibly some other documents) from the 
> Hive wiki to version control. This work needs to be coordinated with the 
> hive-dev and hive-user community in order to avoid missing any edits as well 
> as to avoid or limit unavailability of the docs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4658) Make KW_OUTER optional in outer joins

2013-07-12 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707028#comment-13707028
 ] 

Navis commented on HIVE-4658:
-

Sorry, missed this. Running test.

> Make KW_OUTER optional in outer joins
> -
>
> Key: HIVE-4658
> URL: https://issues.apache.org/jira/browse/HIVE-4658
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Edward Capriolo
>Priority: Trivial
> Attachments: hive-4658.2.patch.txt, HIVE-4658.D11091.1.patch
>
>
> For really trivial migration issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4790) MapredLocalTask task does not make virtual columns

2013-07-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4790:
--

Attachment: HIVE-4790.D11511.2.patch

navis updated the revision "HIVE-4790 [jira] MapredLocalTask task does not make 
virtual columns".

  Rebased to trunk

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11511

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11511?vs=35157&id=35637#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsWork.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java
  ql/src/test/queries/clientpositive/join_vc.q
  ql/src/test/results/clientpositive/join_vc.q.out

To: JIRA, navis


> MapredLocalTask task does not make virtual columns
> --
>
> Key: HIVE-4790
> URL: https://issues.apache.org/jira/browse/HIVE-4790
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4790.D11511.1.patch, HIVE-4790.D11511.2.patch
>
>
> From mailing list, 
> http://www.mail-archive.com/user@hive.apache.org/msg08264.html
> {noformat}
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
> b.rownumber = a.number;
> fails with this error:
>  
> > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = 
> a.number;
> Automatically selecting local only mode for query
> Total MapReduce jobs = 1
> setting HADOOP_USER_NAMEpmarron
> 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property 
> hive.metastore.local no longer has any effect. Make sure to provide a valid 
> value for hive.metastore.uris if you are connecting to a remote metastore.
> Execution log at: /tmp/pmarron/.log
> 2013-06-25 10:52:56 Starting to launch local task to process map join;
>   maximum memory = 932118528
> java.lang.RuntimeException: cannot find field block__offset__inside__file 
> from [0:rownumber, 1:offset]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
> at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Execution failed with exit status: 2
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4790) MapredLocalTask task does not make virtual columns

2013-07-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4790:


Status: Patch Available  (was: Open)

> MapredLocalTask task does not make virtual columns
> --
>
> Key: HIVE-4790
> URL: https://issues.apache.org/jira/browse/HIVE-4790
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4790.D11511.1.patch, HIVE-4790.D11511.2.patch
>
>
> From mailing list, 
> http://www.mail-archive.com/user@hive.apache.org/msg08264.html
> {noformat}
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
> b.rownumber = a.number;
> fails with this error:
>  
> > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = 
> a.number;
> Automatically selecting local only mode for query
> Total MapReduce jobs = 1
> setting HADOOP_USER_NAMEpmarron
> 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property 
> hive.metastore.local no longer has any effect. Make sure to provide a valid 
> value for hive.metastore.uris if you are connecting to a remote metastore.
> Execution log at: /tmp/pmarron/.log
> 2013-06-25 10:52:56 Starting to launch local task to process map join;
>   maximum memory = 932118528
> java.lang.RuntimeException: cannot find field block__offset__inside__file 
> from [0:rownumber, 1:offset]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
> at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Execution failed with exit status: 2
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-07-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2608:
--

Attachment: HIVE-2608.D4317.6.patch

navis updated the revision "HIVE-2608 [jira] Do not require AS a,b,c part in 
LATERAL VIEW".

  Rebased to trunk

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D4317

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D4317?vs=35535&id=35643#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/queries/clientpositive/lateral_view_noalias.q
  ql/src/test/results/clientpositive/lateral_view_noalias.q.out

To: JIRA, navis
Cc: ikabiljo


> Do not require AS a,b,c part in LATERAL VIEW
> 
>
> Key: HIVE-2608
> URL: https://issues.apache.org/jira/browse/HIVE-2608
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, UDF
>Affects Versions: 0.10.0
>Reporter: Igor Kabiljo
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-2608.D4317.5.patch, HIVE-2608.D4317.6.patch
>
>
> Currently, it is required to state column names when LATERAL VIEW is used.
> That shouldn't be necessary, since UDTF returns struct which contains column 
> names - and they should be used by default.
> For example, it would be great if this was possible:
> SELECT t.*, t.key1 + t.key4
> FROM some_table
> LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-07-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2608:


Affects Version/s: (was: 0.10.0)
   Status: Patch Available  (was: Open)

> Do not require AS a,b,c part in LATERAL VIEW
> 
>
> Key: HIVE-2608
> URL: https://issues.apache.org/jira/browse/HIVE-2608
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, UDF
>Reporter: Igor Kabiljo
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-2608.D4317.5.patch, HIVE-2608.D4317.6.patch
>
>
> Currently, it is required to state column names when LATERAL VIEW is used.
> That shouldn't be necessary, since UDTF returns struct which contains column 
> names - and they should be used by default.
> For example, it would be great if this was possible:
> SELECT t.*, t.key1 + t.key4
> FROM some_table
> LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-07-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4730:


Status: Patch Available  (was: Open)

> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4838:
---

Status: Patch Available  (was: Open)

Marking "Patch Available"

I tested this with an experimental test-patch build using the new ptest2 
infrastructure: 
https://builds.apache.org/job/Hive-Test-Patch-trunk-hadoop1-ptest/6/ as you can 
see the tests all passed: 
https://builds.apache.org/job/Hive-Test-Patch-trunk-hadoop1-ptest/6/testReport/

if you look at the console output you can see the patch being applied in 
addition you can see the patch file was a build parameter: 
https://builds.apache.org/job/Hive-Test-Patch-trunk-hadoop1-ptest/6/parameters/

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4838:
---

Attachment: HIVE-4838.patch

New patch does contains trivial updates. It makes one variable transient to 
match the others, removes one comment I had added for myself only and adds a 
couple tests to TestMapJoinKey.

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4597) Remove test code from ql\src\java tree, place it itn ql\src\test tree

2013-07-12 Thread Tony Murphy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy resolved HIVE-4597.
---

Resolution: Not A Problem

There is not test code under ql/src/java, the test generation code does live 
with the vector expression generation code, but that is unavoidable. Remus has 
a task to move the generation code and integrate it with the build.

> Remove test code from ql\src\java tree, place it itn ql\src\test tree
> -
>
> Key: HIVE-4597
> URL: https://issues.apache.org/jira/browse/HIVE-4597
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Tony Murphy
>Priority: Minor
> Attachments: HIVE-4597.patch
>
>
> The TestCodeGen and the generated files (probably the templates too) belong 
> in the ql\src\test tree, not in the ql\src\java tree

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707154#comment-13707154
 ] 

Ashutosh Chauhan commented on HIVE-4838:


I see there is an update to .q.out file. Does that mean there is a correctness 
issue in existing code ?

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707181#comment-13707181
 ] 

Brock Noland commented on HIVE-4838:


Hi,

Correct there is. It's related to the snippet of code I posted earlier. 
Basically the equals implementation of MapJoinDoubleKey (and MapJoinObjectKey) 
is incorrect resulting in different results for the following query depending 
on how it executed (map-side vs reduce-side):

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}

Brock

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707205#comment-13707205
 ] 

Ashutosh Chauhan commented on HIVE-4838:


Interesting. Lets tease out that part from refactoring than. We need to fix 
correctness issue first. Can you create a separate jira with this issue and 
submit a minimal patch which fixes it.

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4839) build-common.xml has

2013-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-4839.
--

Resolution: Not A Problem

Since build-common.xml is always imported from 1 level down from hive root, 
this is OK (though confusing and fragile)

> build-common.xml has 
> -
>
> Key: HIVE-4839
> URL: https://issues.apache.org/jira/browse/HIVE-4839
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
> Fix For: 0.12.0
>
>
> build-common.xml has   
> which points above the root of the source tree.
> build.xml (in the same directory) haslocation="${basedir}"/>
> which is correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)
Brock Noland created HIVE-4845:
--

 Summary: Correctness issue with MapJoins using the null safe 
operator
 Key: HIVE-4845
 URL: https://issues.apache.org/jira/browse/HIVE-4845
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Critical


I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707213#comment-13707213
 ] 

Brock Noland commented on HIVE-4838:


Fair enough, I'll have a patch for HIVE-4845 shortly.

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4838:
---

Status: Open  (was: Patch Available)

Canceling patch to fix correctness issue in a separate JIRA.

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

2013-07-12 Thread Mohammad Islam


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > Can you also run all new tests with ant test -Dhadoop.mr.rev=23 to make 
> > sure we are getting right results. Else, you might need to add more columns 
> > in order-by columns.

Tested


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 
> > 101
> > 
> >
> > Any particular reason you made this synchronized ?

Removed.


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 
> > 103
> > 
> >
> > Have you tested this for both default db as well as non-default db?

Test case added.


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 
> > 121
> > 
> >
> > Instead of \n, can you use File.Seperator?

Removed.


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, 
> > line 43
> > 
> >
> > Is this meant to be Array[tinyint] => bytes?

Done


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, 
> > line 110
> > 
> >
> > Lets take care of this TODO. Should be straight fwd.

Done


- Mohammad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review22571
---


On June 18, 2013, 3:26 a.m., Mohammad Islam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> ---
> 
> (Updated June 18, 2013, 3:26 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
> https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table 
> command. It currently requires to specify Avro schema literal or schema file 
> name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ...  as SELECT ... from 
> 2. Create table ...  as SELECT from 
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
> 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
> PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 
> 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> ---
> 
> Wrote a new java Test class for a new Java class. Added a new test case into 
> existing java test class. In addition, there are 4 .q file for testing 
> multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>



Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

2013-07-12 Thread Mohammad Islam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
---

(Updated July 12, 2013, 6:49 p.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
---

Updated with Ashutosh's comments.


Bugs: HIVE-3159
https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
---

Problem:
Hive doesn't support to create a Avro-based table using HQL create table 
command. It currently requires to specify Avro schema literal or schema file 
name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ...  as SELECT ... from 
2. Create table ...  as SELECT from 
3. Create  table  without specifying Avro schema.


Diffs (updated)
-

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 
010f614 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
---

Wrote a new java Test class for a new Java class. Added a new test case into 
existing java test class. In addition, there are 4 .q file for testing multiple 
use-cases.


Thanks,

Mohammad Islam



[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707239#comment-13707239
 ] 

Edward Capriolo commented on HIVE-4838:
---

So which version is correct the map join or the map reduce join. Or were Both 
producing the wrong results?

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4845:
---

Attachment: HIVE-4845.patch

> Correctness issue with MapJoins using the null safe operator
> 
>
> Key: HIVE-4845
> URL: https://issues.apache.org/jira/browse/HIVE-4845
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Critical
> Attachments: HIVE-4845.patch
>
>
> I found a correctness issue while working on HIVE-4838. The following query 
> from join_nullsafe.q gives different results depending on if it's executed 
> map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4845:
---

Description: 
I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}

For that query rows which should be joined are not. For example, the reduce 
side outputs this row:

148 NULL148 NULL

which makes sense since a.key is equal to b.key and a.value is equal to b.value.



  was:
I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}

For that query rows which should be joined are not. 



> Correctness issue with MapJoins using the null safe operator
> 
>
> Key: HIVE-4845
> URL: https://issues.apache.org/jira/browse/HIVE-4845
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Critical
> Attachments: HIVE-4845.patch
>
>
> I found a correctness issue while working on HIVE-4838. The following query 
> from join_nullsafe.q gives different results depending on if it's executed 
> map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query rows which should be joined are not. For example, the reduce 
> side outputs this row:
> 148   NULL148 NULL
> which makes sense since a.key is equal to b.key and a.value is equal to 
> b.value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4845:
---

Description: 
I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}

For that query rows which should be joined are not. 


  was:
I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}



> Correctness issue with MapJoins using the null safe operator
> 
>
> Key: HIVE-4845
> URL: https://issues.apache.org/jira/browse/HIVE-4845
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Critical
> Attachments: HIVE-4845.patch
>
>
> I found a correctness issue while working on HIVE-4838. The following query 
> from join_nullsafe.q gives different results depending on if it's executed 
> map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query rows which should be joined are not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4845:
---

Description: 
I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}

For that query, on the map side, rows which should be joined are not. For 
example, the reduce side outputs this row:

148 NULL148 NULL

which makes sense since a.key is equal to b.key and a.value is equal to b.value 
but the current map-side code omits this row. The reason is that 
MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
null values.



  was:
I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}

For that query rows which should be joined are not. For example, the reduce 
side outputs this row:

148 NULL148 NULL

which makes sense since a.key is equal to b.key and a.value is equal to b.value.




> Correctness issue with MapJoins using the null safe operator
> 
>
> Key: HIVE-4845
> URL: https://issues.apache.org/jira/browse/HIVE-4845
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Critical
> Attachments: HIVE-4845.patch
>
>
> I found a correctness issue while working on HIVE-4838. The following query 
> from join_nullsafe.q gives different results depending on if it's executed 
> map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query, on the map side, rows which should be joined are not. For 
> example, the reduce side outputs this row:
> 148   NULL148 NULL
> which makes sense since a.key is equal to b.key and a.value is equal to 
> b.value but the current map-side code omits this row. The reason is that 
> MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
> null values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707269#comment-13707269
 ] 

Brock Noland commented on HIVE-4838:


Map-side is wrong and reduce-side was correct. For that query, on the map side, 
rows which should be joined are not. For example, the reduce side outputs this 
row:

{noformat}
a.key   a.value   b.key   b.value
148 NULL  148 NULL
{noformat}

which makes sense since a.key is equal to b.key and a.value is equal to b.value 
but the current map-side code omits this row. The reason is that 
MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
null values.

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4845:
---

Description: 
I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}

For that query, on the map side, rows which should be joined are not. For 
example, the reduce side outputs this row:

{noformat}
a.key   a.value   b.key   b.value
148 NULL  148 NULL
{noformat}

which makes sense since a.key is equal to b.key and a.value is equal to b.value 
but the current map-side code omits this row. The reason is that 
MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
null values.



  was:
I found a correctness issue while working on HIVE-4838. The following query 
from join_nullsafe.q gives different results depending on if it's executed 
map-side or reduce-side:

{noformat}
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
{noformat}

For that query, on the map side, rows which should be joined are not. For 
example, the reduce side outputs this row:

148 NULL148 NULL

which makes sense since a.key is equal to b.key and a.value is equal to b.value 
but the current map-side code omits this row. The reason is that 
MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
null values.




> Correctness issue with MapJoins using the null safe operator
> 
>
> Key: HIVE-4845
> URL: https://issues.apache.org/jira/browse/HIVE-4845
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Critical
> Attachments: HIVE-4845.patch
>
>
> I found a correctness issue while working on HIVE-4838. The following query 
> from join_nullsafe.q gives different results depending on if it's executed 
> map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query, on the map side, rows which should be joined are not. For 
> example, the reduce side outputs this row:
> {noformat}
> a.key   a.value   b.key   b.value
> 148 NULL  148 NULL
> {noformat}
> which makes sense since a.key is equal to b.key and a.value is equal to 
> b.value but the current map-side code omits this row. The reason is that 
> MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
> null values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2436) Update project naming and description in Hive website

2013-07-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707310#comment-13707310
 ] 

Brock Noland commented on HIVE-2436:


OK I've got forrest working so I can generate a patch. I noticed we are 
generating a PDF of each page: http://hive.apache.org/index.pdf

Is there a reason we are doing that? I'd like to disable it as the resulting 
patch will contain binary and I don't see links to the PDF version anywhere so 
I am guessing it's simply not used.

> Update project naming and description in Hive website
> -
>
> Key: HIVE-2436
> URL: https://issues.apache.org/jira/browse/HIVE-2436
> Project: Hive
>  Issue Type: Sub-task
>Reporter: John Sichi
>Assignee: Brock Noland
>
> http://www.apache.org/foundation/marks/pmcs.html#naming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707318#comment-13707318
 ] 

Edward Capriolo commented on HIVE-4838:
---

This is pretty sad news. How long has map-side join been broken for?

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-07-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2206:
--

Attachment: HIVE-2206.D11097.17.patch

yhuai updated the revision "HIVE-2206 [jira] add a new optimizer for query 
correlation discovery and optimization".

- Since hive already uses a single scan for a table with multiple aliases 
in a MR job, we can remove unnecessary code on merging TableScanOperators

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11097

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11097?vs=35487&id=35661#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/if/queryplan.thrift
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/AbstractCorrelationProcCtx.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/IntraQueryCorrelation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DemuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MuxDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UnionDesc.java
  ql/src/test/queries/clientpositive/correlationoptimizer1.q
  ql/src/test/queries/clientpositive/correlationoptimizer10.q
  ql/src/test/queries/clientpositive/correlationoptimizer11.q
  ql/src/test/queries/clientpositive/correlationoptimizer12.q
  ql/src/test/queries/clientpositive/correlationoptimizer13.q
  ql/src/test/queries/clientpositive/correlationoptimizer14.q
  ql/src/test/queries/clientpositive/correlationoptimizer2.q
  ql/src/test/queries/clientpositive/correlationoptimizer3.q
  ql/src/test/queries/clientpositive/correlationoptimizer4.q
  ql/src/test/queries/clientpositive/correlationoptimizer5.q
  ql/src/test/queries/clientpositive/correlationoptimizer6.q
  ql/src/test/queries/clientpositive/correlationoptimizer7.q
  ql/src/test/queries/clientpositive/correlationoptimizer8.q
  ql/src/test/queries/clientpositive/correlationoptimizer9.q
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out
  ql/src/test/results/clientpositive/correlationoptimizer12.q.out
  ql/src/test/results/clientpositive/correlationoptimizer13.q.out
  ql/src/test/results/clientpositive/correlationoptimizer14.q.out
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out
  ql/src/test/results/clientpositive/correlationoptimizer8.q.out
  ql/src/test/results/clientpositive/correlationoptimizer9.q.out
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml

To: JIRA, yhuai
Cc: brock


> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.12.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
>

[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707332#comment-13707332
 ] 

Brock Noland commented on HIVE-4838:


I think the equals method has been broken since HIVE-1754 but as far as I can 
tell it only affects joins with nulls in the join keys.

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2436) Update project naming and description in Hive website

2013-07-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707336#comment-13707336
 ] 

Brock Noland commented on HIVE-2436:


I see the PDF links on the right hand side.

> Update project naming and description in Hive website
> -
>
> Key: HIVE-2436
> URL: https://issues.apache.org/jira/browse/HIVE-2436
> Project: Hive
>  Issue Type: Sub-task
>Reporter: John Sichi
>Assignee: Brock Noland
>
> http://www.apache.org/foundation/marks/pmcs.html#naming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707337#comment-13707337
 ] 

Brock Noland commented on HIVE-4845:


I am running tests here 
https://builds.apache.org/user/brock/my-views/view/hive/job/Hive-Test-Patch-trunk-hadoop1-ptest/9/.

> Correctness issue with MapJoins using the null safe operator
> 
>
> Key: HIVE-4845
> URL: https://issues.apache.org/jira/browse/HIVE-4845
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Critical
> Attachments: HIVE-4845.patch
>
>
> I found a correctness issue while working on HIVE-4838. The following query 
> from join_nullsafe.q gives different results depending on if it's executed 
> map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query, on the map side, rows which should be joined are not. For 
> example, the reduce side outputs this row:
> {noformat}
> a.key   a.value   b.key   b.value
> 148 NULL  148 NULL
> {noformat}
> which makes sense since a.key is equal to b.key and a.value is equal to 
> b.value but the current map-side code omits this row. The reason is that 
> MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
> null values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4840) Fix eclipse template classpath to include the BoneCP lib

2013-07-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707342#comment-13707342
 ] 

Ashutosh Chauhan commented on HIVE-4840:


+1

> Fix eclipse template classpath to include the BoneCP lib
> 
>
> Key: HIVE-4840
> URL: https://issues.apache.org/jira/browse/HIVE-4840
> Project: Hive
>  Issue Type: Bug
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Trivial
> Attachments: HIVE-4840.patch.txt
>
>
> HIVE-4807 did not change the classpath in eclipse template accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4840) Fix eclipse template classpath to include the BoneCP lib

2013-07-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4840:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

committed to trunk. Thanks, Yin!

> Fix eclipse template classpath to include the BoneCP lib
> 
>
> Key: HIVE-4840
> URL: https://issues.apache.org/jira/browse/HIVE-4840
> Project: Hive
>  Issue Type: Bug
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Trivial
> Fix For: 0.12.0
>
> Attachments: HIVE-4840.patch.txt
>
>
> HIVE-4807 did not change the classpath in eclipse template accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2436) Update project naming and description in Hive website

2013-07-12 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707385#comment-13707385
 ] 

Lefty Leverenz commented on HIVE-2436:
--

Is there a way to discover how often people click on the PDF links?

Now that you've pointed them out, I might download the PDFs.  But I wonder if 
anyone else finds them useful.

> Update project naming and description in Hive website
> -
>
> Key: HIVE-2436
> URL: https://issues.apache.org/jira/browse/HIVE-2436
> Project: Hive
>  Issue Type: Sub-task
>Reporter: John Sichi
>Assignee: Brock Noland
>
> http://www.apache.org/foundation/marks/pmcs.html#naming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

2013-07-12 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-3264:
--

Attachment: HIVE-3264.6.patch

I rebased Eli's patch on the current trunk. I've also made  changes so that 
'fixed' fields come out as binary as well. This isn't an exact match, but it's 
better than an array of tinyints I think.

As far as test cases, avro_nullable_fields.q includes fields of both fixed and 
bytes type, so that test serializing and deserializing. There are also unit 
test for each of those in TestAvroDeserializer and TestAvroSerializer.

> Add support for binary dataype to AvroSerde
> ---
>
> Key: HIVE-3264
> URL: https://issues.apache.org/jira/browse/HIVE-3264
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Jakob Homan
>  Labels: patch
> Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, 
> HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's 
> byte array type is converted an array of small ints.  Now that HIVE-2380 is 
> in, this step isn't necessary and we can convert both Avro's bytes type and 
> probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4821) Implement vectorized type casting for all types

2013-07-12 Thread Sarvesh Sakalanaga (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4821:
-

Assignee: Sarvesh Sakalanaga

> Implement vectorized type casting for all types
> ---
>
> Key: HIVE-4821
> URL: https://issues.apache.org/jira/browse/HIVE-4821
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: vectorization-branch
>Reporter: Eric Hanson
>Assignee: Sarvesh Sakalanaga
>
> Implement vectorized support for casting from any type to any type.
> From the documentation:
> cast(expr as ): Converts the results of the expression expr to  
> e.g. cast('1' as BIGINT) will convert the string '1' to it integral 
> representation. A null is returned if the conversion does not succeed
> The current supported internal types are:
> LONG
> DOUBLE
> STRING
> TIMESTAMP
> Before implementation, determine what are the semantics of explicit casting 
> to types less general than the internal types. E.g. what if you cast DOUBLE 
> to TINYINT? Can we just cast internally to LONG and let the output process 
> cast to TINYINT? 
> This JIRA includes all work to make casting operate end-to-end in a SQL query 
> in vectorized mode, including updates to VectorizationContext.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4846) Implement Vectorized Limit Operator

2013-07-12 Thread Sarvesh Sakalanaga (JIRA)
Sarvesh Sakalanaga created HIVE-4846:


 Summary: Implement Vectorized Limit Operator
 Key: HIVE-4846
 URL: https://issues.apache.org/jira/browse/HIVE-4846
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4846) Implement Vectorized Limit Operator

2013-07-12 Thread Sarvesh Sakalanaga (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4846:
-

Status: Patch Available  (was: Open)

> Implement Vectorized Limit Operator
> ---
>
> Key: HIVE-4846
> URL: https://issues.apache.org/jira/browse/HIVE-4846
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sarvesh Sakalanaga
>Assignee: Sarvesh Sakalanaga
> Attachments: Hive-4846.0.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4846) Implement Vectorized Limit Operator

2013-07-12 Thread Sarvesh Sakalanaga (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4846:
-

Attachment: Hive-4846.0.patch

With this patch map side limit operators are vectorized.

> Implement Vectorized Limit Operator
> ---
>
> Key: HIVE-4846
> URL: https://issues.apache.org/jira/browse/HIVE-4846
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sarvesh Sakalanaga
>Assignee: Sarvesh Sakalanaga
> Attachments: Hive-4846.0.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4845:
---

Status: Patch Available  (was: Open)

Tests pass, results are here: 
https://builds.apache.org/user/brock/my-views/view/hive/job/Hive-Test-Patch-trunk-hadoop1-ptest/9/testReport/

> Correctness issue with MapJoins using the null safe operator
> 
>
> Key: HIVE-4845
> URL: https://issues.apache.org/jira/browse/HIVE-4845
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Critical
> Attachments: HIVE-4845.patch
>
>
> I found a correctness issue while working on HIVE-4838. The following query 
> from join_nullsafe.q gives different results depending on if it's executed 
> map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query, on the map side, rows which should be joined are not. For 
> example, the reduce side outputs this row:
> {noformat}
> a.key   a.value   b.key   b.value
> 148 NULL  148 NULL
> {noformat}
> which makes sense since a.key is equal to b.key and a.value is equal to 
> b.value but the current map-side code omits this row. The reason is that 
> MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
> null values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4845) Correctness issue with MapJoins using the null safe operator

2013-07-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707471#comment-13707471
 ] 

Brock Noland commented on HIVE-4845:


https://reviews.facebook.net/D11685

> Correctness issue with MapJoins using the null safe operator
> 
>
> Key: HIVE-4845
> URL: https://issues.apache.org/jira/browse/HIVE-4845
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Critical
> Attachments: HIVE-4845.patch
>
>
> I found a correctness issue while working on HIVE-4838. The following query 
> from join_nullsafe.q gives different results depending on if it's executed 
> map-side or reduce-side:
> {noformat}
> SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> 
> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
> {noformat}
> For that query, on the map side, rows which should be joined are not. For 
> example, the reduce side outputs this row:
> {noformat}
> a.key   a.value   b.key   b.value
> 148 NULL  148 NULL
> {noformat}
> which makes sense since a.key is equal to b.key and a.value is equal to 
> b.value but the current map-side code omits this row. The reason is that 
> MapJoinDoubleKey is used for the map-side join which doesn't properly compare 
> null values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

2013-07-12 Thread Jakob Homan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review23102
---


Overall, looks good.  I'm concerned that there's no end-to-end test of having 
some non-Avro data in Hive, using Hive to write (and join it) to an Avro file, 
and verifying the content of the actual Avro file.  Would something like that 
be feasible?


ql/src/test/queries/clientpositive/avro_create_as_select.q


This test doesn't actually work on a non-avro table.  It just works on the 
definition of a non-avro table.  We should have another one that does a select 
from a populated, non-avro-backed table and verify the values are converted 
corrected.



ql/src/test/queries/clientpositive/avro_without_schema.q


I don't see a test that exercises the generated schema through whole 
mapreduce query, just ones that exercise the metadata store...



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java


Should be a debug



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java


spacing



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java


We lose any column comments that may have been on the original schema.  
Does that normally happen with CTAS in Hive?



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java


Rather than building the avro schema string by hand and then generating the 
schema object, why not generate the avro schema and then run toString on it?  
This guarantees the schema will be correctly formed and type checked.



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java


Use j here instead, so as not to shadow the i.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


Some spacing on the table would help.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


formatting



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


fot -> for.  type info -> TypeInfo



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


Is there any way to turn off the union wrapping?  If not, does it require 
its own separate parameter?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


formatting



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


Move this above the first function so it's clear to readers.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


Yeah, do that.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


Doesn't this duplicate generateSchema above?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


Hive TypeInfo allows for non-string keys, while Avro does not.  There 
should be a check here for that.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java


can this be private?



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java


nit: can this be made more readable?



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java


Rather than hand-coding these, can we have Hive generate them?  This will 
catch any changes if Hive changes how it generates them later on.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java


Please separate out all the test cases into individual tests.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java


assert on the content of the exception, or subtype it and use the expects 
annotation.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java


Either add msg string or break out into separate tests.



serde/src/test/org/apache/hadoop/hive/

Re: Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-12 Thread Jakob Homan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/#review23113
---


Do you have after-optimization performance numbers?  Can you add a test to 
verify that the reencoder cache is working correctly?  Feed in a record with 
one uuid, then another with a different and verify that the cache has two 
elements.  Adding a third record with the original UUID shouldn't increase the 
size of the cache.  Also, that adding n records all with the same schema 
creates only one reencoder...


serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java


verifiedRecordReaders -> noReencodingNeeded ?



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java


readability: pull out getRecordReaderID into its own var



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java


Need to write out the uuid too



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java


Need to read in the uuid too


- Jakob Homan


On July 11, 2013, 3:31 p.m., Mohammad Islam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12480/
> ---
> 
> (Updated July 11, 2013, 3:31 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-4732
> https://issues.apache.org/jira/browse/HIVE-4732
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> From our performance analysis, we found AvroSerde's schema.equals() call 
> consumed a substantial amount ( nearly 40%) of time. This patch intends to 
> minimize the number schema.equals() calls by pushing the check as late/fewer 
> as possible.
> 
> At first, we added a unique id for each record reader which is then included 
> in every AvroGenericRecordWritable. Then, we introduce two new data 
> structures (one hashset and one hashmap) to store intermediate data to avoid 
> duplicates checkings. Hashset contains all the record readers' IDs that don't 
> need any re-encoding. On the other hand, HashMap contains the already used 
> re-encoders. It works as cache and allows re-encoders reuse. With this 
> change, our test shows nearly 40% reduction in Avro record reading time.
>  
>
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
> dbc999f 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> c85ef15 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
>  66f0348 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
> 9af751b 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 
> 
> Diff: https://reviews.apache.org/r/12480/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>



[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

2013-07-12 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707503#comment-13707503
 ] 

Mark Wagner commented on HIVE-3264:
---

Also posted to RB: https://reviews.apache.org/r/12531/

> Add support for binary dataype to AvroSerde
> ---
>
> Key: HIVE-3264
> URL: https://issues.apache.org/jira/browse/HIVE-3264
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Jakob Homan
>  Labels: patch
> Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, 
> HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch
>
>
> When the AvroSerde was written, Hive didn't have a binary type, so Avro's 
> byte array type is converted an array of small ints.  Now that HIVE-2380 is 
> in, this step isn't necessary and we can convert both Avro's bytes type and 
> probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4840) Fix eclipse template classpath to include the BoneCP lib

2013-07-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707537#comment-13707537
 ] 

Hudson commented on HIVE-4840:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #70 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/70/])
HIVE-4840 : Fix eclipse template classpath to include the BoneCP lib (Yin Huai 
via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1502678)
* /hive/trunk/eclipse-templates/.classpath
* /hive/trunk/hcatalog/src/test/e2e/hcatalog/drivers/Util.pm


> Fix eclipse template classpath to include the BoneCP lib
> 
>
> Key: HIVE-4840
> URL: https://issues.apache.org/jira/browse/HIVE-4840
> Project: Hive
>  Issue Type: Bug
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Trivial
> Fix For: 0.12.0
>
> Attachments: HIVE-4840.patch.txt
>
>
> HIVE-4807 did not change the classpath in eclipse template accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4843:
-

Attachment: HIVE-4843.2.patch

New iteration of the changes. The first patch was a work in progress.

> Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
> readability
> ---
>
> Key: HIVE-4843
> URL: https://issues.apache.org/jira/browse/HIVE-4843
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, tez-branch
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch
>
>
> Currently, there are static apis in multiple locations in ExecDriver and 
> MapRedTask that can be leveraged if put in the already existing utility class 
> in the exec package. This would help making the code more maintainable, 
> readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4847) add rename database functionality

2013-07-12 Thread Greg Rahn (JIRA)
Greg Rahn created HIVE-4847:
---

 Summary: add rename database functionality
 Key: HIVE-4847
 URL: https://issues.apache.org/jira/browse/HIVE-4847
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Greg Rahn
Priority: Minor


There seems to be no way to rename a database in Hive, functionality to do so 
would be nice.

Proposed syntax:
ALTER DATABASE dbname RENAME TO newdbname;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3039) Move Hive admin documentation from wiki to version control

2013-07-12 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-3039.
---

Resolution: Won't Fix

> Move Hive admin documentation from wiki to version control
> --
>
> Key: HIVE-3039
> URL: https://issues.apache.org/jira/browse/HIVE-3039
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Lefty Leverenz
>Assignee: Lefty Leverenz
>  Labels: documentation
> Attachments: HIVE-3039.1.patch
>
>
> Move the Hive administrator documentation from the wiki to version control.  
> The wiki doc set for administrators currently has nine html files that need 
> to be converted to xml:
> Installing Hive
> Configuring Hive
> Setting up Metastore
> Setting up Hive Web Interface
> Setting up Thrift Hive Server
> Setting up Hive JDBC Server
> Setting up Hive ODBC Server
> Hive on Amazon Web Services
> Hive on Amazon ElasticMapReduce

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707592#comment-13707592
 ] 

Yin Huai commented on HIVE-4838:


Hi Brock, I have a question. Does this correctness issue only affect joins with 
<=> operator? Or it also affects = operator?

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4518) Counter Strike: Operation Operator

2013-07-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707596#comment-13707596
 ] 

Gunther Hagleitner commented on HIVE-4518:
--

Yes, thanks for the nudge. I'll rebase/run tests again on the weekend.

> Counter Strike: Operation Operator
> --
>
> Key: HIVE-4518
> URL: https://issues.apache.org/jira/browse/HIVE-4518
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-4518.1.patch, HIVE-4518.2.patch, HIVE-4518.3.patch, 
> HIVE-4518.4.patch
>
>
> Queries of the form:
> from foo
> insert overwrite table bar partition (p) select ...
> insert overwrite table bar partition (p) select ...
> insert overwrite table bar partition (p) select ...
> Generate a huge amount of counters. The reason is that task.progress is 
> turned on for dynamic partitioning queries.
> The counters not only make queries slower than necessary (up to 50%) you will 
> also eventually run out. That's because we're wrapping them in enum values to 
> comply with hadoop 0.17.
> The real reason we turn task.progress on is that we need CREATED_FILES and 
> FATAL counters to ensure dynamic partitioning queries don't go haywire.
> The counters have counter-intuitive names like C1 through C1000 and don't 
> seem really useful by themselves.
> With hadoop 20+ you don't need to wrap the counters anymore, each operator 
> can simply create and increment counters. That should simplify the code a lot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-12 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707658#comment-13707658
 ] 

Yin Huai commented on HIVE-4838:


>From the code, seems this issue only affects <=> operator.

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch, HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira