date:20090710

Fix bug in TypeConverter


 Key: HIVE-624
 URL: https://issues.apache.org/jira/browse/HIVE-624
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Zheng Shao


There is a bug in the Converter - we are converting all objects to Primitive 
Java objects instead of Writable.
This has caused some queries to fail:

{code}
SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM any_table;
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-624) Fix bug in TypeConverter


 [ 
https://issues.apache.org/jira/browse/HIVE-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-624:


Attachment: HIVE-624.1.patch

This patch fixes the problem by rewriting all the converters.

The converters are expanded to a class hierarchy so it can reuse the returned 
conversion result (a Writable object) more easily.


I also added the Settable*ObjectInspector which provides a delegated way of 
setting the value of an object and creating new objects.

It also adds one new test case for the converters, and one more client positive 
test case.


 Fix bug in TypeConverter
 

 Key: HIVE-624
 URL: https://issues.apache.org/jira/browse/HIVE-624
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-624.1.patch


 There is a bug in the Converter - we are converting all objects to Primitive 
 Java objects instead of Writable.
 This has caused some queries to fail:
 {code}
 SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM any_table;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-522) GenericUDAF: Extend UDAF to deal with complex types


 [ 
https://issues.apache.org/jira/browse/HIVE-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-522:


Attachment: HIVE-522.6.patch

Merged with trunk again and fixed all tests.

 GenericUDAF: Extend UDAF to deal with complex types
 ---

 Key: HIVE-522
 URL: https://issues.apache.org/jira/browse/HIVE-522
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.4.0

 Attachments: HIVE-522.1.patch, HIVE-522.2.patch, HIVE-522.3.patch, 
 HIVE-522.4.patch, HIVE-522.5.patch, HIVE-522.6.patch


 We can pass arbitrary arguments into GenericUDFs. We should do the same thing 
 to GenericUDAF so that UDAF can also take arbitrary arguments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary

Use of BinarySortableSerDe for serialization of the value between map and 
reduce boundary
-

 Key: HIVE-625
 URL: https://issues.apache.org/jira/browse/HIVE-625
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Zheng Shao


We currently use LazySimpleSerDe which serializes double to text format. Before 
we have LazyBinarySerDe, we should switch to BinarySortableSerDe because that's 
still much faster than LazySimpleSerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary

[
https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao updated HIVE-625:

Attachment: HIVE-625.1.patch

Some extreme test result shows there is a big performance improvement.

{code}
select CAST(rand() * 1024 * 1024 AS INT) as a, rand() as b from mytable
cluster by a limit 10;
{code}

The key is an int, and the value is a double. I ran this on an example table.

The mappers of the new code takes on average 98 seconds.
The mappers of the old code (without this patch) takes on average 165 seconds.

Although this is an extreme example, it does show the huge improvement from
using the binary serialization format.
Note that the test was done with gzip as mapred.map.output.compression.codec,
so the difference of time is exaggerated a bit (compared with the same when we
use Lzo).

Use of BinarySortableSerDe for serialization of the value between map and
reduce boundary
-

Key: HIVE-625
URL: https://issues.apache.org/jira/browse/HIVE-625
Project: Hadoop Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Zheng Shao
Assignee: Zheng Shao
Attachments: HIVE-625.1.patch

We currently use LazySimpleSerDe which serializes double to text format.
Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe
because that's still much faster than LazySimpleSerDe.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary


 [ 
https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-625:


Status: Patch Available  (was: Open)

 Use of BinarySortableSerDe for serialization of the value between map and 
 reduce boundary
 -

 Key: HIVE-625
 URL: https://issues.apache.org/jira/browse/HIVE-625
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-625.1.patch


 We currently use LazySimpleSerDe which serializes double to text format. 
 Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe 
 because that's still much faster than LazySimpleSerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-524) ExecDriver adds 0 byte file to input paths

2009-07-10 Thread Johan Oskarsson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729598#action_12729598
 ] 

Johan Oskarsson commented on HIVE-524:
--

Seems HIVE-195 didn't fix this issue? I just ran into it again with the latest 
trunk checkout

 ExecDriver adds 0 byte file to input paths
 --

 Key: HIVE-524
 URL: https://issues.apache.org/jira/browse/HIVE-524
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Johan Oskarsson
 Fix For: 0.4.0


 In the addInputPaths method in ExecDriver:
 If the input path of a partition cannot be found or contains no files with 
 data in them, a 0 byte file is created and added to the job instead. This 
 causes our custom InputFormat to throw an exception since it is asked to 
 process an unknown file format (not an lzo file).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Build failed in Hudson: Hive-trunk-h0.17 #149

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/149/changes

Changes:

[namit] HIVE-622. add UDF reverse
(Emil Ibrishimov via namit)

[namit] HIVE-553. add BinarySortableSerDe
(Zheng Shao via namit)

[zshao] HIVE-610. Move all properties from jpox.properties to hive-site.xml. 
(Prasad Chakka via zshao)

[namit] HIVE-527. Inserting into a partitioned table without specifying the 
partition field should fail.
(He Yongqiang via namit)

--
[...truncated 14563 lines...]
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out
 
[junit] Done query: unknown_column6.q
[junit] Begin query: unknown_function1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] Loading data to table src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function1.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function1.q.out
 
[junit] Done query: unknown_function1.q
[junit] Begin query: unknown_function2.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] Loading data to table src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function2.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function2.q.out
 
[junit] Done query: unknown_function2.q
[junit] Begin query: unknown_function3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] Loading data to table src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function3.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function3.q.out
 
[junit] Done query: unknown_function3.q
[junit] Begin query: unknown_function4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK

Release 0.3.1?

2009-07-10 Thread Johan Oskarsson

Hi everyone,

All bugs except one assigned to 0.3.1 have been fixed for quite some
time. Shall we try to make a release candidate for 0.3.1 and push the
last (windows) bug to 0.3.2?

/Johan

Build failed in Hudson: Hive-trunk-h0.18 #151

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/151/

--
started
Building remotely on minerva.apache.org (Ubuntu)
FATAL: remote file operation failed
hudson.util.IOException2: remote file operation failed
at hudson.FilePath.act(FilePath.java:430)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:469)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:418)
at hudson.model.AbstractProject.checkout(AbstractProject.java:801)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:314)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:266)
at hudson.model.Run.run(Run.java:896)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
Caused by: java.io.IOException: Unable to delete 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/test/junit_metastore_db/log
  - files in dir: 
[http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/test/junit_metastore_db/log/log3.dat]
 
at hudson.Util.deleteFile(Util.java:215)
at hudson.Util.deleteRecursive(Util.java:248)
at hudson.Util.deleteContentsRecursive(Util.java:182)
at hudson.Util.deleteRecursive(Util.java:247)
at hudson.Util.deleteContentsRecursive(Util.java:182)
at hudson.Util.deleteRecursive(Util.java:247)
at hudson.Util.deleteContentsRecursive(Util.java:182)
at hudson.Util.deleteRecursive(Util.java:247)
at hudson.Util.deleteContentsRecursive(Util.java:182)
at hudson.Util.deleteRecursive(Util.java:247)
at hudson.Util.deleteContentsRecursive(Util.java:182)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:532)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:476)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:1283)
at hudson.remoting.UserRequest.perform(UserRequest.java:69)
at hudson.remoting.UserRequest.perform(UserRequest.java:23)
at hudson.remoting.Request$2.run(Request.java:213)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Hudson build is back to normal: Hive-trunk-h0.17 #150

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/150/

Hudson build is back to normal: Hive-trunk-h0.18 #152

See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/152/

[jira] Commented: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary


[ 
https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729677#action_12729677
 ] 

Namit Jain commented on HIVE-625:
-

But, is it a good idea to always do so ?

Can't you come up with another testcase where the opposite will hold.
The reducer has a highly selective filter at the beginning, followed by a 
select.
In binarysortableserde, all the columns are read, whereas in lazysimleserde, 
most of the columns are not even materialized.


 Use of BinarySortableSerDe for serialization of the value between map and 
 reduce boundary
 -

 Key: HIVE-625
 URL: https://issues.apache.org/jira/browse/HIVE-625
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-625.1.patch


 We currently use LazySimpleSerDe which serializes double to text format. 
 Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe 
 because that's still much faster than LazySimpleSerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-473) Clean up after tests

2009-07-10 Thread Johan Oskarsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Oskarsson updated HIVE-473:
-

Assignee: Johan Oskarsson
  Status: Patch Available  (was: Open)

 Clean up after tests
 

 Key: HIVE-473
 URL: https://issues.apache.org/jira/browse/HIVE-473
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Johan Oskarsson
Assignee: Johan Oskarsson
Priority: Critical
 Fix For: 0.4.0

 Attachments: HIVE-473.patch


 The test suite creates a lot of temporary files that aren't cleaned up. For 
 example plan xml files, mapred/local and mapred/system files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-473) Clean up after tests


[ 
https://issues.apache.org/jira/browse/HIVE-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729733#action_12729733
 ] 

Namit Jain commented on HIVE-473:
-

+1

The patch looks good - will commit if the tests pass

 Clean up after tests
 

 Key: HIVE-473
 URL: https://issues.apache.org/jira/browse/HIVE-473
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Johan Oskarsson
Assignee: Johan Oskarsson
Priority: Critical
 Fix For: 0.4.0

 Attachments: HIVE-473.patch


 The test suite creates a lot of temporary files that aren't cleaned up. For 
 example plan xml files, mapred/local and mapred/system files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary


[ 
https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729772#action_12729772
 ] 

Zheng Shao commented on HIVE-625:
-

For this particular case, I think predicate push down will push the filter to 
the mapper side. And partition pruner will prune out all columns that are not 
accessed.
So, the reducer will probably read all columns that are passed through map and 
reduce boundary.

I agree there can still be other opposite cases - but that won't appear often. 
I can also make this SerDe configurable if that's a better idea.

What do you think?


 Use of BinarySortableSerDe for serialization of the value between map and 
 reduce boundary
 -

 Key: HIVE-625
 URL: https://issues.apache.org/jira/browse/HIVE-625
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-625.1.patch


 We currently use LazySimpleSerDe which serializes double to text format. 
 Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe 
 because that's still much faster than LazySimpleSerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-622) Implement reverse UDF


[ 
https://issues.apache.org/jira/browse/HIVE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729780#action_12729780
 ] 

Zheng Shao commented on HIVE-622:
-

Do we want to make these String UDFs more efficiently by directly using Text 
instead of String?
I saw the comment in the code but shall we do it now or later?


 Implement reverse UDF
 -

 Key: HIVE-622
 URL: https://issues.apache.org/jira/browse/HIVE-622
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Emil Ibrishimov
Assignee: Emil Ibrishimov
 Fix For: 0.4.0

 Attachments: HIVE-622.1.patch, HIVE-622.2.patch


 Implement reverse as requested in 
 https://issues.apache.org/jira/browse/HIVE-615

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-478) Surface processor time for queries

2009-07-10 Thread Adam Kramer (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729787#action_12729787
 ] 

Adam Kramer commented on HIVE-478:
--

Also, in case it was not obvious, the current system counts time going by while 
mappers/reducers are pending. This request would tell me how much time I 
actually used, e.g., not include time spent waiting for mappers or reducers.

 Surface processor time for queries
 

 Key: HIVE-478
 URL: https://issues.apache.org/jira/browse/HIVE-478
 Project: Hadoop Hive
  Issue Type: Wish
  Components: Logging, Query Processor
Reporter: Adam Kramer

 We currently list real-time metrics of how long queries take--finished in: 
 1min 13sec appears on the job tracker. However, this is affected by a lot 
 more than just the quality or implementation of the query. For example, 
 number of mappers used varies a lot when you use subqueries versus 
 single-query aggregation, as does the amount of work necessary.
 For implementation comparisons (e.g., should I use this version of the query 
 or that one), ti would be great to know the processor time used instead of 
 the real time used...both in terms of mapper cpu seconds and reducer cpu 
 seconds.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-622) Implement reverse UDF


[ 
https://issues.apache.org/jira/browse/HIVE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729817#action_12729817
 ] 

Namit Jain commented on HIVE-622:
-

That api is misleading, charAt returns the character at byte position, not at 
character position. We can get it working using that - but it might be cleaner 
to traverse the byte array, extract characters and then reverse them.

 Implement reverse UDF
 -

 Key: HIVE-622
 URL: https://issues.apache.org/jira/browse/HIVE-622
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Emil Ibrishimov
Assignee: Emil Ibrishimov
 Fix For: 0.4.0

 Attachments: HIVE-622.1.patch, HIVE-622.2.patch


 Implement reverse as requested in 
 https://issues.apache.org/jira/browse/HIVE-615

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-626) Typecast bug in Join operator

Typecast bug in Join operator
-

 Key: HIVE-626
 URL: https://issues.apache.org/jira/browse/HIVE-626
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao


There is a type cast error in Join operator. Produced by the following steps:

{code}
create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b string,
foo_c string, foo_d string) row format delimited fields terminated by ','
stored as textfile;

create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name
string, bar_a string, bar_b string, bar_c string, bar_d string) row format
delimited fields terminated by ',' stored as textfile;

create table zshao_count (bar_id int, n int) row format delimited fields
terminated by ',' stored as textfile;


Each table has a single row as follows:

zshao_foo:
1,foo1,a,b,c,d

zshao_bar:
10,0,1,1,bar10,a,b,c,d

zshao_count:
10,2

load data local inpath 'zshao_foo' overwrite into table zshao_foo;
load data local inpath 'zshao_bar' overwrite into table zshao_bar;
load data local inpath 'zshao_count' overwrite into table zshao_count;

explain extended
select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar 
on zshao_foo.foo_id =
zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
{code}

The case is from David Lerman.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-624) Fix bug in TypeConverter


 [ 
https://issues.apache.org/jira/browse/HIVE-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-624:


Attachment: HIVE-624.2.patch

Fixed a compilation problem that didn't surface till ant clean.

 Fix bug in TypeConverter
 

 Key: HIVE-624
 URL: https://issues.apache.org/jira/browse/HIVE-624
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-624.1.patch, HIVE-624.2.patch


 There is a bug in the Converter - we are converting all objects to Primitive 
 Java objects instead of Writable.
 This has caused some queries to fail:
 {code}
 SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM any_table;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-626) Typecast bug in Join operator


 [ 
https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-626:


Attachment: HIVE-626.1.showinfo.patch

I added some instrumentation to the code (see HIVE-626.1.showinfo.patch)  The 
result of explain extended (below) shows that the order of the output column 
of the JoinOperator does not match that of the FileSinkOperator:

{code}
hive explain extended
 select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join 
zshao_bar on zshao_foo.foo_id =
 zshao_bar.foo_id join zshao_count on zshao_count.bar_id = 
zshao_bar.bar_id;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF zshao_foo) (TOK_TABREF 
zshao_bar) (= (. (TOK_TABLE_OR_COL zshao_foo) foo_id) (. (TOK_TABLE_OR_COL 
zshao_bar) foo_id))) (TOK_TABREF zshao_count) (= (. (TOK_TABLE_OR_COL 
zshao_count) bar_id) (. (TOK_TABLE_OR_COL zshao_bar) bar_id (TOK_INSERT 
(TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. 
(TOK_TABLE_OR_COL zshao_foo) foo_name)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL 
zshao_bar) bar_name)) (TOK_SELEXPR (TOK_TABLE_OR_COL n)

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias - Map Operator Tree:
...
  Reduce Operator Tree:
Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0 {VALUE._col1}
1 {VALUE._col0} {VALUE._col4}
  output names: _col1, _col6, _col10
  File Output Operator
compressed: true
GlobalTableId: 0
directory: hdfs://xxx:9000/tmp/hive-zshao/1413634235/10002
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
properties:
  name binary_table
  serialization.ddl struct binary_table { string _col1, string 
_col10, i32 _col6}
  serialization.format 
com.facebook.thrift.protocol.TBinaryProtocol
name: binary_table

  Stage: Stage-2
Map Reduce
  Alias - Map Operator Tree:
$INTNAME
...

{code}


The output of the join has the order: output names: _col1, _col6, _col10
The FileSinkOperator expects: struct binary_table { string _col1, string 
_col10, i32 _col6}


 Typecast bug in Join operator
 -

 Key: HIVE-626
 URL: https://issues.apache.org/jira/browse/HIVE-626
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-626.1.showinfo.patch


 There is a type cast error in Join operator. Produced by the following steps:
 {code}
 create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b 
 string,
 foo_c string, foo_d string) row format delimited fields terminated by ','
 stored as textfile;
 create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name
 string, bar_a string, bar_b string, bar_c string, bar_d string) row format
 delimited fields terminated by ',' stored as textfile;
 create table zshao_count (bar_id int, n int) row format delimited fields
 terminated by ',' stored as textfile;
 Each table has a single row as follows:
 zshao_foo:
 1,foo1,a,b,c,d
 zshao_bar:
 10,0,1,1,bar10,a,b,c,d
 zshao_count:
 10,2
 load data local inpath 'zshao_foo' overwrite into table zshao_foo;
 load data local inpath 'zshao_bar' overwrite into table zshao_bar;
 load data local inpath 'zshao_count' overwrite into table zshao_count;
 explain extended
 select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join 
 zshao_bar on zshao_foo.foo_id =
 zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
 {code}
 The case is from David Lerman.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-626) Typecast bug in Join operator


[ 
https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729899#action_12729899
 ] 

Zheng Shao commented on HIVE-626:
-

One hypothesis: the column names got sorted at some places - _col10  _col6.

I tried to disable column pruner but it didn't work as well.
{code}
hive set hive.optimize.ppd=false;
{code}


 Typecast bug in Join operator
 -

 Key: HIVE-626
 URL: https://issues.apache.org/jira/browse/HIVE-626
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-626.1.showinfo.patch


 There is a type cast error in Join operator. Produced by the following steps:
 {code}
 create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b 
 string,
 foo_c string, foo_d string) row format delimited fields terminated by ','
 stored as textfile;
 create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name
 string, bar_a string, bar_b string, bar_c string, bar_d string) row format
 delimited fields terminated by ',' stored as textfile;
 create table zshao_count (bar_id int, n int) row format delimited fields
 terminated by ',' stored as textfile;
 Each table has a single row as follows:
 zshao_foo:
 1,foo1,a,b,c,d
 zshao_bar:
 10,0,1,1,bar10,a,b,c,d
 zshao_count:
 10,2
 load data local inpath 'zshao_foo' overwrite into table zshao_foo;
 load data local inpath 'zshao_bar' overwrite into table zshao_bar;
 load data local inpath 'zshao_count' overwrite into table zshao_count;
 explain extended
 select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join 
 zshao_bar on zshao_foo.foo_id =
 zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
 {code}
 The case is from David Lerman.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-473) Clean up after tests


 [ 
https://issues.apache.org/jira/browse/HIVE-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-473:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Johan.

 Clean up after tests
 

 Key: HIVE-473
 URL: https://issues.apache.org/jira/browse/HIVE-473
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Johan Oskarsson
Assignee: Johan Oskarsson
Priority: Critical
 Fix For: 0.4.0

 Attachments: HIVE-473.patch


 The test suite creates a lot of temporary files that aren't cleaned up. For 
 example plan xml files, mapred/local and mapred/system files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-626) Typecast bug in Join operator


[ 
https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729923#action_12729923
 ] 

Namit Jain commented on HIVE-626:
-

Zheng, you are disabling partition pushdown - not column pruning.
For that, you need to change the code - there is no way to disable column 
pruning right now.

 Typecast bug in Join operator
 -

 Key: HIVE-626
 URL: https://issues.apache.org/jira/browse/HIVE-626
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-626.1.showinfo.patch


 There is a type cast error in Join operator. Produced by the following steps:
 {code}
 create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b 
 string,
 foo_c string, foo_d string) row format delimited fields terminated by ','
 stored as textfile;
 create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name
 string, bar_a string, bar_b string, bar_c string, bar_d string) row format
 delimited fields terminated by ',' stored as textfile;
 create table zshao_count (bar_id int, n int) row format delimited fields
 terminated by ',' stored as textfile;
 Each table has a single row as follows:
 zshao_foo:
 1,foo1,a,b,c,d
 zshao_bar:
 10,0,1,1,bar10,a,b,c,d
 zshao_count:
 10,2
 load data local inpath 'zshao_foo' overwrite into table zshao_foo;
 load data local inpath 'zshao_bar' overwrite into table zshao_bar;
 load data local inpath 'zshao_count' overwrite into table zshao_count;
 explain extended
 select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join 
 zshao_bar on zshao_foo.foo_id =
 zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
 {code}
 The case is from David Lerman.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-624) Fix bug in TypeConverter