[jira] Created: (HIVE-624) Fix bug in TypeConverter
Fix bug in TypeConverter Key: HIVE-624 URL: https://issues.apache.org/jira/browse/HIVE-624 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao There is a bug in the Converter - we are converting all objects to Primitive Java objects instead of Writable. This has caused some queries to fail: {code} SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM any_table; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-624) Fix bug in TypeConverter
[ https://issues.apache.org/jira/browse/HIVE-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-624: Attachment: HIVE-624.1.patch This patch fixes the problem by rewriting all the converters. The converters are expanded to a class hierarchy so it can reuse the returned conversion result (a Writable object) more easily. I also added the Settable*ObjectInspector which provides a delegated way of setting the value of an object and creating new objects. It also adds one new test case for the converters, and one more client positive test case. Fix bug in TypeConverter Key: HIVE-624 URL: https://issues.apache.org/jira/browse/HIVE-624 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-624.1.patch There is a bug in the Converter - we are converting all objects to Primitive Java objects instead of Writable. This has caused some queries to fail: {code} SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM any_table; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-522) GenericUDAF: Extend UDAF to deal with complex types
[ https://issues.apache.org/jira/browse/HIVE-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-522: Attachment: HIVE-522.6.patch Merged with trunk again and fixed all tests. GenericUDAF: Extend UDAF to deal with complex types --- Key: HIVE-522 URL: https://issues.apache.org/jira/browse/HIVE-522 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.4.0 Attachments: HIVE-522.1.patch, HIVE-522.2.patch, HIVE-522.3.patch, HIVE-522.4.patch, HIVE-522.5.patch, HIVE-522.6.patch We can pass arbitrary arguments into GenericUDFs. We should do the same thing to GenericUDAF so that UDAF can also take arbitrary arguments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary
Use of BinarySortableSerDe for serialization of the value between map and reduce boundary - Key: HIVE-625 URL: https://issues.apache.org/jira/browse/HIVE-625 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao We currently use LazySimpleSerDe which serializes double to text format. Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe because that's still much faster than LazySimpleSerDe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary
[ https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-625: Attachment: HIVE-625.1.patch Some extreme test result shows there is a big performance improvement. {code} select CAST(rand() * 1024 * 1024 AS INT) as a, rand() as b from mytable cluster by a limit 10; {code} The key is an int, and the value is a double. I ran this on an example table. The mappers of the new code takes on average 98 seconds. The mappers of the old code (without this patch) takes on average 165 seconds. Although this is an extreme example, it does show the huge improvement from using the binary serialization format. Note that the test was done with gzip as mapred.map.output.compression.codec, so the difference of time is exaggerated a bit (compared with the same when we use Lzo). Use of BinarySortableSerDe for serialization of the value between map and reduce boundary - Key: HIVE-625 URL: https://issues.apache.org/jira/browse/HIVE-625 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-625.1.patch We currently use LazySimpleSerDe which serializes double to text format. Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe because that's still much faster than LazySimpleSerDe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary
[ https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-625: Status: Patch Available (was: Open) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary - Key: HIVE-625 URL: https://issues.apache.org/jira/browse/HIVE-625 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-625.1.patch We currently use LazySimpleSerDe which serializes double to text format. Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe because that's still much faster than LazySimpleSerDe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-524) ExecDriver adds 0 byte file to input paths
[ https://issues.apache.org/jira/browse/HIVE-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729598#action_12729598 ] Johan Oskarsson commented on HIVE-524: -- Seems HIVE-195 didn't fix this issue? I just ran into it again with the latest trunk checkout ExecDriver adds 0 byte file to input paths -- Key: HIVE-524 URL: https://issues.apache.org/jira/browse/HIVE-524 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.0 Reporter: Johan Oskarsson Fix For: 0.4.0 In the addInputPaths method in ExecDriver: If the input path of a partition cannot be found or contains no files with data in them, a 0 byte file is created and added to the job instead. This causes our custom InputFormat to throw an exception since it is asked to process an unknown file format (not an lzo file). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.17 #149
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/149/changes Changes: [namit] HIVE-622. add UDF reverse (Emil Ibrishimov via namit) [namit] HIVE-553. add BinarySortableSerDe (Zheng Shao via namit) [zshao] HIVE-610. Move all properties from jpox.properties to hive-site.xml. (Prasad Chakka via zshao) [namit] HIVE-527. Inserting into a partitioned table without specifying the partition field should fail. (He Yongqiang via namit) -- [...truncated 14563 lines...] [junit] OK [junit] Loading data to table src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] OK [junit] Loading data to table src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out [junit] Done query: unknown_column6.q [junit] Begin query: unknown_function1.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] Loading data to table src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] OK [junit] Loading data to table src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function1.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function1.q.out [junit] Done query: unknown_function1.q [junit] Begin query: unknown_function2.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] Loading data to table src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] OK [junit] Loading data to table src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function2.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function2.q.out [junit] Done query: unknown_function2.q [junit] Begin query: unknown_function3.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] Loading data to table src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] OK [junit] Loading data to table src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function3.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function3.q.out [junit] Done query: unknown_function3.q [junit] Begin query: unknown_function4.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK
Release 0.3.1?
Hi everyone, All bugs except one assigned to 0.3.1 have been fixed for quite some time. Shall we try to make a release candidate for 0.3.1 and push the last (windows) bug to 0.3.2? /Johan
Build failed in Hudson: Hive-trunk-h0.18 #151
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/151/ -- started Building remotely on minerva.apache.org (Ubuntu) FATAL: remote file operation failed hudson.util.IOException2: remote file operation failed at hudson.FilePath.act(FilePath.java:430) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:469) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:418) at hudson.model.AbstractProject.checkout(AbstractProject.java:801) at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:314) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:266) at hudson.model.Run.run(Run.java:896) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:119) Caused by: java.io.IOException: Unable to delete http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/test/junit_metastore_db/log - files in dir: [http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/test/junit_metastore_db/log/log3.dat] at hudson.Util.deleteFile(Util.java:215) at hudson.Util.deleteRecursive(Util.java:248) at hudson.Util.deleteContentsRecursive(Util.java:182) at hudson.Util.deleteRecursive(Util.java:247) at hudson.Util.deleteContentsRecursive(Util.java:182) at hudson.Util.deleteRecursive(Util.java:247) at hudson.Util.deleteContentsRecursive(Util.java:182) at hudson.Util.deleteRecursive(Util.java:247) at hudson.Util.deleteContentsRecursive(Util.java:182) at hudson.Util.deleteRecursive(Util.java:247) at hudson.Util.deleteContentsRecursive(Util.java:182) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:532) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:476) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:1283) at hudson.remoting.UserRequest.perform(UserRequest.java:69) at hudson.remoting.UserRequest.perform(UserRequest.java:23) at hudson.remoting.Request$2.run(Request.java:213) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
Hudson build is back to normal: Hive-trunk-h0.17 #150
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/150/
Hudson build is back to normal: Hive-trunk-h0.18 #152
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/152/
[jira] Commented: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary
[ https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729677#action_12729677 ] Namit Jain commented on HIVE-625: - But, is it a good idea to always do so ? Can't you come up with another testcase where the opposite will hold. The reducer has a highly selective filter at the beginning, followed by a select. In binarysortableserde, all the columns are read, whereas in lazysimleserde, most of the columns are not even materialized. Use of BinarySortableSerDe for serialization of the value between map and reduce boundary - Key: HIVE-625 URL: https://issues.apache.org/jira/browse/HIVE-625 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-625.1.patch We currently use LazySimpleSerDe which serializes double to text format. Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe because that's still much faster than LazySimpleSerDe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-473) Clean up after tests
[ https://issues.apache.org/jira/browse/HIVE-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Oskarsson updated HIVE-473: - Assignee: Johan Oskarsson Status: Patch Available (was: Open) Clean up after tests Key: HIVE-473 URL: https://issues.apache.org/jira/browse/HIVE-473 Project: Hadoop Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Johan Oskarsson Assignee: Johan Oskarsson Priority: Critical Fix For: 0.4.0 Attachments: HIVE-473.patch The test suite creates a lot of temporary files that aren't cleaned up. For example plan xml files, mapred/local and mapred/system files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-473) Clean up after tests
[ https://issues.apache.org/jira/browse/HIVE-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729733#action_12729733 ] Namit Jain commented on HIVE-473: - +1 The patch looks good - will commit if the tests pass Clean up after tests Key: HIVE-473 URL: https://issues.apache.org/jira/browse/HIVE-473 Project: Hadoop Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Johan Oskarsson Assignee: Johan Oskarsson Priority: Critical Fix For: 0.4.0 Attachments: HIVE-473.patch The test suite creates a lot of temporary files that aren't cleaned up. For example plan xml files, mapred/local and mapred/system files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary
[ https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729772#action_12729772 ] Zheng Shao commented on HIVE-625: - For this particular case, I think predicate push down will push the filter to the mapper side. And partition pruner will prune out all columns that are not accessed. So, the reducer will probably read all columns that are passed through map and reduce boundary. I agree there can still be other opposite cases - but that won't appear often. I can also make this SerDe configurable if that's a better idea. What do you think? Use of BinarySortableSerDe for serialization of the value between map and reduce boundary - Key: HIVE-625 URL: https://issues.apache.org/jira/browse/HIVE-625 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-625.1.patch We currently use LazySimpleSerDe which serializes double to text format. Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe because that's still much faster than LazySimpleSerDe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-622) Implement reverse UDF
[ https://issues.apache.org/jira/browse/HIVE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729780#action_12729780 ] Zheng Shao commented on HIVE-622: - Do we want to make these String UDFs more efficiently by directly using Text instead of String? I saw the comment in the code but shall we do it now or later? Implement reverse UDF - Key: HIVE-622 URL: https://issues.apache.org/jira/browse/HIVE-622 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Emil Ibrishimov Assignee: Emil Ibrishimov Fix For: 0.4.0 Attachments: HIVE-622.1.patch, HIVE-622.2.patch Implement reverse as requested in https://issues.apache.org/jira/browse/HIVE-615 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-478) Surface processor time for queries
[ https://issues.apache.org/jira/browse/HIVE-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729787#action_12729787 ] Adam Kramer commented on HIVE-478: -- Also, in case it was not obvious, the current system counts time going by while mappers/reducers are pending. This request would tell me how much time I actually used, e.g., not include time spent waiting for mappers or reducers. Surface processor time for queries Key: HIVE-478 URL: https://issues.apache.org/jira/browse/HIVE-478 Project: Hadoop Hive Issue Type: Wish Components: Logging, Query Processor Reporter: Adam Kramer We currently list real-time metrics of how long queries take--finished in: 1min 13sec appears on the job tracker. However, this is affected by a lot more than just the quality or implementation of the query. For example, number of mappers used varies a lot when you use subqueries versus single-query aggregation, as does the amount of work necessary. For implementation comparisons (e.g., should I use this version of the query or that one), ti would be great to know the processor time used instead of the real time used...both in terms of mapper cpu seconds and reducer cpu seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-622) Implement reverse UDF
[ https://issues.apache.org/jira/browse/HIVE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729817#action_12729817 ] Namit Jain commented on HIVE-622: - That api is misleading, charAt returns the character at byte position, not at character position. We can get it working using that - but it might be cleaner to traverse the byte array, extract characters and then reverse them. Implement reverse UDF - Key: HIVE-622 URL: https://issues.apache.org/jira/browse/HIVE-622 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Emil Ibrishimov Assignee: Emil Ibrishimov Fix For: 0.4.0 Attachments: HIVE-622.1.patch, HIVE-622.2.patch Implement reverse as requested in https://issues.apache.org/jira/browse/HIVE-615 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-626) Typecast bug in Join operator
Typecast bug in Join operator - Key: HIVE-626 URL: https://issues.apache.org/jira/browse/HIVE-626 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao There is a type cast error in Join operator. Produced by the following steps: {code} create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b string, foo_c string, foo_d string) row format delimited fields terminated by ',' stored as textfile; create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name string, bar_a string, bar_b string, bar_c string, bar_d string) row format delimited fields terminated by ',' stored as textfile; create table zshao_count (bar_id int, n int) row format delimited fields terminated by ',' stored as textfile; Each table has a single row as follows: zshao_foo: 1,foo1,a,b,c,d zshao_bar: 10,0,1,1,bar10,a,b,c,d zshao_count: 10,2 load data local inpath 'zshao_foo' overwrite into table zshao_foo; load data local inpath 'zshao_bar' overwrite into table zshao_bar; load data local inpath 'zshao_count' overwrite into table zshao_count; explain extended select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar on zshao_foo.foo_id = zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id; {code} The case is from David Lerman. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-624) Fix bug in TypeConverter
[ https://issues.apache.org/jira/browse/HIVE-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-624: Attachment: HIVE-624.2.patch Fixed a compilation problem that didn't surface till ant clean. Fix bug in TypeConverter Key: HIVE-624 URL: https://issues.apache.org/jira/browse/HIVE-624 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-624.1.patch, HIVE-624.2.patch There is a bug in the Converter - we are converting all objects to Primitive Java objects instead of Writable. This has caused some queries to fail: {code} SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM any_table; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-626) Typecast bug in Join operator
[ https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-626: Attachment: HIVE-626.1.showinfo.patch I added some instrumentation to the code (see HIVE-626.1.showinfo.patch) The result of explain extended (below) shows that the order of the output column of the JoinOperator does not match that of the FileSinkOperator: {code} hive explain extended select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar on zshao_foo.foo_id = zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF zshao_foo) (TOK_TABREF zshao_bar) (= (. (TOK_TABLE_OR_COL zshao_foo) foo_id) (. (TOK_TABLE_OR_COL zshao_bar) foo_id))) (TOK_TABREF zshao_count) (= (. (TOK_TABLE_OR_COL zshao_count) bar_id) (. (TOK_TABLE_OR_COL zshao_bar) bar_id (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL zshao_foo) foo_name)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL zshao_bar) bar_name)) (TOK_SELEXPR (TOK_TABLE_OR_COL n) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: ... Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col4} output names: _col1, _col6, _col10 File Output Operator compressed: true GlobalTableId: 0 directory: hdfs://xxx:9000/tmp/hive-zshao/1413634235/10002 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat properties: name binary_table serialization.ddl struct binary_table { string _col1, string _col10, i32 _col6} serialization.format com.facebook.thrift.protocol.TBinaryProtocol name: binary_table Stage: Stage-2 Map Reduce Alias - Map Operator Tree: $INTNAME ... {code} The output of the join has the order: output names: _col1, _col6, _col10 The FileSinkOperator expects: struct binary_table { string _col1, string _col10, i32 _col6} Typecast bug in Join operator - Key: HIVE-626 URL: https://issues.apache.org/jira/browse/HIVE-626 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-626.1.showinfo.patch There is a type cast error in Join operator. Produced by the following steps: {code} create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b string, foo_c string, foo_d string) row format delimited fields terminated by ',' stored as textfile; create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name string, bar_a string, bar_b string, bar_c string, bar_d string) row format delimited fields terminated by ',' stored as textfile; create table zshao_count (bar_id int, n int) row format delimited fields terminated by ',' stored as textfile; Each table has a single row as follows: zshao_foo: 1,foo1,a,b,c,d zshao_bar: 10,0,1,1,bar10,a,b,c,d zshao_count: 10,2 load data local inpath 'zshao_foo' overwrite into table zshao_foo; load data local inpath 'zshao_bar' overwrite into table zshao_bar; load data local inpath 'zshao_count' overwrite into table zshao_count; explain extended select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar on zshao_foo.foo_id = zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id; {code} The case is from David Lerman. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-626) Typecast bug in Join operator
[ https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729899#action_12729899 ] Zheng Shao commented on HIVE-626: - One hypothesis: the column names got sorted at some places - _col10 _col6. I tried to disable column pruner but it didn't work as well. {code} hive set hive.optimize.ppd=false; {code} Typecast bug in Join operator - Key: HIVE-626 URL: https://issues.apache.org/jira/browse/HIVE-626 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-626.1.showinfo.patch There is a type cast error in Join operator. Produced by the following steps: {code} create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b string, foo_c string, foo_d string) row format delimited fields terminated by ',' stored as textfile; create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name string, bar_a string, bar_b string, bar_c string, bar_d string) row format delimited fields terminated by ',' stored as textfile; create table zshao_count (bar_id int, n int) row format delimited fields terminated by ',' stored as textfile; Each table has a single row as follows: zshao_foo: 1,foo1,a,b,c,d zshao_bar: 10,0,1,1,bar10,a,b,c,d zshao_count: 10,2 load data local inpath 'zshao_foo' overwrite into table zshao_foo; load data local inpath 'zshao_bar' overwrite into table zshao_bar; load data local inpath 'zshao_count' overwrite into table zshao_count; explain extended select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar on zshao_foo.foo_id = zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id; {code} The case is from David Lerman. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-473) Clean up after tests
[ https://issues.apache.org/jira/browse/HIVE-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-473: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Johan. Clean up after tests Key: HIVE-473 URL: https://issues.apache.org/jira/browse/HIVE-473 Project: Hadoop Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Johan Oskarsson Assignee: Johan Oskarsson Priority: Critical Fix For: 0.4.0 Attachments: HIVE-473.patch The test suite creates a lot of temporary files that aren't cleaned up. For example plan xml files, mapred/local and mapred/system files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-626) Typecast bug in Join operator
[ https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729923#action_12729923 ] Namit Jain commented on HIVE-626: - Zheng, you are disabling partition pushdown - not column pruning. For that, you need to change the code - there is no way to disable column pruning right now. Typecast bug in Join operator - Key: HIVE-626 URL: https://issues.apache.org/jira/browse/HIVE-626 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-626.1.showinfo.patch There is a type cast error in Join operator. Produced by the following steps: {code} create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b string, foo_c string, foo_d string) row format delimited fields terminated by ',' stored as textfile; create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name string, bar_a string, bar_b string, bar_c string, bar_d string) row format delimited fields terminated by ',' stored as textfile; create table zshao_count (bar_id int, n int) row format delimited fields terminated by ',' stored as textfile; Each table has a single row as follows: zshao_foo: 1,foo1,a,b,c,d zshao_bar: 10,0,1,1,bar10,a,b,c,d zshao_count: 10,2 load data local inpath 'zshao_foo' overwrite into table zshao_foo; load data local inpath 'zshao_bar' overwrite into table zshao_bar; load data local inpath 'zshao_count' overwrite into table zshao_count; explain extended select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join zshao_bar on zshao_foo.foo_id = zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id; {code} The case is from David Lerman. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-624) Fix bug in TypeConverter
[ https://issues.apache.org/jira/browse/HIVE-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-624: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Zheng. Fix bug in TypeConverter Key: HIVE-624 URL: https://issues.apache.org/jira/browse/HIVE-624 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-624.1.patch, HIVE-624.2.patch There is a bug in the Converter - we are converting all objects to Primitive Java objects instead of Writable. This has caused some queries to fail: {code} SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM any_table; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.