[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934942#action_12934942 ] Ning Zhang commented on HIVE-1526: -- Carl, can you upload a new patch taking consideration of my other comments? I'll start test. Hive should depend on a release version of Thrift - Key: HIVE-1526 URL: https://issues.apache.org/jira/browse/HIVE-1526 Project: Hive Issue Type: Task Components: Build Infrastructure, Clients Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 Attachments: HIVE-1526-no-codegen.3.patch.txt, HIVE-1526.2.patch.txt, HIVE-1526.3.patch.txt, hive-1526.txt, libfb303.jar, libthrift.jar, serde2_test.patch, svn_rm.sh, thrift-0.5.0.jar, thrift-fb303-0.5.0.jar Hive should depend on a release version of Thrift, and ideally it should use Ivy to resolve this dependency. The Thrift folks are working on adding Thrift artifacts to a maven repository here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables
[ https://issues.apache.org/jira/browse/HIVE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HIVE-1804: - Attachment: hive-1804-2.patch Remove all the debug print statements. Please review Mapjoin will fail if there are no files associating with the join tables Key: HIVE-1804 URL: https://issues.apache.org/jira/browse/HIVE-1804 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1804-1.patch, hive-1804-2.patch If there are some empty tables without any file associated, the map join will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key
[ https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934957#action_12934957 ] He Yongqiang commented on HIVE-1802: For one Text key in join, i think in your patch you still need an array copy. For one Text key in group by, array copy is not needed. I mean the new code only process one Text key in Group by, which we can avoid array copy. For other cases, maybe we can optimize BinarySortableSerDe to use array copy instead of write? Encode MapReduce Shuffling Keys Differently for Single string/bigint Key - Key: HIVE-1802 URL: https://issues.apache.org/jira/browse/HIVE-1802 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-1802.1.patch Delimiters are not needed if we only have one shuffling key, and in the same time escaping delimiters are not needed. We can save some CPU time on serializing and shuffle slightly less amount of data to save memory footprint and network traffic. Also there is a bug that for group-by, we by mistake add a -1 to the end of the key and pay one more unnecessary mem-copy. Can be easily fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key
[ https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-1802: -- Status: Patch Available (was: Open) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key - Key: HIVE-1802 URL: https://issues.apache.org/jira/browse/HIVE-1802 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-1802.1.patch, HIVE-1802.2.patch Delimiters are not needed if we only have one shuffling key, and in the same time escaping delimiters are not needed. We can save some CPU time on serializing and shuffle slightly less amount of data to save memory footprint and network traffic. Also there is a bug that for group-by, we by mistake add a -1 to the end of the key and pay one more unnecessary mem-copy. Can be easily fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key
[ https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934980#action_12934980 ] Siying Dong commented on HIVE-1802: --- For any Group by, we needed 2 mem-copies. One from Text objects to buffer, one add an extra tag to the end of the buffer. Now, the case with single Text takes no mem-copy (except the first byte is 0) and for multiple keys it needs one (from Text object to buffer). For join, we needed 2 mem-copies. One from Text to buffer, one add tag. Now one single Text needs one copy from buffer to add a tag. Other cases we still need two copies. Encode MapReduce Shuffling Keys Differently for Single string/bigint Key - Key: HIVE-1802 URL: https://issues.apache.org/jira/browse/HIVE-1802 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-1802.1.patch, HIVE-1802.2.patch Delimiters are not needed if we only have one shuffling key, and in the same time escaping delimiters are not needed. We can save some CPU time on serializing and shuffle slightly less amount of data to save memory footprint and network traffic. Also there is a bug that for group-by, we by mistake add a -1 to the end of the key and pay one more unnecessary mem-copy. Can be easily fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1806) The merge criteria on dynamic partitons should be per partiton
The merge criteria on dynamic partitons should be per partiton -- Key: HIVE-1806 URL: https://issues.apache.org/jira/browse/HIVE-1806 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Currently the criteria of whether a merge job should be fired on dynamic generated partitions are is the average file size of files across all dynamic partitions. It is very common that some dynamic partitions contains mostly large files and some contains mostly small files. Even though the average size of the total files are larger than the hive.merge.smallfiles.avgsize, we should merge those partitions containing small files only. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer
No Element found exception in BucketMapJoinOptimizer Key: HIVE-1807 URL: https://issues.apache.org/jira/browse/HIVE-1807 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer
[ https://issues.apache.org/jira/browse/HIVE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1807: --- Attachment: HIVE-1807.1.patch No Element found exception in BucketMapJoinOptimizer Key: HIVE-1807 URL: https://issues.apache.org/jira/browse/HIVE-1807 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1807.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1792) track the joins which are being converted to map-join automatically
[ https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HIVE-1792: - Attachment: (was: hive-1792-2.patch) track the joins which are being converted to map-join automatically --- Key: HIVE-1792 URL: https://issues.apache.org/jira/browse/HIVE-1792 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1792-1.patch We should be able to track how many queries (join) got converted to map-join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1808) but in auto_join25.q
[ https://issues.apache.org/jira/browse/HIVE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang reassigned HIVE-1808: Assignee: Liyin Tang but in auto_join25.q Key: HIVE-1808 URL: https://issues.apache.org/jira/browse/HIVE-1808 Project: Hive Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hive-1808-1.patch In this test case, there are 2 SET statements: set hive.mapjoin.localtask.max.memory.usage = 0.0001; set hive.mapjoin.check.memory.rows = 2; But in HiveConf, the names of these 2 conf variable do not match with each other. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key
[ https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935043#action_12935043 ] He Yongqiang commented on HIVE-1802: For any Group by, we needed 2 mem-copies. One from Text objects to buffer, one add an extra tag to the end of the buffer. I think for Join we will need array copy and put a tag at the end. I mean optimize BinarySortableSerDe might be a better idea to optimize cases when need array copy. The code can be cleaner and simpler if only optimize the one Text key case in Group by, and put other optimizations in BinarySortableSerDe. Encode MapReduce Shuffling Keys Differently for Single string/bigint Key - Key: HIVE-1802 URL: https://issues.apache.org/jira/browse/HIVE-1802 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-1802.1.patch, HIVE-1802.2.patch Delimiters are not needed if we only have one shuffling key, and in the same time escaping delimiters are not needed. We can save some CPU time on serializing and shuffle slightly less amount of data to save memory footprint and network traffic. Also there is a bug that for group-by, we by mistake add a -1 to the end of the key and pay one more unnecessary mem-copy. Can be easily fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1797) Compressed the hashtable dump file before put into distributed cache
[ https://issues.apache.org/jira/browse/HIVE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935044#action_12935044 ] He Yongqiang commented on HIVE-1797: will take a look Compressed the hashtable dump file before put into distributed cache Key: HIVE-1797 URL: https://issues.apache.org/jira/browse/HIVE-1797 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hive-1797.patch, hive-1797_3.patch Clearly, the size of small table is the performance bottleneck for map join. Because the size of the small table will affect the memory usage and dumped hashtable file. That means there are 2 boundaries of the map join performance. 1)The memory usage for local task and mapred task 2)The dumped hashtable file size for distributed cache The reason that test case in last email spends most of the execution time on initializing is because it hits the second boundary. Since we have already bound the memory usage, one thing we can do is to let the performance never hits the secondary bound before it hits the first boundary. Assuming the heap size is 1.6 G and the small table file size is 15M compressed (75M uncompressed), local task can roughly hold that 1.5M unique rows in memory. Roughly the dumped file size will be 150M, which is too large to put into the distributed cache. From experiments, we can basically conclude when the dumped file size is smaller than 30M. The distributed cache works well and all the mappers will be initialized in a short time (less than 30 secs). One easy implementation is to compress the hashtable file. I use the gzip to compress the hashtable file and the file size is compressed from 100M to 13M. After several tests, all the mappers will be initialized in less than 23 secs. But this solution adds some decompression overhead to each mapper. Mappers on the same machine will do the duplicated decompression work. Maybe in the future, we can let the distributed cache to support this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer
[ https://issues.apache.org/jira/browse/HIVE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1807: --- Attachment: HIVE-1807.2.patch No Element found exception in BucketMapJoinOptimizer Key: HIVE-1807 URL: https://issues.apache.org/jira/browse/HIVE-1807 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1807.1.patch, HIVE-1807.2.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer
[ https://issues.apache.org/jira/browse/HIVE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935059#action_12935059 ] He Yongqiang commented on HIVE-1807: a new patch addressed Ning's comments No Element found exception in BucketMapJoinOptimizer Key: HIVE-1807 URL: https://issues.apache.org/jira/browse/HIVE-1807 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1807.1.patch, HIVE-1807.2.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal : Hive-trunk-h0.20 #431
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/431/
[jira] Commented: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables
[ https://issues.apache.org/jira/browse/HIVE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935080#action_12935080 ] He Yongqiang commented on HIVE-1804: will take a look Mapjoin will fail if there are no files associating with the join tables Key: HIVE-1804 URL: https://issues.apache.org/jira/browse/HIVE-1804 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1804-1.patch, hive-1804-2.patch If there are some empty tables without any file associated, the map join will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
hive roadmap
For the interest of the community, we have updated the following page: http://wiki.apache.org/hadoop/Hive/Roadmap If you are planning to work on a task, please add it to the appropriate section. This helps to track the major new features, and also help new contributors to pick up a project. Thanks, -Namit/John
[jira] Created: (HIVE-1809) Hive comparison operators are broken for NaN values
Hive comparison operators are broken for NaN values --- Key: HIVE-1809 URL: https://issues.apache.org/jira/browse/HIVE-1809 Project: Hive Issue Type: Bug Reporter: Paul Butler Assignee: Paul Butler Comparisons between NaN values and doubles do not work as expected: hive select 'NaN' = 4.3 from data_one limit 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/pbutler/pbutler_20101123145656_d23f9b77-8907-4ed3-aef9-8b99a1cc3138.log Job running in-process (local Hadoop) 2010-11-23 14:56:40,488 null map = 100%, reduce = 0% Ended Job = job_local_0001 OK true Time taken: 9.47 seconds hive select 4 'NaN' from data_one limit 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/pbutler/pbutler_20101123145858_0d243ac2-f745-4e25-9a38-509bef3bb370.log Job running in-process (local Hadoop) 2010-11-23 14:58:45,689 null map = 100%, reduce = 0% Ended Job = job_local_0001 OK false Time taken: 3.938 seconds -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1809) Hive comparison operators are broken for NaN values
[ https://issues.apache.org/jira/browse/HIVE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Butler updated HIVE-1809: -- Attachment: HIVE-1809.patch Hive comparison operators are broken for NaN values --- Key: HIVE-1809 URL: https://issues.apache.org/jira/browse/HIVE-1809 Project: Hive Issue Type: Bug Reporter: Paul Butler Assignee: Paul Butler Attachments: HIVE-1809.patch Comparisons between NaN values and doubles do not work as expected: hive select 'NaN' = 4.3 from data_one limit 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/pbutler/pbutler_20101123145656_d23f9b77-8907-4ed3-aef9-8b99a1cc3138.log Job running in-process (local Hadoop) 2010-11-23 14:56:40,488 null map = 100%, reduce = 0% Ended Job = job_local_0001 OK true Time taken: 9.47 seconds hive select 4 'NaN' from data_one limit 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/pbutler/pbutler_20101123145858_0d243ac2-f745-4e25-9a38-509bef3bb370.log Job running in-process (local Hadoop) 2010-11-23 14:58:45,689 null map = 100%, reduce = 0% Ended Job = job_local_0001 OK false Time taken: 3.938 seconds -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer
[ https://issues.apache.org/jira/browse/HIVE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1807: - Resolution: Fixed Fix Version/s: 0.7.0 Status: Resolved (was: Patch Available) Committed. Thanks Yongqiang! No Element found exception in BucketMapJoinOptimizer Key: HIVE-1807 URL: https://issues.apache.org/jira/browse/HIVE-1807 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.7.0 Attachments: HIVE-1807.1.patch, HIVE-1807.2.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1809) Hive comparison operators are broken for NaN values
[ https://issues.apache.org/jira/browse/HIVE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935161#action_12935161 ] Ning Zhang commented on HIVE-1809: -- +1. start testing. Hive comparison operators are broken for NaN values --- Key: HIVE-1809 URL: https://issues.apache.org/jira/browse/HIVE-1809 Project: Hive Issue Type: Bug Reporter: Paul Butler Assignee: Paul Butler Attachments: HIVE-1809.patch Comparisons between NaN values and doubles do not work as expected: hive select 'NaN' = 4.3 from data_one limit 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/pbutler/pbutler_20101123145656_d23f9b77-8907-4ed3-aef9-8b99a1cc3138.log Job running in-process (local Hadoop) 2010-11-23 14:56:40,488 null map = 100%, reduce = 0% Ended Job = job_local_0001 OK true Time taken: 9.47 seconds hive select 4 'NaN' from data_one limit 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Execution log at: /tmp/pbutler/pbutler_20101123145858_0d243ac2-f745-4e25-9a38-509bef3bb370.log Job running in-process (local Hadoop) 2010-11-23 14:58:45,689 null map = 100%, reduce = 0% Ended Job = job_local_0001 OK false Time taken: 3.938 seconds -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically
[ https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935167#action_12935167 ] Namit Jain commented on HIVE-1792: -- No need for this track the joins which are being converted to map-join automatically --- Key: HIVE-1792 URL: https://issues.apache.org/jira/browse/HIVE-1792 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1792-1.patch, hive-1792-2.patch We should be able to track how many queries (join) got converted to map-join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext
[ https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi resolved HIVE-1785. -- Resolution: Fixed Release Note: PreExecute and PostExecute have been deprecated in favor of ExecuteWithHookContext. Committed. Thanks Liyin! Could you explain this change on the user mailing list? Also, we need a followup patch for changing the description of hive.exec.pre/post.hooks in conf/hive-default.xml (I just remembered that). change Pre/Post Query Hooks to take in 1 parameter: HookContext --- Key: HIVE-1785 URL: https://issues.apache.org/jira/browse/HIVE-1785 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.0 Reporter: Namit Jain Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1785_3.patch, hive-1785_4.patch, hive-1785_6.patch, hive_1785_1.patch, hive_1785_2.patch This way, it would be possible to add new parameters to the hooks without changing the existing hooks. This will be a incompatible change, and all the hooks need to change to the new API -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1538) FilterOperator is applied twice with ppd on.
[ https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-1538: -- Attachment: patch-1538.txt Patch with following changes: * creates a filter operator with unpushed predicates, as a child of the operator through which the predicates could not be pushed. * removes original filter operator if it does not have any non-final candidates. With creating a child filter operator with the non-final candidates and removing the original one, I'm seeing some problems. So, would like to do that in a followup jira. * Updates all the tests with new explain plans. FilterOperator is applied twice with ppd on. Key: HIVE-1538 URL: https://issues.apache.org/jira/browse/HIVE-1538 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: patch-1538.txt With hive.optimize.ppd set to true, FilterOperator is applied twice. And it seems second operator is always filtering zero rows. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1538) FilterOperator is applied twice with ppd on.
[ https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-1538: -- Fix Version/s: 0.7.0 Status: Patch Available (was: Open) FilterOperator is applied twice with ppd on. Key: HIVE-1538 URL: https://issues.apache.org/jira/browse/HIVE-1538 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.7.0 Attachments: patch-1538.txt With hive.optimize.ppd set to true, FilterOperator is applied twice. And it seems second operator is always filtering zero rows. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Attachment: hive-1096-15.patch.txt Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-15.patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Status: Patch Available (was: Open) * trunk/conf/hive-default.xml: Spelling: substituation Fixed * trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java: Make these variables private? Private variables are what got us into the mess with hadoop. I am not going to repeat the problem. * trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java: Since we want to do substitution for all commands it would probably make sense to do the substitution in CommandProcessorFactory.get() and make CommandProcessor an abstract class with the following implementation: ... In other words, CommandProcessorFactory would return a CommandProcessor object that has been initialized with a substituted copy of the command. No. No more re factoring. It is working the way it is. Using factories going to be major. I'm tired. It does not prove anything since this entire process is not very clever anyway. Currently it is slightly baked, but I believe that better then being over designed. * trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: Replace these string literals with constants, e.g: public static final String ENV_PREFIX = env:; public static final String SYSTEM_PREFIX = system: public static final String HIVECONF_PREFIX = hiveconf: Fixed * trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: String propName = varname.substring(SYSTEM_PREFIX.length()); Fixed * trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java: Can we remove this special case for silent? In SessionState this actually maps to hive.session.silent and I don't see any test cases that cover this case, i.e. that call set silent or set silent=x. It also seems that this introduces in inconsistency since set silent will show the value of hive.session.silent, but the output of set will not list a value for the property silent. Anyone know if there is any older code that depends on this behavior? Do not really know. do not really care :) Out of scope. It is there I am leaving it. As for the VAR. Turns out supporting this is not very easy. Adding Options Parsing to the CLI works, however the session state gives you no where to store variables except in the hive conf. SetProcessor works with SessionState not CLI SessionState. Again big re factoring is needed. What I did do is move remove support for set y=${x}. This patch only adds set y=${hiveconf:x}. Thus if someone cares to add VAR X or ${x} or determine how to change the CLI to add this other map that can be shared across the session state this patch is not in the way. Thus substitution only works for ${hiveconf:x} ${system:x} and ${env:x}. implementing ${x} and var can be done in a separate issue. Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-15.patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically
[ https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935234#action_12935234 ] Liyin Tang commented on HIVE-1792: -- There will be 2 cases to run the common join. One is when the resolver of the conditional task returns the common join. Another is when the map join local task fails. If not reset the tag during the getting the backup task, how to distinguish these 2 cases? track the joins which are being converted to map-join automatically --- Key: HIVE-1792 URL: https://issues.apache.org/jira/browse/HIVE-1792 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: hive-1792-1.patch, hive-1792-2.patch, hive-1792-3.patch We should be able to track how many queries (join) got converted to map-join -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.