date:20101123


[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934942#action_12934942
 ] 

Ning Zhang commented on HIVE-1526:
--

Carl, can you upload a new patch taking consideration of my other comments? 
I'll start test.

 Hive should depend on a release version of Thrift
 -

 Key: HIVE-1526
 URL: https://issues.apache.org/jira/browse/HIVE-1526
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure, Clients
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0

 Attachments: HIVE-1526-no-codegen.3.patch.txt, HIVE-1526.2.patch.txt, 
 HIVE-1526.3.patch.txt, hive-1526.txt, libfb303.jar, libthrift.jar, 
 serde2_test.patch, svn_rm.sh, thrift-0.5.0.jar, thrift-fb303-0.5.0.jar


 Hive should depend on a release version of Thrift, and ideally it should use 
 Ivy to resolve this dependency.
 The Thrift folks are working on adding Thrift artifacts to a maven repository 
 here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables


 [ 
https://issues.apache.org/jira/browse/HIVE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1804:
-

Attachment: hive-1804-2.patch

Remove all the debug print statements.
Please review

 Mapjoin will fail if there are no files associating with the join tables
 

 Key: HIVE-1804
 URL: https://issues.apache.org/jira/browse/HIVE-1804
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: hive-1804-1.patch, hive-1804-2.patch


 If there are some empty tables without any file associated, the map join will 
 fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key


[ 
https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934957#action_12934957
 ] 

He Yongqiang commented on HIVE-1802:


For one Text key in join, i think in your patch you still need an array copy.  
For one Text key in group by, array copy is not needed.

I mean the new code only process one Text key in Group by, which we can avoid 
array copy.

For other cases, maybe we can optimize BinarySortableSerDe to use array copy 
instead of write?

 Encode MapReduce Shuffling Keys Differently for  Single string/bigint Key
 -

 Key: HIVE-1802
 URL: https://issues.apache.org/jira/browse/HIVE-1802
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-1802.1.patch


 Delimiters are not needed if we only have one shuffling key, and in the same 
 time escaping delimiters are not needed. We can save some CPU time on 
 serializing and shuffle slightly less amount of data to save memory footprint 
 and network traffic.
 Also there is a bug that for group-by, we by mistake add a -1 to the end of 
 the key and pay one more unnecessary mem-copy. Can be easily fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key

2010-11-23 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1802:
--

Status: Patch Available  (was: Open)

 Encode MapReduce Shuffling Keys Differently for  Single string/bigint Key
 -

 Key: HIVE-1802
 URL: https://issues.apache.org/jira/browse/HIVE-1802
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-1802.1.patch, HIVE-1802.2.patch


 Delimiters are not needed if we only have one shuffling key, and in the same 
 time escaping delimiters are not needed. We can save some CPU time on 
 serializing and shuffle slightly less amount of data to save memory footprint 
 and network traffic.
 Also there is a bug that for group-by, we by mistake add a -1 to the end of 
 the key and pay one more unnecessary mem-copy. Can be easily fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key

2010-11-23 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934980#action_12934980
 ] 

Siying Dong commented on HIVE-1802:
---

For any Group by, we needed 2 mem-copies. One from Text objects to buffer, one 
add an extra tag to the end of the buffer.
Now, the case with single Text takes no mem-copy (except the first byte is 0) 
and for multiple keys it needs one (from Text object to buffer).

For join, we needed 2 mem-copies. One from Text to buffer, one add tag.
Now one single Text needs one copy from buffer to add a tag. Other cases we 
still need two copies.

 Encode MapReduce Shuffling Keys Differently for  Single string/bigint Key
 -

 Key: HIVE-1802
 URL: https://issues.apache.org/jira/browse/HIVE-1802
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-1802.1.patch, HIVE-1802.2.patch


 Delimiters are not needed if we only have one shuffling key, and in the same 
 time escaping delimiters are not needed. We can save some CPU time on 
 serializing and shuffle slightly less amount of data to save memory footprint 
 and network traffic.
 Also there is a bug that for group-by, we by mistake add a -1 to the end of 
 the key and pay one more unnecessary mem-copy. Can be easily fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1806) The merge criteria on dynamic partitons should be per partiton

The merge criteria on dynamic partitons should be per partiton
--

 Key: HIVE-1806
 URL: https://issues.apache.org/jira/browse/HIVE-1806
 Project: Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang


Currently the criteria of whether a merge job should be fired on dynamic 
generated partitions are is the average file size of files across all dynamic 
partitions. It is very common that some dynamic partitions contains mostly 
large files and some contains mostly small files. Even though the average size 
of the total files are larger than the hive.merge.smallfiles.avgsize, we should 
merge those partitions containing small files only. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer

No Element found exception in BucketMapJoinOptimizer


 Key: HIVE-1807
 URL: https://issues.apache.org/jira/browse/HIVE-1807
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer


 [ 
https://issues.apache.org/jira/browse/HIVE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1807:
---

Attachment: HIVE-1807.1.patch

 No Element found exception in BucketMapJoinOptimizer
 

 Key: HIVE-1807
 URL: https://issues.apache.org/jira/browse/HIVE-1807
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1807.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1792) track the joins which are being converted to map-join automatically


 [ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1792:
-

Attachment: (was: hive-1792-2.patch)

 track the joins which are being converted to map-join automatically
 ---

 Key: HIVE-1792
 URL: https://issues.apache.org/jira/browse/HIVE-1792
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: hive-1792-1.patch


 We should be able to track how many queries (join) got converted to
 map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1808) but in auto_join25.q


 [ 
https://issues.apache.org/jira/browse/HIVE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HIVE-1808:


Assignee: Liyin Tang

 but in auto_join25.q
 

 Key: HIVE-1808
 URL: https://issues.apache.org/jira/browse/HIVE-1808
 Project: Hive
  Issue Type: Bug
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hive-1808-1.patch


 In this test case, there are 2 SET statements:
 set hive.mapjoin.localtask.max.memory.usage = 0.0001;
 set hive.mapjoin.check.memory.rows = 2;
 But in HiveConf, the names of these 2 conf variable do not match with each 
 other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key


[ 
https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935043#action_12935043
 ] 

He Yongqiang commented on HIVE-1802:


For any Group by, we needed 2 mem-copies. One from Text objects to buffer, 
one add an extra tag to the end of the buffer.
I think for Join we will need array copy and put a tag at the end.

I mean optimize BinarySortableSerDe might be a better idea to optimize cases 
when need array copy.
The code can be cleaner and simpler if only optimize the one Text key case in 
Group by, and put other optimizations in BinarySortableSerDe.

 Encode MapReduce Shuffling Keys Differently for  Single string/bigint Key
 -

 Key: HIVE-1802
 URL: https://issues.apache.org/jira/browse/HIVE-1802
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-1802.1.patch, HIVE-1802.2.patch


 Delimiters are not needed if we only have one shuffling key, and in the same 
 time escaping delimiters are not needed. We can save some CPU time on 
 serializing and shuffle slightly less amount of data to save memory footprint 
 and network traffic.
 Also there is a bug that for group-by, we by mistake add a -1 to the end of 
 the key and pay one more unnecessary mem-copy. Can be easily fixed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1797) Compressed the hashtable dump file before put into distributed cache

[
https://issues.apache.org/jira/browse/HIVE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935044#action_12935044
]

He Yongqiang commented on HIVE-1797:

will take a look

Compressed the hashtable dump file before put into distributed cache

Key: HIVE-1797
URL: https://issues.apache.org/jira/browse/HIVE-1797
Project: Hive
Issue Type: Improvement
Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
Attachments: hive-1797.patch, hive-1797_3.patch

Clearly, the size of small table is the performance bottleneck for map join.
Because the size of the small table will affect the memory usage and dumped
hashtable file.
That means there are 2 boundaries of the map join performance.
1)The memory usage for local task and mapred task
2)The dumped hashtable file size for distributed cache
The reason that test case in last email spends most of the execution time on
initializing is because it hits the second boundary.
Since we have already bound the memory usage, one thing we can do is to let
the performance never hits the secondary bound before it hits the first
boundary.
Assuming the heap size is 1.6 G and the small table file size is 15M
compressed (75M uncompressed),
local task can roughly hold that 1.5M unique rows in memory.
Roughly the dumped file size will be 150M, which is too large to put into the
distributed cache.

From experiments, we can basically conclude when the dumped file size is
smaller than 30M.
The distributed cache works well and all the mappers will be initialized in
a short time (less than 30 secs).
One easy implementation is to compress the hashtable file.
I use the gzip to compress the hashtable file and the file size is compressed
from 100M to 13M.
After several tests, all the mappers will be initialized in less than 23 secs.
But this solution adds some decompression overhead to each mapper.
Mappers on the same machine will do the duplicated decompression work.
Maybe in the future, we can let the distributed cache to support this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer


 [ 
https://issues.apache.org/jira/browse/HIVE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1807:
---

Attachment: HIVE-1807.2.patch

 No Element found exception in BucketMapJoinOptimizer
 

 Key: HIVE-1807
 URL: https://issues.apache.org/jira/browse/HIVE-1807
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1807.1.patch, HIVE-1807.2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer


[ 
https://issues.apache.org/jira/browse/HIVE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935059#action_12935059
 ] 

He Yongqiang commented on HIVE-1807:


a new patch addressed Ning's comments

 No Element found exception in BucketMapJoinOptimizer
 

 Key: HIVE-1807
 URL: https://issues.apache.org/jira/browse/HIVE-1807
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1807.1.patch, HIVE-1807.2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Hudson build is back to normal : Hive-trunk-h0.20 #431

2010-11-23 Thread Apache Hudson Server

See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/431/

[jira] Commented: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables


[ 
https://issues.apache.org/jira/browse/HIVE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935080#action_12935080
 ] 

He Yongqiang commented on HIVE-1804:


will take a look

 Mapjoin will fail if there are no files associating with the join tables
 

 Key: HIVE-1804
 URL: https://issues.apache.org/jira/browse/HIVE-1804
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: hive-1804-1.patch, hive-1804-2.patch


 If there are some empty tables without any file associated, the map join will 
 fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

hive roadmap

2010-11-23 Thread Namit Jain

For the interest of the community, we have updated the following page:
  http://wiki.apache.org/hadoop/Hive/Roadmap

If you are planning to work on a task, please add it to the appropriate section.
This helps to track the major new features, and also help new contributors to
pick up a project.



Thanks,
-Namit/John

[jira] Created: (HIVE-1809) Hive comparison operators are broken for NaN values

2010-11-23 Thread Paul Butler (JIRA)

Hive comparison operators are broken for NaN values
---

 Key: HIVE-1809
 URL: https://issues.apache.org/jira/browse/HIVE-1809
 Project: Hive
  Issue Type: Bug
Reporter: Paul Butler
Assignee: Paul Butler


Comparisons between NaN values and doubles do not work as expected:

hive select 'NaN' = 4.3 from data_one limit 1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: 
/tmp/pbutler/pbutler_20101123145656_d23f9b77-8907-4ed3-aef9-8b99a1cc3138.log
Job running in-process (local Hadoop)
2010-11-23 14:56:40,488 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
OK
true
Time taken: 9.47 seconds
hive select 4  'NaN' from data_one limit 1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: 
/tmp/pbutler/pbutler_20101123145858_0d243ac2-f745-4e25-9a38-509bef3bb370.log
Job running in-process (local Hadoop)
2010-11-23 14:58:45,689 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
OK
false
Time taken: 3.938 seconds

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1809) Hive comparison operators are broken for NaN values

2010-11-23 Thread Paul Butler (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Butler updated HIVE-1809:
--

Attachment: HIVE-1809.patch

 Hive comparison operators are broken for NaN values
 ---

 Key: HIVE-1809
 URL: https://issues.apache.org/jira/browse/HIVE-1809
 Project: Hive
  Issue Type: Bug
Reporter: Paul Butler
Assignee: Paul Butler
 Attachments: HIVE-1809.patch


 Comparisons between NaN values and doubles do not work as expected:
 hive select 'NaN' = 4.3 from data_one limit 1;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Execution log at: 
 /tmp/pbutler/pbutler_20101123145656_d23f9b77-8907-4ed3-aef9-8b99a1cc3138.log
 Job running in-process (local Hadoop)
 2010-11-23 14:56:40,488 null map = 100%,  reduce = 0%
 Ended Job = job_local_0001
 OK
 true
 Time taken: 9.47 seconds
 hive select 4  'NaN' from data_one limit 1;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Execution log at: 
 /tmp/pbutler/pbutler_20101123145858_0d243ac2-f745-4e25-9a38-509bef3bb370.log
 Job running in-process (local Hadoop)
 2010-11-23 14:58:45,689 null map = 100%,  reduce = 0%
 Ended Job = job_local_0001
 OK
 false
 Time taken: 3.938 seconds

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1807) No Element found exception in BucketMapJoinOptimizer


 [ 
https://issues.apache.org/jira/browse/HIVE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1807:
-

   Resolution: Fixed
Fix Version/s: 0.7.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks Yongqiang!

 No Element found exception in BucketMapJoinOptimizer
 

 Key: HIVE-1807
 URL: https://issues.apache.org/jira/browse/HIVE-1807
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.7.0

 Attachments: HIVE-1807.1.patch, HIVE-1807.2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1809) Hive comparison operators are broken for NaN values


[ 
https://issues.apache.org/jira/browse/HIVE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935161#action_12935161
 ] 

Ning Zhang commented on HIVE-1809:
--

+1. start testing. 

 Hive comparison operators are broken for NaN values
 ---

 Key: HIVE-1809
 URL: https://issues.apache.org/jira/browse/HIVE-1809
 Project: Hive
  Issue Type: Bug
Reporter: Paul Butler
Assignee: Paul Butler
 Attachments: HIVE-1809.patch


 Comparisons between NaN values and doubles do not work as expected:
 hive select 'NaN' = 4.3 from data_one limit 1;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Execution log at: 
 /tmp/pbutler/pbutler_20101123145656_d23f9b77-8907-4ed3-aef9-8b99a1cc3138.log
 Job running in-process (local Hadoop)
 2010-11-23 14:56:40,488 null map = 100%,  reduce = 0%
 Ended Job = job_local_0001
 OK
 true
 Time taken: 9.47 seconds
 hive select 4  'NaN' from data_one limit 1;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Execution log at: 
 /tmp/pbutler/pbutler_20101123145858_0d243ac2-f745-4e25-9a38-509bef3bb370.log
 Job running in-process (local Hadoop)
 2010-11-23 14:58:45,689 null map = 100%,  reduce = 0%
 Ended Job = job_local_0001
 OK
 false
 Time taken: 3.938 seconds

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-23 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935167#action_12935167
 ] 

Namit Jain commented on HIVE-1792:
--

No need for this

 track the joins which are being converted to map-join automatically
 ---

 Key: HIVE-1792
 URL: https://issues.apache.org/jira/browse/HIVE-1792
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: hive-1792-1.patch, hive-1792-2.patch


 We should be able to track how many queries (join) got converted to
 map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-23 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-1785.
--

  Resolution: Fixed
Release Note: PreExecute and PostExecute have been deprecated in favor of 
ExecuteWithHookContext.

Committed.  Thanks Liyin!

Could you explain this change on the user mailing list?  Also, we need a 
followup patch for changing the description of hive.exec.pre/post.hooks in 
conf/hive-default.xml (I just remembered that).


 change Pre/Post Query Hooks to take in 1 parameter: HookContext
 ---

 Key: HIVE-1785
 URL: https://issues.apache.org/jira/browse/HIVE-1785
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Namit Jain
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: hive-1785_3.patch, hive-1785_4.patch, hive-1785_6.patch, 
 hive_1785_1.patch, hive_1785_2.patch


 This way, it would be possible to add new parameters to the hooks without 
 changing the existing hooks.
 This will be a incompatible change, and all the hooks need to change to the 
 new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-11-23 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1538:
--

Attachment: patch-1538.txt

Patch with following changes:
* creates a filter operator with unpushed predicates, as a child of the 
operator through which the predicates could not be pushed.
* removes original filter operator if it does not have any non-final 
candidates. 
With creating a child filter operator with the non-final candidates and 
removing the original one, I'm seeing some problems. So, would like to do that 
in a followup jira.
* Updates all the tests with new explain plans.

 FilterOperator is applied twice with ppd on.
 

 Key: HIVE-1538
 URL: https://issues.apache.org/jira/browse/HIVE-1538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1538.txt


 With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
 seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-11-23 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1538:
--

Fix Version/s: 0.7.0
   Status: Patch Available  (was: Open)

 FilterOperator is applied twice with ppd on.
 

 Key: HIVE-1538
 URL: https://issues.apache.org/jira/browse/HIVE-1538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.7.0

 Attachments: patch-1538.txt


 With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
 seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1096) Hive Variables

2010-11-23 Thread Edward Capriolo (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Edward Capriolo updated HIVE-1096:
--

Attachment: hive-1096-15.patch.txt

Hive Variables
--

Key: HIVE-1096
URL: https://issues.apache.org/jira/browse/HIVE-1096
Project: Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Fix For: 0.7.0

Attachments: 1096-9.diff, hive-1096-10-patch.txt,
hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-15.patch.txt,
hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff

From mailing list:
--Amazon Elastic MapReduce version of Hive seems to have a nice feature
called Variables. Basically you can define a variable via command-line
while invoking hive with -d DT=2009-12-09 and then refer to the variable via
${DT} within the hive queries. This could be extremely useful. I can't seem
to find this feature even on trunk. Is this feature currently anywhere in the
roadmap?--
This could be implemented in many places.
A simple place to put this is
in Driver.compile or Driver.run we can do string substitutions at that level,
and further downstream need not be effected.
There could be some benefits to doing this further downstream, parser,plan.
but based on the simple needs we may not need to overthink this.
I will get started on implementing in compile unless someone wants to discuss
this more.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1096) Hive Variables

2010-11-23 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Status: Patch Available  (was: Open)

*  trunk/conf/hive-default.xml:
  Spelling: substituation

Fixed

* 
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/VariableSubstitution.java:
  Make these variables private?

Private variables are what got us into the mess with hadoop. I am not going to 
repeat the problem.

* 
trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java:
  Since we want to do substitution for all commands it would probably make 
sense to do the substitution in CommandProcessorFactory.get() and make 
CommandProcessor an abstract class with the following implementation:

...

In other words, CommandProcessorFactory would return a CommandProcessor object 
that has been initialized with a substituted copy of the command.

No. No more re factoring. It is working the way it is. Using factories going to 
be major. I'm tired. It does not prove anything since this entire process is 
not very clever anyway. Currently it is slightly baked, but I believe that 
better then being over designed.

* trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
  Replace these string literals with constants, e.g:

public static final String ENV_PREFIX = env:;
public static final String SYSTEM_PREFIX = system:
public static final String HIVECONF_PREFIX = hiveconf:

Fixed
* trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
  String propName = varname.substring(SYSTEM_PREFIX.length());

Fixed

* trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java:
  Can we remove this special case for silent? In SessionState this 
actually maps to hive.session.silent and I don't see any test cases that 
cover this case, i.e. that call set silent or set silent=x. It also seems 
that this introduces in inconsistency since set silent will show the value of 
hive.session.silent, but the output of set will not list a value for the 
property silent.

Anyone know if there is any older code that depends on this behavior?

Do not really know. do not really care :) Out of scope. It is there I am 
leaving it.

As for the VAR. Turns out supporting this is not very easy. Adding Options 
Parsing to the CLI works, however the session state gives you no where to store 
variables except in the hive conf. SetProcessor works with SessionState not 
CLI SessionState. Again big re factoring is needed.  

What I did do is move remove support for set y=${x}. This patch only adds set 
y=${hiveconf:x}. Thus if someone cares to add VAR X or ${x} or determine how 
to change the CLI to add this other map that can be shared across the session 
state this patch is not in the way.

Thus substitution only works for ${hiveconf:x} ${system:x} and ${env:x}. 
implementing ${x} and var can be done in a separate issue.

 Hive Variables
 --

 Key: HIVE-1096
 URL: https://issues.apache.org/jira/browse/HIVE-1096
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
 hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-15.patch.txt, 
 hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff


 From mailing list:
 --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
 called Variables. Basically you can define a variable via command-line 
 while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
 ${DT} within the hive queries. This could be extremely useful. I can't seem 
 to find this feature even on trunk. Is this feature currently anywhere in the 
 roadmap?--
 This could be implemented in many places.
 A simple place to put this is 
 in Driver.compile or Driver.run we can do string substitutions at that level, 
 and further downstream need not be effected. 
 There could be some benefits to doing this further downstream, parser,plan. 
 but based on the simple needs we may not need to overthink this.
 I will get started on implementing in compile unless someone wants to discuss 
 this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically