[jira] Commented: (HIVE-1096) Hive Variables

2010-02-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835112#action_12835112
 ] 

Edward Capriolo commented on HIVE-1096:
---


>>
Using the HiveConf namespace for user variables seems like a bad idea, 
especially since there are no checks in place to prevent people from defining 
variables with names like "hive.foo.bar" Giving users unprotected access to 
your configuration namespace usually leads to problems down the road.
Philosophically I agree. In actuality have Hive/Hadoop conf is easily 
manipulated by changing your hadoop-site.xml or hive-site.xml. Users do have 
unprotected access to the namespace that is the nature of hadoop. Users of hive 
are setting variables all the time.

>>
It would be nice to be able to reference Java system properties using this 
syntax.
Hive /Hadoop do not often refer to system properties during normal operation. I 
am on the fence about this.

>>Driver.replace() iterates over the list of defined variables. Instead, I 
>>think it should iterate over the tokens in the command that match the pattern 
>>'${.*}'.

The hive CLI really needs some type of top level parser. Because we don't have 
this there are many ways we could do certain things but all of them are a 
little 'hackish'. If we had a real parser, reading character by character, we 
would not need a regex or string replace to do variable processing.

>>
This would make it easy to log any cases where the command contains 
"${foo.bar}" and foo.bar is undefined.

I think the code should replace "" for an undefined variable. 

>>Driver.replace(String) should have a name like 
>>Driver.interpolateCommandVariables(), or Driver.replaceVariables().
Agreed, my notorious spelling usually causes me to avoid words like 
'interpolate' :)

>>It would be nice to be able to prevent interpolation using an escape 
>>character, e.g. "\${somevar}".
I punted on that by being able to turn the replacement on and off. As I said 
above true parser would do stuff like this.

>>It would be nice to be able to nest variable definitions, e.g. 
>>version="0.6.0", jar_name="hive-exec-${version}". The variable interpolation 
>>code in Hadoop's Configuration class does this.
Yes. Carl I started with your code, the bug I mentioned above (comments before 
set), caused me to rip the entire thing apart, before I found out what I was 
doing wrong I had ripped out all the regex. I am not very comfortable with 
regex, and I was not in love with the while construct with a return in the 
middle. Again nesting would be easy with a true parser. I do not mind going 
back to your code.  

>>The last thing we want is two completely separate variable interpolation 
>>mechanisms.

Agreed. The only true difference in implementation is that your doing it with 
properties and I am doing it with HiveConf Vars. If we support both I think we 
are both happy. Any ideas?

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
> hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

2010-02-17 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1134:
---

Attachment: hive-1134-2010-02-17.patch

The attached patch also fixed a bug in Hive-917 's patch

Should use MOD instead of Div
// if the big table has more buckets than the current small table,
// use "MOD" to get small table bucket names. For example, if the big
// table has 4 buckets and the small table has 2 buckets, then the
// mapping should be 0->0, 1->1, 2->0, 3->1.

> bucketing mapjoin where the big table contains more than 1 big partition
> 
>
> Key: HIVE-1134
> URL: https://issues.apache.org/jira/browse/HIVE-1134
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1134-2010-02-17.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1096) Hive Variables

2010-02-17 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835096#action_12835096
 ] 

Carl Steinbach commented on HIVE-1096:
--

bq. Test cases?
Missed the test case. Sorry.

Also, I think that in order to get committed this needs to
address the use case described in HIVE-1063. The last thing
we want is two completely separate variable interpolation 
mechanisms.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
> hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1096) Hive Variables

2010-02-17 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835093#action_12835093
 ] 

Carl Steinbach commented on HIVE-1096:
--

* Test cases?
* Using the HiveConf namespace for user variables seems like a bad idea, 
especially since there are no checks in place to prevent people from defining 
variables with names like "hive.foo.bar" Giving users unprotected access to 
your configuration namespace usually leads to problems down the road.
* It would be nice to be able to reference Java system properties using this 
syntax.
* It would be nice to be able to nest variable definitions, e.g. 
version="0.6.0", jar_name="hive-exec-${version}". The variable interpolation 
code in Hadoop's Configuration class does this.
* It would be nice to be able to prevent interpolation using an escape 
character, e.g. "\${somevar}".
* Driver.replace(String) should have a name like 
Driver.interpolateCommandVariables(), or Driver.replaceVariables().
* Driver.replace() iterates over the list of defined variables. Instead, I 
think it should iterate over the tokens in the command that match the pattern 
'${.*}'. This would make it easy to log any cases where the command contains 
"${foo.bar}" and foo.bar is undefined.
* Replace the reference to the string literal "hive.variable.replace" with 
HIVEVARREPLACE.




> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
> hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1178) enforce bucketing for a table

2010-02-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1178:
-

Status: Patch Available  (was: Open)

> enforce bucketing for a table
> -
>
> Key: HIVE-1178
> URL: https://issues.apache.org/jira/browse/HIVE-1178
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.6.0
>
> Attachments: hive.1178.1.patch
>
>
> If the table being inserted is a bucketed, currently hive does not try to 
> enforce that.
> An option should be added for checking that.
> Moreover, the number of buckets can be higher than the number of maximum 
> reducers, in which
> case a single reducer can write to multiple files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1178) enforce bucketing for a table

2010-02-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1178:
-

Attachment: hive.1178.1.patch

> enforce bucketing for a table
> -
>
> Key: HIVE-1178
> URL: https://issues.apache.org/jira/browse/HIVE-1178
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.6.0
>
> Attachments: hive.1178.1.patch
>
>
> If the table being inserted is a bucketed, currently hive does not try to 
> enforce that.
> An option should be added for checking that.
> Moreover, the number of buckets can be higher than the number of maximum 
> reducers, in which
> case a single reducer can write to multiple files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-02-17 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Attachment: 1096-9.diff

Regenerated against trunk.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
> hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1126) Missing some Jdbc functionality like getTables getColumns and HiveResultSet.get* methods based on column name.

2010-02-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1126:
-

Status: Open  (was: Patch Available)

> Missing some Jdbc functionality like getTables getColumns and 
> HiveResultSet.get* methods based on column name.
> --
>
> Key: HIVE-1126
> URL: https://issues.apache.org/jira/browse/HIVE-1126
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.6.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1126-1.patch, HIVE-1126.patch
>
>
> I've been using the hive jdbc driver more and more and was missing some 
> functionality which I added
> HiveDatabaseMetaData.getTables
> Using "show tables" to get the info from hive.
> HiveDatabaseMetaData.getColumns
> Using "describe tablename" to get the columns.
> This makes using something like SQuirreL a lot nicer since you have the list 
> of tables and just click on the content tab to see what's in the table.
> I also implemented
> HiveResultSet.getObject(String columnName) so you call most get* methods 
> based on the column name.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1096) Hive Variables

2010-02-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835053#action_12835053
 ] 

Namit Jain commented on HIVE-1096:
--

Can you re-generate the patch ? It is not applying cleanly.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, 
> hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1178) enforce bucketing for a table

2010-02-17 Thread Namit Jain (JIRA)
enforce bucketing for a table
-

 Key: HIVE-1178
 URL: https://issues.apache.org/jira/browse/HIVE-1178
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0


If the table being inserted is a bucketed, currently hive does not try to 
enforce that.
An option should be added for checking that.

Moreover, the number of buckets can be higher than the number of maximum 
reducers, in which
case a single reducer can write to multiple files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1177) build should set flag or detect conditions to avoid duplicating work

2010-02-17 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1177:
--

Summary: build should set flag or detect conditions to avoid duplicating 
work  (was: build should set flag or detect conditions to avoid deplicating 
work)

> build should set flag or detect conditions to avoid duplicating work
> 
>
> Key: HIVE-1177
> URL: https://issues.apache.org/jira/browse/HIVE-1177
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Edward Capriolo
>Priority: Minor
>
> I made an ant target quick-test, which differs from test in that it
> has no dependencies.
>  
>
>  
> 
>  
>  
> 
> time ant -Dhadoop.version='0.18.3' -Doffline=true
> -Dtestcase=TestCliDriver -Dqfile=alter1.q quick-test
> BUILD SUCCESSFUL
> Total time: 15 seconds
> real0m16.250s
> user0m20.965s
> sys 0m1.579s
> time ant -Dhadoop.version='0.18.3' -Doffline=true
> -Dtestcase=TestCliDriver -Dqfile=alter1.q test
> BUILD SUCCESSFUL
> Total time: 26 seconds
> real0m26.564s
> user0m31.307s
> sys 0m2.346s
> Some makefiles set flags files like "make.ok", that allow the build process 
> to intelligently skip steps that are already done. Currently, a target like 
> test has no way of determining state and will re-issue dependent targets like 
> clean-test,jar (and there dependents). 
> Suggestion:
> Hive should set flags or intelligently determine the state of the build and 
> save cpu development time. Targets should not re-execute unless a clean is 
> explicitly given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1177) build should set flag or detect conditions to avoid deplicating work

2010-02-17 Thread Edward Capriolo (JIRA)
build should set flag or detect conditions to avoid deplicating work


 Key: HIVE-1177
 URL: https://issues.apache.org/jira/browse/HIVE-1177
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Edward Capriolo
Priority: Minor


I made an ant target quick-test, which differs from test in that it
has no dependencies.
 
   
 


 
 


time ant -Dhadoop.version='0.18.3' -Doffline=true
-Dtestcase=TestCliDriver -Dqfile=alter1.q quick-test
BUILD SUCCESSFUL
Total time: 15 seconds

real0m16.250s
user0m20.965s
sys 0m1.579s

time ant -Dhadoop.version='0.18.3' -Doffline=true
-Dtestcase=TestCliDriver -Dqfile=alter1.q test
BUILD SUCCESSFUL
Total time: 26 seconds

real0m26.564s
user0m31.307s
sys 0m2.346s

Some makefiles set flags files like "make.ok", that allow the build process to 
intelligently skip steps that are already done. Currently, a target like test 
has no way of determining state and will re-issue dependent targets like 
clean-test,jar (and there dependents). 

Suggestion:
Hive should set flags or intelligently determine the state of the build and 
save cpu development time. Targets should not re-execute unless a clean is 
explicitly given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1163) Eclipse launchtemplate changes to enable debugging

2010-02-17 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao resolved HIVE-1163.
--

   Resolution: Fixed
Fix Version/s: 0.6.0
 Release Note: HIVE-1163. Eclipse launchtemplate changes to enable 
debugging. (Ning Zhang and Carl Steinbach via zshao)
 Hadoop Flags: [Reviewed]

Committed. Thanks Ning and Carl!

> Eclipse launchtemplate changes to enable debugging
> --
>
> Key: HIVE-1163
> URL: https://issues.apache.org/jira/browse/HIVE-1163
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1163.4.patch, HIVE-1163.patch, HIVE-1163_2.patch, 
> HIVE-1163_3.patch
>
>
> Some recent changes in the build.xml and build-common.xml breaks the 
> debugging functionality in eclipse. Some system defined properties were 
> missing when running eclipse debugger. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1136) add type-checking setters for HiveConf class to match existing getters

2010-02-17 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao resolved HIVE-1136.
--

  Resolution: Fixed
Release Note: HIVE-1136. Add type-checking setters for HiveConf class. 
(John Sichi via zshao)
Hadoop Flags: [Reviewed]

Committed. Thanks John!

> add type-checking setters for HiveConf class to match existing getters
> --
>
> Key: HIVE-1136
> URL: https://issues.apache.org/jira/browse/HIVE-1136
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1136.1.patch, HIVE-1136.2.patch
>
>
> This is a followup from HIVE-1129.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-984) Building Hive occasionally fails with Ivy error: hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:

2010-02-17 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834970#action_12834970
 ] 

John Sichi commented on HIVE-984:
-

I was able to verify that using a reliable server (tested with a privately 
hosted server of my own) allowed for successful artifact download from my home 
network.

Next step is to talk to some Facebook peeps to see if we can get what we need 
set up on mirror.facebook.net.


> Building Hive occasionally fails with Ivy error: 
> hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
> ---
>
> Key: HIVE-984
> URL: https://issues.apache.org/jira/browse/HIVE-984
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-984.2.patch, HIVE-984.patch
>
>
> Folks keep running into this problem when building Hive from source:
> {noformat}
> [ivy:retrieve]
> [ivy:retrieve] :: problems summary ::
> [ivy:retrieve]  WARNINGS
> [ivy:retrieve]  [FAILED ]
> hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
> expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
> (138662ms)
> [ivy:retrieve]  [FAILED ]
> hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
> expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
> (138662ms)
> [ivy:retrieve]   hadoop-resolver: tried
> [ivy:retrieve]
> http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
> [ivy:retrieve]  ::
> [ivy:retrieve]  ::  FAILED DOWNLOADS::
> [ivy:retrieve]  :: ^ see resolution messages for details  ^ ::
> [ivy:retrieve]  ::
> [ivy:retrieve]  :: hadoop#core;0.20.1!hadoop.tar.gz(source)
> [ivy:retrieve]  ::
> [ivy:retrieve]
> [ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> {noformat}
> The problem appears to be either with a) the Hive build scripts, b) ivy, or 
> c) archive.apache.org
> Besides fixing the actual bug, one other option worth considering is to add 
> the Hadoop jars to the
> Hive source repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1168) Fix Hive build on Hudson

2010-02-17 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834946#action_12834946
 ] 

John Sichi commented on HIVE-1168:
--

Thanks Johan!

0.17, 0.19, and 0.20 all passed, so maybe it was a one-off glitch on 0.18.  
Let's see if it clears itself up.


> Fix Hive build on Hudson
> 
>
> Key: HIVE-1168
> URL: https://issues.apache.org/jira/browse/HIVE-1168
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: John Sichi
>Priority: Critical
>
> {quote}
> We need to delete the .ant directory containing the old ivy version in order 
> to fix it 
> (and if we're using the same environment for both trunk and branches, either 
> segregate them or script an rm to clean in between).
> {quote}
> It's worth noting that ant may have picked up the old version of Ivy from
> somewhere else. In order Ant's classpath contains:
> # Ant's startup JAR file, ant-launcher.jar
> # Everything in the directory containing the version of ant-launcher.jar 
> that's
>   running, i.e. everything in ANT_HOME/lib
> # All JAR files in ${user.home}/.ant/lib
> # Directories and JAR files supplied via the -lib command line option.
> # Everything in the CLASSPATH variable unless the -noclasspath option is used.
> (2) implies that users on shared machines may have to install their own
> version of ant in order to get around these problems, assuming that the
> administrator has install the ivy.jar in $ANT_HOME/lib

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-02-17 Thread Prasad Chakka (JIRA)
'create if not exists' fails for a table name with 'select' in it
-

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka



hive> create table if not exists tmp_select(s string, c string, n int);
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
start with SELECT)
at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
JDOQL Single-String query should always start with SELECT)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Hive-trunk-h0.20 #191

2010-02-17 Thread Apache Hudson Server
See 




Hudson build is back to normal : Hive-trunk-h0.19 #368

2010-02-17 Thread Apache Hudson Server
See 




[jira] Updated: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

2010-02-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1158:
-

   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

Committed in 0.5 also. Thanks Ning

> Introducing a new parameter for Map-side join bucket size
> -
>
> Key: HIVE-1158
> URL: https://issues.apache.org/jira/browse/HIVE-1158
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.5.0
>
> Attachments: HIVE-1158.patch, HIVE-1158_branch_0_5.patch
>
>
> Map-side join cache the small table in memory and join with the split of the 
> large table at the mapper side. If the small table is too large, it uses 
> RowContainer to cache a number of rows indicated by parameter 
> hive.join.cache.size, whose default value is 25000. This parameter is also 
> used for regular reducer-side joins to cache all input tables except the 
> streaming table. This default value is too large for map-side join bucket 
> size, resulting in OOM exceptions sometimes. We should define a different 
> parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1168) Fix Hive build on Hudson

2010-02-17 Thread Johan Oskarsson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834822#action_12834822
 ] 

Johan Oskarsson commented on HIVE-1168:
---

Seems the builds started and got past this issue, but a test failed:
http://hudson.zones.apache.org/hudson/view/Hive/job/Hive-trunk-h0.18/368/testReport/junit/org.apache.hadoop.hive.cli/TestNegativeCliDriver/testNegativeCliDriver_script_broken_pipe1/

> Fix Hive build on Hudson
> 
>
> Key: HIVE-1168
> URL: https://issues.apache.org/jira/browse/HIVE-1168
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: John Sichi
>Priority: Critical
>
> {quote}
> We need to delete the .ant directory containing the old ivy version in order 
> to fix it 
> (and if we're using the same environment for both trunk and branches, either 
> segregate them or script an rm to clean in between).
> {quote}
> It's worth noting that ant may have picked up the old version of Ivy from
> somewhere else. In order Ant's classpath contains:
> # Ant's startup JAR file, ant-launcher.jar
> # Everything in the directory containing the version of ant-launcher.jar 
> that's
>   running, i.e. everything in ANT_HOME/lib
> # All JAR files in ${user.home}/.ant/lib
> # Directories and JAR files supplied via the -lib command line option.
> # Everything in the CLASSPATH variable unless the -noclasspath option is used.
> (2) implies that users on shared machines may have to install their own
> version of ant in order to get around these problems, assuming that the
> administrator has install the ivy.jar in $ANT_HOME/lib

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #368

2010-02-17 Thread Apache Hudson Server
See 

Changes:

[zshao] HIVE-1174. Fix Job counter error if hive.merge.mapfiles equals true. 
(Yongqiang He via zshao)

[namit] HIVE-917. Bucketed Map Join
(He Yongqiang via namit)

[namit] HIVE-1117. Make queryPlan serializable
(Zheng Shao via namit)

--
[...truncated 13170 lines...]
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Loading data to

[jira] Updated: (HIVE-1096) Hive Variables

2010-02-17 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Status: Patch Available  (was: Open)

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, 
> hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query

2010-02-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834788#action_12834788
 ] 

Namit Jain commented on HIVE-1173:
--

I think we should fix this for 0.5 also

> Partition pruner cancels pruning if non-deterministic function present in 
> filtering expression only in joins is present in query
> 
>
> Key: HIVE-1173
> URL: https://issues.apache.org/jira/browse/HIVE-1173
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0, 0.4.1
>Reporter: Vladimir Klimontovich
>
> Brief description:
> case 1) non-deterministic present in partition condition, joins are present 
> in query => partition pruner doesn't do filtering of partitions based on 
> condition
> case 2) non-deterministic present in partition condition, joins aren't 
> present in query => partition pruner do filtering of partitions based on 
> condition
> It's quite illogical when pruning depends on presence of joins in query.
> Example:
> Let's consider following sequence of hive queries:
> 1) Create non-deterministic function:
> create temporary function UDF2 as 'UDF2';
> {{
> import org.apache.hadoop.hive.ql.exec.UDF;
> import org.apache.hadoop.hive.ql.udf.UDFType;
> @UDFType(deterministic=false)
>   public class UDF2 extends UDF {
>   public String evaluate(String val) {
>   return val;
>   }
>   }
> }}
> 2) Create tables
> CREATE TABLE Main (
>   a STRING,
>   b INT
> )
> PARTITIONED BY(part STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '10'
> STORED AS TEXTFILE;
> ALTER TABLE Main ADD PARTITION (part="part1") LOCATION 
> "/hive-join-test/part1/";
> ALTER TABLE Main ADD PARTITION (part="part2") LOCATION 
> "/hive-join-test/part2/";
> CREATE TABLE Joined (
>   a STRING,
>   f STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '10'
> STORED AS TEXTFILE
> LOCATION '/hive-join-test/join/';
> 3) Run first query:
> select 
>   m.a,
>   m.b
> from Main m
> where
>   part > UDF2('part0') AND part = 'part1';
> The pruner will work for this query: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1
> 4) Run second query (with join):
> select 
>   m.a,
>   j.a,
>   m.b
> from Main m
> join Joined j on
>   j.a=m.a
> where
>   part > UDF2('part0') AND part = 'part1';
> Pruner doesn't work: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join
> 5) Also lets try to run query with MAPJOIN hint
> select /*+MAPJOIN(j)*/ 
>   m.a,
>   j.a,
>   m.b
> from Main m
> join Joined j on
>   j.a=m.a
> where
>   part > UDF2('part0') AND part = 'part1';
> The result is the same, pruner doesn't work: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

2010-02-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834784#action_12834784
 ] 

Namit Jain commented on HIVE-1158:
--

+1

0.5 patch looks good - will commit if the tests pass

> Introducing a new parameter for Map-side join bucket size
> -
>
> Key: HIVE-1158
> URL: https://issues.apache.org/jira/browse/HIVE-1158
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1158.patch, HIVE-1158_branch_0_5.patch
>
>
> Map-side join cache the small table in memory and join with the split of the 
> large table at the mapper side. If the small table is too large, it uses 
> RowContainer to cache a number of rows indicated by parameter 
> hive.join.cache.size, whose default value is 25000. This parameter is also 
> used for regular reducer-side joins to cache all input tables except the 
> streaming table. This default value is too large for map-side join bucket 
> size, resulting in OOM exceptions sometimes. We should define a different 
> parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query

2010-02-17 Thread Vladimir Klimontovich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834710#action_12834710
 ] 

Vladimir Klimontovich commented on HIVE-1173:
-

I just tried "part = 'part1' AND part > UDF2('part0')" condition. Query plan 
remained the same.

> Partition pruner cancels pruning if non-deterministic function present in 
> filtering expression only in joins is present in query
> 
>
> Key: HIVE-1173
> URL: https://issues.apache.org/jira/browse/HIVE-1173
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0, 0.4.1
>Reporter: Vladimir Klimontovich
>
> Brief description:
> case 1) non-deterministic present in partition condition, joins are present 
> in query => partition pruner doesn't do filtering of partitions based on 
> condition
> case 2) non-deterministic present in partition condition, joins aren't 
> present in query => partition pruner do filtering of partitions based on 
> condition
> It's quite illogical when pruning depends on presence of joins in query.
> Example:
> Let's consider following sequence of hive queries:
> 1) Create non-deterministic function:
> create temporary function UDF2 as 'UDF2';
> {{
> import org.apache.hadoop.hive.ql.exec.UDF;
> import org.apache.hadoop.hive.ql.udf.UDFType;
> @UDFType(deterministic=false)
>   public class UDF2 extends UDF {
>   public String evaluate(String val) {
>   return val;
>   }
>   }
> }}
> 2) Create tables
> CREATE TABLE Main (
>   a STRING,
>   b INT
> )
> PARTITIONED BY(part STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '10'
> STORED AS TEXTFILE;
> ALTER TABLE Main ADD PARTITION (part="part1") LOCATION 
> "/hive-join-test/part1/";
> ALTER TABLE Main ADD PARTITION (part="part2") LOCATION 
> "/hive-join-test/part2/";
> CREATE TABLE Joined (
>   a STRING,
>   f STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '10'
> STORED AS TEXTFILE
> LOCATION '/hive-join-test/join/';
> 3) Run first query:
> select 
>   m.a,
>   m.b
> from Main m
> where
>   part > UDF2('part0') AND part = 'part1';
> The pruner will work for this query: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1
> 4) Run second query (with join):
> select 
>   m.a,
>   j.a,
>   m.b
> from Main m
> join Joined j on
>   j.a=m.a
> where
>   part > UDF2('part0') AND part = 'part1';
> Pruner doesn't work: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join
> 5) Also lets try to run query with MAPJOIN hint
> select /*+MAPJOIN(j)*/ 
>   m.a,
>   j.a,
>   m.b
> from Main m
> join Joined j on
>   j.a=m.a
> where
>   part > UDF2('part0') AND part = 'part1';
> The result is the same, pruner doesn't work: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1168) Fix Hive build on Hudson

2010-02-17 Thread Johan Oskarsson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834708#action_12834708
 ] 

Johan Oskarsson commented on HIVE-1168:
---

Apologies for not keeping an eye on it, have been away for a few months. I 
poked the hudson machine after an email from John and the Hive build against 
Hadoop 0.17 finished ok. The two others will run as scheduled in a few hours 
and I bet they will run fine.

> Fix Hive build on Hudson
> 
>
> Key: HIVE-1168
> URL: https://issues.apache.org/jira/browse/HIVE-1168
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: John Sichi
>Priority: Critical
>
> {quote}
> We need to delete the .ant directory containing the old ivy version in order 
> to fix it 
> (and if we're using the same environment for both trunk and branches, either 
> segregate them or script an rm to clean in between).
> {quote}
> It's worth noting that ant may have picked up the old version of Ivy from
> somewhere else. In order Ant's classpath contains:
> # Ant's startup JAR file, ant-launcher.jar
> # Everything in the directory containing the version of ant-launcher.jar 
> that's
>   running, i.e. everything in ANT_HOME/lib
> # All JAR files in ${user.home}/.ant/lib
> # Directories and JAR files supplied via the -lib command line option.
> # Everything in the CLASSPATH variable unless the -noclasspath option is used.
> (2) implies that users on shared machines may have to install their own
> version of ant in order to get around these problems, assuming that the
> administrator has install the ivy.jar in $ANT_HOME/lib

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

2010-02-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1158:
-

Status: Patch Available  (was: Reopened)

all unit tests passed.

> Introducing a new parameter for Map-side join bucket size
> -
>
> Key: HIVE-1158
> URL: https://issues.apache.org/jira/browse/HIVE-1158
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1158.patch, HIVE-1158_branch_0_5.patch
>
>
> Map-side join cache the small table in memory and join with the split of the 
> large table at the mapper side. If the small table is too large, it uses 
> RowContainer to cache a number of rows indicated by parameter 
> hive.join.cache.size, whose default value is 25000. This parameter is also 
> used for regular reducer-side joins to cache all input tables except the 
> streaming table. This default value is too large for map-side join bucket 
> size, resulting in OOM exceptions sometimes. We should define a different 
> parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

2010-02-17 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1158:
-

Attachment: HIVE-1158_branch_0_5.patch

Uploading HIVE-1158_branch_0_5.patch for branch 0.5. This patch includes 
changes pulled from other patches in trunk to make the packport possible. 

Still running unit tests, but it seems all relavent tests have passed. I will 
update the test results once they are done. 

> Introducing a new parameter for Map-side join bucket size
> -
>
> Key: HIVE-1158
> URL: https://issues.apache.org/jira/browse/HIVE-1158
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1158.patch, HIVE-1158_branch_0_5.patch
>
>
> Map-side join cache the small table in memory and join with the split of the 
> large table at the mapper side. If the small table is too large, it uses 
> RowContainer to cache a number of rows indicated by parameter 
> hive.join.cache.size, whose default value is 25000. This parameter is also 
> used for regular reducer-side joins to cache all input tables except the 
> streaming table. This default value is too large for map-side join bucket 
> size, resulting in OOM exceptions sometimes. We should define a different 
> parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.