[jira] Updated: (HIVE-1212) Explicitly say "Hive Internal Error" to ease debugging

2010-03-04 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Attachment: HIVE-1212.1.patch

> Explicitly say "Hive Internal Error" to ease debugging
> --
>
> Key: HIVE-1212
> URL: https://issues.apache.org/jira/browse/HIVE-1212
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1212.1.patch
>
>
> Our users complain that hive fails error messages like "FAILED: Unknown 
> exception: null".
> We should explicitly mention that's an internal error of Hive, and provide 
> more information (stacktrace) on the screen to ease bug reporting and 
> debugging.
> In other cases, we will still put the detailed information (stacktrace) in 
> the log, since users should be able to figure out what's wrong with a single 
> line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say "Hive Internal Error" to ease debugging

2010-03-04 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Attachment: (was: HIVE-1212.1.patch)

> Explicitly say "Hive Internal Error" to ease debugging
> --
>
> Key: HIVE-1212
> URL: https://issues.apache.org/jira/browse/HIVE-1212
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>
> Our users complain that hive fails error messages like "FAILED: Unknown 
> exception: null".
> We should explicitly mention that's an internal error of Hive, and provide 
> more information (stacktrace) on the screen to ease bug reporting and 
> debugging.
> In other cases, we will still put the detailed information (stacktrace) in 
> the log, since users should be able to figure out what's wrong with a single 
> line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say "Hive Internal Error" to ease debugging

2010-03-04 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Attachment: HIVE-1212.1.patch

This also fixes UDFArgumentException reporting.


> Explicitly say "Hive Internal Error" to ease debugging
> --
>
> Key: HIVE-1212
> URL: https://issues.apache.org/jira/browse/HIVE-1212
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1212.1.patch
>
>
> Our users complain that hive fails error messages like "FAILED: Unknown 
> exception: null".
> We should explicitly mention that's an internal error of Hive, and provide 
> more information (stacktrace) on the screen to ease bug reporting and 
> debugging.
> In other cases, we will still put the detailed information (stacktrace) in 
> the log, since users should be able to figure out what's wrong with a single 
> line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say "Hive Internal Error" to ease debugging

2010-03-04 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Affects Version/s: 0.6.0
   Status: Patch Available  (was: Open)

> Explicitly say "Hive Internal Error" to ease debugging
> --
>
> Key: HIVE-1212
> URL: https://issues.apache.org/jira/browse/HIVE-1212
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-1212.1.patch
>
>
> Our users complain that hive fails error messages like "FAILED: Unknown 
> exception: null".
> We should explicitly mention that's an internal error of Hive, and provide 
> more information (stacktrace) on the screen to ease bug reporting and 
> debugging.
> In other cases, we will still put the detailed information (stacktrace) in 
> the log, since users should be able to figure out what's wrong with a single 
> line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates

2010-03-04 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841714#action_12841714
 ] 

Zheng Shao commented on HIVE-224:
-

Hi James, currently we don't have the bandwidth to do this, but I guess it 
won't be too hard - we just need to use 
http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashMap.html (search 
for LRU).
Are you interested in joining force on this?


> implement lfu based flushing policy for map side aggregates
> ---
>
> Key: HIVE-224
> URL: https://issues.apache.org/jira/browse/HIVE-224
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table 
> approaches memory limits.
> we have discussed a strategy of flushing hash table entries that have the 
> been seen the least number of times (effectively LFU flushing strategy). This 
> will be very effective at reducing the amount of data sent from map to reduce 
> step - as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates

2010-03-04 Thread James Warren (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841692#action_12841692
 ] 

James Warren commented on HIVE-224:
---

think i bumped up against this or a related issue today - is there any plans on 
incorporating this into a future release?

thanks,
-James

> implement lfu based flushing policy for map side aggregates
> ---
>
> Key: HIVE-224
> URL: https://issues.apache.org/jira/browse/HIVE-224
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table 
> approaches memory limits.
> we have discussed a strategy of flushing hash table entries that have the 
> been seen the least number of times (effectively LFU flushing strategy). This 
> will be very effective at reducing the amount of data sent from map to reduce 
> step - as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



storage handlers and HBase integration

2010-03-04 Thread John Sichi
Hey folks,

In case you're not following the action over at HIVE-705, we're getting close 
to having HBase integration committed into Hive.  I've written up docs here:

http://wiki.apache.org/hadoop/Hive/HBaseIntegration
http://wiki.apache.org/hadoop/Hive/StorageHandlers

(If you happened to read the first draft of the HBaseIntegration doc a few days 
ago, I've made a lot of updates today to fill out the details on column 
mapping.)

As part of commit, we'll be doing some code reviews within Facebook next week 
and logging a bunch of followup tasks; if you have any comments on the approach 
or implementation, please pile on in JIRA.

I'll be giving this a quick mention at the Hive user's group later this month, 
and then a more detailed presentation at the HBase User Group meeting in April.

JVS



[jira] Updated: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-04 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-705:


Attachment: HIVE-705.3.patch

HIVE-705.3.patch resolves a conflict with trunk, fixes some serde bugs, and 
adds more tests.


> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705.3.patch, HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch, zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1194) sorted merge join

2010-03-04 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1194.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed. Thanks Yongqiang

> sorted merge join
> -
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, 
> hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, 
> hive-1194-2010-3-3.patch, hive-1194-2010-3-4.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being 
> performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed 
> map joins also.
> Since, sorted properties of a table are not enforced currently, a new 
> parameter can be added to specify to use the sort-merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1215) bogus assertion in GroupByOperator.initializeOp

2010-03-04 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841607#action_12841607
 ] 

Edward Capriolo commented on HIVE-1215:
---

Yes you can reproduce this by a simple edit in build common and a q file that 
does select count(1) from src;

{noformat}
 



  
  
  
  
{noformat}

This be easy to add.

or 


> bogus assertion in GroupByOperator.initializeOp
> ---
>
> Key: HIVE-1215
> URL: https://issues.apache.org/jira/browse/HIVE-1215
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
>
> export HADOOP_OPTS="-ea"
> and then run the following query in Hive:
> select count(1) from pokes;
> This causes an assertion failure:
> Caused by: java.lang.AssertionError
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:161)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:143)
>   ... 10 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1215) bogus assertion in GroupByOperator.initializeOp

2010-03-04 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned HIVE-1215:
-

Assignee: Edward Capriolo

> bogus assertion in GroupByOperator.initializeOp
> ---
>
> Key: HIVE-1215
> URL: https://issues.apache.org/jira/browse/HIVE-1215
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
>
> export HADOOP_OPTS="-ea"
> and then run the following query in Hive:
> select count(1) from pokes;
> This causes an assertion failure:
> Caused by: java.lang.AssertionError
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:161)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:143)
>   ... 10 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1095) Hive in Maven

2010-03-04 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841600#action_12841600
 ] 

Gerrit Jansen van Vuuren commented on HIVE-1095:


What I said was a bit confusing I've realized :) and what I meant was:

There are two parts:

 -> Commiting (if reviewed and accepted) the HIVE-1095-trunk.patch to the 
trunk. This is generated against trunk. 

 -> Using the HIVE-1095-0.4.1.patch against the version 0.4.1 of hive to 
generate the maven artifacts for 0.4.1 hive, are commits allowed for already 
versioned releases? if so then it would be better to have it committed cause 
any changes to build.xml, ivy.xml or build-common.xml would mean that the patch 
needs generation.

So the broad scope and idea would be to publish the already released hive 
versions to the maven repo:
 0.3.0 
 0.4.0
 0.4.1 
 0.5.0
and then have the build in trunk so that when another release is made the maven 
publishing code is already in the build and its only needed to run ant 
maven-publish.

By writing this I've realized that I probably need to generate the patches for 
the builds on the other versions of hive also, should I do this and attach to 
this task?





> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1095-0.4.1.patch, HIVE-1095-Sample.patch, 
> HIVE-1095-trunk.patch
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1215) bogus assertion in GroupByOperator.initializeOp

2010-03-04 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841591#action_12841591
 ] 

John Sichi commented on HIVE-1215:
--

We have tests for such queries, and we enable assertions in the junit ant task, 
but we do not seem to be enabling assertions for the forked JVM's which execute 
the plan.  We should find a way to address this in order to get more coverage 
for assertions (and also to expose any more bogus ones).


> bogus assertion in GroupByOperator.initializeOp
> ---
>
> Key: HIVE-1215
> URL: https://issues.apache.org/jira/browse/HIVE-1215
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: John Sichi
> Fix For: 0.6.0
>
>
> export HADOOP_OPTS="-ea"
> and then run the following query in Hive:
> select count(1) from pokes;
> This causes an assertion failure:
> Caused by: java.lang.AssertionError
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:161)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:143)
>   ... 10 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1215) bogus assertion in GroupByOperator.initializeOp

2010-03-04 Thread John Sichi (JIRA)
bogus assertion in GroupByOperator.initializeOp
---

 Key: HIVE-1215
 URL: https://issues.apache.org/jira/browse/HIVE-1215
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: John Sichi
 Fix For: 0.6.0


export HADOOP_OPTS="-ea"

and then run the following query in Hive:

select count(1) from pokes;

This causes an assertion failure:

Caused by: java.lang.AssertionError
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:161)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:344)
at 
org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:143)
... 10 more


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1211) Tapping logs from child processes

2010-03-04 Thread bc Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bc Wong updated HIVE-1211:
--

Status: Patch Available  (was: Open)

> Tapping logs from child processes
> -
>
> Key: HIVE-1211
> URL: https://issues.apache.org/jira/browse/HIVE-1211
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Logging
>Reporter: bc Wong
> Attachments: HIVE-1211.1.patch
>
>
> Stdout/stderr from child processes (e.g. {{MapRedTask}}) are redirected to 
> the parent's stdout/stderr. There is little one can do to to sort out which 
> log is from which query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1095) Hive in Maven

2010-03-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841573#action_12841573
 ] 

He Yongqiang commented on HIVE-1095:


>>Would somebody be able to apply this patch after review (not commit) 
Does this mean the patch need to regenerated after any conflicting changes? 

> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1095-0.4.1.patch, HIVE-1095-Sample.patch, 
> HIVE-1095-trunk.patch
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition

2010-03-04 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1200:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
   0.5.1
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to both trunk and 0.5. Thanks Zheng

> Fix CombineHiveInputFormat to work with multi-level of directories in a 
> single table/partition
> --
>
> Key: HIVE-1200
> URL: https://issues.apache.org/jira/browse/HIVE-1200
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.1, 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.5.1, 0.6.0
>
> Attachments: HIVE-1200.1.branch-0.5.patch, HIVE-1200.1.patch
>
>
> The CombineHiveInputFormat does not work with multi-level of directories in a 
> single table/partition, because it uses an exact match logic, instead of the 
> relativize logic as in MapOperator
> {code}
> MapOperator.java:
>   if 
> (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) {
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1194) sorted merge join

2010-03-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841550#action_12841550
 ] 

Namit Jain commented on HIVE-1194:
--

+1

looks good - will commit if the tests pass

> sorted merge join
> -
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, 
> hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, 
> hive-1194-2010-3-3.patch, hive-1194-2010-3-4.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being 
> performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed 
> map joins also.
> Since, sorted properties of a table are not enforced currently, a new 
> parameter can be added to specify to use the sort-merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1194) sorted merge join

2010-03-04 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1194:
---

Attachment: hive-1194-2010-3-4.patch

attached a new patch 

> sorted merge join
> -
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, 
> hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, 
> hive-1194-2010-3-3.patch, hive-1194-2010-3-4.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being 
> performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed 
> map joins also.
> Since, sorted properties of a table are not enforced currently, a new 
> parameter can be added to specify to use the sort-merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1194) sorted merge join

2010-03-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841442#action_12841442
 ] 

Namit Jain commented on HIVE-1194:
--

I know - the log file is correct, but when I run the tests, I get a diff.

> sorted merge join
> -
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, 
> hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being 
> performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed 
> map joins also.
> Since, sorted properties of a table are not enforced currently, a new 
> parameter can be added to specify to use the sort-merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-936) dynamic partitions creation based on values

2010-03-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-936:


Attachment: (was: dp_design.txt)

> dynamic partitions creation based on values
> ---
>
> Key: HIVE-936
> URL: https://issues.apache.org/jira/browse/HIVE-936
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: dp_design.txt
>
>
> If a Hive table is created as partitioned, DML could only inserted into one 
> partitioin per query. Ideally partitions should be created on the fly based 
> on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' 
> and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each 
> distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= 
> '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= 
> '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-936) dynamic partitions creation based on values

2010-03-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-936:


Attachment: dp_design.txt

Updated design notes after a group discussion.

> dynamic partitions creation based on values
> ---
>
> Key: HIVE-936
> URL: https://issues.apache.org/jira/browse/HIVE-936
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: dp_design.txt, dp_design.txt
>
>
> If a Hive table is created as partitioned, DML could only inserted into one 
> partitioin per query. Ideally partitions should be created on the fly based 
> on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' 
> and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each 
> distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= 
> '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= 
> '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1194) sorted merge join

2010-03-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841414#action_12841414
 ] 

He Yongqiang commented on HIVE-1194:


@namit,

498's join results is in the results:

496 val_496 496 val_496
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
498 val_498 498 val_498
5   val_5   5   val_5
5   val_5   5   val_5
5   val_5   5   val_5
5   val_5   5   val_5
5   val_5   5   val_5
5   val_5   5   val_5
5   val_5   5   val_5
5   val_5   5   val_5
5   val_5   5   val_5
9   val_9   9   val_9


I will add a automatic check query in the test and upload a new one.

> sorted merge join
> -
>
> Key: HIVE-1194
> URL: https://issues.apache.org/jira/browse/HIVE-1194
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1194-2010-02-28.patch, hive-1194-2010-3-2.2.patch, 
> hive-1194-2010-3-2.patch, hive-1194-2010-3-3-2.patch, hive-1194-2010-3-3.patch
>
>
> If the input tables are sorted on the join key, and a mapjoin is being 
> performed, it is useful to exploit the sorted properties of the table.
> This can lead to substantial cpu savings - this needs to work across bucketed 
> map joins also.
> Since, sorted properties of a table are not enforced currently, a new 
> parameter can be added to specify to use the sort-merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.