date:20090713

[jira] Commented: (HIVE-617) Script to start classes with hadoop and hive environment

2009-07-13 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730720#action_12730720
 ] 

Zheng Shao commented on HIVE-617:
-

Looks great!
Can you include TestHIve.java in the patch (and put it in some package 
org.apache.hadoop.hive.examples, etc), and then invoke TestHive with "ant" in 
the "test" target (or maybe add a "test_scripts")?


> Script to start classes with hadoop and hive environment
> 
>
> Key: HIVE-617
> URL: https://issues.apache.org/jira/browse/HIVE-617
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-617.2.patch, hive-617.patch, TestHive.java
>
>
> At times it may be required to write a process that uses both the Hadoop and 
> Hive environment and API. For example, someone may write an application that 
> uses the HIVE api directly. This patch will add a more generic --jar 
> extension that can start any class with the proper environment.  
> RUNJAR=/opt/hive/lib/hive_hwi.jar RUNCLASS=test.TestHive hive --service jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary

2009-07-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730713#action_12730713
 ] 

Namit Jain commented on HIVE-625:
-

It is difficult to generalize - think about a script operator which consumes 
most of the rows, similar to a filter.

We can make it configurable, but I am not sure what the default should be. In 
most of the cases, BinarySortableSerDe should be better.


> Use of BinarySortableSerDe for serialization of the value between map and 
> reduce boundary
> -
>
> Key: HIVE-625
> URL: https://issues.apache.org/jira/browse/HIVE-625
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-625.1.patch
>
>
> We currently use LazySimpleSerDe which serializes double to text format. 
> Before we have LazyBinarySerDe, we should switch to BinarySortableSerDe 
> because that's still much faster than LazySimpleSerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-632) conditionalresolver and conditionalresolverctx should be serialable

2009-07-13 Thread Namit Jain (JIRA)

conditionalresolver and conditionalresolverctx should be serialable
---

 Key: HIVE-632
 URL: https://issues.apache.org/jira/browse/HIVE-632
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.4.0


In future, if the query plans are cached - the resolver and the context also 
needs to be cached. Therefore, they need to be serializable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-592) renaming internal table should rename HDFS and also change path of the table and partitions accordingly.

2009-07-13 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-592:
---

Fix Version/s: 0.4.0
Affects Version/s: 0.4.0
   0.3.1
   0.3.0
   0.2.0
   Status: Patch Available  (was: Open)

fix is little bit involved since we don't want to overwrite existing data.

> renaming internal table should rename HDFS and also change path of the table 
> and partitions accordingly.
> 
>
> Key: HIVE-592
> URL: https://issues.apache.org/jira/browse/HIVE-592
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Fix For: 0.4.0
>
> Attachments: hive-592.2.patch
>
>
> rename table changes just the name of the table in metastore but not hdfs. so 
> if a table with old name is created, it uses the hdfs directory pointing to 
> the renamed table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-592) renaming internal table should rename HDFS and also change path of the table and partitions accordingly.

2009-07-13 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-592:
---

Attachment: hive-592.2.patch

> renaming internal table should rename HDFS and also change path of the table 
> and partitions accordingly.
> 
>
> Key: HIVE-592
> URL: https://issues.apache.org/jira/browse/HIVE-592
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Attachments: hive-592.2.patch
>
>
> rename table changes just the name of the table in metastore but not hdfs. so 
> if a table with old name is created, it uses the hdfs directory pointing to 
> the renamed table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-493) automatically infer existing partitions of table from HDFS files.

2009-07-13 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730693#action_12730693
 ] 

Prasad Chakka commented on HIVE-493:


@edward, you can write an externalutility that keeps checking directories and 
compacting them as necessary.  don't understand 'virtual read only schema'

> automatically infer existing partitions of table from HDFS files.
> -
>
> Key: HIVE-493
> URL: https://issues.apache.org/jira/browse/HIVE-493
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
>
> Initially partition list for a table is inferred from HDFS directory 
> structure instead of looking into metastore (partitions are created using 
> 'alter table ... add partition'). but this automatic inferring was removed to 
> favor the later approach during checking-in metastore checker feature and 
> also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the 
> HDFS directory and let Hive infer rather than explicitly add a partition. But 
> doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition 
> list is merged list of inferred partitions and registered partitions. and 
> duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the 
> inferred partitions? the table schema when the inferred partition is created 
> or the latest tale schema? how do we know the table schema when the inferred 
> partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without 
> actually deleting the data. this feature is not supported and may not be that 
> useful but nevertheless this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes 
> for such partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of 
> Hive. If inferred partitions are preferred then can we live with restricted 
> functionality that this imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-405) Operators should pass ObjectInspector in init instead of forward

2009-07-13 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730685#action_12730685
 ] 

Prasad Chakka commented on HIVE-405:


are you sure about it? that doesn't seem to be correct. parentsObjectInspector 
doesn't seem to have been set in some cases, especially for TableScan (or 
topOps)

> Operators should pass ObjectInspector in init instead of forward
> 
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-405) Operators should pass ObjectInspector in init instead of forward

2009-07-13 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730673#action_12730673
 ] 

Zheng Shao commented on HIVE-405:
-

I think Operator.inputObjectInspectors and Operator.parentsObjectInspector 
contain the same information.
So we don't need to add inputObjectInspectors to Operator.


> Operators should pass ObjectInspector in init instead of forward
> 
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-405) Operators should pass ObjectInspector in init instead of forward

2009-07-13 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730670#action_12730670
 ] 

Prasad Chakka commented on HIVE-405:


why? the name of the argument to initialize() uses input. isn't input same as 
parents?

> Operators should pass ObjectInspector in init instead of forward
> 
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-405) Operators should pass ObjectInspector in init instead of forward

2009-07-13 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730665#action_12730665
 ] 

Zheng Shao commented on HIVE-405:
-

@hive-405.patch:

Please use "parentsObjectInspector" instead of adding "inputObjectInspectors" 
to Operator.java.



> Operators should pass ObjectInspector in init instead of forward
> 
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-605) add new datanucleus jars to .classpath

2009-07-13 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-605:


  Resolution: Fixed
Release Note: HIVE-605. Change eclipse classpath for datanuclues. (Prasad 
Chakka via zshao) 
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Prasad.

> add new datanucleus jars to .classpath
> --
>
> Key: HIVE-605
> URL: https://issues.apache.org/jira/browse/HIVE-605
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Fix For: 0.4.0
>
> Attachments: hive-605.patch
>
>
> HIVE-445 replaced jpox jars with upgraded versions. this is JIRA is to fix 
> the .classpath so that eclipse works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-405) Operators should pass ObjectInspector in init instead of forward

2009-07-13 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-405:
---

Assignee: Prasad Chakka  (was: Zheng Shao)
  Status: Patch Available  (was: Open)

> Operators should pass ObjectInspector in init instead of forward
> 
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
>Priority: Critical
> Attachments: hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-405) Operators should pass ObjectInspector in init instead of forward

2009-07-13 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-405:
---

Attachment: hive-405.patch

> Operators should pass ObjectInspector in init instead of forward
> 
>
> Key: HIVE-405
> URL: https://issues.apache.org/jira/browse/HIVE-405
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Critical
> Attachments: hive-405.patch
>
>
> We are always passing the same ObjectInspector, so there is no need to pass 
> it again and again in forward.
> Also there is a problem that can ONLY be fixed by passing ObjectInspector in 
> init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for 
> all inputs, as a result, there is no way to construct an output 
> ObjectInspector based on the inputs. Currently we have hard-coded code that 
> assumes joins are always outputting Strings, which did break but was hidden 
> by the old framework (because we do toString() when serializing the output, 
> and toString() is defined for all Java Classes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-329) start and stop hive thrift server in daemon mode

2009-07-13 Thread Min Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou reassigned HIVE-329:
-

Assignee: Min Zhou

> start and stop hive thrift server  in daemon mode
> -
>
> Key: HIVE-329
> URL: https://issues.apache.org/jira/browse/HIVE-329
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Affects Versions: 0.3.0
>Reporter: Min Zhou
>Assignee: Min Zhou
> Attachments: daemon.patch
>
>
> I write two shell script to start and stop hive thrift server more convenient.
> usage:
> bin/hive --service start-hive [HIVE_PORT]
> bin/hive --service stop-hive 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730648#action_12730648
 ] 

Namit Jain commented on HIVE-623:
-

Can you make the check as a function in 
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUtils.java ?

> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Assignee: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch, HIVE-623.2.patch, HIVE-623.3.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Emil Ibrishimov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ibrishimov updated HIVE-623:
-

Attachment: HIVE-623.3.patch

OK, now I check it with a bit mask.

> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Assignee: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch, HIVE-623.2.patch, HIVE-623.3.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-628) remove stray lines from hive-default.xml

2009-07-13 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-628:


   Resolution: Fixed
Fix Version/s: 0.4.0
 Release Note: HIVE-628. Remove stray lines from hive-default.xml. (Prasad 
Chakka via zshao) 
   Status: Resolved  (was: Patch Available)

Thanks Prasad.

> remove stray lines from hive-default.xml
> 
>
> Key: HIVE-628
> URL: https://issues.apache.org/jira/browse/HIVE-628
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.4.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Fix For: 0.4.0
>
> Attachments: hive-628.patch
>
>
> some stray lines were left behind in the hive-default.xml from HIVE-610. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-629) concat files needed for map-reduce jobs also

2009-07-13 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730638#action_12730638
 ] 

Zheng Shao commented on HIVE-629:
-

Agreed. If we turn it on for map-reduce job, the threshold should be much less 
- maybe 128MB per file or even 64MB per file.

This makes sure it only happens for rare cases (so it does not affect the 
performance of most queries).


> concat files needed for map-reduce jobs also
> 
>
> Key: HIVE-629
> URL: https://issues.apache.org/jira/browse/HIVE-629
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Currently, hive concatenates files only if the job under consideration is a 
> map-only job. 
> I got some requests from some users, where they want this behavior for 
> map-reduce jobs also - it may not be a good idea to turn it on by default.
> But, we should provide an option to the user where the concatenation can 
> happen even for map-reduce jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730637#action_12730637
 ] 

Namit Jain commented on HIVE-623:
-

Otherwise, the code changes look good

> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Assignee: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch, HIVE-623.2.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730634#action_12730634
 ] 

Namit Jain commented on HIVE-623:
-

yes, you should make that change and submit the patch

> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Assignee: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch, HIVE-623.2.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-623:
---

Assignee: Emil Ibrishimov

> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Assignee: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch, HIVE-623.2.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730632#action_12730632
 ] 

Zheng Shao commented on HIVE-623:
-

Text.charAt(i) is actually very expensive - it calls ByteBuffer.wrap which 
creates a new object.
Can we directly use the UTF-8 encoding map to test whether a byte is a trailing 
byte or not (for both reverse and length)


> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch, HIVE-623.2.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Emil Ibrishimov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ibrishimov updated HIVE-623:
-

Attachment: HIVE-623.2.patch

changed

> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch, HIVE-623.2.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-626) Typecast bug in Join operator

2009-07-13 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao resolved HIVE-626.
-

   Resolution: Fixed
Fix Version/s: 0.4.0
 Release Note: HIVE-626. Fix Column Pruner column order bug. (Yongqiang He 
via zshao)
 Hadoop Flags: [Reviewed]

Committed. Thanks Yonqqiang!


> Typecast bug in Join operator
> -
>
> Key: HIVE-626
> URL: https://issues.apache.org/jira/browse/HIVE-626
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Fix For: 0.4.0
>
> Attachments: hive-626-2009-07-13-2.patch, hive-626-2009-07-13.patch, 
> HIVE-626.1.showinfo.patch, HIVE-626.2.showinfo_disable_cp.patch
>
>
> There is a type cast error in Join operator. Produced by the following steps:
> {code}
> create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b 
> string,
> foo_c string, foo_d string) row format delimited fields terminated by ','
> stored as textfile;
> create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name
> string, bar_a string, bar_b string, bar_c string, bar_d string) row format
> delimited fields terminated by ',' stored as textfile;
> create table zshao_count (bar_id int, n int) row format delimited fields
> terminated by ',' stored as textfile;
> Each table has a single row as follows:
> zshao_foo:
> 1,foo1,a,b,c,d
> zshao_bar:
> 10,0,1,1,bar10,a,b,c,d
> zshao_count:
> 10,2
> load data local inpath 'zshao_foo' overwrite into table zshao_foo;
> load data local inpath 'zshao_bar' overwrite into table zshao_bar;
> load data local inpath 'zshao_count' overwrite into table zshao_count;
> explain extended
> select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join 
> zshao_bar on zshao_foo.foo_id =
> zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
> {code}
> The case is from David Lerman.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-438) Make hive work with apache thrift

2009-07-13 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-438.
-

   Resolution: Fixed
Fix Version/s: 0.4.0
 Hadoop Flags: [Reviewed]

Committed. Thanks Raghu.

> Make hive work with apache thrift
> -
>
> Key: HIVE-438
> URL: https://issues.apache.org/jira/browse/HIVE-438
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Raghotham Murthy
> Fix For: 0.4.0
>
> Attachments: hive-438.1.patch, HIVE-438.2.patch, hive-438.3.patch, 
> hive-438.serde.patch, libfb303.jar, libthrift.jar
>
>
> The following changes have to be made in hive to get it working with any 
> thrift in apache.
> - change libthrift.jar
> - change com.facebook.thrift to org.apache.thrift
> - handle some incompatible changes
> - fix some of the warnings

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730602#action_12730602
 ] 

Namit Jain commented on HIVE-623:
-

In the test udf_reverse.q, can you change the test to select count(1) where 
reverse() ===...

The string cannot be printed - look at the test inputddl5.q

> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-626) Typecast bug in Join operator

2009-07-13 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao reassigned HIVE-626:
---

Assignee: He Yongqiang

> Typecast bug in Join operator
> -
>
> Key: HIVE-626
> URL: https://issues.apache.org/jira/browse/HIVE-626
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-626-2009-07-13-2.patch, hive-626-2009-07-13.patch, 
> HIVE-626.1.showinfo.patch, HIVE-626.2.showinfo_disable_cp.patch
>
>
> There is a type cast error in Join operator. Produced by the following steps:
> {code}
> create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b 
> string,
> foo_c string, foo_d string) row format delimited fields terminated by ','
> stored as textfile;
> create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name
> string, bar_a string, bar_b string, bar_c string, bar_d string) row format
> delimited fields terminated by ',' stored as textfile;
> create table zshao_count (bar_id int, n int) row format delimited fields
> terminated by ',' stored as textfile;
> Each table has a single row as follows:
> zshao_foo:
> 1,foo1,a,b,c,d
> zshao_bar:
> 10,0,1,1,bar10,a,b,c,d
> zshao_count:
> 10,2
> load data local inpath 'zshao_foo' overwrite into table zshao_foo;
> load data local inpath 'zshao_bar' overwrite into table zshao_bar;
> load data local inpath 'zshao_count' overwrite into table zshao_count;
> explain extended
> select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join 
> zshao_bar on zshao_foo.foo_id =
> zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
> {code}
> The case is from David Lerman.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-631) TestParse dies

2009-07-13 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-631:


  Resolution: Fixed
Release Note: HIVE-631. Fix TestParse. (Namit Jain via zshao)
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Namit.

> TestParse dies 
> ---
>
> Key: HIVE-631
> URL: https://issues.apache.org/jira/browse/HIVE-631
> Project: Hadoop Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.4.0
>
> Attachments: hive.631.1.patch
>
>
> TestParse is dying - 
> This is probably not working for a long time, and needs to be fixed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-623) Optimize length and reverse UDFs

2009-07-13 Thread Emil Ibrishimov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ibrishimov updated HIVE-623:
-

Attachment: HIVE-623.1.patch

Changes in this patch:
1) no strings are used for length() and reverse()
2) non-ascii tests added to udf_length.q and udf_reverse.q
3) zero-length tests added to udf_length.q

> Optimize length and reverse UDFs
> 
>
> Key: HIVE-623
> URL: https://issues.apache.org/jira/browse/HIVE-623
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Emil Ibrishimov
>Priority: Minor
> Attachments: HIVE-623.1.patch
>
>
> Parse Text instead of converting to String in "length", "reverse" and other 
> UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-627) Optimizer should only access RowSchema (and not RowResolver)

2009-07-13 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730561#action_12730561
 ] 

Zheng Shao commented on HIVE-627:
-

Will we be able to clean up the whole OperatorCompile before optimization?

Also I feel that 1 and 2 can be done together but 3 can be separate by itself. 
We can do 1 and 2 in a separate issue.


The changes in 3 includes:
3.1. clean up opParseContext before we enter into optimization step;
3.2. make ColumnPruner use RowSchema instead of RowResolver.


> Optimizer should only access RowSchema (and not RowResolver)
> 
>
> Key: HIVE-627
> URL: https://issues.apache.org/jira/browse/HIVE-627
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>
> The column pruner is accessing RowResolver a lot of times, for things like 
> reverseLookup, and get(alias, column).
> These are not necessary - we should not need to translate an internal name to 
> (alias, column) and then translate back. We should be able to use internal 
> name from one operator to the other, using RowSchema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-627) Optimizer should only access RowSchema (and not RowResolver)

2009-07-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730557#action_12730557
 ] 

Namit Jain commented on HIVE-627:
-

We can do the following cleanups:

1. Include a new field: OperatorCompile, which is a transient field, in the 
Operator
2. This can include the RowResolver - we can get rid of the whole hashmap - 
opParseContext. If other operators need more compile time information, they can 
extend OperatorCompile.
3. All resolutions have already happened till the time optimizer is called. So, 
we don't need to do any name resolution at that time. However, in order to 
ensure that, we should just cleanup
   the row resolver from OperatorCompile before calling the optimizer.






> Optimizer should only access RowSchema (and not RowResolver)
> 
>
> Key: HIVE-627
> URL: https://issues.apache.org/jira/browse/HIVE-627
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>
> The column pruner is accessing RowResolver a lot of times, for things like 
> reverseLookup, and get(alias, column).
> These are not necessary - we should not need to translate an internal name to 
> (alias, column) and then translate back. We should be able to use internal 
> name from one operator to the other, using RowSchema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-631) TestParse dies

2009-07-13 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-631:


Status: Patch Available  (was: Open)

> TestParse dies 
> ---
>
> Key: HIVE-631
> URL: https://issues.apache.org/jira/browse/HIVE-631
> Project: Hadoop Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.4.0
>
> Attachments: hive.631.1.patch
>
>
> TestParse is dying - 
> This is probably not working for a long time, and needs to be fixed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-631) TestParse dies

2009-07-13 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-631:


Attachment: hive.631.1.patch

> TestParse dies 
> ---
>
> Key: HIVE-631
> URL: https://issues.apache.org/jira/browse/HIVE-631
> Project: Hadoop Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.4.0
>
> Attachments: hive.631.1.patch
>
>
> TestParse is dying - 
> This is probably not working for a long time, and needs to be fixed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-631) TestParse dies

2009-07-13 Thread Namit Jain (JIRA)

TestParse dies 
---

 Key: HIVE-631
 URL: https://issues.apache.org/jira/browse/HIVE-631
 Project: Hadoop Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.4.0


TestParse is dying - 

This is probably not working for a long time, and needs to be fixed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-627) Optimizer should only access RowSchema (and not RowResolver)

2009-07-13 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730495#action_12730495
 ] 

Zheng Shao commented on HIVE-627:
-

Talked with Namit offline. Namit can add more details, but the synopsis is that 
we should be able to do it, and the best way is to wipe out the RowResolvers 
before we enter into the optimization phase.


> Optimizer should only access RowSchema (and not RowResolver)
> 
>
> Key: HIVE-627
> URL: https://issues.apache.org/jira/browse/HIVE-627
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>
> The column pruner is accessing RowResolver a lot of times, for things like 
> reverseLookup, and get(alias, column).
> These are not necessary - we should not need to translate an internal name to 
> (alias, column) and then translate back. We should be able to use internal 
> name from one operator to the other, using RowSchema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-396) Hive performance benchmarks

2009-07-13 Thread Yuntao Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuntao Jia updated HIVE-396:


Attachment: (was: benchmark_report_2009-07-03.pdf)

> Hive performance benchmarks
> ---
>
> Key: HIVE-396
> URL: https://issues.apache.org/jira/browse/HIVE-396
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
> Attachments: hive_benchmark_2009-06-18.pdf, 
> hive_benchmark_2009-06-18.tar.gz, hive_benchmark_2009-07-12.pdf, 
> hive_benchmark_2009-07-12.tar.gz
>
>
> We need some performance benchmark to measure and track the performance 
> improvements of Hive.
> Some references:
> PIG performance benchmarks PIG-200
> PigMix: http://wiki.apache.org/pig/PigMix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-396) Hive performance benchmarks

2009-07-13 Thread Yuntao Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuntao Jia updated HIVE-396:


Attachment: (was: pig_queries.tar.gz)

> Hive performance benchmarks
> ---
>
> Key: HIVE-396
> URL: https://issues.apache.org/jira/browse/HIVE-396
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
> Attachments: hive_benchmark_2009-06-18.pdf, 
> hive_benchmark_2009-06-18.tar.gz, hive_benchmark_2009-07-12.pdf, 
> hive_benchmark_2009-07-12.tar.gz
>
>
> We need some performance benchmark to measure and track the performance 
> improvements of Hive.
> Some references:
> PIG performance benchmarks PIG-200
> PigMix: http://wiki.apache.org/pig/PigMix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-396) Hive performance benchmarks

2009-07-13 Thread Yuntao Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuntao Jia updated HIVE-396:


Attachment: hive_benchmark_2009-07-12.pdf
hive_benchmark_2009-07-12.tar.gz

The latest Hive benchmark that covers a few things:

1. Includes timing results that use Lzo compression to compress the 
intermediate map output data.
2. Includes timing results without compression of the intermediate map output 
data.
3. Includes the cluster hardware and software information.
4. Includes updated Hive benchmark queries
5. Includes updated PIG benchmark queries
6. Includes updated hadoop job source code. 
7. A few other minor changes, such as README etc. 

> Hive performance benchmarks
> ---
>
> Key: HIVE-396
> URL: https://issues.apache.org/jira/browse/HIVE-396
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
> Attachments: benchmark_report_2009-07-03.pdf, 
> hive_benchmark_2009-06-18.pdf, hive_benchmark_2009-06-18.tar.gz, 
> hive_benchmark_2009-07-12.pdf, hive_benchmark_2009-07-12.tar.gz, 
> pig_queries.tar.gz
>
>
> We need some performance benchmark to measure and track the performance 
> improvements of Hive.
> Some references:
> PIG performance benchmarks PIG-200
> PigMix: http://wiki.apache.org/pig/PigMix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-630) list collapsing and exploding missing in hive

2009-07-13 Thread Namit Jain (JIRA)

list collapsing and exploding missing in hive 
--

 Key: HIVE-630
 URL: https://issues.apache.org/jira/browse/HIVE-630
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Emil Ibrishimov


There is no way to collapse multiple elements in a list or explode a list.
It would be very useful to have this support.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-522) GenericUDAF: Extend UDAF to deal with complex types

2009-07-13 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-522:


Attachment: HIVE-522.8.patch

Merged with trunk again.

> GenericUDAF: Extend UDAF to deal with complex types
> ---
>
> Key: HIVE-522
> URL: https://issues.apache.org/jira/browse/HIVE-522
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-522.1.patch, HIVE-522.2.patch, HIVE-522.3.patch, 
> HIVE-522.4.patch, HIVE-522.5.patch, HIVE-522.6.patch, HIVE-522.8.patch
>
>
> We can pass arbitrary arguments into GenericUDFs. We should do the same thing 
> to GenericUDAF so that UDAF can also take arbitrary arguments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-07-13 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-555.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed. Thanks Min

> create temporary function support not only udf, but also udaf,  genericudf, 
> etc.
> 
>
> Key: HIVE-555
> URL: https://issues.apache.org/jira/browse/HIVE-555
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Min Zhou
>Assignee: Min Zhou
> Fix For: 0.4.0
>
> Attachments: HIVE-555-1.patch, HIVE-555-2.patch, HIVE-555-3.patch, 
> HIVE-555-4.patch
>
>
> Right now, command 'create temporary function' only support  udf. 
> we can also let user write their udaf, generic udf, and write generic udaf in 
> the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-617) Script to start classes with hadoop and hive environment

2009-07-13 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-617:
-

Attachment: hive-617.2.patch

Added License header. Added more verbose help comments. Fixed help instructions.

> Script to start classes with hadoop and hive environment
> 
>
> Key: HIVE-617
> URL: https://issues.apache.org/jira/browse/HIVE-617
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-617.2.patch, hive-617.patch, TestHive.java
>
>
> At times it may be required to write a process that uses both the Hadoop and 
> Hive environment and API. For example, someone may write an application that 
> uses the HIVE api directly. This patch will add a more generic --jar 
> extension that can start any class with the proper environment.  
> RUNJAR=/opt/hive/lib/hive_hwi.jar RUNCLASS=test.TestHive hive --service jar

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-629) concat files needed for map-reduce jobs also

2009-07-13 Thread Namit Jain (JIRA)

concat files needed for map-reduce jobs also


 Key: HIVE-629
 URL: https://issues.apache.org/jira/browse/HIVE-629
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.4.0
Reporter: Namit Jain
Assignee: Namit Jain


Currently, hive concatenates files only if the job under consideration is a 
map-only job. 

I got some requests from some users, where they want this behavior for 
map-reduce jobs also - it may not be a good idea to turn it on by default.
But, we should provide an option to the user where the concatenation can happen 
even for map-reduce jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-628) remove stray lines from hive-default.xml

2009-07-13 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-628:
---

Status: Patch Available  (was: Open)

> remove stray lines from hive-default.xml
> 
>
> Key: HIVE-628
> URL: https://issues.apache.org/jira/browse/HIVE-628
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.4.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Attachments: hive-628.patch
>
>
> some stray lines were left behind in the hive-default.xml from HIVE-610. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-628) remove stray lines from hive-default.xml

2009-07-13 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-628:
---

Attachment: hive-628.patch

> remove stray lines from hive-default.xml
> 
>
> Key: HIVE-628
> URL: https://issues.apache.org/jira/browse/HIVE-628
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.4.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Attachments: hive-628.patch
>
>
> some stray lines were left behind in the hive-default.xml from HIVE-610. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-628) remove stray lines from hive-default.xml

2009-07-13 Thread Prasad Chakka (JIRA)

remove stray lines from hive-default.xml


 Key: HIVE-628
 URL: https://issues.apache.org/jira/browse/HIVE-628
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.4.0
Reporter: Prasad Chakka
Assignee: Prasad Chakka


some stray lines were left behind in the hive-default.xml from HIVE-610. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.

2009-07-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730419#action_12730419
 ] 

Namit Jain commented on HIVE-555:
-

+1

looks good - am running tests right now

> create temporary function support not only udf, but also udaf,  genericudf, 
> etc.
> 
>
> Key: HIVE-555
> URL: https://issues.apache.org/jira/browse/HIVE-555
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Min Zhou
>Assignee: Min Zhou
> Fix For: 0.4.0
>
> Attachments: HIVE-555-1.patch, HIVE-555-2.patch, HIVE-555-3.patch, 
> HIVE-555-4.patch
>
>
> Right now, command 'create temporary function' only support  udf. 
> we can also let user write their udaf, generic udf, and write generic udaf in 
> the future. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-473) Clean up after tests

2009-07-13 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730401#action_12730401
 ] 

Namit Jain commented on HIVE-473:
-

Zheng, did you mean the above comment for some other jira ?

> Clean up after tests
> 
>
> Key: HIVE-473
> URL: https://issues.apache.org/jira/browse/HIVE-473
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Johan Oskarsson
>Assignee: Johan Oskarsson
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: HIVE-473.patch
>
>
> The test suite creates a lot of temporary files that aren't cleaned up. For 
> example plan xml files, mapred/local and mapred/system files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-626) Typecast bug in Join operator

2009-07-13 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-626:
--

Attachment: hive-626-2009-07-13-2.patch

Added a new testcase and removed keyinfor and valueinfo from explain plan. 
Passed tests in my local.

> Typecast bug in Join operator
> -
>
> Key: HIVE-626
> URL: https://issues.apache.org/jira/browse/HIVE-626
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Zheng Shao
> Attachments: hive-626-2009-07-13-2.patch, hive-626-2009-07-13.patch, 
> HIVE-626.1.showinfo.patch, HIVE-626.2.showinfo_disable_cp.patch
>
>
> There is a type cast error in Join operator. Produced by the following steps:
> {code}
> create table zshao_foo (foo_id int, foo_name string, foo_a string, foo_b 
> string,
> foo_c string, foo_d string) row format delimited fields terminated by ','
> stored as textfile;
> create table zshao_bar (bar_id int, bar_0 int, foo_id int, bar_1 int, bar_name
> string, bar_a string, bar_b string, bar_c string, bar_d string) row format
> delimited fields terminated by ',' stored as textfile;
> create table zshao_count (bar_id int, n int) row format delimited fields
> terminated by ',' stored as textfile;
> Each table has a single row as follows:
> zshao_foo:
> 1,foo1,a,b,c,d
> zshao_bar:
> 10,0,1,1,bar10,a,b,c,d
> zshao_count:
> 10,2
> load data local inpath 'zshao_foo' overwrite into table zshao_foo;
> load data local inpath 'zshao_bar' overwrite into table zshao_bar;
> load data local inpath 'zshao_count' overwrite into table zshao_count;
> explain extended
> select zshao_foo.foo_name, zshao_bar.bar_name, n from zshao_foo join 
> zshao_bar on zshao_foo.foo_id =
> zshao_bar.foo_id join zshao_count on zshao_count.bar_id = zshao_bar.bar_id;
> {code}
> The case is from David Lerman.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

49 matches

Mail list logo