[jira] Updated: (HIVE-868) add last ddl time and dml time for table/partition

2009-10-05 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-868:


Attachment: hive.868.3.patch

> add last ddl time and dml time for table/partition
> --
>
> Key: HIVE-868
> URL: https://issues.apache.org/jira/browse/HIVE-868
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.868.1.patch, hive.868.2.patch, hive.868.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-545) Use ArrayList instead of Vector in single-threaded Hive code

2009-10-05 Thread Cyrus Katrak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762496#action_12762496
 ] 

Cyrus Katrak commented on HIVE-545:
---

Patch attached. Tests still running, but looks good so far.
Sorry about the size of this, what started off as a quick fix sprawled into an 
gargantuan search and replace of epic proportions:

-Replaced most instances of Vector with ArrayList
-Replaced most ArrayList variable declarations with List (ArrayList a = new 
ArrayList -> List a = new ArrayList)
-Replaced most ArrayList method argument declarations with List ( 
doWork(ArrayList a) -> doWork(List a) )
-Replaced most ArrayList template type declarations with List 
(Map> -> Map>)

> Use ArrayList instead of Vector in single-threaded Hive code
> 
>
> Key: HIVE-545
> URL: https://issues.apache.org/jira/browse/HIVE-545
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-545.patch
>
>
> Most of the Hive code is single-threaded, but sometimes we are using Vector 
> instead of the more efficient ArrayList.
> See http://java.sun.com/j2se/1.5.0/docs/api/java/util/ArrayList.html
> "This class (ArrayList) is roughly equivalent to Vector, except that it is 
> unsynchronized."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-545) Use ArrayList instead of Vector in single-threaded Hive code

2009-10-05 Thread Cyrus Katrak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Katrak updated HIVE-545:
--

Attachment: HIVE-545.patch

> Use ArrayList instead of Vector in single-threaded Hive code
> 
>
> Key: HIVE-545
> URL: https://issues.apache.org/jira/browse/HIVE-545
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-545.patch
>
>
> Most of the Hive code is single-threaded, but sometimes we are using Vector 
> instead of the more efficient ArrayList.
> See http://java.sun.com/j2se/1.5.0/docs/api/java/util/ArrayList.html
> "This class (ArrayList) is roughly equivalent to Vector, except that it is 
> unsynchronized."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-682) add UDF concat_ws

2009-10-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762475#action_12762475
 ] 

Namit Jain commented on HIVE-682:
-

Index: ql/src/test/queries/clientpositive/udf_concat_ws.q
===
--- ql/src/test/queries/clientpositive/udf_concat_ws.q  (revision 0)
+++ ql/src/test/queries/clientpositive/udf_concat_ws.q  (revision 0)
@@ -0,0 +1,15 @@
+CREATE TABLE dest1(c1 STRING, c2 STRING, c3 STRING);
+
+FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
src.key = 86;
+
+EXPLAIN
+SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
+   concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
+   concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
+   concat_ws('**', dest1.c1, NULL, dest1.c3) FROM dest1;
+
+SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
+   concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
+   concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
+   concat_ws('**', dest1.c1, NULL, dest1.c3) FROM dest1;




Please add the following in the above test:

describe function concat_ws;
describe function extended concat_ws;



> add UDF concat_ws
> -
>
> Key: HIVE-682
> URL: https://issues.apache.org/jira/browse/HIVE-682
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Jonathan Chang
> Attachments: concat_ws.patch, concat_ws.patch, concat_ws.patch
>
>
> add UDF concat_ws
> look at 
> http://dev.mysql.com/doc/refman/5.0/en/func-op-summary-ref.html
> for details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-682) add UDF concat_ws

2009-10-05 Thread Jonathan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Chang updated HIVE-682:


Attachment: concat_ws.patch

This patch rebases a testcase; now all tests pass.

> add UDF concat_ws
> -
>
> Key: HIVE-682
> URL: https://issues.apache.org/jira/browse/HIVE-682
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Jonathan Chang
> Attachments: concat_ws.patch, concat_ws.patch, concat_ws.patch
>
>
> add UDF concat_ws
> look at 
> http://dev.mysql.com/doc/refman/5.0/en/func-op-summary-ref.html
> for details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-682) add UDF concat_ws

2009-10-05 Thread Jonathan Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762470#action_12762470
 ] 

Jonathan Chang commented on HIVE-682:
-

I thought I had already added a description and extended description via 
@description.  Could you elaborate on what else I should be doing?

> add UDF concat_ws
> -
>
> Key: HIVE-682
> URL: https://issues.apache.org/jira/browse/HIVE-682
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Jonathan Chang
> Attachments: concat_ws.patch, concat_ws.patch, concat_ws.patch
>
>
> add UDF concat_ws
> look at 
> http://dev.mysql.com/doc/refman/5.0/en/func-op-summary-ref.html
> for details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-867) Add add UDFs found in mysq

2009-10-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-867:
-

Attachment: hive-867-1.diff

Not complete but, someone might want to take a look and make comments.

> Add add UDFs found in mysq
> --
>
> Key: HIVE-867
> URL: https://issues.apache.org/jira/browse/HIVE-867
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: hive-867-1.diff
>
>
> Some UDF's that mysql has that hive does not. 
> atan
> aes_decrypt
> aes_encrypt
> bit_and
> bit_count
> bit_length
> bit_or
> bit_xor
> char_length
> char
> character_length
> collation
> compress
> crc32
> encode
> encrypt
> format
> greatest
> in
> inet_oton
> inet_ntoa
> match
> md5
> oct
> ord
> pi
> radians
> sha1 _sha
> sign
> sleep
> truncate

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-784) Support subqueries in Hive

2009-10-05 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762453#action_12762453
 ] 

Ning Zhang commented on HIVE-784:
-

Created a new task HIVE-870 for semi-join. Hive users should be able to specify 
semi-join in their queries just as INNER/OUTER joins. 

> Support subqueries in Hive
> --
>
> Key: HIVE-784
> URL: https://issues.apache.org/jira/browse/HIVE-784
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> Hive currently only support views in the FROM-clause, some Facebook use cases 
> suggest that Hive should support subqueries such as those connected by 
> IN/EXISTS in the WHERE-clause. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-870) semi joins

2009-10-05 Thread Ning Zhang (JIRA)
semi joins
--

 Key: HIVE-870
 URL: https://issues.apache.org/jira/browse/HIVE-870
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang
Assignee: Ning Zhang


Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example,

select * 
from A
where A.id IN 
   (select id
from B
where B.date> '2009-10-01');

returns from A whose ID is in the set of IDs found in B, whose date is greater 
than a certain date. This query can be unnested using a INNER join or LEFT 
OUTER JOIN, but we need to deduplicate the IDs returned by the subquery on 
table B. The semantics of LEFT SEMI JOIN is that as long as there is ANY row in 
the right-hand table that matches the join key, the left-hand table row will be 
emitted as a result w/o necessarily looking further in the right-hand table for 
further matches. This is exactly the semantics of the IN subquery. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-859) Cleanup HWI build.xml

2009-10-05 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-859:


   Resolution: Fixed
Fix Version/s: 0.5.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Edward

> Cleanup HWI build.xml
> -
>
> Key: HIVE-859
> URL: https://issues.apache.org/jira/browse/HIVE-859
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure, Web UI
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.5.0
>
> Attachments: hive-859.diff, hive-859.diff
>
>
> There are a few issues with hwi/build.xml. To name a few: 
> * war gets written to build/ not build/hwi
> * no apache header
> * war should be its own target
> * some unnecessary definitions/redefinitions  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-865) mapjoin: memory leak for same key with very large number of values

2009-10-05 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-865.
-

   Resolution: Fixed
Fix Version/s: 0.5.0
 Hadoop Flags: [Reviewed]

Committed. Thanks Ning

> mapjoin: memory leak for same key with very large number of values
> --
>
> Key: HIVE-865
> URL: https://issues.apache.org/jira/browse/HIVE-865
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.5.0
>
> Attachments: HIVE-865.patch, HIVE-865_2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-869) Allow ScriptOperator to consume not all input data

2009-10-05 Thread Zheng Shao (JIRA)
Allow ScriptOperator to consume not all input data
--

 Key: HIVE-869
 URL: https://issues.apache.org/jira/browse/HIVE-869
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Zheng Shao
 Fix For: 0.5.0


The ScriptOperator (SELECT TRANSFORM(a, b, c) USING 'myscript' AS (d, e, f) 
...) has a problem:
If the user script exits without consuming all data from standard input, then 
we will report an error even if the exit code from the user script is 0.

We want to have an option, when enabled, ScriptOperator will return 
successfully in that case.

If the option is not enabled, then we should stick to the current behavior.

The option can be called: "hive.script.ignore.pipe.broken.exception".


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Amazon Elastic MapReduce Supports Apache Hive

2009-10-05 Thread Andrew Hitchcock
Greetings,

We are excited to announce that Amazon Elastic MapReduce now supports Apache
Hive – making the service even more compelling for large data set processing
and analytics.  Hive is an open source data warehouse and analytics package
that runs on top of Hadoop. Hive is operated by a SQL-based language called
Hive QL that allows users to structure, summarize, and query data sources
stored in Amazon S3. Hive QL goes beyond standard SQL, adding first-class
support for map/reduce functions and complex extensible user defined data
types like Json and Thrift. This capability allows processing of complex and
unstructured data sources, such as text documents, and log files, in
applications such as data mining or click stream analysis.  Hive also allows
user extensions via user-defined functions written in Java and deployed via
storage in Amazon S3.

Here are some resources to help you get started:

   - Tutorial: *Running Hive on Amazon ElasticMap
Reduce
   *
   - Video a video
tutorial
   - Sample application: Operating a Data Warehouse with Hive Amazon Elastic
   MapReduce and Amazon
SimpleDB.
   * *

Sincerely,

The Amazon Elastic MapReduce Team


[jira] Updated: (HIVE-868) add last ddl time and dml time for table/partition

2009-10-05 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-868:


Attachment: hive.868.2.patch

fixed test outputs

> add last ddl time and dml time for table/partition
> --
>
> Key: HIVE-868
> URL: https://issues.apache.org/jira/browse/HIVE-868
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.868.1.patch, hive.868.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-660) Fix UDFLike for multi-line inputs

2009-10-05 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-660:


Status: Open  (was: Patch Available)

> Fix UDFLike for multi-line inputs
> -
>
> Key: HIVE-660
> URL: https://issues.apache.org/jira/browse/HIVE-660
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-660.1.patch
>
>
> We should use DOTALL option in UDFLike, because '%' and '_' should also match 
> to the newline.
> See 
> http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#DOTALL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-682) add UDF concat_ws

2009-10-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762354#action_12762354
 ] 

Namit Jain commented on HIVE-682:
-

Have you run all the tests ? 
There is a test show_functions.q which should diff after your changes - 
Also, add describe and describe extended for the new function that you have 
created

> add UDF concat_ws
> -
>
> Key: HIVE-682
> URL: https://issues.apache.org/jira/browse/HIVE-682
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Jonathan Chang
> Attachments: concat_ws.patch, concat_ws.patch
>
>
> add UDF concat_ws
> look at 
> http://dev.mysql.com/doc/refman/5.0/en/func-op-summary-ref.html
> for details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-865) mapjoin: memory leak for same key with very large number of values

2009-10-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762348#action_12762348
 ] 

Namit Jain commented on HIVE-865:
-

+1

looks good - will commit if the tests pass

> mapjoin: memory leak for same key with very large number of values
> --
>
> Key: HIVE-865
> URL: https://issues.apache.org/jira/browse/HIVE-865
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-865.patch, HIVE-865_2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-784) Support subqueries in Hive

2009-10-05 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762346#action_12762346
 ] 

Ning Zhang commented on HIVE-784:
-

Subqueries can be correlated or uncorrelated to its outer queries. If it is 
uncorrelated, the subqueries can be treated as a constant (set of rows in the 
case of IN, or boolean in the case of EXISTS). It is the easiest case where 
subqueries can be evaluated once and results don't change. It is the correlated 
subquries that are more common and much more complex. 

In general, correlated subqueries can be treated as a function, where the 
correlated variables can be treated as the input parameters. For example, the 
following nested subquery

select * 
from A
where exists (
select null 
from B
where B.id = A.id 
and B.date > '2009-10-01') 

can be treated as a function where the passing parameter is the correlated 
variable A.id. A native plan is to evaluate this function for every row in A. 
This is equivalent to a nested-loop join (semi) between A and B. A better 
evaluation plan is to evaluate the subquery every time the input parameter 
changes value. A better generalization is to rewrite the whole query to unnest 
the subquery into a semi-join between A and B. Then there are more join 
algorithms to choose from. A whole body of database research papers are 
dedicated to rewrite rules on unnesting subqueries to joins. 

There are also cases that the nested subqueries cannot be unnested into joins, 
particularly for those subqueries involving aggretations. 

As the first step, we will working on cases where subquries are uncorrelated or 
they can be unnested into semi-joins.  

> Support subqueries in Hive
> --
>
> Key: HIVE-784
> URL: https://issues.apache.org/jira/browse/HIVE-784
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> Hive currently only support views in the FROM-clause, some Facebook use cases 
> suggest that Hive should support subqueries such as those connected by 
> IN/EXISTS in the WHERE-clause. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-554) Add GenericUDF to create arrays, maps, and structs

2009-10-05 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762340#action_12762340
 ] 

Zheng Shao commented on HIVE-554:
-

array and map are simpler - we can work on that first before getting into 
struct.


> Add GenericUDF to create arrays, maps, and structs
> --
>
> Key: HIVE-554
> URL: https://issues.apache.org/jira/browse/HIVE-554
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>
> Here is an example:
> {code}
> SELECT array(1,2,3)[3], map("a":1,"b":2,"c":3)["a"], struct(user_id:3, 
> revenue: sum(rev))
> FROM table
> GROUP BY user_id;
> {code}
> This is relatively easy to do with the GenericUDF framework, and will greatly 
> increase the flexibility of the language.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-865) mapjoin: memory leak for same key with very large number of values

2009-10-05 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-865:


Attachment: HIVE-865_2.patch

Cleaned up the comments in HIVE-865_2.patch

> mapjoin: memory leak for same key with very large number of values
> --
>
> Key: HIVE-865
> URL: https://issues.apache.org/jira/browse/HIVE-865
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-865.patch, HIVE-865_2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-236) RLIKE/REGEXP should allow matching partial strings

2009-10-05 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762334#action_12762334
 ] 

Zheng Shao commented on HIVE-236:
-

Looks great! Will commit after testing.


> RLIKE/REGEXP should allow matching partial strings
> --
>
> Key: HIVE-236
> URL: https://issues.apache.org/jira/browse/HIVE-236
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Paul Yang
> Attachments: HIVE-236.1.patch, HIVE-236.2.patch, HIVE-236.3.patch
>
>
> The current behavior is that the regexp needs to match the whole string.
> But from mysql: ( 
> http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp )
> mysql> SELECT 'fofo' REGEXP '^fo';  -> 1
> We need to make it work the same way as MySQL.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-236) RLIKE/REGEXP should allow matching partial strings

2009-10-05 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-236:
---

Attachment: HIVE-236.3.patch

Modified REGEXP to log only the first time when regexp is empty.
Consolidated tests in udf_regexp.q into a single query.

> RLIKE/REGEXP should allow matching partial strings
> --
>
> Key: HIVE-236
> URL: https://issues.apache.org/jira/browse/HIVE-236
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Paul Yang
> Attachments: HIVE-236.1.patch, HIVE-236.2.patch, HIVE-236.3.patch
>
>
> The current behavior is that the regexp needs to match the whole string.
> But from mysql: ( 
> http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp )
> mysql> SELECT 'fofo' REGEXP '^fo';  -> 1
> We need to make it work the same way as MySQL.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-865) mapjoin: memory leak for same key with very large number of values

2009-10-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762328#action_12762328
 ] 

Namit Jain commented on HIVE-865:
-

Can you fix the comments etc. which you added for debugging ?


+// turn off caching to see if it causes the mem leak



> mapjoin: memory leak for same key with very large number of values
> --
>
> Key: HIVE-865
> URL: https://issues.apache.org/jira/browse/HIVE-865
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-865.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-865) mapjoin: memory leak for same key with very large number of values

2009-10-05 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-865:


Attachment: HIVE-865.patch

This patch added solves the Out-of-memory exception thrown by MapJoinOperator. 
The changes include:
1) make the persistent hashtable HTree non-transactional. (this is not the 
cause of out-of-memory error, but it may improve performance)
2) commit the RecordManager for every 100 record insertion. (this is the main 
cause of OOM issue).

> mapjoin: memory leak for same key with very large number of values
> --
>
> Key: HIVE-865
> URL: https://issues.apache.org/jira/browse/HIVE-865
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-865.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-236) RLIKE/REGEXP should allow matching partial strings

2009-10-05 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762314#action_12762314
 ] 

Zheng Shao commented on HIVE-236:
-

@HIVE-236.2.patch:

The warning should not be outputted for every call - we had seen problems with 
too many repeated log messages that filled up the log file.
Can you add a boolean variable "warned", so that we only warn once?

Also, it may help if you put all the test cases in a single query. That helps 
to reduce the time of running tests.


> RLIKE/REGEXP should allow matching partial strings
> --
>
> Key: HIVE-236
> URL: https://issues.apache.org/jira/browse/HIVE-236
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Paul Yang
> Attachments: HIVE-236.1.patch, HIVE-236.2.patch
>
>
> The current behavior is that the regexp needs to match the whole string.
> But from mysql: ( 
> http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp )
> mysql> SELECT 'fofo' REGEXP '^fo';  -> 1
> We need to make it work the same way as MySQL.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-236) RLIKE/REGEXP should allow matching partial strings

2009-10-05 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-236:
---

Attachment: HIVE-236.2.patch

Updated udf1.q.out
Added logging for empty regexp

> RLIKE/REGEXP should allow matching partial strings
> --
>
> Key: HIVE-236
> URL: https://issues.apache.org/jira/browse/HIVE-236
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Paul Yang
> Attachments: HIVE-236.1.patch, HIVE-236.2.patch
>
>
> The current behavior is that the regexp needs to match the whole string.
> But from mysql: ( 
> http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp )
> mysql> SELECT 'fofo' REGEXP '^fo';  -> 1
> We need to make it work the same way as MySQL.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-855) UDF: Concat should accept multiple arguments

2009-10-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao resolved HIVE-855.
-

 Tags: udf
   Resolution: Fixed
Fix Version/s: 0.5.0
 Release Note: HIVE-855. UDF: Concat should accept multiple arguments. 
(Paul Yang via zshao)
 Hadoop Flags: [Reviewed]

Committed. Thanks Paul!

> UDF: Concat should accept multiple arguments
> 
>
> Key: HIVE-855
> URL: https://issues.apache.org/jira/browse/HIVE-855
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Zheng Shao
>Assignee: Paul Yang
> Fix For: 0.5.0
>
> Attachments: HIVE-855.1.patch, HIVE-855.2.patch, HIVE-855.3.patch, 
> HIVE-855.4.patch, HIVE-855.5.patch
>
>
> According to mysql, concat should accept multiple arguments.
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.