date:20081214

[jira] Updated: (HIVE-165) var(col) built-in to go with avg(col) and count(col)

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-165:
---

Component/s: Query Processor

Adding to Query Processor component.

 var(col) built-in to go with avg(col) and count(col)
 

 Key: HIVE-165
 URL: https://issues.apache.org/jira/browse/HIVE-165
 Project: Hadoop Hive
  Issue Type: Wish
  Components: Query Processor
Reporter: Adam Kramer
Assignee: David Phillips
Priority: Minor

 The last step in the unholy triumvirate of statistical built-ins is the 
 variance. We already have the n (count) and the mean (avg). I currently have 
 a job or two that filters all of the data into a single reducer which just 
 computes mean/n/variance and writes it to a table...so my guess is that this 
 would be a pretty big speed increase. Not a huge deal though, as computing 
 the variance myself is trivial.
 (Average, variance, and n can be co-computed in one pass, so if you're doing 
 var() you can basically have avg() and count() for free.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-160) sampling in a subquery is broken

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-160:
---

Component/s: Query Processor

Adding to Query Processor component.

 sampling in a subquery is broken
 

 Key: HIVE-160
 URL: https://issues.apache.org/jira/browse/HIVE-160
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Venky Iyer



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-167) Hive: add a RegularExpressionDeserializer

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-167:
---

Component/s: Serializers/Deserializers

Adding to Serializers/Deserializers component.

 Hive: add a RegularExpressionDeserializer
 -

 Key: HIVE-167
 URL: https://issues.apache.org/jira/browse/HIVE-167
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Zheng Shao

 We need a RegularExpressionDeserializer to read data based on a regex.  This 
 will be very useful for reading files like apache log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-161) for list column x that is sometimes null, select x.y will cause a nullpointerexception

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-161:
---

Component/s: Query Processor

Adding to Query Processor component.

 for list column x that is sometimes null, select x.y will cause a 
 nullpointerexception
 --

 Key: HIVE-161
 URL: https://issues.apache.org/jira/browse/HIVE-161
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Zheng Shao



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-137) Create tests for new date functions

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-137:
---

Component/s: Testing Infrastructure

Adding to Testing Infrastructure component.

 Create tests for new date functions
 ---

 Key: HIVE-137
 URL: https://issues.apache.org/jira/browse/HIVE-137
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: David Phillips

 Validate that the date functions actually work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-135) need more accurate way of tracking memory consumption on map side aggregates

2008-12-14 Thread Jeff Hammerbacher (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeff Hammerbacher updated HIVE-135:
---

Component/s: Query Processor

Adding to Query Processor component.

need more accurate way of tracking memory consumption on map side aggregates

Key: HIVE-135
URL: https://issues.apache.org/jira/browse/HIVE-135
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Reporter: Joydeep Sen Sarma

from email thread:
Just trying it out - I am confused by one thing:

hive set hive.map.aggr=true;
set hive.map.aggr=true;
hive explain from mytable u insert overwrite directory
'/user/jssarma/tmp_agg' select u.a, avg(size(u.b)) group by u.a;
everything looks good. Now I submit this query and this is what I see on the
tracker:
Map input records 87,912,961 0 87,912,961
Map output records 87,912,960 0 87,912,960
This doesn't make sense. With map-side aggregates - we should be getting
vastly reduced number of rows emitted from mapper.
I am wondering whether we should rethink our flushing logic. The freeMemory()
call is not reliable (since it doesn't account for stuff that's not cleaned
out by GC). Perhaps we should switch to an explicit setting for amount of
memory for hash tables (we do know the size of each hash table entry and
overall size and should be able to guess reasonably). From what Dhruba
reported - there's no way to call the garbage collector and wait for it to
complete (to get a more accurate report of free memory). so the whole route
of obtaining free memory seems a little hosed.
by way of comparison - hadoop also estimates memory usage in sorting. there -
the sort run is just stored in a sequential stream and it just takes the size
of the stream and compares it to max allowed sort memory usage (which is a
configuration option)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-136) SerDe should escape some special characters

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-136:
---

Component/s: Serializers/Deserializers

Adding to Serializers/Deserializers component.

 SerDe should escape some special characters
 ---

 Key: HIVE-136
 URL: https://issues.apache.org/jira/browse/HIVE-136
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Zheng Shao

 MetadataTypedColumnsetSerDe and DynamicSerDe should escape some special 
 characters like '\n' or the column/item/key separator.
 Otherwise the data will look corrupted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-133) Add support for RecordIO in serde2

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-133:
---

Component/s: Serializers/Deserializers

Adding to Serializers/Deserializers component.

 Add support for RecordIO in serde2
 --

 Key: HIVE-133
 URL: https://issues.apache.org/jira/browse/HIVE-133
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.19.0
Reporter: Johan Oskarsson
 Fix For: 0.19.0


 Currently there is no support for Hadoop's RecordIO (also known as Jute) in 
 Hive's new serde version 2.
 I believe quite a few Hadoop installations are using SequenceFiles with keys 
 and values in a combination of normal Writables and generated RecordIO 
 classes.
 This issue needs to cover the following points (as suggested by Joydeep Sen 
 Sarma):
 - traditionally our serde's have ignored the keys altogether (the row is 
 embedded in the value).
 - the jute code was written for an older version of the serde interface and 
 needs to be ported to the new interface

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-124) aggregation on empty table should still return 1 row

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-124:
---

Component/s: Query Processor

Adding to Query Processor component.

 aggregation on empty table should still return 1 row
 

 Key: HIVE-124
 URL: https://issues.apache.org/jira/browse/HIVE-124
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Zheng Shao

 The query  SELECT COUNT(1) FROM f_status_update fsu WHERE FALSE should 
 return a single row with value 0.
 Our code treat that query as SELECT 1, COUNT(1) FROM f_status_update fsu 
 WHERE FALSE GROUP BY 1, but these 2 queries are not equivalent because the 
 second query will return empty result if the input is empty.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-125) Hive CLI should allow ; inside quotes

2008-12-14 Thread Jeff Hammerbacher (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated HIVE-125:
---

Component/s: Clients

Adding to Clients component.

 Hive CLI should allow ; inside quotes
 -

 Key: HIVE-125
 URL: https://issues.apache.org/jira/browse/HIVE-125
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Zheng Shao

 Now Hive CLI breaks the command line whenever it sees a ; even inside 
 quotes. This prevents users to input ; in string literals or scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-165) var(col) built-in to go with avg(col) and count(col)

[jira] Updated: (HIVE-160) sampling in a subquery is broken

[jira] Updated: (HIVE-167) Hive: add a RegularExpressionDeserializer

[jira] Updated: (HIVE-161) for list column x that is sometimes null, select x.y will cause a nullpointerexception

[jira] Updated: (HIVE-137) Create tests for new date functions

[jira] Updated: (HIVE-135) need more accurate way of tracking memory consumption on map side aggregates

[jira] Updated: (HIVE-136) SerDe should escape some special characters

[jira] Updated: (HIVE-133) Add support for RecordIO in serde2

[jira] Updated: (HIVE-124) aggregation on empty table should still return 1 row

[jira] Updated: (HIVE-125) Hive CLI should allow ; inside quotes

10 matches

Site Navigation

Mail list logo

Footer information