date:20100226

HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
trowing IOExcpetion


 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.0, 0.4.1, 0.4.0
Reporter: Vladimir Klimontovich
 Fix For: 0.4.2, 0.5.1, 0.6.0


To fix this it's simply needed to add second parameter to IOException 
constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion


 [ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Klimontovich updated HIVE-1203:


Attachment: 0.4.patch
0.5.patch

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0, 0.4.1, 0.5.0
Reporter: Vladimir Klimontovich
 Fix For: 0.4.2, 0.5.1, 0.6.0

 Attachments: 0.4.patch, 0.5.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion


 [ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Klimontovich updated HIVE-1203:


Attachment: trunk.patch

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0, 0.4.1, 0.5.0
Reporter: Vladimir Klimontovich
 Fix For: 0.4.2, 0.5.1, 0.6.0

 Attachments: 0.4.patch, 0.5.patch, trunk.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1193) ensure sorting properties for a table

2010-02-26 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838914#action_12838914
 ] 

Edward Capriolo commented on HIVE-1193:
---

Also how can the optimizer take advantage of this? If we know data is sorted we 
could do some aggressive pruning (if we know offsets) and short circuiting for 
some where conditions.

 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1193.1.patch


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1202) Unknown exception : null while join


[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838946#action_12838946
 ] 

He Yongqiang commented on HIVE-1202:


Actually hive does not support this kind of join. It only support equal join. 
please try sth like this:
select a.name, b.* from classes a join classes b on a.name = b.number  where 
a.name  b.number

 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1

 Attachments: HIVE-1202.branch-0.4.1.patch


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion


[ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838953#action_12838953
 ] 

He Yongqiang commented on HIVE-1203:


Vladimir, in this case, do we really need a stack trace? It is mostly caused by 
ClassNotFound etc when creating an input format instance with the class name.
Is there some error that can only be found from the stack trace?

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0, 0.4.1, 0.5.0
Reporter: Vladimir Klimontovich
 Fix For: 0.4.2, 0.5.1, 0.6.0

 Attachments: 0.4.patch, 0.5.patch, trunk.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1193) ensure sorting properties for a table


[ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838958#action_12838958
 ] 

He Yongqiang commented on HIVE-1193:


@Zheng,
1. How do we make sure that the data is bucketed / sorted? By adding an 
additional map-reduce job?
Yes. 
2. What if the user already specified CLUSTER BY key in his query?
As 1, there will be a new job added which will redistribute the data. 
If the user specify a cluster by column different than the table's sort and 
bucket property, we maybe should let it fail. But right now that cluster by is 
actually ignored.
3. Do we disable merging of small files when we do this?
Yes. We should disable it. we should disable it when enabled enforceBucketing 
or enforceSorting


 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1193.1.patch


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion


[ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838960#action_12838960
 ] 

Vladimir Klimontovich commented on HIVE-1203:
-

I think so. Exception could be thrown not only from Class.forName, but also 
from Class.newInstance. Actually it was my case as due to incorrect settings 
constructor of my InputFormat was throwing an exception (and this exception was 
being swallowed).

Also, it's a general rule in most java projects (to add exception as a cause 
esception). More information in stacktrace never hurts :) 
(Although I'm not sure that Hive should follow this rule)

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0, 0.4.1, 0.5.0
Reporter: Vladimir Klimontovich
 Fix For: 0.4.2, 0.5.1, 0.6.0

 Attachments: 0.4.patch, 0.5.patch, trunk.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file

[
https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838959#action_12838959
]

Namit Jain commented on HIVE-1197:
--

Currently, the split that a mapper processes is determined by a variety of
parameters, including the dfs block size, min split size etc.

It might be useful to have an option when the users wants a mapper so scan 1
file. This will be specially useful for sort-merge join.
If the data is partitioned into various buckets, and each bucket us sorted, the
sort merge join can join the different buckets together.

For example, consider the following scenario:

table T1: sorted and bucketed by column 'key' into 1000 buckets
table T2: sorted and bucketed by column 'key' into 1000 buckets

and the query:

select * from T1 join T2 on key
mapjoin.

Instead of joining the table T1 with T2, the 1000 buckets can be joined with
each other individually.
Since the data is sorted on the join key, sort-merge join can be used.
Say the buckets are named: b0001, b0002 .. b1000
Say table T1 is the big table, and the buckets from T2 are being read as part
of the mapper which is spawned to process T1,
under the current approach, it will be very difficult to perform outer joins.

For example, if bucket b1 for T1 contains:

1
2
5
6
9
16
22
30

and the corresponding bucket for T2 contains:

2
4
8

If there are 2 mappers for bucket b1 for T1, processing 4 records each
((1,2,5,6) and (9.16.22.30) respectively.
It will be very difficult to perform a outer join. The mapper will need to peek
into the previous record
and the next record respectively.

Moreover, it will be very difficult to ensure that the result also has 1000
buckets. Another map-reduce job
will be needed for the same.

This can be easily solved if we are guaranteed that the whole bucket (or the
file corresponding to the bucket),
will be processed by a single mapper.

create a new input format where a mapper spans a file
-

Key: HIVE-1197
URL: https://issues.apache.org/jira/browse/HIVE-1197
Project: Hadoop Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
Fix For: 0.6.0

This will be needed for Sort merge joins.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion


[ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838968#action_12838968
 ] 

He Yongqiang commented on HIVE-1203:


Thanks for the explanation.  Yes, there is no harm to do that.
I will test and commit it.

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0, 0.4.1, 0.5.0
Reporter: Vladimir Klimontovich
 Fix For: 0.4.2, 0.5.1, 0.6.0

 Attachments: 0.4.patch, 0.5.patch, trunk.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1193) ensure sorting properties for a table

[
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839004#action_12839004
]

Namit Jain commented on HIVE-1193:
--

There are 2 different jiras: one for ensuring the bucketing properties and one
for ensuring the sorted properties.

Currently, even though the tables are sorted and bucketed during the table
creation, they are not enforced.
It is up to the user to make sure the data is bucketed/sorted appropriately
while loading.
Since it is not enforced, the optimizer cannot take advantage of that because
it doesnt know whether the data is actually sorted.

There was a jira previously, which took advantage of the fact that the data is
sorted for processing for group by.
This is controlled by configurable parameters.

Going forward, we want to use them for joining, specifically for sort merge
joins.

@Edward, currently we are not doing skipping based on sorting properties.

Currently, we create an additional map-reduce job for bucketing/sorting.
Even if there is a cluster by, and the data is already bucketed/sorted by the
correct key, we dont use that. There
will be another map-reduce job. This can be optimized in future.

Merging of map-only jobs is disabled, but same thing should be performed for
map-reduce jobs also. I will file a follow-up
jira on that.

ensure sorting properties for a table
-

Key: HIVE-1193
URL: https://issues.apache.org/jira/browse/HIVE-1193
Project: Hadoop Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
Fix For: 0.6.0

Attachments: hive.1193.1.patch

If a table is sorted, and data is being inserted into that - currently, we
dont make sure that data is sorted. That might be useful some downstream
operations.
This cannot be made the default due to backward compatibility, but an option
can be added for the same

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Hive User Group Meeting 3/18/2010 7pm at Facebook

2010-02-26 Thread Zheng Shao

Hi all,

We are going to hold the second Hive User Group Meeting at 7PM on
3/18/2010 Thursday.

The agenda will be:

* Hive Tutorial: 20 min
* Hive User Case Study: 20 min
* New Features and API: 25 min
 JDBC/ODBC and CTAS
 UDF/UDAF/UDTF
 Create View/HBaseInputFormat
 Hive Join Strategy
 SerDe

The audience is beginner to intermediate Hive users/developers.

*** The details are here: http://www.facebook.com/event.php?eid=319237846974 ***
*** Please RSVP so we can schedule logistics accordingly. ***

-- 
Yours,
Zheng

[jira] Commented: (HIVE-801) row-wise IN would be useful

2010-02-26 Thread Adam Kramer (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839052#action_12839052
 ] 

Adam Kramer commented on HIVE-801:
--

Also note that the true utility of this is syntax like
WHERE a.foo IN (b.*)

...for instances where b has many many columns and it is messy to articulate 
them. I'm thinking about a current table I have with 800 columns...is there a 
limit on the character-wise length of a query?

 row-wise IN would be useful
 ---

 Key: HIVE-801
 URL: https://issues.apache.org/jira/browse/HIVE-801
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Adam Kramer

 SELECT * FROM tablename t
 WHERE IN(12345,key1,key2,key3);
 ...IN would operate on a given row, and return True when the first argument 
 equaled at least one of the other arguments. So here IN would return true if 
 12345=key1 OR 12345=key2 OR 12345=key3 (but wouldn't test the latter two if 
 the first matched).
 This would also help with https://issues.apache.org/jira/browse/HIVE-783, if 
 IN were implemented in a manner that allows it to be used in an ON clause.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function


 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HIVE-259:
---

Attachment: (was: Percentile.xlsx)

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function


 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HIVE-259:
---

Attachment: Percentile.xlsx

Percentiles that match included test case

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function


 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HIVE-259:
---

Attachment: HIVE-259-3.patch

- use Double instead of Integer for percentile so we can ask for 99.999 
percentile 
- checkstyle fix except State object
- new test case


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function


 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HIVE-259:
---

Status: Open  (was: Patch Available)

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function


 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HIVE-259:
---

Status: Patch Available  (was: Open)

HIVE-259-3.patch

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1204) typedbytes: writing to stderr kills the mapper

typedbytes: writing to stderr kills the mapper
--

 Key: HIVE-1204
 URL: https://issues.apache.org/jira/browse/HIVE-1204
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1204) typedbytes: writing to stderr kills the mapper