[jira] Updated: (HIVE-1658) Fix describe [extended] column formatting

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-1658:
---

Attachment: HIVE-1658-PrelimPatch.patch

Preliminary patch on the above mentioned approach - felt this one to be easier. 
Comments welcome.

The code needs to be reorganized and cleaned, but I wanted to upload patch 
before I sign off for the day. Will proceed with test cases on confirmation of 
the approach.

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-1658-PrelimPatch.patch
>
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916992#action_12916992
 ] 

Thiruvel Thirumoolan commented on HIVE-1658:


Patch under works.

Changes:

1. 'describe' & 'describe extended' outputs will be the same as pre HIVE-558.
2. 'describe formatted' will use the new format for displaying columns and 
additional information.

Will implement the changes similar to how extended is implemented, using a 
boolean in DescTableDesc to denote the formatted keyword and formatting the 
output in DDLTask.describeTable.

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1669) non-deterministic display of storage parameter in test

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-1669:
---

Attachment: HIVE-1669.patch

Just sorting the param keys before displaying them. Test outputs not updated 
yet, will do so along with HIVE-1658.

> non-deterministic display of storage parameter in test
> --
>
> Key: HIVE-1669
> URL: https://issues.apache.org/jira/browse/HIVE-1669
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
> Attachments: HIVE-1669.patch
>
>
> With the change to beautify the 'desc extended table', the storage parameters 
> are displayed in non-deterministic manner (since its implementation is 
> HashMap). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1452) Mapside join on non partitioned table with partitioned table causes error

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916842#action_12916842
 ] 

Thiruvel Thirumoolan commented on HIVE-1452:


> However, I see different results when MAPJOIN is used. Will open another JIRA 
> for the same.

Have opened HIVE-1682 for the same.

> Mapside join on non partitioned table with partitioned table causes error
> -
>
> Key: HIVE-1452
> URL: https://issues.apache.org/jira/browse/HIVE-1452
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Thiruvel Thirumoolan
>
> I am running script which contains two tables, one is dynamically partitioned 
> and stored as RCFormat and the other is stored as TXT file.
> The TXT file has around 397MB in size and has around 24million rows.
> {code}
> drop table joinquery;
> create external table joinquery (
>   id string,
>   type string,
>   sec string,
>   num string,
>   url string,
>   cost string,
>   listinfo array >
> ) 
> STORED AS TEXTFILE
> LOCATION '/projects/joinquery';
> CREATE EXTERNAL TABLE idtable20mil(
> id string
> )
> STORED AS TEXTFILE
> LOCATION '/projects/idtable20mil';
> insert overwrite table joinquery
>select 
>   /*+ MAPJOIN(idtable20mil) */
>   rctable.id,
>   rctable.type,
>   rctable.map['sec'],
>   rctable.map['num'],
>   rctable.map['url'],
>   rctable.map['cost'],
>   rctable.listinfo
> from rctable
> JOIN  idtable20mil on (rctable.id = idtable20mil.id)
> where
> rctable.id is not null and
> rctable.part='value' and
> rctable.subpart='value'and
> rctable.pty='100' and
> rctable.uniqid='1000'
> order by id;
> {code}
> Result:
> Possible error:
>   Data file split:string,part:string,subpart:string,subsubpart:string> is 
> corrupted.
> Solution:
>   Replace file. i.e. by re-running the query that produced the source table / 
> partition.
> -
> If I look at mapper logs.
> {verbatim}
> Caused by: java.io.IOException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:109)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashBucket.readExternal(HashBucket.java:284)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.Serialization.deserialize(Serialization.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.DefaultSerializer.deserialize(DefaultSerializer.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:360)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:332)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashDirectory.get(HashDirectory.java:195)
>   at org.apache.hadoop.hive.ql.util.jdbm.htree.HTree.get(HTree.java:155)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.get(HashMapWrapper.java:114)
>   ... 11 more
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
>   at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
>   at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:98)
> {verbatim}
> I am trying to create a testcase, which can demonstrate this error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1682) Wrong results with MAPJOIN when cols from non-MAPJOINed table are selected

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)
Wrong results with MAPJOIN when cols from non-MAPJOINed table are selected
--

 Key: HIVE-1682
 URL: https://issues.apache.org/jira/browse/HIVE-1682
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
 Environment: Hive trunk (rev 1003407)
Hadoop 20.2
Reporter: Thiruvel Thirumoolan


Results of this query is wrong:

set hive.mapjoin.cache.numrows=100;
select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar 
= invites.bar);

Results of all the queries below match:

/* This is the same as problematic query without specifying numrows - which 
defaults to 25k much greater than the number of rows in pokes table */
select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar 
= invites.bar)

set hive.mapjoin.cache.numrows=100;
select /*+ MAPJOIN(invites) */ invites.bar from pokes join invites on 
(pokes.bar = invites.bar);

select invites.bar from pokes join invites on (pokes.bar = invites.bar);

select pokes.bar from pokes join invites on (pokes.bar = invites.bar);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1452) Mapside join on non partitioned table with partitioned table causes error

2010-10-01 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan resolved HIVE-1452.


Resolution: Duplicate

HIVE-1670 fixes the EOF issue and I dont see the problem with the queries I 
used above. Hence closing this one.

However, I see different results when MAPJOIN is used. Will open another JIRA 
for the same.

> Mapside join on non partitioned table with partitioned table causes error
> -
>
> Key: HIVE-1452
> URL: https://issues.apache.org/jira/browse/HIVE-1452
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Thiruvel Thirumoolan
>
> I am running script which contains two tables, one is dynamically partitioned 
> and stored as RCFormat and the other is stored as TXT file.
> The TXT file has around 397MB in size and has around 24million rows.
> {code}
> drop table joinquery;
> create external table joinquery (
>   id string,
>   type string,
>   sec string,
>   num string,
>   url string,
>   cost string,
>   listinfo array >
> ) 
> STORED AS TEXTFILE
> LOCATION '/projects/joinquery';
> CREATE EXTERNAL TABLE idtable20mil(
> id string
> )
> STORED AS TEXTFILE
> LOCATION '/projects/idtable20mil';
> insert overwrite table joinquery
>select 
>   /*+ MAPJOIN(idtable20mil) */
>   rctable.id,
>   rctable.type,
>   rctable.map['sec'],
>   rctable.map['num'],
>   rctable.map['url'],
>   rctable.map['cost'],
>   rctable.listinfo
> from rctable
> JOIN  idtable20mil on (rctable.id = idtable20mil.id)
> where
> rctable.id is not null and
> rctable.part='value' and
> rctable.subpart='value'and
> rctable.pty='100' and
> rctable.uniqid='1000'
> order by id;
> {code}
> Result:
> Possible error:
>   Data file split:string,part:string,subpart:string,subsubpart:string> is 
> corrupted.
> Solution:
>   Replace file. i.e. by re-running the query that produced the source table / 
> partition.
> -
> If I look at mapper logs.
> {verbatim}
> Caused by: java.io.IOException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:109)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashBucket.readExternal(HashBucket.java:284)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.Serialization.deserialize(Serialization.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.DefaultSerializer.deserialize(DefaultSerializer.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:360)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:332)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashDirectory.get(HashDirectory.java:195)
>   at org.apache.hadoop.hive.ql.util.jdbm.htree.HTree.get(HTree.java:155)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.get(HashMapWrapper.java:114)
>   ... 11 more
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
>   at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
>   at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:98)
> {verbatim}
> I am trying to create a testcase, which can demonstrate this error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-30 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916513#action_12916513
 ] 

Thiruvel Thirumoolan commented on HIVE-1658:


Sorry folks, was out sick for more than a week. Will upload a patch tomorrow.

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-21 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913043#action_12913043
 ] 

Thiruvel Thirumoolan commented on HIVE-1658:


ok, will revert back the formatting of the columns (partitions also) and will 
leave the rest of the changes as is. That would mean the headers also will go 
away, the formatting doesnt look good with them and tab alone as separator.

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-558) describe extended table/partition output is cryptic

2010-09-20 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-558:
--

Attachment: HIVE-558.4.patch

Updated tests and re-ran the whole test suite - against 20 and 17. All tests 
pass, let me know if you see any problems.

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.3.patch, HIVE-558.4.patch, HIVE-558.patch, 
> HIVE-558.patch, HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-558) describe extended table/partition output is cryptic

2010-09-17 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-558:
--

Attachment: HIVE-558.3.patch

Changelog:

MetaDataFormatUtils: fixed a bug as Location is null for views & added 
copyright information
DDLSemanticAnalyzer: Use DescTableDesc.getSchema() as the method is static now.
QTestUtil: Additional tags added
Test case outputs updated (incl outputs for 0.17)

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.3.patch, HIVE-558.patch, HIVE-558.patch, 
> HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (HIVE-558) describe extended table/partition output is cryptic

2010-09-03 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-558 started by Thiruvel Thirumoolan.

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.patch, HIVE-558.patch, 
> HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-558) describe extended table/partition output is cryptic

2010-09-03 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-558:
--

Attachment: HIVE-558.patch

Thanks Paul, attaching patch with bug fixed.

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.patch, HIVE-558.patch, 
> HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-558) describe extended table/partition output is cryptic

2010-08-25 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-558:
--

Attachment: HIVE-558.patch

Uploading revised patch, testing all possible cases.

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
> Attachments: HIVE-558.patch, HIVE-558_PrelimPatch.patch, 
> SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-08-25 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902659#action_12902659
 ] 

Thiruvel Thirumoolan commented on HIVE-558:
---

Thanks Namit, I confused obtaining data for those corresponding fields also, 
which is when I think it would be truly independent. But I guess its safe to 
assume the ordering of elements in DescTableDesc's schema wont change or more 
elements added/removed.

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
> Attachments: HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1452) Mapside join on non partitioned table with partitioned table causes error

2010-08-24 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901801#action_12901801
 ] 

Thiruvel Thirumoolan commented on HIVE-1452:


I am able to reproduce the problem with a smaller dataset (from the examples). 

In a MAPJOIN, when no columns are selected from the MAPJOINed table and if 
hive.mapjoin.cache.numrows < rowcount(MAPJOINed table), one can simulate the 
problem Viraj saw.

set hive.mapjoin.cache.numrows=100;
select /*+ MAPJOIN(invites) */ pokes.bar from pokes join invites on (pokes.bar 
= invites.bar);

The same happens when I MAPJOIN with pokes and select columns from invites. So 
this doesn't have to do with partitioning.

However, when I increase hive.mapjoin.cache.numrows to 500 OR select 
invites.bar instead, I am not able to reproduce the problem.

Hive - Running from trunk, may be a day old.
Hadoop - 20.2

> Mapside join on non partitioned table with partitioned table causes error
> -
>
> Key: HIVE-1452
> URL: https://issues.apache.org/jira/browse/HIVE-1452
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Thiruvel Thirumoolan
>
> I am running script which contains two tables, one is dynamically partitioned 
> and stored as RCFormat and the other is stored as TXT file.
> The TXT file has around 397MB in size and has around 24million rows.
> {code}
> drop table joinquery;
> create external table joinquery (
>   id string,
>   type string,
>   sec string,
>   num string,
>   url string,
>   cost string,
>   listinfo array >
> ) 
> STORED AS TEXTFILE
> LOCATION '/projects/joinquery';
> CREATE EXTERNAL TABLE idtable20mil(
> id string
> )
> STORED AS TEXTFILE
> LOCATION '/projects/idtable20mil';
> insert overwrite table joinquery
>select 
>   /*+ MAPJOIN(idtable20mil) */
>   rctable.id,
>   rctable.type,
>   rctable.map['sec'],
>   rctable.map['num'],
>   rctable.map['url'],
>   rctable.map['cost'],
>   rctable.listinfo
> from rctable
> JOIN  idtable20mil on (rctable.id = idtable20mil.id)
> where
> rctable.id is not null and
> rctable.part='value' and
> rctable.subpart='value'and
> rctable.pty='100' and
> rctable.uniqid='1000'
> order by id;
> {code}
> Result:
> Possible error:
>   Data file split:string,part:string,subpart:string,subsubpart:string> is 
> corrupted.
> Solution:
>   Replace file. i.e. by re-running the query that produced the source table / 
> partition.
> -
> If I look at mapper logs.
> {verbatim}
> Caused by: java.io.IOException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:109)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashBucket.readExternal(HashBucket.java:284)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1792)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1751)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.Serialization.deserialize(Serialization.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.helper.DefaultSerializer.deserialize(DefaultSerializer.java:106)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:360)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.recman.BaseRecordManager.fetch(BaseRecordManager.java:332)
>   at 
> org.apache.hadoop.hive.ql.util.jdbm.htree.HashDirectory.get(HashDirectory.java:195)
>   at org.apache.hadoop.hive.ql.util.jdbm.htree.HTree.get(HTree.java:155)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.get(HashMapWrapper.java:114)
>   ... 11 more
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:375)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2776)
>   at java.io.ObjectInputStream.readInt(ObjectInputStream.java:950)
>   at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.readExternal(MapJoinObjectValue.java:98)
> {verbatim}
> I am trying to create a testcase, which ca

[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-08-23 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901599#action_12901599
 ] 

Thiruvel Thirumoolan commented on HIVE-558:
---

Thanks for the feedback Namit.

Looks like I can reduce 'some' of the direct calls by getting properties 
through Table.getSchema() and looking up for the properties defined in 
Constants class. If something changes in future only MetaStoreUtils.getSchema() 
would need change. That way I can also get the column names and types.

But I am not sure how to obtain the comments of the fields through this way or 
through the way you suggested (couldnt find a related usage), any pointers 
would be appreciated.

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
> Attachments: HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1583) Hive should not override Hadoop specific system properties

2010-08-23 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-1583:
---

Attachment: HIVE-1583.patch

Attaching patch.

What users set on the shell for CLASSPATH or HADOOP_CLASSPATH will be picked up 
first. I guess thats OK.

> Hive should not override Hadoop specific system properties
> --
>
> Key: HIVE-1583
> URL: https://issues.apache.org/jira/browse/HIVE-1583
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Amareshwari Sriramadasu
> Attachments: HIVE-1583.patch
>
>
> Currently Hive overrides Hadoop specific system properties such as 
> HADOOP_CLASSPATH.
> It does the following in bin/hive script :
> {code}
> # pass classpath to hadoop
> export HADOOP_CLASSPATH=${CLASSPATH}
> {code}
> Instead, It should honor the value of HADOOP_CLASSPATH set by client by 
> appending CLASSPATH to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1583) Hive should not override Hadoop specific system properties

2010-08-23 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901516#action_12901516
 ] 

Thiruvel Thirumoolan commented on HIVE-1583:


Same is the case with CLASSPATH, its being reset.

CLASSPATH="${HIVE_CONF_DIR}"

> Hive should not override Hadoop specific system properties
> --
>
> Key: HIVE-1583
> URL: https://issues.apache.org/jira/browse/HIVE-1583
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Amareshwari Sriramadasu
>
> Currently Hive overrides Hadoop specific system properties such as 
> HADOOP_CLASSPATH.
> It does the following in bin/hive script :
> {code}
> # pass classpath to hadoop
> export HADOOP_CLASSPATH=${CLASSPATH}
> {code}
> Instead, It should honor the value of HADOOP_CLASSPATH set by client by 
> appending CLASSPATH to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-558) describe extended table/partition output is cryptic

2010-08-16 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-558:
--

Attachment: HIVE-558_PrelimPatch.patch
SampleOutputDescribe.txt

Have a initial patch. Wrote a MetaDataFormatUtils class which formats table 
information. This seemed better than the earlier approach I had. The output on 
a describe & describe extended are attached.

Please let me know your thoughts on the approach and suggestions if any.

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
> Attachments: HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1519) Insertion should throw an error when partition order is different than create table

2010-08-09 Thread Thiruvel Thirumoolan (JIRA)
Insertion should throw an error when partition order is different than create 
table
---

 Key: HIVE-1519
 URL: https://issues.apache.org/jira/browse/HIVE-1519
 Project: Hadoop Hive
  Issue Type: Bug
  Components: CLI
Reporter: Thiruvel Thirumoolan
 Fix For: 0.7.0


Hive should throw an error when the partition order specified during insert is 
different from the order specified during table creation.

Currently hive allows data insertion but further query of data doesn't return 
any result.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-08-03 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895002#action_12895002
 ] 

Thiruvel Thirumoolan commented on HIVE-558:
---

Can I pick this one up if someone else isnt working on it? 

I would approach it by writing a Utils function to parse and display output of 
metastore/src/gen-javabean/*Table.toString() in a proper format. Or is there a 
Thrift helper that gets this done easier?

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.