FileFormat

2010-10-06 Thread gaurav jain


Hi,

insert overwrite directory "$dir" select * from xxx;

creates files of type attempt_201008201925_165088_r_00_0.gz




insert overwrite table "$table" select * from xxx;

creates file of type attempt_201008201925_165088_r_00_0



How can I configure "insert overwrite directory" to producesequence files ( non 
.gz )





Regards,
Gaurav Jain


  


FileFormat

2010-10-05 Thread gaurav jain
Hi,

insert overwrite directory "$dir" select * from xxx;

creates files of type attempt_201008201925_165088_r_00_0.gz




insert overwrite table "$table" select * from xxx;

creates file of type attempt_201008201925_165088_r_00_0



How can I configure "insert overwrite directory" to producesequence files ( non 
.gz )





Regards,
Gaurav Jain


  


[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898369#action_12898369
 ] 

John Sichi commented on HIVE-1514:
--

Yongqiang, for reference doc updates, remember to add a phrase like "(Note:  
only available starting with 0.7.0)" so that users of earlier Hive versions 
know they need to upgrade if they want the feature.

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898350#action_12898350
 ] 

He Yongqiang commented on HIVE-1514:


I updated the wiki page here :
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Alter_Table.2BAC8-Partition_Location

This only change the metadata. With this patch, you will be able to let the 
partition point to some external places, and use a new fileformat. If the 
metadata you specified is correct, you will be able to do that.

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898337#action_12898337
 ] 

Ashutosh Chauhan commented on HIVE-1514:


>From jira description and discussions, its not clear to me what changes went 
>in here.
It will be useful to summarize the use case which this jira satisfies. From 
cursory look of the patch, it seems following is now possible to do
{code}
ALTER TABLE table_name [partitionSpec] SET LOCATION "new location" set 
fileformat rcfile
{code}
 or some such. If so, is the use case the following: user created some data for 
a existing hive table externally (meaning through some process outside of hive) 
and now wants to use it to query from hive. So, she needs to do metadata 
operation as above (which is now enabled through this patch) ?

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-10 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1514.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Yongqiang

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-10 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896979#action_12896979
 ] 

Namit Jain commented on HIVE-1514:
--

+1

will commit if the tests pass

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-10 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1514:
---

Attachment: hive-1514.3.patch

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-09 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896777#action_12896777
 ] 

Namit Jain commented on HIVE-1514:
--

+new String[] { "ALTERTABLE_FILEFORMAT", "ALTERPARTITION_FILEFORMAR" });


There is a spelling mistake - should be 
new String[] { "ALTERTABLE_FILEFORMAT", "ALTERPARTITION_FILEFORMAT" });


will result in changing a few log files.

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-09 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1514:
-

Status: Open  (was: Patch Available)

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-08 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1514:
---

Attachment: hive-1514.2.patch

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-05 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1514:
---

Status: Patch Available  (was: Open)

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-05 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1514:
---

Attachment: hive-1514.1.patch

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-05 Thread He Yongqiang (JIRA)
Be able to modify a partition's fileformat and file location information.
-

 Key: HIVE-1514
 URL: https://issues.apache.org/jira/browse/HIVE-1514
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1448) bug in 'set fileformat'

2010-07-02 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1448:
---

Status: Resolved  (was: Patch Available)
Resolution: Won't Fix

Zheng pointed out offline that there is already a command to do the same thing:

ALTER TABLE table_name SET SERDE serde_class_name
 
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL


> bug in 'set fileformat'
> ---
>
> Key: HIVE-1448
> URL: https://issues.apache.org/jira/browse/HIVE-1448
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1448-brach-0.6.patch, hive-1448.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1448) bug in 'set fileformat'

2010-07-02 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1448:
---

Attachment: hive-1448-brach-0.6.patch
hive-1448.1.patch

> bug in 'set fileformat'
> ---
>
> Key: HIVE-1448
> URL: https://issues.apache.org/jira/browse/HIVE-1448
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1448-brach-0.6.patch, hive-1448.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1448) bug in 'set fileformat'

2010-07-02 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1448:
---

   Status: Patch Available  (was: Open)
Fix Version/s: 0.6.0
   0.7.0

> bug in 'set fileformat'
> ---
>
> Key: HIVE-1448
> URL: https://issues.apache.org/jira/browse/HIVE-1448
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1448-brach-0.6.patch, hive-1448.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1448) bug in 'set fileformat'

2010-07-02 Thread He Yongqiang (JIRA)
bug in 'set fileformat'
---

 Key: HIVE-1448
 URL: https://issues.apache.org/jira/browse/HIVE-1448
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-03-15 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845718#action_12845718
 ] 

Ning Zhang commented on HIVE-1085:
--

committed to 0.5. Thanks Yongqiang!

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1085-branch0.5.patch, hive-1085.2.patch, 
> hive-1085.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-03-15 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1085:
---

Attachment: hive-1085-branch0.5.patch

patch for branch-0.5

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1085-branch0.5.patch, hive-1085.2.patch, 
> hive-1085.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-01-22 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1085.
--

   Resolution: Fixed
Fix Version/s: 0.6.0
 Assignee: He Yongqiang

Committed. Thanks Yongqiang!

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.6.0
>
> Attachments: hive-1085.2.patch, hive-1085.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-01-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803913#action_12803913
 ] 

Ning Zhang commented on HIVE-1085:
--

+1

looks good. Will commit once tests pass.

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
> Attachments: hive-1085.2.patch, hive-1085.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-01-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1085:
---

Attachment: hive-1085.2.patch

hive-1085.2.patch integrates Ning's offline comments. hive-1085.2.patch is more 
clear for future change. 

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
> Attachments: hive-1085.2.patch, hive-1085.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-01-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1085:
---

Attachment: hive-1080.patch

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
> Attachments: hive-1085.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-01-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1085:
---

Attachment: (was: hive-1080.patch)

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
> Attachments: hive-1085.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-01-22 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1085:
---

Attachment: hive-1085.patch

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
> Attachments: hive-1085.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-01-22 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803870#action_12803870
 ] 

Ning Zhang commented on HIVE-1085:
--

To elaborate, currently ColumnarSerDe is used if the hive.default.fileformat is 
set to RCFile. It should not ColumnarSerDe if the user specify a different 
storage format in DDL.

> ColumnarSerde should not be the default Serde when user specified a 
> fileformat using 'stored as'.
> -
>
> Key: HIVE-1085
> URL: https://issues.apache.org/jira/browse/HIVE-1085
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1085) ColumnarSerde should not be the default Serde when user specified a fileformat using 'stored as'.

2010-01-22 Thread He Yongqiang (JIRA)
ColumnarSerde should not be the default Serde when user specified a fileformat 
using 'stored as'.
-

 Key: HIVE-1085
 URL: https://issues.apache.org/jira/browse/HIVE-1085
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-09 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-360:


   Resolution: Fixed
Fix Version/s: 0.4.0
 Release Note: HIVE-360. Generalize the FileFormat Interface in Hive. (He 
Yongqiang via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Yongqiang.

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Fix For: 0.4.0
>
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, 
> hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, HIVE-360.patch, 
> qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-09 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697696#action_12697696
 ] 

Zheng Shao commented on HIVE-360:
-

@Joydeep: Agreed, although currently the only use case for that is inside text 
file format. It's kind of rare so I think we can generalize that when we have a 
second use case.
We should also move file format check function into the specific file format as 
well.


> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, 
> hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, HIVE-360.patch, 
> qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-09 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697571#action_12697571
 ] 

Joydeep Sen Sarma commented on HIVE-360:


looked at this a bit - looks great to me.

One comment is that the getFinalPath call should be made part of the 
HiveOutputFormat as well. Actually all we need is to let the outputformat 
determine the file extension. rest of the path name is always the same. but 
it's not a big deal.

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, 
> hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, HIVE-360.patch, 
> qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-09 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-360:


Attachment: HIVE-360.patch

Almost the same as Yongqiang's patch except fixing a few typos etc.

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, 
> hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, HIVE-360.patch, 
> qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-08 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697408#action_12697408
 ] 

Zheng Shao commented on HIVE-360:
-

+1


> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, 
> hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-08 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-04-09-3.patch

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, 
> hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-08 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-04-09.patch

1) modified the .out files using -Doverwrite=true
2) removed fileformat_void.q
3) checked errors from Hive's test
" 
[junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED
[junit] < FAILED: Error in semantic analysis: Output Format must implement 
HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or 
SequenceFileOutputFormat
[junit] > FAILED: Error in metadata: Class not found: ClassDoesNotExist
[junit] > FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask
[junit] < FAILED: Error in semantic analysis: Output Format must implement 
HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or 
SequenceFileOutputFormat
[junit] > FAILED: Error in semantic analysis: line 4:23 Output Format must 
implement OutputFormat dest1
[junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED
[junit] Test org.apache.hadoop.hive.ql.TestMTQueries FAILED
[junit] Test org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker 
FAILED
[junit] Test org.apache.hadoop.hive.ql.parse.TestParse FAILED
"
fixed them and passed them in my local.


> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, 
> hive-360-2009-04-09.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-04-08-3.patch

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-04-08.patch

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, 
> hive-360-2009-04-08.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: qfile.tar

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-07 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696854#action_12696854
 ] 

Zheng Shao commented on HIVE-360:
-

Some more details on 5):  SequenceFileFormat was inheriting from 
FileOutputFormat using different generic arguments, which makes it impossible 
for HiveOutputFormat to extend OutputFormat, and HiveSequenceFileFormat to 
inherit from SequenceFileFormat, and implement HiveOutputFormat.

As a result, we have to drop the inheritance of HiveOutputFormat on 
OutputFormat.

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-04-07-5.patch

1) 
HiveOutputFileFormat by adding a new Method? Like getFileExtension(jc, 
isCompressed)?

if (outputFormat instanceof IgnoreKeyTextOutputFormat) {
finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities
.getTaskId(jc)
+ Utilities.getFileExtension(jc, isCompressed));
  }

Done. 
A new method called getOutputFormatFinalPath (..) is added in 
HiveFileFormatUtils 

2) HiveOutputFormatUtils is renamed to HiveFileFormatUtils 

3) "initRecordWriter" in FileSinkOperator is made static and renamed to 
getRecordWriter which returns a RecordWriter.

4) code from MoveTask.java is moved to a method called checkInputFormat in 
HiveFileFormatUtils 


In addition to Zheng's suggestions mentioned in the jira page. We talked about 
other modifications.
5) drop the inheritance of HiveOutputFormat on OutputFormat, because 
SequenceFileOutputFormat in 0.17 and 0.19 are inheriting from FileOutputFormat 
differently.
6) modify tableDesc to support HiveOutputFormat directly

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-06 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696166#action_12696166
 ] 

Zheng Shao commented on HIVE-360:
-

Also MoveTask.java needs to be modified as well. There is some code to check 
the file format. We need to change it accordingly as well.

{code}
  boolean fileIsSequenceFile = true;   
  try {
SequenceFile.Reader reader = new SequenceFile.Reader(
  fs, files.get(fileId).getPath(), conf);
reader.close();
  } catch (IOException e) {
fileIsSequenceFile = false;
  }
{code}


> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-06 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696163#action_12696163
 ] 

Zheng Shao commented on HIVE-360:
-

This piece of code looks bad in FileSinkOperator.
Can you move it into the HiveOutputFileFormat by adding a new Method? Like 
getFileExtension(jc, isCompressed)?

{code}
  if (outputFormat instanceof IgnoreKeyTextOutputFormat) {
finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities
.getTaskId(jc)
+ Utilities.getFileExtension(jc, isCompressed));
  }
{code}

Can you also make "initRecordWriter" in FileSinkOperator static?  There is just 
one member variable that is referenced: outPath, and you can make 
initRecordWriter return the outWriter value.


> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-04 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-04-04-4.patch

1) change tableDesc in the getHiveRecordWriter method to java.util.Properties
2) added in two .q file. one in clientpositive and the other in clientnegative
Thanks Zheng.

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, 
> hive-360-2009-04-04-4.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-02 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695227#action_12695227
 ] 

Zheng Shao commented on HIVE-360:
-

Good point. We should do an INSERT after the CREATE TABLE as well.


> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-02 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695219#action_12695219
 ] 

He Yongqiang commented on HIVE-360:
---

Is CREATE TABLE ... STORED AS INPUTFORMAT  OUTPUTFORMAT xxx a good testcase 
for OutputFormat.
Hive only uses OutputFormat in its FileSinkOperator, and it seems CREATE only 
stores the information. 

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-02 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694959#action_12694959
 ] 

Zheng Shao commented on HIVE-360:
-

Also, please include 2 tests which uses the syntax of CREATE TABLE ... STORED 
AS INPUTFORMAT  OUTPUTFORMAT xxx.
One positive test, and one negative test with invalid file formats (e.g., just 
a hadoop file format, but no corresponding hive file format).



> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-02 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694957#action_12694957
 ] 

Zheng Shao commented on HIVE-360:
-

Currently we need a tableDesc in the getHiveRecordWriter method, can we change 
it to java.util.Properties instead?
We don't need all the information about the table to create the File.


> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-01 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Status: Patch Available  (was: Open)

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-01 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-04-01.patch

hive-360-2009-04-01.patch is a refactored version of hive-360-2009-03-32.patch.
It adds javadoc and the apache licence header in each new file.

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-01 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Status: Open  (was: Patch Available)

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-01 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694543#action_12694543
 ] 

He Yongqiang commented on HIVE-360:
---

Thanks, Thusoo.
Actually i am refactoring the code now. 
I have talked with Zheng about the current patch. There are some improvements:
(1) make HiveInputFormat as an interface, and extends from InputFormat. Add a 
new getRecordWriter. The main different between its getRecordWriter and Hadoop 
OutputFormat's getRecordWriter is that the new getRecordWriter accepts a path 
parameter, and create the out file at the calling.
(2) make HiveSequenceFileOutputFormat extend Hadoop's SequenceFileOutputFormat 
and implement the new HiveOutputFormat
(3) Deprecate Hive's IgnoreKeyOutputFormat and replace it with a new 
IgnoreKeyOutputFormat which uses the new HiveOutputFormat

In this way, the code will be more clear. The disadvantage is that the 
HiveOutputFormat's signature is like:
{code}
HiveOutputFormat extends
OutputFormat
{code} 
It can only use subclasses of WritableComparable as its key and subclasses of 
Writable as its value. I think it is ok in Hive, isn't it?

Should i cancel the patch now and resubmit one once the refactory is done?

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
>     Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-01 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-360:
---

Affects Version/s: 0.4.0
   Status: Patch Available  (was: Open)

Marking this as patch submitted. That helps to get this on the radar for the 
reviewers.

Thanks,
Ashish

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-04-01 Thread Ashish Thusoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-360:
--

Assignee: He Yongqiang

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>Assignee: He Yongqiang
> Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-03-31 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-03-31.patch

Attached hive-360-2009-03-31.patch, a draft version.
1) add a new HiveOutputFormat
2) wrap existing IgnoreKeyTextOutputFormat and SequenceFileOutputFormat to 
HiveIgnoreKeyTextOutputFormat and HiveSequenceFileOutputFormat respectly. 
4) add a HiveOutputFormatUtils for backward compability
3) factor FileSinkOperator to use HiveOutputFormat to create write

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
> Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-03-31 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: (was: hive-360-2009-03-31.patch)

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
> Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-03-31 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-360:
--

Attachment: hive-360-2009-03-31.patch

Attached hive-360-2009-03-31.patch, a draft version.
1) add a new HiveOutputFormat
2) wrap existing IgnoreKeyTextOutputFormat and SequenceFileOutputFormat to 
HiveIgnoreKeyTextOutputFormat and HiveSequenceFileOutputFormat respectly. 
4) add a HiveOutputFormatUtils for backward compability
3) factor FileSinkOperator to use HiveOutputFormat to create write

> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
> Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-03-29 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693637#action_12693637
 ] 

He Yongqiang commented on HIVE-360:
---

TOK_TABLEFILEFORMAT is already handled in DDLSemanticAnalyzer.
{code}
case HiveParser.TOK_TBLSEQUENCEFILE:
  inputFormat = SEQUENCEFILE_INPUT;
  outputFormat = SEQUENCEFILE_OUTPUT;
  break;
{code}



> Generalize the FileFormat Interface in Hive
> ---
>
> Key: HIVE-360
> URL: https://issues.apache.org/jira/browse/HIVE-360
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... 
> else" to support TextFileFormat and SequenceFileFormat. There is no way to 
> support a 3rd one without changing the "if...else" structure. We should make 
> an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more 
> method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but 
> DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
> outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
> case HiveParser.TOK_TBLSEQUENCEFILE:
> ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface 
> (and cast the user-provided file format to that interface), instead of doing 
> "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>   if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
> finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
> Utilities.getTaskId(hconf) +
>  Utilities.getFileExtension(jc, isCompressed));
>   ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-360) Generalize the FileFormat Interface in Hive

2009-03-24 Thread Zheng Shao (JIRA)
Generalize the FileFormat Interface in Hive
---

 Key: HIVE-360
 URL: https://issues.apache.org/jira/browse/HIVE-360
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao


Currently the FileFormat support in Hive is not generalized - we do "if ... 
else" to support TextFileFormat and SequenceFileFormat. There is no way to 
support a 3rd one without changing the "if...else" structure. We should make an 
interface for the FileFormat need for Hive.

The OutputFileFormat interface that Hive requires will contain one more method 
than the Hadoop OutputFileFormat - create a File with a specific name.

Hive.g:409 (Hive.g already supports the custom file format but 
DDLSemanticAnalyzer.java is not recognizing it yet
{code}
KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT 
outFmt=StringLiteral
{code}

Please add the handling of TOK_TABLEFILEFORMAT here:
DDLSemanticAnalyzer.java:223
{code}
case HiveParser.TOK_TBLSEQUENCEFILE:
...
{code}

Please add the handling of custom outputFormat here by adding a new interface 
(and cast the user-provided file format to that interface), instead of doing 
"if ... else"
FileSinkOperator.java:129-174:
{code}
  if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
finalPath = new Path(Utilities.toTempPath(conf.getDirName()), 
Utilities.getTaskId(hconf) +
 Utilities.getFileExtension(jc, isCompressed));
  ...
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.