[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer

2009-08-13 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-693:


   Resolution: Fixed
Fix Version/s: (was: 0.3.0)
   0.5.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Zheng and Andraz

> Add a AWS S3 log format deserializer
> 
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Zheng Shao
>Assignee: Andraz Tori
> Fix For: 0.5.0
>
> Attachments: HIVE-693.1.patch, HIVE-693.2.patch, inputs3.q, s3.log, 
> s3deserializer.diff, S3LogDeserializer.java, S3LogStruct.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer

2009-08-12 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-693:


Attachment: HIVE-693.2.patch

Incorporated Ashish's comments.

Also removed the column definition since they will come directly from serde.

@Andraz: For speed improvement: Instead of using regex, you can read in the 
data as org.apache.hadoop.io.Text, and do split by yourself. Each field can be 
stored in a Text as well, and the Text object can be reused across the rows. In 
this way, the processing will be much faster.


> Add a AWS S3 log format deserializer
> 
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Zheng Shao
>Assignee: Andraz Tori
> Fix For: 0.3.0
>
> Attachments: HIVE-693.1.patch, HIVE-693.2.patch, inputs3.q, s3.log, 
> s3deserializer.diff, S3LogDeserializer.java, S3LogStruct.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer

2009-08-07 Thread Andraz Tori (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andraz Tori updated HIVE-693:
-

Attachment: inputs3.q

actually, the input.q was a bit old, sorry for that, here's the fixed one.

everything else seems ok

> Add a AWS S3 log format deserializer
> 
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Zheng Shao
>Assignee: Andraz Tori
> Fix For: 0.3.0
>
> Attachments: HIVE-693.1.patch, inputs3.q, s3.log, 
> s3deserializer.diff, S3LogDeserializer.java, S3LogStruct.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer

2009-08-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-693:


Attachment: HIVE-693.1.patch

HIVE-693.1.patch: Andraz, I've moved all data and code to contrib. Can you 
review and comment?

Please note that when you want to upgrade from hive 0.3 to hive 0.4 to use this 
new serde, you would need to manually go through the metastore tables and 
replace the name of the SerDe class (since it's changed to 
org.apache.hadoop.hive.contrib.serde2.s3.S3LogDeserializer.

> Add a AWS S3 log format deserializer
> 
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Zheng Shao
>Assignee: Andraz Tori
> Fix For: 0.3.0
>
> Attachments: HIVE-693.1.patch, s3.log, s3deserializer.diff, 
> S3LogDeserializer.java, S3LogStruct.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer

2009-08-05 Thread Andraz Tori (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andraz Tori updated HIVE-693:
-

Attachment: s3.log

... forgot to add a s3.log for previous patch

are there any chances of getting this into 0.4 ?

> Add a AWS S3 log format deserializer
> 
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Zheng Shao
>Assignee: Andraz Tori
> Fix For: 0.3.0
>
> Attachments: s3.log, s3deserializer.diff, S3LogDeserializer.java, 
> S3LogStruct.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer

2009-08-04 Thread Andraz Tori (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andraz Tori updated HIVE-693:
-

Attachment: s3deserializer.diff

the patch...

> Add a AWS S3 log format deserializer
> 
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Zheng Shao
>Assignee: Andraz Tori
> Fix For: 0.3.0
>
> Attachments: s3deserializer.diff, S3LogDeserializer.java, 
> S3LogStruct.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer

2009-08-04 Thread Andraz Tori (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andraz Tori updated HIVE-693:
-

Fix Version/s: 0.3.0
   Status: Patch Available  (was: Open)

here's a patch with expected inputs and outputs so unittests can be created...

I am still new to Hive source tree, so someone else should take care of moving 
it to contrib.

> Add a AWS S3 log format deserializer
> 
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Zheng Shao
>Assignee: Andraz Tori
> Fix For: 0.3.0
>
> Attachments: S3LogDeserializer.java, S3LogStruct.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer

2009-07-28 Thread Andraz Tori (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andraz Tori updated HIVE-693:
-

Attachment: S3LogStruct.java
S3LogDeserializer.java

Deserializer implementation.

While it works, code is by no means release-ready, it has to be cleaned up 
first. But it is better than nothing as a starting point for someone looking to 
integrate S3 log deserializer. 

I was quite amazed to find out that  no one else needed this/published this.

> Add a AWS S3 log format deserializer
> 
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Reporter: Zheng Shao
> Attachments: S3LogDeserializer.java, S3LogStruct.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.