[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer
[ https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-693: Resolution: Fixed Fix Version/s: (was: 0.3.0) 0.5.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Zheng and Andraz > Add a AWS S3 log format deserializer > > > Key: HIVE-693 > URL: https://issues.apache.org/jira/browse/HIVE-693 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Zheng Shao >Assignee: Andraz Tori > Fix For: 0.5.0 > > Attachments: HIVE-693.1.patch, HIVE-693.2.patch, inputs3.q, s3.log, > s3deserializer.diff, S3LogDeserializer.java, S3LogStruct.java > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer
[ https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-693: Attachment: HIVE-693.2.patch Incorporated Ashish's comments. Also removed the column definition since they will come directly from serde. @Andraz: For speed improvement: Instead of using regex, you can read in the data as org.apache.hadoop.io.Text, and do split by yourself. Each field can be stored in a Text as well, and the Text object can be reused across the rows. In this way, the processing will be much faster. > Add a AWS S3 log format deserializer > > > Key: HIVE-693 > URL: https://issues.apache.org/jira/browse/HIVE-693 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Zheng Shao >Assignee: Andraz Tori > Fix For: 0.3.0 > > Attachments: HIVE-693.1.patch, HIVE-693.2.patch, inputs3.q, s3.log, > s3deserializer.diff, S3LogDeserializer.java, S3LogStruct.java > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer
[ https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andraz Tori updated HIVE-693: - Attachment: inputs3.q actually, the input.q was a bit old, sorry for that, here's the fixed one. everything else seems ok > Add a AWS S3 log format deserializer > > > Key: HIVE-693 > URL: https://issues.apache.org/jira/browse/HIVE-693 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Zheng Shao >Assignee: Andraz Tori > Fix For: 0.3.0 > > Attachments: HIVE-693.1.patch, inputs3.q, s3.log, > s3deserializer.diff, S3LogDeserializer.java, S3LogStruct.java > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer
[ https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-693: Attachment: HIVE-693.1.patch HIVE-693.1.patch: Andraz, I've moved all data and code to contrib. Can you review and comment? Please note that when you want to upgrade from hive 0.3 to hive 0.4 to use this new serde, you would need to manually go through the metastore tables and replace the name of the SerDe class (since it's changed to org.apache.hadoop.hive.contrib.serde2.s3.S3LogDeserializer. > Add a AWS S3 log format deserializer > > > Key: HIVE-693 > URL: https://issues.apache.org/jira/browse/HIVE-693 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Zheng Shao >Assignee: Andraz Tori > Fix For: 0.3.0 > > Attachments: HIVE-693.1.patch, s3.log, s3deserializer.diff, > S3LogDeserializer.java, S3LogStruct.java > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer
[ https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andraz Tori updated HIVE-693: - Attachment: s3.log ... forgot to add a s3.log for previous patch are there any chances of getting this into 0.4 ? > Add a AWS S3 log format deserializer > > > Key: HIVE-693 > URL: https://issues.apache.org/jira/browse/HIVE-693 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Zheng Shao >Assignee: Andraz Tori > Fix For: 0.3.0 > > Attachments: s3.log, s3deserializer.diff, S3LogDeserializer.java, > S3LogStruct.java > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer
[ https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andraz Tori updated HIVE-693: - Attachment: s3deserializer.diff the patch... > Add a AWS S3 log format deserializer > > > Key: HIVE-693 > URL: https://issues.apache.org/jira/browse/HIVE-693 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Zheng Shao >Assignee: Andraz Tori > Fix For: 0.3.0 > > Attachments: s3deserializer.diff, S3LogDeserializer.java, > S3LogStruct.java > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer
[ https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andraz Tori updated HIVE-693: - Fix Version/s: 0.3.0 Status: Patch Available (was: Open) here's a patch with expected inputs and outputs so unittests can be created... I am still new to Hive source tree, so someone else should take care of moving it to contrib. > Add a AWS S3 log format deserializer > > > Key: HIVE-693 > URL: https://issues.apache.org/jira/browse/HIVE-693 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Zheng Shao >Assignee: Andraz Tori > Fix For: 0.3.0 > > Attachments: S3LogDeserializer.java, S3LogStruct.java > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-693) Add a AWS S3 log format deserializer
[ https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andraz Tori updated HIVE-693: - Attachment: S3LogStruct.java S3LogDeserializer.java Deserializer implementation. While it works, code is by no means release-ready, it has to be cleaned up first. But it is better than nothing as a starting point for someone looking to integrate S3 log deserializer. I was quite amazed to find out that no one else needed this/published this. > Add a AWS S3 log format deserializer > > > Key: HIVE-693 > URL: https://issues.apache.org/jira/browse/HIVE-693 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Zheng Shao > Attachments: S3LogDeserializer.java, S3LogStruct.java > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.