[jira] Commented: (HIVE-895) Add SerDe for Avro serialized data
[ https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889558#action_12889558 ] Alex Rovner commented on HIVE-895: -- Can some one please explain to me how would this serde work? Specifically how would it deserialize the data? >From what I understand Avro file has a header that defines the data that is >stored in the file. In order to deserialize the data you need to read the >header which is a challenge in Hive's Deserialize interface because the >initialize() method does not know anything about the input file. (Note: there >is a hack that can get you the file by getting the map.input hadoop >property this hack however is not good enough in hive because some one >might be using the CLI to query which will not trigger a map reduce job. Does anyone know a good solution to this issue? I am actually trying to implements a different file format but the idea of our format is similar to Avro: Each file has a header in which it contains a "schema" Thanks > Add SerDe for Avro serialized data > -- > > Key: HIVE-895 > URL: https://issues.apache.org/jira/browse/HIVE-895 > Project: Hadoop Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Jeff Hammerbacher > > As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro > data seems like a solid win. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1470) percentile_approx() fails with more than 1 reducer
[ https://issues.apache.org/jira/browse/HIVE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889551#action_12889551 ] John Sichi commented on HIVE-1470: -- Oh...if it requires more than one reducer, we'll need minimr mode in order to exercise it, meaning I need to finish HIVE-117. > percentile_approx() fails with more than 1 reducer > -- > > Key: HIVE-1470 > URL: https://issues.apache.org/jira/browse/HIVE-1470 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Mayank Lahiri > Fix For: 0.7.0 > > Attachments: HIVE-1470.1.patch > > > The larger issue is that a UDAF that has variable return types needs two > inner Evaluator classes. This patch fixes a NullPointerException bug that is > only encountered when partial aggregations are invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1470) percentile_approx() fails with more than 1 reducer
[ https://issues.apache.org/jira/browse/HIVE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889550#action_12889550 ] John Sichi commented on HIVE-1470: -- Unit test? > percentile_approx() fails with more than 1 reducer > -- > > Key: HIVE-1470 > URL: https://issues.apache.org/jira/browse/HIVE-1470 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Mayank Lahiri > Fix For: 0.7.0 > > Attachments: HIVE-1470.1.patch > > > The larger issue is that a UDAF that has variable return types needs two > inner Evaluator classes. This patch fixes a NullPointerException bug that is > only encountered when partial aggregations are invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1468) intermediate data produced for select queries ignores hive.exec.compress.intermediate
[ https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889528#action_12889528 ] Zheng Shao commented on HIVE-1468: -- "select queries" means "SELECT" without "INSERT", correct? I agree that we should treat these queries differently, specifically, no compression (or maybe use lzo to same bandwidth - clients can be in other data centers) will be a big win. > intermediate data produced for select queries ignores > hive.exec.compress.intermediate > - > > Key: HIVE-1468 > URL: https://issues.apache.org/jira/browse/HIVE-1468 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma > > > set hive.exec.compress.intermediate=false; > > explain extended select xxx from yyy; > ... > File Output Operator > compressed: true > GlobalTableId: 0 > looks like we only intermediate locations identified during splitting mr > tasks follow this directive. this should be fixed because this forces clients > to always decompress output data (even if the config setting is altered). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1463) hive output file names are unnecessarily large
[ https://issues.apache.org/jira/browse/HIVE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1463: Attachment: 1463.3.patch this fixes all the issues 1) regex expanded to cover both 17 and later releases. in 17 tasks are indeed named _map_ and _reduce_ in local mode. 2) no change to strip leading zeros in taskid. ordering of files will not be changed by this diff. the filename component being removed is constant per map-reduce job (jobid + jobtracker_id etc.). 3) one line env setting in the build file that allows us to control test execution logging from hive-log4j. this passes all the tests. the problem with load_dyn_part2.q was due to incorrect regex application. the taskid matching has to be applied to the last component of the path name only. as an aside - replaceTaskIdFromFilename would also be easier to understand and simpler if it simply did this (cut last component, replace taskid, concat back and return). > hive output file names are unnecessarily large > -- > > Key: HIVE-1463 > URL: https://issues.apache.org/jira/browse/HIVE-1463 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma > Attachments: 1463.2.patch, 1463.3.patch, hive-1463.1.patch > > > Hive's output files are named like this: > attempt_201006221843_431854_r_00_0 > out of all of this goop - only one character '0' would have sufficed. we > should fix this. This would help environments with namenode memory > constraints. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.