[
https://issues.apache.org/jira/browse/HIVE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889328#action_12889328
]
Joydeep Sen Sarma commented on HIVE-1463:
-----------------------------------------
thanks for the review.
1) I checked this out. hadoop from 17 onwards always uses
<prefix>_<jtid>_[mr]_<taskid>_<attemptid>. in 17 - prefix was 'task'. in 18 and
later prefix was changed to 'attempt'. jt = 'local' for local mode. otherwise
there's no difference between local and regular jobs.
i think 15 was different (where hive was initially started) - that's why
there were comments to the effect that jobs have _map_ in local mode.
one thing i can do is add tests under shim to make sure of this. if i am
unable to add a test - i will at least confirm for sure the naming under 17.
2) good point! dropping the leading prefix is not necessary (since repeated
strings are factored out by hdfs now - it uses String.intern()). i can take
that part out.
will upload modified diff.
> hive output file names are unnecessarily large
> ----------------------------------------------
>
> Key: HIVE-1463
> URL: https://issues.apache.org/jira/browse/HIVE-1463
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Joydeep Sen Sarma
> Attachments: hive-1463.1.patch
>
>
> Hive's output files are named like this:
> attempt_201006221843_431854_r_000000_0
> out of all of this goop - only one character '0' would have sufficed. we
> should fix this. This would help environments with namenode memory
> constraints.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.