[jira] Commented: (HIVE-1463) hive output file names are unnecessarily large

Joydeep Sen Sarma (JIRA) Fri, 16 Jul 2010 14:09:17 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889328#action_12889328
 ]


Joydeep Sen Sarma commented on HIVE-1463:
-----------------------------------------

thanks for the review.

1) I checked this out. hadoop from 17 onwards always uses 
<prefix>_<jtid>_[mr]_<taskid>_<attemptid>. in 17 - prefix was 'task'. in 18 and 
later prefix was changed to 'attempt'. jt = 'local' for local mode. otherwise 
there's no difference between local and regular jobs.

   i think 15 was different (where hive was initially started) - that's why 
there were comments to the effect that jobs have _map_ in local mode.

  one thing i can do is add tests under shim to make sure of this. if i am 
unable to add a test - i will at least confirm for sure the naming under 17.

2) good point!  dropping the leading prefix is not necessary (since repeated 
strings are factored out by hdfs now - it uses String.intern()). i can take 
that part out.

will upload modified diff.


> hive output file names are unnecessarily large
> ----------------------------------------------
>
>                 Key: HIVE-1463
>                 URL: https://issues.apache.org/jira/browse/HIVE-1463
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>         Attachments: hive-1463.1.patch
>
>
> Hive's output files are named like this:
> attempt_201006221843_431854_r_000000_0
> out of all of this goop - only one character '0' would have sufficed. we 
> should fix this. This would help environments with namenode memory 
> constraints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1463) hive output file names are unnecessarily large

Reply via email to