[
https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759624#comment-13759624
]
Phabricator commented on HIVE-5102:
-----------------------------------
ashutoshc has requested changes to the revision "HIVE-5102 [jira] ORC getSplits
should create splits based the stripes".
Looks good overall. Some minor comments.
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java:276 Didn't get
why we need to & 0xff here?
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:247 I think
it will good to throw exception if dirs.isEmpty() because we spent a day
debugging a problem where Tez updated code but didnt have this config variable
in their conf.
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:261 There
are few places in Hive codebase which creates thread pool. We should unify all
that. But thats probably topic for another jira.
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:69 It will
good to a comment for this field.
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:262 Good to
name this field splits. Also, instead should it be List<FileSplit> ?
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:288 Why
should we allow this? Isn't caller passing in wrong argument in those cases?
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:329 This
seems like introducing a possibility of process hang. Is there a better way of
doing things here?
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java:523 Add
@Override annotation?
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java:285-288
This change shouldn't have made this test to fail. Any reason for deleting this?
REVISION DETAIL
https://reviews.facebook.net/D12579
BRANCH
h-5102
ARCANIST PROJECT
hive
To: JIRA, ashutoshc, omalley
> ORC getSplits should create splits based the stripes
> -----------------------------------------------------
>
> Key: HIVE-5102
> URL: https://issues.apache.org/jira/browse/HIVE-5102
> Project: Hive
> Issue Type: Bug
> Components: File Formats
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Attachments: HIVE-5102.D12579.1.patch, HIVE-5102.D12579.2.patch
>
>
> Currently ORC inherits getSplits from FileFormat, which basically makes a
> split per an HDFS block. This can create too little parallelism and would be
> better done by having getSplits look at the file footer and create splits
> based on the stripes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira