[
https://issues.apache.org/jira/browse/TAJO-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104100#comment-14104100
]
ASF GitHub Bot commented on TAJO-931:
-------------------------------------
Github user hyunsik commented on the pull request:
https://github.com/apache/tajo/pull/119#issuecomment-52804586
I've rebased, reflected the comments, and fixed some potential bugs. Please
review this.
> Output file can be punctuated depending on the file size.
> ---------------------------------------------------------
>
> Key: TAJO-931
> URL: https://issues.apache.org/jira/browse/TAJO-931
> Project: Tajo
> Issue Type: Improvement
> Components: physical operator
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Fix For: 0.9.0
>
>
> There are some file formats (e.g., Parquet) which are not splittable. They
> can usually span multiple HDFS blocks if one file is very large. It causes
> remote HDFS access and limits the parallel degree, resulting in significant
> performance degradation.
> We can solve this problem if StoreTableExec or
> {Col|SortBased}PartitionStoreExec can punctuate the final output file
> according to the written size.
> In addition, we need to support a session variable to determine the per file
> size of final output files. So, TAJO-928 blocks this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)