[jira] [Work logged] (HIVE-23158) Optimize S3A recordReader policy for Random IO formats

ASF GitHub Bot (Jira) Mon, 13 Apr 2020 10:37:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-23158?focusedWorklogId=421462&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421462
 ]


ASF GitHub Bot logged work on HIVE-23158:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Apr/20 17:36
            Start Date: 13/Apr/20 17:36
    Worklog Time Spent: 10m 
      Work Description: pgaref commented on pull request #972: HIVE-23158 
initial patch
URL: https://github.com/apache/hive/pull/972
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 421462)
    Time Spent: 20m  (was: 10m)

> Optimize S3A recordReader policy for Random IO formats
> ------------------------------------------------------
>
>                 Key: HIVE-23158
>                 URL: https://issues.apache.org/jira/browse/HIVE-23158
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Trivial
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-23158.01.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> S3A filesystem client (inherited by Hadoop) supports the notion of input 
> policies.
>  These policies tune the behaviour of HTTP requests that are used for reading 
> different filetypes such as TEXT or ORC.
> For formats such as ORC and Parquet that do a lot of seek operations, there 
> is an optimized RANDOM mode that reads files only partially instead of fully 
> (default).
> I am suggesting to add some extra logic as part of HiveInputFormat to make 
> sure we optimize RecordReader requests for random IO when data is stored on 
> S3A using formats such as ORC or Parquet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23158) Optimize S3A recordReader policy for Random IO formats

Reply via email to