[ 
https://issues.apache.org/jira/browse/PIG-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5432:
------------------------------------
    Hadoop Flags: Reviewed
      Resolution: Fixed
          Status: Resolved  (was: Patch Available)

+1. Committed to branch-0.18 and trunk. Thanks for contribution [~jtolar]

> OrcStorage fails to detect schema in some cases
> -----------------------------------------------
>
>                 Key: PIG-5432
>                 URL: https://issues.apache.org/jira/browse/PIG-5432
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jacob Tolar
>            Assignee: Jacob Tolar
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: PIG-5432.v01.patch
>
>
> OrcStorage needs to detect the schema of input data paths. If some data paths 
> have no ORC files (perhaps only a _SUCCESS marker is present), this will 
> fail. 
> For example: 
> {code}
> A = LOAD '/path/to/20230101,/path/to/20230102' USING OrcStorage();
> {code}
> If {{/path/to/20230101}} contains only a _SUCCESS marker and {{20230102}} 
> contains data, OrcStorage fails to detect the schema and Pig exits with a 
> confusing/unhelpful error, something like "Cannot find any ORC files from 
> <locations>. Probably multiple load/store statements in script."
> The code tries to use a search algorithm to recursively search through all 
> input paths for the data (via Utils.depthFirstSearchForFile), but it is 
> implemented incorrectly and returns early in this scenario.
> See: 
> https://github.com/apache/pig/blob/c0d75ba930f9aa5c6454d0264a96f82b45279202/src/org/apache/pig/builtin/OrcStorage.java#L389-L408
> https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/impl/util/Utils.java#L629-L667
> I'll attach a patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to