[ 
https://issues.apache.org/jira/browse/HADOOP-17945?focusedWorklogId=660960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-660960
 ]

ASF GitHub Bot logged work on HADOOP-17945:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Oct/21 14:45
            Start Date: 06/Oct/21 14:45
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on a change in pull request 
#3501:
URL: https://github.com/apache/hadoop/pull/3501#discussion_r723354042



##########
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JsonSerialization.java
##########
@@ -229,30 +235,44 @@ public T fromInstance(T instance) throws IOException {
 
   /**
    * Load from a Hadoop filesystem.
-   * There's a check for data availability after the file is open, by
-   * raising an EOFException if stream.available == 0.
-   * This allows for a meaningful exception without the round trip overhead
-   * of a getFileStatus call before opening the file. It may be brittle
-   * against an FS stream which doesn't return a value here, but the
-   * standard filesystems all do.
-   * JSON parsing and mapping problems
-   * are converted to IOEs.
    * @param fs filesystem
    * @param path path
    * @return a loaded object
-   * @throws IOException IO or JSON parse problems
+   * @throws PathIOException JSON parse problem
+   * @throws IOException IO problems
    */
   public T load(FileSystem fs, Path path) throws IOException {
-    try (FSDataInputStream dataInputStream = fs.open(path)) {
-      // throw an EOF exception if there is no data available.
-      if (dataInputStream.available() == 0) {

Review comment:
       yes. Available is only reporting buffer size, not length of data




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 660960)
    Time Spent: 1h 10m  (was: 1h)

> JsonSerialization raises EOFException reading JSON data stored on google GCS
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-17945
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17945
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.3.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The JsonSerialization<> load code doesn't work on gcs as it uses 
> "stream.available()" to fail with a meaningful message if the stream is empty.
> But that method is meant to say how much data is available without blocking, 
> something we actually get wrong ourselves. Google GCS team didn't get it 
> wrong, so on a read(), if there's no local buffer, an EOFException is raised



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to