[ https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340411#comment-15340411 ]
James Clampffer commented on HDFS-10543: ---------------------------------------- I think you need to turn {code} *nbyte = std::get<1>(returnstate); {code} into {code} *nbyte = *nbyte - std::get<1>(returnstate); {code} to make sure you only read as much as was originally requested. > hdfsRead read stops at block boundary > ------------------------------------- > > Key: HDFS-10543 > URL: https://issues.apache.org/jira/browse/HDFS-10543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Reporter: Xiaowei Zhu > Assignee: James Clampffer > Attachments: HDFS-10543.HDFS-8707.000.patch, > HDFS-10543.HDFS-8707.001.patch > > > Reproducer: > char *buf2 = new char[file_info->mSize]; > memset(buf2, 0, (size_t)file_info->mSize); > int ret = hdfsRead(fs, file, buf2, file_info->mSize); > delete [] buf2; > if(ret != file_info->mSize) { > std::stringstream ss; > ss << "tried to read " << file_info->mSize << " bytes. but read " << > ret << " bytes"; > ReportError(ss.str()); > hdfsCloseFile(fs, file); > continue; > } > When it runs with a file ~1.4GB large, it will return an error like "tried to > read 1468888890 bytes. but read 134217728 bytes". The HDFS cluster it runs > against has a block size of 134217728 bytes. So it seems hdfsRead will stop > at a block boundary. Looks like a regression. We should add retry to continue > reading cross blocks in case of files w/ multiple blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org