[ 
https://issues.apache.org/jira/browse/SQOOP-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed updated SQOOP-3243:
----------------------------------
    Description: 
Importing BLOB data into encrypted zone causes "Stream closed" error with
* BLOB data size bigger than 16MB -> LobFile will be used
* Java 8 -> has a different implementation of the close() method of 
FilterOutputStream class

{noformat}
17/10/12 07:16:04 INFO mapreduce.Job: Running job: job_1507777811520_5091
17/10/12 07:16:13 INFO mapreduce.Job: Job job_1507777811520_5091 running in 
uber mode : false
17/10/12 07:16:13 INFO mapreduce.Job: map 0% reduce 0%
17/10/12 07:22:37 INFO mapreduce.Job: Task Id : 
attempt_1507777811520_5091_m_000000_0, Status : FAILED
Error: java.io.IOException: Stream closed
at 
org.apache.hadoop.crypto.CryptoOutputStream.checkStream(CryptoOutputStream.java:268)
at 
org.apache.hadoop.crypto.CryptoOutputStream.flush(CryptoOutputStream.java:255)
at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at 
org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
at org.apache.sqoop.io.LobFile$V0Writer.close(LobFile.java:1669)
at org.apache.sqoop.lib.LargeObjectLoader.close(LargeObjectLoader.java:96)
at org.apache.sqoop.mapreduce.AvroImportMapper.cleanup(AvroImportMapper.java:79)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{noformat}

The root cause of this issue is in LobFile.close method, which is being invoked 
from the Map cleanup. In line 1669, from the stacktrace, it's trying to close 
countingOut OS. However, at line 1664, out OS is already being closed. However, 
out OS is just a wrapper of countingOut OS, so at the end, both are pointing to 
same instance of CryptoOutputStream. When the call reaches line 1669, 
CryptoOutputStream instance is already closed by line 1664. The problem happens 
because java.io.BufferedOutputStream will try to call flush on the underlying 
OS it's wrapping (in this case, CryptoOutputStream), reaching line 255 of 
CryptoOutputStream.

  was:
Importing BLOB data into encrypted zone causes "Stream closed" error with
* BLOB data size bigger than 16MB
* Java 8

{noformat}
17/10/12 07:16:04 INFO mapreduce.Job: Running job: job_1507777811520_5091
17/10/12 07:16:13 INFO mapreduce.Job: Job job_1507777811520_5091 running in 
uber mode : false
17/10/12 07:16:13 INFO mapreduce.Job: map 0% reduce 0%
17/10/12 07:22:37 INFO mapreduce.Job: Task Id : 
attempt_1507777811520_5091_m_000000_0, Status : FAILED
Error: java.io.IOException: Stream closed
at 
org.apache.hadoop.crypto.CryptoOutputStream.checkStream(CryptoOutputStream.java:268)
at 
org.apache.hadoop.crypto.CryptoOutputStream.flush(CryptoOutputStream.java:255)
at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at 
org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
at org.apache.sqoop.io.LobFile$V0Writer.close(LobFile.java:1669)
at org.apache.sqoop.lib.LargeObjectLoader.close(LargeObjectLoader.java:96)
at org.apache.sqoop.mapreduce.AvroImportMapper.cleanup(AvroImportMapper.java:79)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{noformat}

The root cause of this issue seems to be in LobFile.close method, which is 
being invoked from the Map cleanup. In line 1669, from the stacktrace, it's 
trying to close countingOut OS. However, at line 1664, out OS is already being 
closed. However, out OS is just a wrapper of countingOut OS, so at the end, 
both are pointing to same instance of CryptoOutputStream. When the call reaches 
line 1669, CryptoOutputStream instance is already closed by line 1664. The 
problem happens because java.io.BufferedOutputStream will try to call flush on 
the underlying OS it's wrapping (in this case, CryptoOutputStream), reaching 
line 255 of CryptoOutputStream.


> Importing BLOB data causes "Stream closed" error on encrypted HDFS
> ------------------------------------------------------------------
>
>                 Key: SQOOP-3243
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3243
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Boglarka Egyed
>
> Importing BLOB data into encrypted zone causes "Stream closed" error with
> * BLOB data size bigger than 16MB -> LobFile will be used
> * Java 8 -> has a different implementation of the close() method of 
> FilterOutputStream class
> {noformat}
> 17/10/12 07:16:04 INFO mapreduce.Job: Running job: job_1507777811520_5091
> 17/10/12 07:16:13 INFO mapreduce.Job: Job job_1507777811520_5091 running in 
> uber mode : false
> 17/10/12 07:16:13 INFO mapreduce.Job: map 0% reduce 0%
> 17/10/12 07:22:37 INFO mapreduce.Job: Task Id : 
> attempt_1507777811520_5091_m_000000_0, Status : FAILED
> Error: java.io.IOException: Stream closed
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.checkStream(CryptoOutputStream.java:268)
> at 
> org.apache.hadoop.crypto.CryptoOutputStream.flush(CryptoOutputStream.java:255)
> at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at 
> org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
> at org.apache.sqoop.io.LobFile$V0Writer.close(LobFile.java:1669)
> at org.apache.sqoop.lib.LargeObjectLoader.close(LargeObjectLoader.java:96)
> at 
> org.apache.sqoop.mapreduce.AvroImportMapper.cleanup(AvroImportMapper.java:79)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148)
> at 
> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {noformat}
> The root cause of this issue is in LobFile.close method, which is being 
> invoked from the Map cleanup. In line 1669, from the stacktrace, it's trying 
> to close countingOut OS. However, at line 1664, out OS is already being 
> closed. However, out OS is just a wrapper of countingOut OS, so at the end, 
> both are pointing to same instance of CryptoOutputStream. When the call 
> reaches line 1669, CryptoOutputStream instance is already closed by line 
> 1664. The problem happens because java.io.BufferedOutputStream will try to 
> call flush on the underlying OS it's wrapping (in this case, 
> CryptoOutputStream), reaching line 255 of CryptoOutputStream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to