[jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264095#comment-16264095
 ] 

Steve Loughran edited comment on SPARK-22526 at 11/23/17 3:47 PM:
--

# Fix the code you invoke
# wrap the code you  invoke with something like (and this is coded in the JIRA, 
untested & should really close the stream in something to swallow IOEs.

{code}
binaryRdd.map { t =>
  try {
process(t._2)
  } finally {
t._2.close()
  }
}
{code}



was (Author: ste...@apache.org):
# Fix the code you invoke
#. wrap the code you  invoke with something like (and this is coded in the 
JIRA, untested & should really close the stream in something to swallow IOEs.

{code}
binaryRdd.map { t =>
  try {
process(t._2)
  } finally {
t._2.close()
  }
}
{code}


> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257224#comment-16257224
 ] 

mohamed imran edited comment on SPARK-22526 at 11/17/17 5:04 PM:
-

[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>


filepath = x.getAs("filepath");

ziprdd = sc.binaryfiles(s"$filepath") ;// filename will be test.zip(example)

ziprdd.count;



}

i dont process Avro files. I am processing binary files which is compressed 
normal CSV files from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need 
anything else.


was (Author: imranece59):
[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>


filepath = x.getAs("filepath")

ziprdd = sc.binaryfiles(s"$filepath") // filename will be test.zip(example)

ziprdd.count



}

i dont process Avro files. I am processing binary files which is compressed 
normal CSV files from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need 
anything else.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257224#comment-16257224
 ] 

mohamed imran edited comment on SPARK-22526 at 11/17/17 5:03 PM:
-

[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>


filepath = x.getAs("filepath")

ziprdd = sc.binaryfiles(s"$filepath") // filename will be test.zip(example)

ziprdd.count



}

i dont process Avro files. I am processing binary files which is compressed 
normal CSV files from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need 
anything else.


was (Author: imranece59):
[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>

filepath = x.getAs("filepath")
ziprdd = sc.binaryfiles(s"$filepath") // filename will be test.zip(example)
ziprdd.count

}

i dont process Avro files. I am processing binary files which is compressed 
normal CSV files from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need 
anything else.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253820#comment-16253820
 ] 

mohamed imran edited comment on SPARK-22526 at 11/15/17 5:22 PM:
-

Nope. Actually i am reading some zip files using the sc.binaryfiles from spark. 
But after some 100th file read it gets hangs indefinitely. I was monitoring the 
opened connection of hadoop S3 FsInput. None of the connections were closed 
after every read of each binary files. Yes I am sure it is a not a network or 
S3 issue.  Spark version - 2.2.0/ Hadoop version -2.7.3


was (Author: imranece59):
Nope. Actually i am reading some zip files using the sc.binaryfiles from spark. 
But after some 100th file read it gets hangs indefinitely. I was monitoring the 
opened connection of every spark binary file read. None of the connections were 
closed after every read of each binary files. Yes I am sure it is a not a 
network or S3 issue.  Spark version - 2.2.0/ Hadoop version -2.7.3

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253820#comment-16253820
 ] 

mohamed imran edited comment on SPARK-22526 at 11/15/17 5:17 PM:
-

Nope. Actually i am reading some zip files using the sc.binaryfiles from spark. 
But after some 100th file read it gets hangs indefinitely. I was monitoring the 
opened connection of every spark binary file read. None of the connections were 
closed after every read of each binary files. Yes I am sure it is a not a 
network or S3 issue.  Spark version - 2.2.0/ Hadoop version -2.7.3


was (Author: imranece59):
Nope. Actually i am reading some zip files using the sc.binaryfiles from spark. 
But after some 100th file read it gets hangs indefinitely. I was monitoring the 
opened connection of every spark binary file read. None of the connections were 
closed after every read of each binary files. Yes I am sure it is a not a 
network or S3 issue. 

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org