Re: S3 files , Spark job hungsup

Jon Chase Tue, 23 Dec 2014 14:10:30 -0800

http://www.jets3t.org/toolkit/configuration.html

Put the following properties in a file named jets3t.properties and make
sure it is available during the running of your Spark job (just place it in
~/ and pass a reference to it when calling spark-submit with --file
~/jets3t.properties)

httpclient.connection-timeout-ms How many milliseconds to wait before a
connection times out. 0 means infinity.
Default: 60000

httpclient.socket-timeout-ms
How many milliseconds to wait before a socket connection times out. 0 means
infinity.
Default: 60000

You will also probably want to increase this value substantially:

httpclient.max-connections
The maximum number of simultaneous connections to allow globally
Default: 20
Note: If you have a fast Internet connection, you can improve the
performance of your S3 client by increasing this setting and the
corresponding S3 Service properties s3service.max-thread-count and
s3service.admin-max-thread-count. However, be careful because if you
increase this value too much for your connection you may exceed your
available bandwidth and cause communications errors.

On Mon, Dec 22, 2014 at 1:20 PM, durga katakam <durgak...@gmail.com> wrote:

> Yes . I am reading thousands of files every hours. Is there any way I can
> tell spark to timeout.
> Thanks for your help.
>
> -D
>
> On Mon, Dec 22, 2014 at 4:57 AM, Shuai Zheng <szheng.c...@gmail.com>
> wrote:
>
>> Is it possible too many connections open to read from s3 from one node? I
>> have this issue before because I open a few hundreds of files on s3 to read
>> from one node. It just block itself without error until timeout later.
>>
>> On Monday, December 22, 2014, durga <durgak...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am facing a strange issue sporadically. occasionally my spark job is
>>> hungup on reading s3 files. It is not throwing exception . or making some
>>> progress, it is just hungs up there.
>>>
>>> Is this a known issue , Please let me know how could I solve this issue.
>>>
>>> Thanks,
>>> -D
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/S3-files-Spark-job-hungsup-tp20806.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>

Re: S3 files , Spark job hungsup

Reply via email to