t; know the following:
>>>> 1. Number of partitions
>>>> 2. Number of files
>>>> 3. Time taken to create the RDD's
>>>>
>>>>
>>>> Regards,
>>>> Gourav Sengupta
>>>>
>>>>
>&g
;>> Gourav Sengupta
>>>
>>>
>>> On Tue, Jan 26, 2016 at 1:12 PM, Gourav Sengupta <
>>> gourav.sengu...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> are
.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
&g
>>
>>> Hi,
>>>
>>> are you creating RDD's out of the data?
>>>
>>>
>>>
>>> Regards,
>>> Gourav
>>>
>>> On Tue, Jan 26, 2016 at 12:45 PM, aecc <alessandroa...@gmail.com> wrote:
>>>
>&g
in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
8.
> The number of partitions used when reading data is 7315.
> The maximum size of a file to read is 14G
> The size of the folder is around: 270G
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-acces
Sorry, I have not been able to solve the issue. I used speculation mode as
workaround to this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html
Sent from the Apache Spark User List
Any hints?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25365.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Reading files directly from Amazon S3 can be frustrating especially if
you're dealing with a large number of input files, could you please
elaborate more on your use-case? Does the S3 bucket in question already
contain a large number of files?
The implementation of the * wildcard operator in S3
/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25367.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
Any help on this? this is really blocking me and I don't find any feasible
solution yet.
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25327.html
Sent from the Apache Spark User List
:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe,
Are you sitting behind a proxy or something? Can you look more into the
executor logs? I have a strange feeling that you are blowing the memory
(and possibly hitting GC etc).
Thanks
Best Regards
On Thu, Sep 10, 2015 at 10:05 PM, Mario Pastorelli <
mario.pastore...@teralytics.ch> wrote:
> Dear
Dear community,
I am facing a problem accessing data on S3 via Spark. My current
configuration is the following:
- Spark 1.4.1
- Hadoop 2.7.1
- hadoop-aws-2.7.1
- mesos 0.22.1
I am accessing the data using the s3a protocol but it just hangs. The job
runs through the whole data set but
14 matches
Mail list logo