Yes, I check Spark UI to follow what’s going on. It seems to start several
tasks fine (8 tasks in my case) out of ~70k tasks, and then stalls.
I actually was able to get things to work by disabling dynamic allocation.
Basically I set the number of executors manually, which disables dynamic
Hi,
You can control an initial num. of partitions (tasks) in v2.0.
https://www.mail-archive.com/user@spark.apache.org/msg51603.html
// maropu
On Tue, Jun 14, 2016 at 7:24 AM, Mich Talebzadeh
wrote:
> Have you looked at spark GUI to see what it is waiting for. is
Have you looked at spark GUI to see what it is waiting for. is that
available memory. What is the resource manager you are using?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Hi Michael,
Thanks for the suggestion to use Spark 2.0 preview. I just downloaded the
preview and tried using it, but I’m running into the exact same issue.
Khaled
> On Jun 13, 2016, at 2:58 PM, Michael Armbrust wrote:
>
> You might try with the Spark 2.0 preview. We
You might try with the Spark 2.0 preview. We spent a bunch of time
improving the handling of many small files.
On Mon, Jun 13, 2016 at 11:19 AM, khaled.hammouda
wrote:
> I'm trying to use Spark SQL to load json data that are split across about
> 70k
> files across 24
I'm trying to use Spark SQL to load json data that are split across about 70k
files across 24 directories in hdfs, using
sqlContext.read.json("hdfs:///user/hadoop/data/*/*").
This doesn't seem to work for some reason, I get timeout errors like the
following:
---
6/06/13 15:46:31 ERROR