Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2017-01-05 Thread Palash Gupta
Hi Macro, Yes it was in the same host when problem was found. Even when I tried to start with different host, the problem is still there. Any hints or suggestion will be appreciated.  Thanks & Best Regards, Palash Gupta From: Marco Mistroni To: Palash Gupta

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2017-01-05 Thread Marco Mistroni
Hi If it only happens when u run 2 app at same time could it be that these 2 apps somehow run on same host? Kr On 5 Jan 2017 9:00 am, "Palash Gupta" wrote: > Hi Marco and respected member, > > I have done all the possible things suggested by Forum but still I'm >

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2017-01-05 Thread Palash Gupta
Hi Marco and respected member, I have done all the possible things suggested by Forum but still I'm having same issue: 1. I will migrate my applications to production environment where I will have more resourcesPalash>> I migrated my application in production where I have more CPU Cores,

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-31 Thread Palash Gupta
Hi Marco, Thanks! Please have my response: so you have a pyspark application running on spark 2.0Palash>> Yes You have python scripts dropping files on HDFSPalash>> Yes (it is not part of spark process, just independent python script) then you have two spark jobPalash>> Yes - 1 load expected

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-30 Thread Marco Mistroni
Hi Palash so you have a pyspark application running on spark 2.0 You have python scripts dropping files on HDFS then you have two spark job - 1 load expected hour data (pls explain. HOw many files on average) - 1 load delayed data(pls explain. how many files on average) Do these scripts run

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-30 Thread Palash Gupta
Hi Marco & Ayan, I have now clearer idea about what Marco means by Reduce. I will do it to dig down. Let me answer to your queries: hen you see the broadcast errors, does your job terminate? Palash>> Yes it terminated the app. Or are you assuming that something is wrong just because you see the

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-30 Thread Marco Mistroni
Correct. I mean reduce the functionality. Uhm I realised I didn't ask u a fundamental question. When you see the broadcast errors, does your job terminate? Or are you assuming that something is wrong just because you see the message in the logs? Plus...Wrt logicWho writes the CSV? With what

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-30 Thread ayan guha
@Palash: I think what Macro meant by "reduce functionality" is to reduce scope of your application's functionality so that you can isolate the issue in certain part(s) of the app...I do not think he meant "reduce" operation :) On Fri, Dec 30, 2016 at 9:26 PM, Palash Gupta <

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-30 Thread Palash Gupta
Hi Marco, All of your suggestions are highly appreciated, whatever you said so far. I would apply to implement in my code and let you know. Let me answer your query: What does your program do? Palash>> In each hour I am loading many CSV files and then I'm making some KPI(s) out of them.

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-30 Thread Palash Gupta
Hi Nicholas, Appreciated your response. Understand your articulated point & I will implement and let you know the status of the problem. Sample: // these lines are equivalent in Spark 2.0 spark.read.format("csv").option("header", "true").load("../Downloads/*.csv") spark.read.option("header",

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-29 Thread Marco Mistroni
Hello no sorry i dont have any further insight into that i have seen similar errors but for completely different issues, and in most of hte cases it had to do with my data or my processing rather than Spark itself. What does your program do? you say it runs for 2-3 hours, what is the logic?

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-29 Thread Nicholas Hakobian
If you are using spark 2.0 (as listed in the stackoverflow post) why are you using the external CSV module from Databricks? Spark 2.0 includes the functionality from this external module natively, and its possible you are mixing an older library with a newer spark which could explain a crash.

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-29 Thread Palash Gupta
Hi Marco, Thanks for your response. Yes I tested it before & am able to load from linux filesystem and it also sometimes have similar issue. However in both cases (either from hadoop or linux file system), this error comes in some specific scenario as per my observations: 1. When two parallel

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-29 Thread Marco Mistroni
Hi Pls try to read a CSV from filesystem instead of hadoop. If you can read it successfully then your hadoop file is the issue and you can start debugging from there. Hth On 29 Dec 2016 6:26 am, "Palash Gupta" wrote: > Hi Apache Spark User team, > > > >

[TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-28 Thread Palash Gupta
Hi Apache Spark User team, Greetings! I started developing an application using Apache Hadoop and Spark using python. My pyspark application randomly terminated saying "Failed to get broadcast_1*" and I have been searching for suggestion and support in Stakeoverflow at Failed to get