Hi Macro,
Yes it was in the same host when problem was found.
Even when I tried to start with different host, the problem is still there.
Any hints or suggestion will be appreciated.
Thanks & Best Regards,
Palash Gupta
From: Marco Mistroni
To: Palash Gupta
Hi
If it only happens when u run 2 app at same time could it be that these 2
apps somehow run on same host?
Kr
On 5 Jan 2017 9:00 am, "Palash Gupta" wrote:
> Hi Marco and respected member,
>
> I have done all the possible things suggested by Forum but still I'm
>
Hi Marco and respected member,
I have done all the possible things suggested by Forum but still I'm having
same issue:
1. I will migrate my applications to production environment where I will have
more resourcesPalash>> I migrated my application in production where I have
more CPU Cores,
Hi Marco,
Thanks!
Please have my response:
so you have a pyspark application running on spark 2.0Palash>> Yes
You have python scripts dropping files on HDFSPalash>> Yes (it is not part of
spark process, just independent python script)
then you have two spark jobPalash>> Yes
- 1 load expected
Hi Palash
so you have a pyspark application running on spark 2.0
You have python scripts dropping files on HDFS
then you have two spark job
- 1 load expected hour data (pls explain. HOw many files on average)
- 1 load delayed data(pls explain. how many files on average)
Do these scripts run
Hi Marco & Ayan,
I have now clearer idea about what Marco means by Reduce. I will do it to dig
down.
Let me answer to your queries:
hen you see the broadcast errors, does your job terminate? Palash>> Yes it
terminated the app.
Or are you assuming that something is wrong just because you see the
Correct. I mean reduce the functionality.
Uhm I realised I didn't ask u a fundamental question. When you see the
broadcast errors, does your job terminate? Or are you assuming that
something is wrong just because you see the message in the logs?
Plus...Wrt logicWho writes the CSV? With what
@Palash: I think what Macro meant by "reduce functionality" is to reduce
scope of your application's functionality so that you can isolate the issue
in certain part(s) of the app...I do not think he meant "reduce" operation
:)
On Fri, Dec 30, 2016 at 9:26 PM, Palash Gupta <
Hi Marco,
All of your suggestions are highly appreciated, whatever you said so far. I
would apply to implement in my code and let you know.
Let me answer your query:
What does your program do?
Palash>> In each hour I am loading many CSV files and then I'm making some
KPI(s) out of them.
Hi Nicholas,
Appreciated your response.
Understand your articulated point & I will implement and let you know the
status of the problem.
Sample:
// these lines are equivalent in Spark 2.0
spark.read.format("csv").option("header", "true").load("../Downloads/*.csv")
spark.read.option("header",
Hello
no sorry i dont have any further insight into that i have seen similar
errors but for completely different issues, and in most of hte cases it had
to do with my data or my processing rather than Spark itself.
What does your program do? you say it runs for 2-3 hours, what is the
logic?
If you are using spark 2.0 (as listed in the stackoverflow post) why are
you using the external CSV module from Databricks? Spark 2.0 includes the
functionality from this external module natively, and its possible you are
mixing an older library with a newer spark which could explain a crash.
Hi Marco,
Thanks for your response.
Yes I tested it before & am able to load from linux filesystem and it also
sometimes have similar issue.
However in both cases (either from hadoop or linux file system), this error
comes in some specific scenario as per my observations:
1. When two parallel
Hi
Pls try to read a CSV from filesystem instead of hadoop. If you can read
it successfully then your hadoop file is the issue and you can start
debugging from there.
Hth
On 29 Dec 2016 6:26 am, "Palash Gupta"
wrote:
> Hi Apache Spark User team,
>
>
>
>
Hi Apache Spark User team,
Greetings!
I started developing an application using Apache Hadoop and Spark using python.
My pyspark application randomly terminated saying "Failed to get broadcast_1*"
and I have been searching for suggestion and support in Stakeoverflow at Failed
to get
15 matches
Mail list logo