This is using Spark Scala 2.4.4. I'm getting some very strange behaviour
after reading in a dataframe from a json file, using sparkSession.read in
permissive mode. I've included the error column when reading in the data, as
I want to log details of any errors in the input json file.
My suspicion
Hi
we have a scenario where we have a large table ie 5-6B records. The table
is repository of data from past N years. It is possible that some updates
take place on the data and thus er are using Delta table.
As part of the business process we know updates can happen only within M
years of past
have you deactivated the spark.ui ?
I have read several thread explaining the ui can lead to OOM because it
stores 1000 dags by default
On Sun, Oct 20, 2019 at 03:18:20AM -0700, Paul Wais wrote:
> Dear List,
>
> I've observed some sort of memory leak when using pyspark to run ~100
> jobs in
Well, dumb question:
Given the workflow outlined above, should Local Mode keep running? Or
is the leak a known issue? I just wanted to check because I can't
recall seeing this issue with a non-local master, though it's possible
there were task failures that hid the issue.
If this issue looks
Hello!
I have an use case where I have to apply multiple already trained models
(e.g. M1, M2, ..Mn) on the same spark stream ( fetched from kafka).
The models were trained usining the isolation forest algorithm from here:
https://github.com/titicaca/spark-iforest
I have found something similar
Hi Debu,
you need to define spark config properties before the jar file path at
spark-submit. Everything after the jar path will be passed as arguments to
your application.
Best Regards
Debabrata Ghosh schrieb am Do. 31. Okt. 2019 um
03:26:
> Greetings All !
>
> I needed some help in