[Spark SQL]: Dataframe group by potential bug (Scala)

2019-10-31 Thread ludwiggj
This is using Spark Scala 2.4.4. I'm getting some very strange behaviour after reading in a dataframe from a json file, using sparkSession.read in permissive mode. I've included the error column when reading in the data, as I want to log details of any errors in the input json file. My suspicion

Fwd: Delta with intelligent upsett

2019-10-31 Thread ayan guha
Hi we have a scenario where we have a large table ie 5-6B records. The table is repository of data from past N years. It is possible that some updates take place on the data and thus er are using Delta table. As part of the business process we know updates can happen only within M years of past

Re: pyspark - memory leak leading to OOM after submitting 100 jobs?

2019-10-31 Thread Nicolas Paris
have you deactivated the spark.ui ? I have read several thread explaining the ui can lead to OOM because it stores 1000 dags by default On Sun, Oct 20, 2019 at 03:18:20AM -0700, Paul Wais wrote: > Dear List, > > I've observed some sort of memory leak when using pyspark to run ~100 > jobs in

Re: pyspark - memory leak leading to OOM after submitting 100 jobs?

2019-10-31 Thread Paul Wais
Well, dumb question: Given the workflow outlined above, should Local Mode keep running? Or is the leak a known issue? I just wanted to check because I can't recall seeing this issue with a non-local master, though it's possible there were task failures that hid the issue. If this issue looks

[Spark Streaming] Apply multiple ML pipelines(Models) to the same stream

2019-10-31 Thread Spico Florin
Hello! I have an use case where I have to apply multiple already trained models (e.g. M1, M2, ..Mn) on the same spark stream ( fetched from kafka). The models were trained usining the isolation forest algorithm from here: https://github.com/titicaca/spark-iforest I have found something similar

Re: Need help regarding logging / log4j.properties

2019-10-31 Thread Roland Johann
Hi Debu, you need to define spark config properties before the jar file path at spark-submit. Everything after the jar path will be passed as arguments to your application. Best Regards Debabrata Ghosh schrieb am Do. 31. Okt. 2019 um 03:26: > Greetings All ! > > I needed some help in