Fwd: Recover RFormula Column Names

2019-10-28 Thread Andrew Redd
Hi All! I'm performing an econometric analysis over several billion rows of data and would like to use the Pyspark SparkML implementation of linear regression. In the example below I'm trying to interact hour of day and month of year indicators. The StringIndexer documentation tells you what it's

[YarnShuffleService] Consistent OOMs when enabling Spark transport encryption

2019-10-28 Thread Anton Ippolitov
Hi, I have been experimenting with Spark 2.4.4 transport encryption and have encountered an issue with a couple of our jobs: they consistently make the YarnShuffleService die with OOM errors. It looks like the memory is full of /io.netty.channel.ChannelOutboundBuffer$Entry/ objects each