Re: Recover RFormula Column Names

2019-10-30 Thread Alessandro Solimando
Glad to hear that Andrew. While looking for the aforementioned SO's answer I have stumbled upon a similar one for pyspark, it works and being in Python you are also spared the "reflection" part. If you happen to try the RWrapperUtils it would be

Re: Recover RFormula Column Names

2019-10-29 Thread Andrew Redd
Thanks Alessandro! That did the trick. I all of the indices and interactions are in the metadata. I also wanted to confirm that this solution works in pyspark as the metadata is carried over. Andrew On Tue, Oct 29, 2019 at 5:26 AM Alessandro Solimando < alessandro.solima...@gmail.com> wrote: >

Re: Recover RFormula Column Names

2019-10-29 Thread Alessandro Solimando
Hello Andrew, few years ago I had the same need and I found this SO's answer the way to go. Here an extract of my (Scala) code (which was doing other things on top), I have removed the irrelevant parts but without testing it, so it might not work out

Fwd: Recover RFormula Column Names

2019-10-28 Thread Andrew Redd
Hi All! I'm performing an econometric analysis over several billion rows of data and would like to use the Pyspark SparkML implementation of linear regression. In the example below I'm trying to interact hour of day and month of year indicators. The StringIndexer documentation tells you what it's