Re: spark distribution build fails

2022-03-17 Thread Martin Grigorov
Hi, For the mail archives: this error happens when the user has MAVEN_OPTS env var pre-exported. In this case ./build/mvn|sbt does not export its own MAVEN_OPTS with the -XssXYZ value, and the default one is too low and leads to the StackOverflowError On Mon, Mar 14, 2022 at 11:13 PM

Re: Continuous ML model training in stream mode

2022-03-17 Thread Sean Owen
(Thank you, not sure that was me though) I don't know of plans to expose the streaming impls in ML, as they still work fine in MLlib and they also don't come up much. Continuous training is relatively rare, maybe under-appreciated, but rare in practice. On Thu, Mar 17, 2022 at 1:57 PM Gourav

Re: Continuous ML model training in stream mode

2022-03-17 Thread Gourav Sengupta
Dear friends, a few years ago, I was in a London meetup seeing Sean (Owen) demonstrate how we can try to predict the gender of individuals who are responding to tweets after accepting privacy agreements, in case I am not wrong. It was real time, it was spectacular, and it was the presentation

Re: [Pyspark] [Linear Regression] Can't Fit Data

2022-03-17 Thread Sean Owen
The error points you to the answer. Somewhere in your code you are parsing dates, and the date format is no longer valid / supported. These changes are doc'ed in the docs it points you to. It is not related to the regression itself. On Thu, Mar 17, 2022 at 11:35 AM Bassett, Kenneth wrote: >

[Pyspark] [Linear Regression] Can't Fit Data

2022-03-17 Thread Bassett, Kenneth
Hello, I am having an issue with Linear Regression when trying to fit training data to the model. The code below used to work, but it stopped recently. Spark is version 3.2.1. # Split Data into train and test data train, test = data.randomSplit([0.9, 0.1]) y = 'Build_Rate' # Perform