MLeap and Spark ML SQLTransformer
1 I have a question. I am trying to serialize a PySpark ML model to mleap. However, the model makes use of the SQLTransformer to do some column-based transformations e.g. adding log-scaled versions of some columns. As we all know, Mleap doesn't support SQLTransformer - see here : https://github.com/combust/mleap/issues/126 so I've implemented the former of these 2 suggestions: For non-row operations, move the SQL out of the ML Pipeline that you plan to serialize For row-based operations, use the available ML transformers or write a custom transformer <- this is where the custom transformer documentation will help. I've externalized the SQL transformation on the training data used to build the model, and I do the same for the input data when I run the model for evaluation. The problem I'm having is that I'm unable to obtain the same results across the 2 models. *Model 1 *- Pure Spark ML model containing / SQLTransformer + later transformations : StringIndexer -> OneHotEncoderEstimator -> VectorAssembler -> RandomForestClassifier / *Model 2* - Externalized version with SQL queries run on training data in building the model. The transformations are everything after SQLTransformer in Model 1: /StringIndexer -> OneHotEncoderEstimator -> VectorAssembler -> RandomForestClassifier / I'm wondering how I could go about debugging this problem. Is there a way to somehow compare the results after each stage to see where the differences show up ? Any suggestions are appreciated. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
unsubscribe
unsubscribe
Re: 'ExecutorTaskSummary' alternative in Spark 2.3 onwards
Any advise/help here is much appreciated. On Mon, Dec 30, 2019 at 4:16 PM Ninja Coder wrote: > I have a spark streaming application (currently spark 2.2) which is using > `org.apache.spark.ui.exec.ExecutorTaskSummary` to grab executor details > like > duration, tasks failed, tasks completed, GC time, etc after each batch is > completed. These metrics are then loaded to prometheus and pagerduty alerts > are set on it. > > We are planning to upgrade spark and looks like`ExecutorTaskSummary` is no > longer available from spark 2.3 > > I would like to know what are the other alternatives which I can use. > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Java 11 support in Spark 2.5
>From this >(http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27966), > looks like there is no confirmation yet if at all Spark 2.5 would have JDK 11 >support. Spark 3 would most likely be out soon (tentatively this quarter as per mailing list). Spark 3 is going to have JDK 11 support. From: Sinha, Breeta (Nokia - IN/Bangalore) Sent: Thursday, January 2, 2020 12:48 PM To: user@spark.apache.org Cc: Rao, Abhishek (Nokia - IN/Bangalore) ; Imandi, Srinivas (Nokia - IN/Bangalore) Subject: Java 11 support in Spark 2.5 Hi All, Wanted to know if Java 11 support is added in Spark 2.5. If so, what is the expected timeline for Spark 2.5 release? Kind Regards, Breeta Sinha