MLeap and Spark ML SQLTransformer

2020-01-02 Thread femibyte
1


I have a question. I am trying to serialize a PySpark ML model to mleap.
However, the model makes use of the SQLTransformer to do some column-based
transformations e.g. adding log-scaled versions of some columns. As we all
know, Mleap doesn't support SQLTransformer - see here :
https://github.com/combust/mleap/issues/126 so I've implemented the former
of these 2 suggestions:

For non-row operations, move the SQL out of the ML Pipeline that you plan to
serialize For row-based operations, use the available ML transformers or
write a custom transformer <- this is where the custom transformer
documentation will help. I've externalized the SQL transformation on the
training data used to build the model, and I do the same for the input data
when I run the model for evaluation.

The problem I'm having is that I'm unable to obtain the same results across
the 2 models.

*Model 1 *- Pure Spark ML model containing

/
SQLTransformer + later transformations : StringIndexer -> 
 OneHotEncoderEstimator -> VectorAssembler -> RandomForestClassifier
/

*Model 2* - Externalized version with SQL queries run on training data in
building the model. 
The transformations are everything after SQLTransformer in Model 1:

   /StringIndexer -> OneHotEncoderEstimator -> VectorAssembler ->
RandomForestClassifier
/

I'm wondering how I could go about debugging this problem. Is there a way to
somehow compare the results after each stage to see where the differences
show up ? Any suggestions are appreciated.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



unsubscribe

2020-01-02 Thread Amit Jain
unsubscribe


Re: 'ExecutorTaskSummary' alternative in Spark 2.3 onwards

2020-01-02 Thread Ninja Coder
Any advise/help here is much appreciated.

On Mon, Dec 30, 2019 at 4:16 PM Ninja Coder  wrote:

> I have a spark streaming application (currently spark 2.2) which is using
> `org.apache.spark.ui.exec.ExecutorTaskSummary` to grab executor details
> like
> duration, tasks failed, tasks completed, GC time, etc after each batch is
> completed. These metrics are then loaded to prometheus and pagerduty alerts
> are set on it.
>
> We are planning to upgrade spark and looks like`ExecutorTaskSummary` is no
> longer available from spark 2.3
>
> I would like to know what are the other alternatives which I can use.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Java 11 support in Spark 2.5

2020-01-02 Thread Jatin Puri
>From this 
>(http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27966),
> looks like there is no confirmation yet if at all Spark 2.5 would have JDK 11 
>support.

Spark 3 would most likely be out soon (tentatively this quarter as per mailing 
list). Spark 3 is going to have JDK 11 support.

From: Sinha, Breeta (Nokia - IN/Bangalore) 
Sent: Thursday, January 2, 2020 12:48 PM
To: user@spark.apache.org 
Cc: Rao, Abhishek (Nokia - IN/Bangalore) ; Imandi, 
Srinivas (Nokia - IN/Bangalore) 
Subject: Java 11 support in Spark 2.5


Hi All,



Wanted to know if Java 11 support is added in Spark 2.5.

If so, what is the expected timeline for Spark 2.5 release?



Kind Regards,

Breeta Sinha