Re: I have trained a ML model, now what?

2019-01-23 Thread Pola Yao
Hi Riccardo, Right now, Spark does not support low-latency predictions in Production. MLeap is an alternative and it's been used in many scenarios. But it's good to see that Spark Community has decided to provide such support. On Wed, Jan 23, 2019 at 7:53 AM Riccardo Ferrari wrote: > Felix,

Re: How to force-quit a Spark application?

2019-01-22 Thread Pola Yao
anager" > thread, but I don't see that one in your list. > > On Wed, Jan 16, 2019 at 12:08 PM Pola Yao wrote: > > > > Hi Marcelo, > > > > Thanks for your response. > > > > I have dumped the threads on the server where I submitted the spark > applica

Re: How to force-quit a Spark application?

2019-01-16 Thread Pola Yao
AM Marcelo Vanzin wrote: > If System.exit() doesn't work, you may have a bigger problem > somewhere. Check your threads (using e.g. jstack) to see what's going > on. > > On Wed, Jan 16, 2019 at 8:09 AM Pola Yao wrote: > > > > Hi Marcelo, > > > > Thanks for

Re: How to force-quit a Spark application?

2019-01-16 Thread Pola Yao
if > something is creating a non-daemon thread that stays alive somewhere, > you'll see that. > > Or you can force quit with sys.exit. > > On Tue, Jan 15, 2019 at 1:30 PM Pola Yao wrote: > > > > I submitted a Spark job through ./spark-submit command, the code was > exe

How to force-quit a Spark application?

2019-01-15 Thread Pola Yao
I submitted a Spark job through ./spark-submit command, the code was executed successfully, however, the application got stuck when trying to quit spark. My code snippet: ''' { val spark = SparkSession.builder.master(...).getOrCreate val pool = Executors.newFixedThreadPool(3) implicit val xc =

[Spark-ml]Error in training ML models: Missing an output location for shuffle xxx

2019-01-07 Thread Pola Yao
Hi Spark Comminuty, I was using XGBoost-spark to train a machine learning model. The dataset was not large (around 1G). And I used the following command to submit my application: ''' ./bin/spark-submit --master yarn --deploy-mode client --num-executors 50 --executor-cores 2 --executor-memory 3g

[spark-ml] How to write a Spark Application correctly?

2019-01-02 Thread Pola Yao
Hello Spark Community, I have a dataset of size 20G, 20 columns. Each column is categorical, so I applied string-indexer and one-hot-encoding on every column. After, I applied vector-assembler on all the newly derived columns to form a feature vector for each record, and then feed the feature

Fwd: Train multiple machine learning models in parallel

2018-12-19 Thread Pola Yao
Hi Comminuty, I have a 1T dataset which contains records for 50 users. Each user has 20G data averagely. I wanted to use spark to train a machine learning model (e.g., XGBoost tree model) for each user. Ideally, the result should be 50 models. However, it'd be infeasible to submit 50 spark jobs