UI for spark machine learning.

2017-07-09 Thread Mahesh Sawaiker
Hi, 1) Is anyone aware of any workbench kind of tool to run ML jobs in spark. Specifically is the tool could be something like a Web application that is configured to connect to a spark cluster. User is able to select input training sets probably from hdfs , train and then run predictions,

spark-submit via cluster mode - setting dependencies classpath!

2017-07-09 Thread Kanagha
Hi, I'm trying to run a phoenix spark job via spark cluster mode to a remote yarn cluster. When I do a spark-submit, all jars under SPARK_HOME gets uploaded. I also need to point the remote hbase jar folder location and other dependencies for running the job. Going through the docs, I see

Re: How do I find the time taken by each step in a stage in a Spark Job

2017-07-09 Thread swetha kasireddy
Yes, the Spark UI has some information but, it's not that helpful to find out which particular stage is taking time. On Wed, Jun 28, 2017 at 12:51 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > You can find the information from the spark UI > > ---Original--- > *From:* "SRK" >

Spark streaming giving me a bunch of WARNINGS, please help me understand them

2017-07-09 Thread shyla deshpande
WARN Use an existing SparkContext, some configuration may not take effect. I wanted to restart the spark streaming app, so stopped the running and issued a new spark submit. Why and how it will use a existing SparkContext? WARN Spark is not running in local mode, therefore the

Re: Spark streaming, Storage tab questions

2017-07-09 Thread anna stax
On Sun, Jul 9, 2017 at 4:33 PM, anna stax wrote: > Does each row represent the state of my app at different time? > > When the fraction cached is 90% and the size on Disk is 0, does that mean > 10% of the data is lost. Its neither in memory now disk? > > I am running spark

PySpark saving custom pipelines

2017-07-09 Thread Riccardo Ferrari
Hi list, I have developed some custom Transformer/Estimators in python around some libraries (scipy), now I would love to persist them for reuse in a streaming app. I am currently aware of this: https://issues.apache.org/jira/browse/SPARK-17025 However I would like to hear from experienced