Re: Lag and queued up batches info in Structured Streaming UI

2018-06-27 Thread swetha kasireddy
Thanks TD, but the sql plan does not seem to provide any information on which stage is taking longer time or to identify any bottlenecks about various stages. Spark kafka Direct used to provide information about various stages in a micro batch and the time taken by each stage. Is there a way to

Re: Not able to overwrite cassandra table using Spark

2018-06-27 Thread Siva Samraj
You can try with this, it will work val finaldf = merchantdf.write. format("org.apache.spark.sql.cassandra") .mode(SaveMode.Overwrite) .option("confirm.truncate", true) .options(Map("table" -> "tablename", "keyspace" -> "keyspace")) .save() On Wed 27 Jun,

Not able to overwrite cassandra table using Spark

2018-06-27 Thread Abhijeet Kumar
Hello Team, I’m creating a dataframe out of cassandra table using datastax spark connector. After making some modification into the dataframe, I’m trying to put that dataframe back to the Cassandra table by overwriting the old content. For that the piece of code is:

Re: Live Streamed Code Review today at 11am Pacific

2018-06-27 Thread Holden Karau
Today @ 1:30pm pacific I'll be looking at the current Spark 2.1.3 RC and see how we validate Spark releases - https://www.twitch.tv/events/VAg-5PKURQeH15UAawhBtw / https://www.youtube.com/watch?v=1_XLrlKS26o . Tomorrow @ 12:30 live PR reviews & Monday live coding -

Semi-Supervised self-training (e.g. partial fitting)

2018-06-27 Thread Mina Aslani
Hi, Is partial fitting/self-training available for a classifier (e.g. Regression) in Apache Spark? Best regards, Mina

Re: submitting dependencies

2018-06-27 Thread Jean Georges Perrin
Have you tried to build a uber jar to bundle all your classes together? > On Jun 27, 2018, at 01:27, amin mohebbi > wrote: > > Could you please help me to understand how I should submit my spark > application ? > > I have used this connector

[PYSPARK Word2Vec] Error when loading Word2Vec before calling SparkSession

2018-06-27 Thread tgiordan
Hi, I apparently found a bug with pyspark v2.3.1 when loading a Word2VecModel before using SparkSession (for example : spark.read.parquet("df_tmp")). This bug append on notebook and pyspark-shell (not with spark-submit when you need to declare 'spark =

Re: [ClusterMode] -Dspark.master with missing secondary master IP

2018-06-27 Thread bsikander
We switched the port from 7077 to 6066 because we were losing 20 seconds each time we launched a driver. 10 seconds for failing to submit the driver on :7077. After losing 20 seconds, it used to fallback to some old way of driver submitions. With 6066 we don't lose any time. -- Sent from:

[ClusterMode] -Dspark.master with missing secondary master IP

2018-06-27 Thread bsikander
We recently transitioned from client mode to cluster mode with Spark Standalone deployment. We are using 2.2.1. We are also using SparkLauncher to launch the driver. The problem is that when my Driver is launched the spark.master property (-Dspark.master) is set to only primary master IP.

[ANNOUNCE] Apache Bahir 2.2.1 Released

2018-06-27 Thread Luciano Resende
Apache Bahir provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. The Apache Bahir community is pleased to announce the release of Apache Bahir 2.2.1 which provides the following extensions for Apache