[Spark structured streaming] Use of (flat)mapgroupswithstate takes long time

2018-01-18 Thread chris-sw
Hi, I recently did some experiments with stateful structured streaming by using flatmapgroupswithstate. The streaming application is quit simple: It receives data from Kafka, feed it to the stateful operator (flatmapgroupswithstate) and sinks the output to console. During a test with small

Re: Unsubscribe

2018-01-18 Thread Yash Sharma
Please send mail to user-unsubscr...@spark.apache.org to unsubscribe. Cheers On Fri., 19 Jan. 2018, 5:11 pm Anu B Nair, wrote: >

Re: Unsubscribe

2018-01-18 Thread Yash Sharma
Please send mail to user-unsubscr...@spark.apache.org to unsubscribe. Cheers On Fri., 19 Jan. 2018, 5:28 pm Sbf xyz, wrote: >

Unsubscribe

2018-01-18 Thread Sbf xyz

Unsubscribe

2018-01-18 Thread Anu B Nair

[Structured Streaming]: Structured Streaming into Redshift sink

2018-01-18 Thread Somasundaram Sekar
Is it possible to write the Dataframe backed by Kafka Streaming source into AWS Redshift, we have in the past used https://github.com/databricks/spark-redshift to write into redshift, but I presume it will not work with *writeStream*. Also writing with JDBC connector with ForeachWriter is also may

Re: "Got wrong record after seeking to offset" issue

2018-01-18 Thread Justin Miller
Yeah I saw that after I sent that e-mail out. Iactually remembered another ticket that I had commented on that turned out to be unrelated to the issue I was seeing at the time. It may be related to the current issue: https://issues.apache.org/jira/browse/SPARK-17147

Re: "Got wrong record after seeking to offset" issue

2018-01-18 Thread Cody Koeninger
https://kafka.apache.org/documentation/#compaction On Thu, Jan 18, 2018 at 1:17 AM, Justin Miller wrote: > By compacted do you mean compression? If so then we did recently turn on lz4 > compression. If there’s another meaning if there’s a command I can run to >

Re: Reading Hive RCFiles?

2018-01-18 Thread Michael Segel
No idea on how that last line of garbage got in the message. > On Jan 18, 2018, at 9:32 AM, Michael Segel wrote: > > Hi, > > I’m trying to find out if there’s a simple way for Spark to be able to read > an RCFile. > > I know I can create a table in Hive, then

Reading Hive RCFiles?

2018-01-18 Thread Michael Segel
Hi, I’m trying to find out if there’s a simple way for Spark to be able to read an RCFile. I know I can create a table in Hive, then drop the files in to that directory and use a sql context to read the file from Hive, however I wanted to read the file directly. Not a lot of details to go

Re: Spark Stream is corrupted

2018-01-18 Thread KhajaAsmath Mohammed
Any solutions for this problem please . Sent from my iPhone > On Jan 17, 2018, at 10:39 PM, KhajaAsmath Mohammed > wrote: > > Hi, > > I have created a streaming object from checkpoint but it always through up > error as stream corrupted when I restart spark

Re: StreamingLogisticRegressionWithSGD : Multiclass Classification : Options

2018-01-18 Thread Patrick McCarthy
As a hack, you could perform a number of 1 vs. all classifiers and then post-hoc select among the highest prediction probability to assign class. On Thu, Jan 18, 2018 at 12:17 AM, Sundeep Kumar Mehta wrote: > Hi, > > I was looking for Logistic Regression with Multi Class

Re: Writing data in HDFS high available cluster

2018-01-18 Thread Subhash Sriram
Hi Soheil, We have a high availability cluster as well, but I never have to specify the active master when writing, only the cluster name. It works regardless of which node is the active master. Hope that helps. Thanks, Subhash Sent from my iPhone > On Jan 18, 2018, at 5:49 AM, Soheil

Writing data in HDFS high available cluster

2018-01-18 Thread Soheil Pourbafrani
I have a HDFS high available cluster with two namenode, one as active namenode and one as standby namenode. When I want to write data to HDFS I use the active namenode address. Now, my question is what happened if during spark writing data active namenode fails. Is there any way to set both active

Re: good materiala to learn apache spark

2018-01-18 Thread Marco Mistroni
Jacek lawskowski on this mail list wrote a book which is available online. Hth On Jan 18, 2018 6:16 AM, "Manuel Sopena Ballesteros" < manuel...@garvan.org.au> wrote: > Dear Spark community, > > > > I would like to learn more about apache spark. I have a Horton works HDP > platform and have

Does Spark and Hive use Same SQL parser : ANTLR

2018-01-18 Thread Pralabh Kumar
Hi Does hive and spark uses same SQL parser provided by ANTLR . Did they generate the same logical plan . Please help on the same. Regards Pralabh Kumar

Re: Spark application on yarn cluster clarification

2018-01-18 Thread Fawze Abujaber
Hi Soheil, Resource manager and NodeManager are enough, of your you need the roles of DataNode and NameNode to be able accessing the Data. On Thu, 18 Jan 2018 at 10:12 Soheil Pourbafrani wrote: > I am setting up a Yarn cluster to run Spark applications on that, but I'm >

Writing to Redshift from Kafka Streaming source

2018-01-18 Thread Somasundaram Sekar
Hi, Is it possible to write the Dataframe backed by Kafka Streaming source into AWS Redshift, we have in the past used https://github.com/databricks/spark-redshift to write into redshift, but I presume it will not work with DataFrame##writeStream(). Also writing with JDBC connector with

Spark application on yarn cluster clarification

2018-01-18 Thread Soheil Pourbafrani
I am setting up a Yarn cluster to run Spark applications on that, but I'm confused a bit! Consider I have a 4-node yarn cluster including one resource manager and 3 node manager and spark are installed in all 4 nodes. Now my question is when I want to submit spark application to yarn cluster, is

spark linear regression model fit result is different from statsmodels linear model.

2018-01-18 Thread TonyHu
I want to do a VIF check on spark, so i have to get R^2(coefficient of determination) from linear regression model. But the result is much different from the R^2 using statsmodels linear model. I don't know why. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/