date:20171213

Re: Why this code is errorfull

2017-12-13 Thread Soheil Pourbafrani

Also I should mention that the `stream` Dstream definition is: JavaInputDStream> stream = KafkaUtils.createDirectStream( ssc, LocationStrategies.PreferConsistent(), ConsumerStrategies.Subscribe(TOPIC, kafkaParams) ); On Thu,

Why this code is errorfull

2017-12-13 Thread Soheil Pourbafrani

The following code is in SparkStreaming : JavaInputDStream results = stream.map(record -> SparkTest.getTime(record.value()) + ":" + Long.toString(System.currentTimeMillis()) + ":" + Arrays.deepToString(SparkTest.finall(record.value())) + ":" +

bulk upsert data batch from Kafka dstream into Postgres db

2017-12-13 Thread salemi

Hi All, we are consuming messages from Kafka using Spark dsteam. Once the processing is done we would like to update/insert the data in bulk fashion into the database. I was wondering what the best solution for this might be. Our Postgres database table is not partitioned. Thank you, Ali

spark streaming with flume: cannot assign requested address error

2017-12-13 Thread Junfeng Chen

I am trying to connect spark streaming with flume with pull mode. I have three machine and each one runs spark and flume agent at the same time, where they are master, slave1, slave2. I have set flume sink to slave1 on port 6689. Telnet slave1 6689 on other two machine works well. In my code, I

How to control logging in testing package com.holdenkarau.spark.testing.

2017-12-13 Thread Marco Mistroni

HI all could anyone advise on how to control logging in com,holdenkarau.spark.testing? there are loads of spark logging statement every time i run a test I tried to disable spark logging using statements below, but with no success import org.apache.log4j.Logger import

is Union or Join Supported for Spark Structured Streaming Queries in 2.2.0?

2017-12-13 Thread kant kodali

Hi All, I have messages in a queue that might be coming in with few different schemas like msg 1 schema 1, msg2 schema2, msg3 schema3, msg 4 schema1 I want to put all of this in one data frame. is it possible with structured streaming? I am using Spark 2.2.0 Thanks!

Re: Apache Spark documentation on mllib's Kmeans doesn't jibe.

2017-12-13 Thread Scott Reynolds

The train method is on the Companion Object https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.mllib.clustering.KMeans$ here is a decent resource on Companion Object usage: https://docs.scala-lang.org/tour/singleton-objects.html On Wed, Dec 13, 2017 at 9:16 AM Michael

Re: Why do I see five attempts on my Spark application

2017-12-13 Thread sanat kumar Patnaik

It should be within your yarn-site.xml config file.The parameter name is yarn.resourcemanager.am.max-attempts. The directory should be /usr/lib/spark/conf/yarn-conf. Try to find this directory on your gateway node if using Cloudera distribution. On Wed, Dec 13, 2017 at 2:33 PM, Subhash Sriram

Re: Why do I see five attempts on my Spark application

2017-12-13 Thread Subhash Sriram

There are some more properties specifically for YARN here: http://spark.apache.org/docs/latest/running-on-yarn.html Thanks, Subhash On Wed, Dec 13, 2017 at 2:32 PM, Subhash Sriram wrote: > http://spark.apache.org/docs/latest/configuration.html > > On Wed, Dec 13,

Re: Why do I see five attempts on my Spark application

2017-12-13 Thread Subhash Sriram

http://spark.apache.org/docs/latest/configuration.html On Wed, Dec 13, 2017 at 2:31 PM, Toy wrote: > Hi, > > Can you point me to the config for that please? > > On Wed, 13 Dec 2017 at 14:23 Marcelo Vanzin wrote: > >> On Wed, Dec 13, 2017 at 11:21 AM,

Re: Why do I see five attempts on my Spark application

2017-12-13 Thread Toy

Hi, Can you point me to the config for that please? On Wed, 13 Dec 2017 at 14:23 Marcelo Vanzin wrote: > On Wed, Dec 13, 2017 at 11:21 AM, Toy wrote: > > I'm wondering why am I seeing 5 attempts for my Spark application? Does > Spark application

Re: Why do I see five attempts on my Spark application

2017-12-13 Thread Marcelo Vanzin

On Wed, Dec 13, 2017 at 11:21 AM, Toy wrote: > I'm wondering why am I seeing 5 attempts for my Spark application? Does Spark > application restart itself? It restarts itself if it fails (up to a limit that can be configured either per Spark application or globally in

Why do I see five attempts on my Spark application

2017-12-13 Thread Toy

Hi, I'm wondering why am I seeing 5 attempts for my Spark application? Does Spark application restart itself?[image: Screen Shot 2017-12-13 at 2.18.03 PM.png]

Different behaviour when querying a spark DataFrame from dynamodb

2017-12-13 Thread Bogdan Cojocar

I am reading some data in a dataframe from a dynamo db table: val data = spark.read.dynamodb("table") data.filter($"field1".like("%hello%")).createOrReplaceTempView("temp") spark.sql("select * from temp").show() When I do the last statement I get results. If however I try to do:

Apache Spark documentation on mllib's Kmeans doesn't jibe.

2017-12-13 Thread Michael Segel

Hi, Just came across this while looking at the docs on how to use Spark’s Kmeans clustering. Note: This appears to be true in both 2.1 and 2.2 documentation. The overview page: https://spark.apache.org/docs/2.1.0/mllib-clustering.html#k-means Here’ the example contains the following line:

Re: Spark Streaming with Confluent

2017-12-13 Thread Gerard Maas

Hi Arkadiusz, Try 'rooting' your import. It looks like the import is being interpreted as being relative to the previous. 'rooting; is done by adding the '_root_' prefix to your import: import org.apache.spark.streaming.kafka.KafkaUtils import

Spark Streaming with Confluent

2017-12-13 Thread Arkadiusz Bicz

Hi, I try to test spark streaming 2.2.0 version with confluent 3.3.0 I have got lot of error during compilation this is my sbt: lazy val sparkstreaming = (project in file(".")) .settings( name := "sparkstreaming", organization := "org.arek", version := "0.1-SNAPSHOT", scalaVersion

Determine Cook's distance / influential data points

2017-12-13 Thread Richard Siebeling

Hi, would it be possible to determine the Cook's distance using Spark? thanks, Richard

Re: Spark loads data from HDFS or S3

2017-12-13 Thread Jörn Franke

S3 can be realized cheaper than HDFS on Amazon. As you correctly describe it does not support data locality. The data is distributed to the workers. Depending on your use case it can make sense to have HDFS as a temporary “cache” for S3 data. > On 13. Dec 2017, at 09:39, Philip Lee

Re: Spark loads data from HDFS or S3

2017-12-13 Thread Sebastian Nagel

> When Spark loads data from S3 (sc.textFile('s3://...'), how all data will be > spread on Workers? The data is read by workers. Only make sure that the data is splittable, by using a splittable format or by passing a list of files sc.textFile('s3://.../*.txt') to achieve full parallelism.

Spark loads data from HDFS or S3

2017-12-13 Thread Philip Lee

Hi I have a few of questions about a structure of HDFS and S3 when Spark-like loads data from two storage. Generally, when Spark loads data from HDFS, HDFS supports data locality and already own distributed file on datanodes, right? Spark could just process data on workers. What about S3?

Re: Why this code is errorfull

Why this code is errorfull

bulk upsert data batch from Kafka dstream into Postgres db

spark streaming with flume: cannot assign requested address error

How to control logging in testing package com.holdenkarau.spark.testing.

is Union or Join Supported for Spark Structured Streaming Queries in 2.2.0?

Re: Apache Spark documentation on mllib's Kmeans doesn't jibe.

Re: Why do I see five attempts on my Spark application

Re: Why do I see five attempts on my Spark application

Re: Why do I see five attempts on my Spark application

Re: Why do I see five attempts on my Spark application

Re: Why do I see five attempts on my Spark application

Why do I see five attempts on my Spark application

Different behaviour when querying a spark DataFrame from dynamodb

Apache Spark documentation on mllib's Kmeans doesn't jibe.

Re: Spark Streaming with Confluent

Spark Streaming with Confluent

Determine Cook's distance / influential data points

Re: Spark loads data from HDFS or S3

Re: Spark loads data from HDFS or S3

Spark loads data from HDFS or S3

21 matches

Site Navigation

Mail list logo

Footer information