date:20180428

[Spark2.1] SparkStreaming to Cassandra performance problem

2018-04-28 Thread Saulo Sobreiro

Hi all, I am implementing a use case where I read some sensor data from Kafka with SparkStreaming interface (KafkaUtils.createDirectStream) and, after some transformations, write the output (RDD) to Cassandra. Everything is working properly but I am having some trouble with the performance.

[Spark2.X] SparkStreaming to Cassandra performance problem

2018-04-28 Thread Saulo Sobreiro

Hi all, I am implementing a use case where I read some sensor data from Kafka with SparkStreaming interface (KafkaUtils.createDirectStream) and, after some transformations, write the output (RDD) to Cassandra. Everything is working properly but I am having some trouble with the performance.

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Mark Hamstra

spark.driver.maxResultSize http://spark.apache.org/docs/latest/configuration.html On Sat, Apr 28, 2018 at 8:41 AM, klrmowse wrote: > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use .collect() > > but,

Re: A naive ML question

2018-04-28 Thread kant kodali

Hi, I mean a transaction goes typically goes through different states like STARTED, PENDING, CANCELLED, COMPLETED, SETTLED etc... Thanks, kant On Sat, Apr 28, 2018 at 4:11 AM, Jörn Franke wrote: > What do you mean by “how it evolved over time” ? A transaction describes >

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel

I believe the virtualization of memory happens at the OS layer hiding it completely from the application layer On Sat, 28 Apr 2018, 22:22 Stephen Boesch, wrote: > While it is certainly possible to use VM I have seen in a number of places > warnings that collect() results must

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Stephen Boesch

While it is certainly possible to use VM I have seen in a number of places warnings that collect() results must be able to be fit in memory. I'm not sure if that applies to *all" spark calculations: but in the very least each of the specific collect()'s that are performed would need to be

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel

There is something as *virtual memory* On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote: > Do you have a machine with terabytes of RAM? afaik collect() requires > RAM - so that would be your limiting factor. > > 2018-04-28 8:41 GMT-07:00 klrmowse : > >>

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Stephen Boesch

Do you have a machine with terabytes of RAM? afaik collect() requires RAM - so that would be your limiting factor. 2018-04-28 8:41 GMT-07:00 klrmowse : > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use

[Spark 2.x Core] .collect() size limit

2018-04-28 Thread klrmowse

i am currently trying to find a workaround for the Spark application i am working on so that it does not have to use .collect() but, for now, it is going to have to use .collect() what is the size limit (memory for the driver) of RDD file that .collect() can work with? i've been scouring

Sequence file to Image in spark

2018-04-28 Thread Selvam Raman

Hi All, I am trying to convert sequence file to image in spark. i found that when i was reading bytearrayinputstream from bytes it throws serialization exception. Any insight will be helpful. scala> sc.sequenceFile[NullWritable,BytesWritable]("D:/seqImage").map(x =>

Re: Dataframe vs dataset

2018-04-28 Thread Michael Artz

Ok from the language you used, you are saying kind of that Dataset is a subset of Dataframe. I would disagree because to me a DataFrame is just a Dataset of org.spache.spark.sql.Row On Sat, Apr 28, 2018, 8:34 AM Marco Mistroni wrote: > Imho .neither..I see datasets as

Re: Dataframe vs dataset

2018-04-28 Thread Marco Mistroni

Imho .neither..I see datasets as typed df and therefore ds are enhanced df Feel free to disagree.. Kr On Sat, Apr 28, 2018, 2:24 PM Michael Artz wrote: > Hi, > > I use Spark everyday and I have a good grip on the basics of Spark, so > this question isnt for myself. But

Dataframe vs dataset

2018-04-28 Thread Michael Artz

Hi, I use Spark everyday and I have a good grip on the basics of Spark, so this question isnt for myself. But this came up and I wanted to see what other Spark users would say, and I dont want to influence your answer. And SO is weird about polls. The question is "Which one do you feel is

Re: A naive ML question

2018-04-28 Thread Jörn Franke

What do you mean by “how it evolved over time” ? A transaction describes basically an action at a certain point of time. Do you mean how a financial product evolved over time given a set of a transactions? > On 28. Apr 2018, at 12:46, kant kodali wrote: > > Hi All, > > I

A naive ML question

2018-04-28 Thread kant kodali

Hi All, I have a bunch of financial transactional data and I was wondering if there is any ML model that can give me a graph structure for this data? other words, show how a transaction had evolved over time? Any suggestions or references would help. Thanks!

[Spark2.1] SparkStreaming to Cassandra performance problem

[Spark2.X] SparkStreaming to Cassandra performance problem

Re: [Spark 2.x Core] .collect() size limit

Re: A naive ML question

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

[Spark 2.x Core] .collect() size limit

Sequence file to Image in spark

Re: Dataframe vs dataset

Re: Dataframe vs dataset

Dataframe vs dataset

Re: A naive ML question

A naive ML question

15 matches

Site Navigation

Mail list logo

Footer information