[Spark2.1] SparkStreaming to Cassandra performance problem

2018-04-28 Thread Saulo Sobreiro
Hi all, I am implementing a use case where I read some sensor data from Kafka with SparkStreaming interface (KafkaUtils.createDirectStream) and, after some transformations, write the output (RDD) to Cassandra. Everything is working properly but I am having some trouble with the performance.

[Spark2.X] SparkStreaming to Cassandra performance problem

2018-04-28 Thread Saulo Sobreiro
Hi all, I am implementing a use case where I read some sensor data from Kafka with SparkStreaming interface (KafkaUtils.createDirectStream) and, after some transformations, write the output (RDD) to Cassandra. Everything is working properly but I am having some trouble with the performance.

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Mark Hamstra
spark.driver.maxResultSize http://spark.apache.org/docs/latest/configuration.html On Sat, Apr 28, 2018 at 8:41 AM, klrmowse wrote: > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use .collect() > > but,

Re: A naive ML question

2018-04-28 Thread kant kodali
Hi, I mean a transaction goes typically goes through different states like STARTED, PENDING, CANCELLED, COMPLETED, SETTLED etc... Thanks, kant On Sat, Apr 28, 2018 at 4:11 AM, Jörn Franke wrote: > What do you mean by “how it evolved over time” ? A transaction describes >

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel
I believe the virtualization of memory happens at the OS layer hiding it completely from the application layer On Sat, 28 Apr 2018, 22:22 Stephen Boesch, wrote: > While it is certainly possible to use VM I have seen in a number of places > warnings that collect() results must

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Stephen Boesch
While it is certainly possible to use VM I have seen in a number of places warnings that collect() results must be able to be fit in memory. I'm not sure if that applies to *all" spark calculations: but in the very least each of the specific collect()'s that are performed would need to be

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel
There is something as *virtual memory* On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote: > Do you have a machine with terabytes of RAM? afaik collect() requires > RAM - so that would be your limiting factor. > > 2018-04-28 8:41 GMT-07:00 klrmowse : > >>

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Stephen Boesch
Do you have a machine with terabytes of RAM? afaik collect() requires RAM - so that would be your limiting factor. 2018-04-28 8:41 GMT-07:00 klrmowse : > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use

[Spark 2.x Core] .collect() size limit

2018-04-28 Thread klrmowse
i am currently trying to find a workaround for the Spark application i am working on so that it does not have to use .collect() but, for now, it is going to have to use .collect() what is the size limit (memory for the driver) of RDD file that .collect() can work with? i've been scouring

Sequence file to Image in spark

2018-04-28 Thread Selvam Raman
Hi All, I am trying to convert sequence file to image in spark. i found that when i was reading bytearrayinputstream from bytes it throws serialization exception. Any insight will be helpful. scala> sc.sequenceFile[NullWritable,BytesWritable]("D:/seqImage").map(x =>

Re: Dataframe vs dataset

2018-04-28 Thread Michael Artz
Ok from the language you used, you are saying kind of that Dataset is a subset of Dataframe. I would disagree because to me a DataFrame is just a Dataset of org.spache.spark.sql.Row On Sat, Apr 28, 2018, 8:34 AM Marco Mistroni wrote: > Imho .neither..I see datasets as

Re: Dataframe vs dataset

2018-04-28 Thread Marco Mistroni
Imho .neither..I see datasets as typed df and therefore ds are enhanced df Feel free to disagree.. Kr On Sat, Apr 28, 2018, 2:24 PM Michael Artz wrote: > Hi, > > I use Spark everyday and I have a good grip on the basics of Spark, so > this question isnt for myself. But

Dataframe vs dataset

2018-04-28 Thread Michael Artz
Hi, I use Spark everyday and I have a good grip on the basics of Spark, so this question isnt for myself. But this came up and I wanted to see what other Spark users would say, and I dont want to influence your answer. And SO is weird about polls. The question is "Which one do you feel is

Re: A naive ML question

2018-04-28 Thread Jörn Franke
What do you mean by “how it evolved over time” ? A transaction describes basically an action at a certain point of time. Do you mean how a financial product evolved over time given a set of a transactions? > On 28. Apr 2018, at 12:46, kant kodali wrote: > > Hi All, > > I

A naive ML question

2018-04-28 Thread kant kodali
Hi All, I have a bunch of financial transactional data and I was wondering if there is any ML model that can give me a graph structure for this data? other words, show how a transaction had evolved over time? Any suggestions or references would help. Thanks!