please care and vote for Chinese people under cruel autocracy of CCP, great thanks!

2019-08-28 Thread ant_fighter
Hi all, Sorry for disturbing you guys. Though I don't think here as a proper place to do this, I need your help, your vote, your holy vote, for us Chinese, for conscience and justice, for better world. In the over 70 years of ruling over China, the Chinese Communist Party has done many

[python 2.4.3] correlation matrix

2019-08-28 Thread Rishi Shah
Hi All, What is the best way to calculate correlation matrix? -- Regards, Rishi Shah

Re: Structured Streaming Dataframe Size

2019-08-28 Thread Nick Dawes
Thank you, TD. Couple of follow up questions please. 1) "It only keeps around the minimal intermediate state data" How do you define "minimal" here? Is there a configuration property to control the time or size of Streaming Dataframe? 2) I'm not writing anything out to any database or S3. My

Re: Caching tables in spark

2019-08-28 Thread Tzahi File
I mean two separate spark jobs On Wed, Aug 28, 2019 at 2:25 PM Subash Prabakar wrote: > When you mean by process is it two separate spark jobs? Or two stages > within same spark code? > > Thanks > Subash > > On Wed, 28 Aug 2019 at 19:06, wrote: > >> Take a look at this article >> >> >> >> >>

Re: Caching tables in spark

2019-08-28 Thread Subash Prabakar
When you mean by process is it two separate spark jobs? Or two stages within same spark code? Thanks Subash On Wed, 28 Aug 2019 at 19:06, wrote: > Take a look at this article > > > > > https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html > > > > *From:* Tzahi File

RE: Caching tables in spark

2019-08-28 Thread email
Take a look at this article https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html From: Tzahi File Sent: Wednesday, August 28, 2019 5:18 AM To: user Subject: Caching tables in spark Hi, Looking for your knowledge with some question. I have 2

Caching tables in spark

2019-08-28 Thread Tzahi File
Hi, Looking for your knowledge with some question. I have 2 different processes that read from the same raw data table (around 1.5 TB). Is there a way to read this data once and cache it somehow and to use this data in both processes? Thanks -- Tzahi File Data Engineer [image: ironSource]

What is directory "/path/_spark_metadata" for?

2019-08-28 Thread Mark Zhao
Hey, When running Spark on Alluxio-1.8.2, I encounter the following exception: “alluxio.exception.FileDoseNotExistException: Path “/test-data/_spark_metadata” does not exist” in Alluxio master.log. What exactly is the directory "_spark_metadata" used for? And how can I fix this problem?

Low cache hit ratio when running Spark on Alluxio

2019-08-28 Thread Jerry Yan
Hi, We are running Spark jobs on an Alluxio Cluster which is serving 13 gigabytes of data with 99% of the data is in memory. I was hoping to speed up the Spark jobs by reading the in-memory data in Alluxio, but found Alluxio local hit rate is only 1.68%, while Alluxio remote hit rate is 98.32%.

How to improve loading data into Cassandra table in this scenario?

2019-08-28 Thread Shyam P
> > updated the issue content. > https://stackoverflow.com/questions/57684972/how-to-improve-performance-my-spark-job-here-to-load-data-into-cassandra-table Thank you.